Go to content Go to menu

OR2018 Recap

Jun 19, 10:38 AM

We got this

NOTE: THIS WAS INITIALLY POSTED AS A DRAFT, it has been updated twice (see below). I reserve the right to add links to things that need them, as the idea occurs to me… but it’s mostly done now. —HJP 6/20/2018 10:41am CDT

Before OR2018, I went on vacation with my wife to Santa Fe, New Mexico. We drove from Missouri, so we were in the car for a while, and we checked out a book on CD from our library. The book we got was by Kelly McGonigal, who had recently spoken at a work retreat my wife attended… to give you a feel for where Kelly is coming from, here’s a TED talk of hers.

The book/CD set we checked out is called, The Neuroscience of Change: a compassion-based program for personal transformation.

Listening to this CD on a road trip was very relaxing… I told my wife I felt like I’d been on an all-day mindfulness retreat when we stepped out of the car.

Why am I bringing all this up? Well, while listening to this CD, I came to the realization that I have been resisting some change my career has been going through, and I also got in touch with a capacity I didn’t realize I had: a feeling, that, “I’ve got this.” I vowed to myself to take this confidence into OR2018. And I was startled to find that same confidence reflected back to me by everyone at Open Repositories, from the speakers to all of my colleagues. I can’t say whether I interpreted this “vibe” based on my own intention prior to the conference, or whether it was something other people could observe. I will say that the statement “Open Access has arrived” bounced around a bit, from speaker to speaker, and you could say that “we’ve got this” is a variation of that.

Enough preamble, on to the conference!

Informal Meetups day (Sunday, 6/3/2018)

I ran into my friend Dermot Frost in the airport in Denver, as well as Carloyn Cole from Penn State. We ended up hanging out after we landed in Bozeman. Carolyn lead the Valkyrie Code Read workshop, which was one of the things I was most looking forward to at this conference, and I definitely wanted to find out more about how they are using Valkyrie at Penn State. So, we wandered Bozeman for an afternoon. Carolyn proposed walking until we could get a clearer view of the mountains, Dermot and I agreed. We ended up walking to the very edge of Bozeman.

We got to the MSU Library in time for the “Informal Meetups” and I had a nice chat with a few of my DSpace friends. I broke the news that I’m going to be working more with Samvera and less on DSpace (which is the professional change I’ve been wrestling with, I mentioned above). I didn’t think this was news, I thought I’d told people, but it did seem to cause a few pouts. I’ll stick with the DSpace community and pitch in when I can, but… my focus will be Samvera from here on out. It’s just the way it is, my employer wishes it, I will make it so.

Workshops day (Monday, 6/4/2018)

Workshop: DSpace REST-API

If you’d like to follow the workshop, you can do it all on your own, self-guided.

One tool mentioned during this workshop, Postman looks very helpful, I installed it and have played around a bit. It’s a nice suite of tools to work with a REST-API, the kind of thing that helps you remember all the various things you need to remember, so you don’t have to jump between browser sessions and read docs, copy/paste between windows… Postman will help you keep track of all the complicated things, so you can focus on using the API. Even (and especially) logging in and maintaining a session for an authenticated API.

Workshop: Valkyrie Code Read

Valkyrie on GitHub

My random notes: I realized just how much of a newbie I am to Ruby. As Carolyn read through the code, I found myself googling about inheritance in Ruby. I found this page in this tutorial (tutorial start ) I would like to follow that tutorial later.

And this one line in Valkyrie made me just marvel I think that’s three maps deep?

I then started exploring the resources I found on Ruby, and discovered the Open Book Shelf I’ll have to return to that later.

More notes from my notebook: change sets are a key part of how Valkyrie works, and the code around them is pretty clear, this is where we dove in during the code read, if you want to duplicate the experience, or otherwise deepen your understanding of Valkyrie, start with change sets.

Also, a thing I have learned in the past, but was nice to see in practice during the code read: specs/tests are great docs on how things are supposed to work, so, if you’re lost, start with the tests.

I should note that, while Valkyrie is not on the draft Hyrax road map, it’s clearly on the “road map.” (Hat tip to Tom Johnson for that turn of phrase and road map link.) As I spoke with other Samvera community members, and listened to them speak during sessions, it’s very clear that the entire community has accepted the inevitability of Valkyrie becoming part of the stack we will all use. It’s a tool we all anticipate having in our toolbox, sooner rather than later (see below for more on this theme). I’ve been here before, it’s a touchy subject, but ask any DSpace community member about “DSpace 2.0.” :-) However, all joking aside, most of the ideas that were floated as part of DSpace 2.0 did eventually make it into the core DSpace, it just didn’t happen all at once. I do believe that Valkyrie is on its way in to the Samvera code base.

After the code read was done, there were other workshops starting up, so I wandered in to the Redbox workshop…

RedBox workshop

…Where I found out about this handy tool: Data Curator but it doesn’t run on Linux :-( (UPDATE: I’m wrong, it builds just fine on Linux, you need Yarn installed, and it builds an AppImage version, which is easy to install, yay! a new toy!)

However, if you need a quick and dirty CSV tool (which is not Excel), and you have Atom already installed, Tablr works well. Though it’s a tad unstable until you patch a bug (patches and workarounds are posted on that issue I just linked).

RedBox is neat, one can learn a lot from what they’re doing and how they’re doing it… and a related thing: GitLab is a really handy tool for automating all sorts of services our users/stakeholders might want provided to them. GitLab looks like a way for us to say “yes, we can do that for you” which is cool to see more of. I know one developer currently at UCLA who has built his own personal CD stack using Rancher and GitLab. I intend to try to copy his setup. I’ve been nagging him for his docs, however, I also know that there is a nice blog post about this kind of thing, so, I think I ought to be able to muddle through on my own.


The opening keynote, by Casey Fiesler, was entitled “Growing Their Own: Building an Archive and a Community for Fanfiction”. Recording, Slides

It was inspiring to see what a group of dedicated volunteers could achieve, in bootstrapping a community-driven repository of user-generated content. I recommend watching this keynote.

The closing keynote, by Asaf Bartov, was entitled “Free Culture in the Periphery: A Personal Perspective” Recording, Slides
This keynote was similarly inspiring, seeing what a dedicated community of volunteers is capable of, as well as the struggle and challenges this community faces.

I want to say more about one particular challenge, which I noticed during Asaf’s keynote, but I’ll save that for another time.

GT01: Samvera (June 5, 2018)

Esmé Cowles, from Princeton, started off this session with an introduction to Valkyrie, the process of how and why it came about, and the philosophy behind its development. Here is a remix of the video from the presentation with the slides added. I recommend watching it. Esmé’s talk kind of set the tone for Valkyrie for the rest of the conference, I think… it made it OK to talk about as if it’s a tool we can rely on being in our toolbox in the future.

UPDATE 6/20/2018: Oh, yeah, I had a part in this conference, too!

I was invited to serve as one of the Developer Track co-chairs, and after a brief wait for approval from my management, I said yes. It was pretty cool to help shape part of the conference. I’ve been a reviewer in the past, but rounding up and wrangling reviewers (and session chairs) for a track is surprisingly rewarding work. People are flattered you have asked them for help, and then they do help. I just want to say thank you to everyone who said yes to my pleas for help, I really appreciate it.

I especially want to say thanks to my fellow co-chair, Liz Krznarich, who was such a calm, steadying voice whenever I was inclined to simply freak out about whatever it was we needed to do. We got it done. Thanks, Liz!

Developer Workspaces panel

Here’s the abstract for the panel I proposed:

Some of us still develop the traditional way, and install the entire application stack on our own computers. But there are many other options available: Vagrant, Docker, or IDEs in the cloud. All approaches share the same aim: to minimize the effort required in standing up a new developer workspace, and to ensure this setup is shareable and repeatable. This panel will consist of live demos of all of these options, with plenty of opportunities to discuss best practices.

Here are the notes from the whole session (which includes links to all the slides). The panel was at the end of the session, so skip to the bottom of those notes if you just want to see the notes on the panel.

This panel was a lot of fun to do, (yes, even the live demo) and I hope it helps some people figure out what all these different tools are capable of, why one would choose to use them, and which is a good fit for what they want to do.

And because I skipped past the thank you slide at the end (it was a relief to be done!), here’s a link to that slide. Also, I’d like to thank all the panelists for the session, for agreeing to participate, and helping put together an amazing collection of work to demonstrate the current state of the art of developer workspaces. Begging your indulgence, I’ll just name them here (in alphabetical order): Terry Brady, Georgetown University Library, Liz Krznarich, ORCID, and Kate Lynch, University of Pennsylvania. Also, a huge thanks to former panelists who could not make it to OR: Erin Fahy, Stanford and Anusha Ranganathan, Cottage Labs. Even though they couldn’t make it, their participation and continuing advice helped shaped the content of the panel presentations. Thanks again, I think we made a great team, and I hope to work with all of you again some day.

UPDATE 6/20/2018 10:41am CDT: Ideas Challenge

It’s hard for me to resist the allure of the Ideas Challenge, and I joined a team this year. Our team name was “GDPR – Wranglers vs Sheriffs”, My team mates were: Janet McDougall, Senior Data Archivist, Australian Data Archive, Saskia van Bergen, Senior Project Manager, Leiden University Libraries and Harish Maringanti, Associate Dean for ITS, The University of Utah. Our proposed solution was to develop a checklist similar to the GDPR Checklist site, but with guidance more specific to repositories and research data. I wanted to produce a working demo based on the GDPR Checklist site’s code, however, the static site generator it uses, Gatsby.js, proved too difficult for me to set up while also attending sessions, so I set that aside and just gave a hand-wavy demo using the actual GDPR Checklist site. I’m happy to report that I continued tinkering with Gatsby.js on the way home, and my first day back home… and… I got it working after all. Gatsby seems like a cool tool, I will have to play with it more. As many people know, static site generators are an interest of mine. OH, I’m also happy to report that The Medical Research Council in the UK has some advice re GDPR so… if you’re worried about how GDPR might affect you as a researcher or someone who helps facilitate research data storage, check that out.

Random thoughts

DSpace 7 will be amazing!

DSpace 7 slides
DSpace 7 demo

DSpace 7 will be amazing! Why? 1) Configurable entities (i.e. you can customize the data model!), this is potentially sharable with other repository shared data model work going on now. 2) ResourceSync is supported out of the box. 3) An industry-standard REST-API, courtesy of Spring Data REST, and a UI based on Angular 2. DSpace will feel like a desktop application! Expect to play with the beta in early 2019 (maybe earlier), it should be out and ready for deployments by next OR. Want to play earlier than that? They could use the help.

It’s exciting to see at least two communities rallying around the idea of customizing and sharing data models. It’ll be good to have at least two robust options for reflecting the sometimes complex metadata models our content requires of repository and digital library folks. Oh, and if you’re interested in this topic, I recommend checking out CASRAI. (Hat tip to Tim Donohue for that link.)

This past June I went to Open Repositories 2017 in Brisbane, Australia. I presented a workshop on how to get started with using Ansible and Serverspec. The workshop slides and materials are available. That was a great experience, and I’m happy to report that I was able to survive the virtual machine I brought to lead the workshop not working on my own notebook… thanks to the help of my pal, Kim Shepherd, who loaned me his own notebook, to run the machine. It did work for everyone else. When I returned from Australia, I rewrote the Vagrant configuration to make a sturdier VM for future workshops, calling it Workshop-o-matic. It is useful for anyone who might want to follow along with the workshop slides, so, if you’re interested, please do.

This blog post is very tardy, I’m sorry about that. Immediately after the conference, I took my wife on a vacation around northern Australia. We spent a wonderful week driving around in a rented campervan. It was a great time, and it has been a non-stop whirlwind to catch up with work and life, and everything else that piles up after a vacation, and at the start of a new school year. So, anyway, enough excuses, on with this recap.

The Keynote was given by Sir Timothy Gowers, entitled, “Perverse incentives: how the reward structures of academia are getting in the way of scholarly communication and good science.” There is a video recording. Incidentally, all the filmed conference sessions are also available (note only the general session tracks in the main ballroom were recorded).

Two things Sir Timothy said really struck a chord with me, have held my attention through the conference, and since, as well:

The current culture doesn’t really favor sharing [incomplete ideas].

An obvious thought is that if we did all start sharing our little scribblings, we could end up with a complete mess.

After hearing this, my mind started racing, becuase, us open source developers do exactly this: we already share our work in progress. And it hit me, I’d been thinking about this problem a while. I even had a phrase for my half-baked idea of how to approach the problem:

Are you going to eat that, mate?

And I filled a page with scribbly notes, and found a team to pitch this idea as part of the idea’s challenge. Alas, it didn’t win, but we had great fun making the slides.

So, my main takeaway from this conference is that I need to figure out how network data analysis works, and I need to tackle this challenge on my own, because I’m convinced there are a lot of really great ideas—almost finished code—just out there on GitHub, waiting for us to find, and ask that question: Hey, if you’re not using this code, can we use it?

Pardon this digression, however, after the conference, my pal Kim sent me a note on Slack, and says he was fiddling with a citation database and ended up finding this article:

MODELING DISTRIBUTED COLLABORATION ON GITHUB Journal Article published Dec 2014 in Advances in Complex Systems volume 17 issue 07n08 on page 1450024 Authors: NORA McDONALD, KELLY BLINCOE, EVA PETAKOVIC, SEAN GOGGINS

And an author name leaps out at me: Sean Goggins, hey, I think I know that guy. We have shared friends, we go to the same neighborhood pool. So, we become facebook friends. I still haven’t taken Sean out to lunch, but he’s working on this really interesting project:


From the governance page, the mission of the project is to:

  1. produce integrated, open source software for analyzing software development, and definition of standards and models used in that software in specific use cases;
  2. establish implementation-agnostic metrics for measuring community activity, contributions, and health; and
  3. optionally produce standardized metric exchange formats, detailed use cases, models, or recommendations to analyze specific issues in the industry/OSS world.

Which is not quite what I want to do, but it is working with the same data set, to help foster the health of open source development communities. And goal 3 would at least help me in my own goal, which is essentially to build a recommendation engine for work in progress on GitHub.

Now, back to my OR17 recap. Here are some of the cool tools I found out about at the conference, the things I want to check out later:

In Dev Track 2, Conal Tuohy presented on mining linked data from text, and mentioned a tool I want to check out: XProc
which is an W3C recommendation for an XML transformation language, using XML pipelines. There’s a book and a tutorial I found, I’ll check them out later.

Also, Peter Sefton presented a static repository builder tool, Calcyte which makes extremely high-performance and inexpensive data repositories with static HTML.

The real draw for me for this session was the presentation Visualizing Research Graph using Neo4j and Gephi by Dr. Amir Aryani and Hao Zhang. I knew after the keynote that I really needed to find out more about graph data, and I knew Neo4j and Gephi would be tools I’d need to be familiar with, so I ended up in Dev Track 2 to see this presentation and be inspired, and it did not disappoint. I came out of this presentation convinced that I could use the network graph data in GitHub to build the recommendation engine I wanted to build. And, even if I didn’t build a full-fledged tool, at a minimum I should be able to explore this data on my own, using Neo4J and Gephi.

I’m not ashamed to admit that I sought out Dr. Aryani’s next presentation on the following day, in General Track 10, “Research Graph: Building a Distributed
Graph of Scholarly Works using Research Data Switchboard”. It was really interesting to find out how a distributed graph works, and why one would use it—It’s a way to produce a larger data set from more than one shared dataset, by connecting the graph data across disparate repositories. Doing so allows each parter institution to retain “ownership” of their own data, while still maintaining access to the shared whole of the larger dataset. Distributed graph databases also share a bit of the computational load of running large-scale queries, which helps the entire data set scale, and remain usable.

My other takeaway from OR17 is that I really need to keep better tabs on a particular colleague of mine, Andrea Schweer, as she often puts interesting code up on GitHub, and every time I look at her code I’m blown away by its quality, and how I can immediately make use of much of it. Don’t believe me? Look at this collection of cool stuff.

That’s the kind of thing I hope to be able to find with my future fiddling with the GitHub network graph. How many other developers have huge collections of interesting bits of code, maybe just half-finished, but still amazingly useful, waiting for us to discover, and use, and build communities around?

I’m really excited about this, and hope to be able to help make it happen.

UPDATE (12/15/2017):
Just to give you a tiny taste of what’s possible, GitHub has added a couple of recommendation-engine features. If you have a GitHub account, head on over to GitHub Discover and GitHub Explore which are both giant rabbit holes of fun, happy hunting! NOTE: neither of these features are what I had in mind, they’re just basic “you like these projects and follow these people, have you seen this projects?” or “hey, everyone else is excited about this, you should be, too” kinds of things. I’d like to focus in on branches in forks of a project, find the ones that have been pulled a lot, and mix that in with other social data (friends of friends, etc.).

UPDATE (01/16/2018):
Kim Shepherd wrote a fun song inspired by events at OR17 Two warnings: there’s a bit of NSFW language in the middle, and this probably makes more sense if you were there. But, it’s a good song and in good fun, so give it a listen.

UPDATE (02/07/2018):
Ooooh, this looks fun