Go to content Go to menu

Samvera Connect 2018 Recap

Mar 15, 10:04 AM

Samvera Connect 2018 was hosted by the University of Utah’s J. Willard Marriott Library. For me, the theme of this conference was testing. I registered for two different workshops on testing. The first was run by Carolyn Cole, and based on the excellent Rails Bridge training. [Carolyn’s workshop materials] The second was run by Tom Johnson, and was a slightly more Samvera-specific workshop on testing topics. [Tom’s workshop materials] I have several pages of notes from both workshops. But I’ll try to mention a few bullet point takaways from them:

From Carolyn’s workshop:

  • try to write tests that make sense to you
  • tests are really only useful if you understand how they work
  • generated code doesn’t always have tests, or what tests it does have are incomplete
  • to quote Carolyn: “Yes there are smart people writing this stuff, but it’s important to understand what you’re using.”
  • use Coveralls to find places where you are testing too much (ooh, interesting, but good tip)
  • name your tests in such a way that, when you see them fail, the failure message makes sense… you’ll see your tests fail right away, if you’re doing TDD correctly, and it helps a lot if you can understand what the failure message is telling you
  • a thing to read about later: Shameless Green from Sandi Metz.
  • Feature specs are expensive, but that’s because they are complete… since big tests require a long time investment, ensure there is a big payoff for running them, especially if you’re running them all the time as part of some automated process

From Tom’s workshop:

  • Are your tests using too-specific dependencies?
  • Do your tests “know too much”?
  • Don’t just “rebuild the implementation” in the test
  • Complicated tests should worry you, they are a warning sign
  • this snippet is golden: it (require 'pry' binding pry) see this article for more details

Tom also had lots to say about mock objects, and other such things. For sure check out both Carolyn and Tom’s workshop materials, and if you’re doing any sort of development work with Samvera, you should actually go through both of those workshops and follow along with the exercises. It’s good experience, you’ll learn a lot.

OK, workshops done, on to the main conference. Here are some highlights from my notes: The Code Stability Working Group’s Recommendations is something I should read. Glancing at it again, I see there are a few interesting links from that page… I need to set aside some time to explore all of that info more thoroughly. Along those same lines, the Hyrax Roadmap is something anyone who works with Hyrax should be aware of, and read. Also, Hyrax doesn’t just travel down that road all on its own, there is likely lots of work to be done, and if you’re in a position to help, there will be Hyrax pull requests waiting for review. Have some spare time? Make yourself useful and pitch in!

Another theme to this conference was the community coming to terms with what bringing in Valkyrie will actually mean, and how that work will proceed. And while that might have been interesting many months ago, proceed they have! :-) Valkyrie is now up to version 1.5.1 with version 2.0 expected in coming weeks. The project has been promoted to the core Samvera repository, and has been added as a dependency of Hyrax as of February 2.

During the Hyrax Working Group, Valkyrie discussion dominated. At some point, someone mentioned Martin Fowler’s Branch by Abstraction article … I need to read that.

Now I’ll just list off the talks I got the most out of, in the order they appear in my notes (likely chronological order).

David Schober from Northwestern presented a talk on DevOps [slides]
…really useful from the trenches information on running containers in AWS, and using Terraform to do it. This is an area my team at UCLA Library is trying to develop expertise in, these slides will be useful for our team. I particularly like their DevStack tool for setting up dev environments for the various services they build and maintain.

Kate Lynch from University of Pennsylvania Libraries presented a talk about a workflow they’ve written to manage backup of content to Amazon Glacier, which they call “Guardian Workflow”. [slides]
… really cool stuff, and worth a look if you want to do something similar. Even if you’re not interested in backing up to Glacier, it’s worth a look to see how they’ve managed to tackle it, as the approach might be useful for other work you have to do.

Justin Coyne from Stanford University Library spoke about deploying with AWS Elastic Container Service [slides] From my notes, they’re using CloudWatch to pull all their log files together, and auto-scaling will save you money, you can turn off the service when you don’t need it.

James Griffin from Princeton University Library presented on Synchronizing Samvera [slides] Lots of helpful information in this talk about how to get Samvera to talk to other web services. This is information that will help anyone who is building microservices alongside a Samvera repository, or anyone who needs to integrate a Samvera repository in a larger IT ecosystem.

There was a session about IIIF, and my notes from this session are pretty sparse, but I did jot down that Simeon Warner from Cornell mentioned they use something called “Art Store” to handle moving files, and I think it’s actually Archival Storage Ingest … which actually looks pretty cool, and useful for my team.

There was a Batch Ingest Working Group meeting which I attended, but the facilitator couldn’t make it. My colleague Lisa McAulay jumped in to lead the conversation, and we ended up walking through the working group’s docs on the wiki, and added new information from those present at this meeting, so hopefully we ended up helping. Those notes have moved on the wiki since this meeting, I think this is their current location

I, along with my colleague from UCSD, Jon Robinson, facilitated an unconference session we called the “DevOps Sandbox Swap Meet”, here are the [notes from the session]

My colleague from UCLA Library, Stephen Gurnick, and I presented a poster on things we’ve learned about making DevOps work at UCLA Library. [our poster]

That’s about it. Sorry these notes are so late.

OR2018 Recap

Jun 19, 10:38 AM

We got this

NOTE: THIS WAS INITIALLY POSTED AS A DRAFT, it has been updated twice (see below). I reserve the right to add links to things that need them, as the idea occurs to me… but it’s mostly done now. —HJP 6/20/2018 10:41am CDT

Before OR2018, I went on vacation with my wife to Santa Fe, New Mexico. We drove from Missouri, so we were in the car for a while, and we checked out a book on CD from our library. The book we got was by Kelly McGonigal, who had recently spoken at a work retreat my wife attended… to give you a feel for where Kelly is coming from, here’s a TED talk of hers.

The book/CD set we checked out is called, The Neuroscience of Change: a compassion-based program for personal transformation.

Listening to this CD on a road trip was very relaxing… I told my wife I felt like I’d been on an all-day mindfulness retreat when we stepped out of the car.

Why am I bringing all this up? Well, while listening to this CD, I came to the realization that I have been resisting some change my career has been going through, and I also got in touch with a capacity I didn’t realize I had: a feeling, that, “I’ve got this.” I vowed to myself to take this confidence into OR2018. And I was startled to find that same confidence reflected back to me by everyone at Open Repositories, from the speakers to all of my colleagues. I can’t say whether I interpreted this “vibe” based on my own intention prior to the conference, or whether it was something other people could observe. I will say that the statement “Open Access has arrived” bounced around a bit, from speaker to speaker, and you could say that “we’ve got this” is a variation of that.

Enough preamble, on to the conference!

Informal Meetups day (Sunday, 6/3/2018)

I ran into my friend Dermot Frost in the airport in Denver, as well as Carloyn Cole from Penn State. We ended up hanging out after we landed in Bozeman. Carolyn lead the Valkyrie Code Read workshop, which was one of the things I was most looking forward to at this conference, and I definitely wanted to find out more about how they are using Valkyrie at Penn State. So, we wandered Bozeman for an afternoon. Carolyn proposed walking until we could get a clearer view of the mountains, Dermot and I agreed. We ended up walking to the very edge of Bozeman.

We got to the MSU Library in time for the “Informal Meetups” and I had a nice chat with a few of my DSpace friends. I broke the news that I’m going to be working more with Samvera and less on DSpace (which is the professional change I’ve been wrestling with, I mentioned above). I didn’t think this was news, I thought I’d told people, but it did seem to cause a few pouts. I’ll stick with the DSpace community and pitch in when I can, but… my focus will be Samvera from here on out. It’s just the way it is, my employer wishes it, I will make it so.

Workshops day (Monday, 6/4/2018)

Workshop: DSpace REST-API

If you’d like to follow the workshop, you can do it all on your own, self-guided.

One tool mentioned during this workshop, Postman looks very helpful, I installed it and have played around a bit. It’s a nice suite of tools to work with a REST-API, the kind of thing that helps you remember all the various things you need to remember, so you don’t have to jump between browser sessions and read docs, copy/paste between windows… Postman will help you keep track of all the complicated things, so you can focus on using the API. Even (and especially) logging in and maintaining a session for an authenticated API.

Workshop: Valkyrie Code Read

Valkyrie on GitHub

My random notes: I realized just how much of a newbie I am to Ruby. As Carolyn read through the code, I found myself googling about inheritance in Ruby. I found this page in this tutorial (tutorial start ) I would like to follow that tutorial later.

And this one line in Valkyrie made me just marvel I think that’s three maps deep?

I then started exploring the resources I found on Ruby, and discovered the Open Book Shelf I’ll have to return to that later.

More notes from my notebook: change sets are a key part of how Valkyrie works, and the code around them is pretty clear, this is where we dove in during the code read, if you want to duplicate the experience, or otherwise deepen your understanding of Valkyrie, start with change sets.

Also, a thing I have learned in the past, but was nice to see in practice during the code read: specs/tests are great docs on how things are supposed to work, so, if you’re lost, start with the tests.

I should note that, while Valkyrie is not on the draft Hyrax road map, it’s clearly on the “road map.” (Hat tip to Tom Johnson for that turn of phrase and road map link.) As I spoke with other Samvera community members, and listened to them speak during sessions, it’s very clear that the entire community has accepted the inevitability of Valkyrie becoming part of the stack we will all use. It’s a tool we all anticipate having in our toolbox, sooner rather than later (see below for more on this theme). I’ve been here before, it’s a touchy subject, but ask any DSpace community member about “DSpace 2.0.” :-) However, all joking aside, most of the ideas that were floated as part of DSpace 2.0 did eventually make it into the core DSpace, it just didn’t happen all at once. I do believe that Valkyrie is on its way in to the Samvera code base.

After the code read was done, there were other workshops starting up, so I wandered in to the Redbox workshop…

RedBox workshop

…Where I found out about this handy tool: Data Curator but it doesn’t run on Linux :-( (UPDATE: I’m wrong, it builds just fine on Linux, you need Yarn installed, and it builds an AppImage version, which is easy to install, yay! a new toy!)

However, if you need a quick and dirty CSV tool (which is not Excel), and you have Atom already installed, Tablr works well. Though it’s a tad unstable until you patch a bug (patches and workarounds are posted on that issue I just linked).

RedBox is neat, one can learn a lot from what they’re doing and how they’re doing it… and a related thing: GitLab is a really handy tool for automating all sorts of services our users/stakeholders might want provided to them. GitLab looks like a way for us to say “yes, we can do that for you” which is cool to see more of. I know one developer currently at UCLA who has built his own personal CD stack using Rancher and GitLab. I intend to try to copy his setup. I’ve been nagging him for his docs, however, I also know that there is a nice blog post about this kind of thing, so, I think I ought to be able to muddle through on my own.


The opening keynote, by Casey Fiesler, was entitled “Growing Their Own: Building an Archive and a Community for Fanfiction”. Recording, Slides

It was inspiring to see what a group of dedicated volunteers could achieve, in bootstrapping a community-driven repository of user-generated content. I recommend watching this keynote.

The closing keynote, by Asaf Bartov, was entitled “Free Culture in the Periphery: A Personal Perspective” Recording, Slides
This keynote was similarly inspiring, seeing what a dedicated community of volunteers is capable of, as well as the struggle and challenges this community faces.

I want to say more about one particular challenge, which I noticed during Asaf’s keynote, but I’ll save that for another time.

GT01: Samvera (June 5, 2018)

Esmé Cowles, from Princeton, started off this session with an introduction to Valkyrie, the process of how and why it came about, and the philosophy behind its development. Here is a remix of the video from the presentation with the slides added. I recommend watching it. Esmé’s talk kind of set the tone for Valkyrie for the rest of the conference, I think… it made it OK to talk about as if it’s a tool we can rely on being in our toolbox in the future.

UPDATE 6/20/2018: Oh, yeah, I had a part in this conference, too!

I was invited to serve as one of the Developer Track co-chairs, and after a brief wait for approval from my management, I said yes. It was pretty cool to help shape part of the conference. I’ve been a reviewer in the past, but rounding up and wrangling reviewers (and session chairs) for a track is surprisingly rewarding work. People are flattered you have asked them for help, and then they do help. I just want to say thank you to everyone who said yes to my pleas for help, I really appreciate it.

I especially want to say thanks to my fellow co-chair, Liz Krznarich, who was such a calm, steadying voice whenever I was inclined to simply freak out about whatever it was we needed to do. We got it done. Thanks, Liz!

Developer Workspaces panel

Here’s the abstract for the panel I proposed:

Some of us still develop the traditional way, and install the entire application stack on our own computers. But there are many other options available: Vagrant, Docker, or IDEs in the cloud. All approaches share the same aim: to minimize the effort required in standing up a new developer workspace, and to ensure this setup is shareable and repeatable. This panel will consist of live demos of all of these options, with plenty of opportunities to discuss best practices.

Here are the notes from the whole session (which includes links to all the slides). The panel was at the end of the session, so skip to the bottom of those notes if you just want to see the notes on the panel.

This panel was a lot of fun to do, (yes, even the live demo) and I hope it helps some people figure out what all these different tools are capable of, why one would choose to use them, and which is a good fit for what they want to do.

And because I skipped past the thank you slide at the end (it was a relief to be done!), here’s a link to that slide. Also, I’d like to thank all the panelists for the session, for agreeing to participate, and helping put together an amazing collection of work to demonstrate the current state of the art of developer workspaces. Begging your indulgence, I’ll just name them here (in alphabetical order): Terry Brady, Georgetown University Library, Liz Krznarich, ORCID, and Kate Lynch, University of Pennsylvania. Also, a huge thanks to former panelists who could not make it to OR: Erin Fahy, Stanford and Anusha Ranganathan, Cottage Labs. Even though they couldn’t make it, their participation and continuing advice helped shaped the content of the panel presentations. Thanks again, I think we made a great team, and I hope to work with all of you again some day.

UPDATE 6/20/2018 10:41am CDT: Ideas Challenge

It’s hard for me to resist the allure of the Ideas Challenge, and I joined a team this year. Our team name was “GDPR – Wranglers vs Sheriffs”, My team mates were: Janet McDougall, Senior Data Archivist, Australian Data Archive, Saskia van Bergen, Senior Project Manager, Leiden University Libraries and Harish Maringanti, Associate Dean for ITS, The University of Utah. Our proposed solution was to develop a checklist similar to the GDPR Checklist site, but with guidance more specific to repositories and research data. I wanted to produce a working demo based on the GDPR Checklist site’s code, however, the static site generator it uses, Gatsby.js, proved too difficult for me to set up while also attending sessions, so I set that aside and just gave a hand-wavy demo using the actual GDPR Checklist site. I’m happy to report that I continued tinkering with Gatsby.js on the way home, and my first day back home… and… I got it working after all. Gatsby seems like a cool tool, I will have to play with it more. As many people know, static site generators are an interest of mine. OH, I’m also happy to report that The Medical Research Council in the UK has some advice re GDPR so… if you’re worried about how GDPR might affect you as a researcher or someone who helps facilitate research data storage, check that out.

Random thoughts

DSpace 7 will be amazing!

DSpace 7 slides
DSpace 7 demo

DSpace 7 will be amazing! Why? 1) Configurable entities (i.e. you can customize the data model!), this is potentially sharable with other repository shared data model work going on now. 2) ResourceSync is supported out of the box. 3) An industry-standard REST-API, courtesy of Spring Data REST, and a UI based on Angular 2. DSpace will feel like a desktop application! Expect to play with the beta in early 2019 (maybe earlier), it should be out and ready for deployments by next OR. Want to play earlier than that? They could use the help.

It’s exciting to see at least two communities rallying around the idea of customizing and sharing data models. It’ll be good to have at least two robust options for reflecting the sometimes complex metadata models our content requires of repository and digital library folks. Oh, and if you’re interested in this topic, I recommend checking out CASRAI. (Hat tip to Tim Donohue for that link.)