Go to content Go to menu

Over the past few months I have been looking into getting Hyrax (aka a Samvera reference implementation) set up on my work notebook. I think I’m close, and I wanted to share my working notes here, in case you want to follow along…

You must first love Ruby

OK, maybe “love” is too strong of a word, but you’ll at least need to install it, if you haven’t already.

Install Ruby

You’ll probably want to use a dedicated tool to manage Ruby versions, because that’s part of the fun of Ruby—you’ll need to use a different version of Ruby some day. And that day is going to be hard enough without trying to un-do or work around some other way of installing Ruby. Trust me.

My favorite way to install Ruby is with rbenv however uru is supposedly easier and cross-platform. And there are many other methods.

Gem Install Rails and Railties

You’re not going to be running all of Rails on your workstation (though you could, but there are a bunch of dependencies for Hyrax, and you’re going to use Docker to manage that mess). However, you’ll need the Rails gem installed so you can use CLI-based Rails generator to build a new Hyrax application. And you’ll need Railties version 5.0.6 so you can run the preferred Rails generator (Railties is the gem that manages the generator stuff). So, as soon as you have Ruby installed, run these commands:

gem install rails
gem install railties:5.0.6

An early milestone of success: let’s generate a Hyrax application!

We have all the pieces we need, so let’s get this out of the way now. Run this:

cd path/to/your/project/or/workspace/folder
rails _5.0.6_ new awesomenameforyournewapp -m https://raw.githubusercontent.com/samvera/hyrax/v2.0.1/template.rb

Blam, that felt good.

Gem Install Stack Car

We’re going to use Notch8’s Stack Car gem to help us manage Docker and Docker-Compose competently.

gem install stack_car

OK, don’t get too excited, but you’re almost ready to seriously hack on Hyrax. But first, you do have Docker and Docker-Compose installed, right?

Install Docker and Docker Compose

Sorry, that will probably be an epic journey of discovery. Docker seems to work better on Linux than any other OS… I’ve heard good things about OSX. But, this is a terse guide, and those links will get you started. Come back when you have Docker and Docker Compose installed. Good luck!

Right, back to Hyrax and Stack Car

cd path/to/your/project/or/workspace/folder/awesomenameforyournewapp
sc dockerize .

Oooh, shiny. Now, as awesome as this is, you’ll need to make some adjustments.

Add the following to the default .env file:


Remove the last 4 lines from the docker-compose.yml file



^^^ that may or may not be necessary, but on the version of Docker and Docker Compose I was using, it was.

Change the ports line in the docker-compose.yml file to read

      - "3000:80"

Let’s start this baby up

Now we can crank up our Dockerized Hyrax app:

sc up

Have fun

You should be in a good place to further explore what Hyrax is and what you can make it do. Need a place to start? Start here:

This is a work in progress!

So, I don’t yet have access to any documentation on how Stack Car works or what you should do next. I wish I did. If you’re following along, you can muddle through with me, or you can wait patiently for me to update this blog post with links to more documentation.

UPDATE: so, you’ll have a instance of Hyrax running on but you’ll need to do a couple of things before it’s actually usable.

1. You’ll need to run db:migrate:

sc be rails db:migrate

2. You'll need to chown and chmod your tmp folder
sc exec bash
chown root tmp
chmod 777 tmp
# note this is really naughty, but you know, it's a dev environment, so get over it

Now check and you should see your new Hyrax site waiting for you to hack on. Get to it, buddy. Oh, for the db:migrate command, you might need to send a slightly different one. Rails will tell you what to run. You’re almost there.

UPDATE2: If you’re running Docker on a Linux notebook (I am) all sorts of things will be slightly off while you’re working with Stack Car. I’m not sure of the cause, I’m researching, but it’s something to do with the way the named mounts are loaded with the Docker Compose file created by Stack Car. The main problem is that Docker wants to run as root, which means files created by root in the containers are owned by root (hence the janky chown and “rootme” permissions up above). There are workarounds for most of the issues, but the one that I can’t seem to fix is that whenever a rails command is run on the container, the files that rails command creates will be owned by the root user. That ownership translates over to the host. Which means I won’t be able to edit those files. Which is a real downer as far as developer experience is concerned (the entire point is to be able to work with these files… not being able to work with them makes me very table-flippy). I suspect a bit more cleverness with file permissions might be enough to hobble along… but… I also suspect there may be a simple thing I can change in the Docker Compose file to have these mounts work correctly without any monkey business. I suppose the real question is: is it worth my time to invest any further effort to get this working environment to work for me or should I move back to Vagrant—a tool I trust to deliver a usable (albeit rather slow) working environment.

UPDATE3: I think this is the source of the magic on OSX… apparently things just work over there? More research required.

UPDATE4: Docker uses something called a storage driver to handle how containers talk to storage on the host computer. it looks like my default storage driver was set to aufs, which isn’t quite the same as overlay2, which is the recommended storage driver. So, I’ve followed this suggestion and now hoave overlay2 set as my Docker storage driver. I then did a bunch more research because that config change was not enough to handle the permissions issues I am encountering. And I found this on StackOverflow, so I added a :z at the end of my volume lines in docker-compose.yml, like so:

      image: solr:latest
       - .env
       - .env.development
       - "8983:8983"
       - './solr:/opt/solr/server/solr/mycores:z'
       - docker-entrypoint.sh
       - solr-precreate
       - samvera
       - /opt/solr/server/solr/mycores/config

the :z flag tells the storage driver to pass through the permissions from the host folder and files. With this flag in place, you can run

chmod -R 777 solr

in your host working directory, and you’ll have proper permissions set up, so you can edit those solr configs on your host and then have Docker load the Solr container correctly. The same strategy applies to all the other mounts you might want to work with (like the one for the web container, where all the app files and configs go). After you run a rails task to generate new MVC files, you’ll probably need to run the following in a terminal on your host:

find -type f -user root | sudo xargs chmod 777

You can then edit the files the rails tasks leave for you, via an editor or IDE (Atom or RubyMine), on your host. Is this annoying that you have to tinker with permissions all the time? Yes… but it does work, and you’ll only have to do it once for each file you create. Probably you could get creative with a sticky bit and setting the GID on the container… that will be fun for another day.

Speaking of fun, it turns out you need to run another task before you can create the default admin_set. The docs are currently out of date, so I made a ticket for that issue

TLDR:, here’s what you need to type in order to get your Hyrax app running in Stack Car:

sc be bundle install
sc be rails hyrax:default_collection_types:create
sc be rails hyrax:default_admin_set:create

And you need to add an admin role to the role_map.yml file.

Basically you need to follow along with the getting started guide ENJOY!

UPDATE5: Here are a few other helpful links for getting started:

This past June I went to Open Repositories 2017 in Brisbane, Australia. I presented a workshop on how to get started with using Ansible and Serverspec. The workshop slides and materials are available. That was a great experience, and I’m happy to report that I was able to survive the virtual machine I brought to lead the workshop not working on my own notebook… thanks to the help of my pal, Kim Shepherd, who loaned me his own notebook, to run the machine. It did work for everyone else. When I returned from Australia, I rewrote the Vagrant configuration to make a sturdier VM for future workshops, calling it Workshop-o-matic. It is useful for anyone who might want to follow along with the workshop slides, so, if you’re interested, please do.

This blog post is very tardy, I’m sorry about that. Immediately after the conference, I took my wife on a vacation around northern Australia. We spent a wonderful week driving around in a rented campervan. It was a great time, and it has been a non-stop whirlwind to catch up with work and life, and everything else that piles up after a vacation, and at the start of a new school year. So, anyway, enough excuses, on with this recap.

The Keynote was given by Sir Timothy Gowers, entitled, “Perverse incentives: how the reward structures of academia are getting in the way of scholarly communication and good science.” There is a video recording. Incidentally, all the filmed conference sessions are also available (note only the general session tracks in the main ballroom were recorded).

Two things Sir Timothy said really struck a chord with me, have held my attention through the conference, and since, as well:

The current culture doesn’t really favor sharing [incomplete ideas].

An obvious thought is that if we did all start sharing our little scribblings, we could end up with a complete mess.

After hearing this, my mind started racing, becuase, us open source developers do exactly this: we already share our work in progress. And it hit me, I’d been thinking about this problem a while. I even had a phrase for my half-baked idea of how to approach the problem:

Are you going to eat that, mate?

And I filled a page with scribbly notes, and found a team to pitch this idea as part of the idea’s challenge. Alas, it didn’t win, but we had great fun making the slides.

So, my main takeaway from this conference is that I need to figure out how network data analysis works, and I need to tackle this challenge on my own, because I’m convinced there are a lot of really great ideas—almost finished code—just out there on GitHub, waiting for us to find, and ask that question: Hey, if you’re not using this code, can we use it?

Pardon this digression, however, after the conference, my pal Kim sent me a note on Slack, and says he was fiddling with a citation database and ended up finding this article:

MODELING DISTRIBUTED COLLABORATION ON GITHUB Journal Article published Dec 2014 in Advances in Complex Systems volume 17 issue 07n08 on page 1450024 Authors: NORA McDONALD, KELLY BLINCOE, EVA PETAKOVIC, SEAN GOGGINS

And an author name leaps out at me: Sean Goggins, hey, I think I know that guy. We have shared friends, we go to the same neighborhood pool. So, we become facebook friends. I still haven’t taken Sean out to lunch, but he’s working on this really interesting project:


From the governance page, the mission of the project is to:

  1. produce integrated, open source software for analyzing software development, and definition of standards and models used in that software in specific use cases;
  2. establish implementation-agnostic metrics for measuring community activity, contributions, and health; and
  3. optionally produce standardized metric exchange formats, detailed use cases, models, or recommendations to analyze specific issues in the industry/OSS world.

Which is not quite what I want to do, but it is working with the same data set, to help foster the health of open source development communities. And goal 3 would at least help me in my own goal, which is essentially to build a recommendation engine for work in progress on GitHub.

Now, back to my OR17 recap. Here are some of the cool tools I found out about at the conference, the things I want to check out later:

In Dev Track 2, Conal Tuohy presented on mining linked data from text, and mentioned a tool I want to check out: XProc
which is an W3C recommendation for an XML transformation language, using XML pipelines. There’s a book and a tutorial I found, I’ll check them out later.

Also, Peter Sefton presented a static repository builder tool, Calcyte which makes extremely high-performance and inexpensive data repositories with static HTML.

The real draw for me for this session was the presentation Visualizing Research Graph using Neo4j and Gephi by Dr. Amir Aryani and Hao Zhang. I knew after the keynote that I really needed to find out more about graph data, and I knew Neo4j and Gephi would be tools I’d need to be familiar with, so I ended up in Dev Track 2 to see this presentation and be inspired, and it did not disappoint. I came out of this presentation convinced that I could use the network graph data in GitHub to build the recommendation engine I wanted to build. And, even if I didn’t build a full-fledged tool, at a minimum I should be able to explore this data on my own, using Neo4J and Gephi.

I’m not ashamed to admit that I sought out Dr. Aryani’s next presentation on the following day, in General Track 10, “Research Graph: Building a Distributed
Graph of Scholarly Works using Research Data Switchboard”. It was really interesting to find out how a distributed graph works, and why one would use it—It’s a way to produce a larger data set from more than one shared dataset, by connecting the graph data across disparate repositories. Doing so allows each parter institution to retain “ownership” of their own data, while still maintaining access to the shared whole of the larger dataset. Distributed graph databases also share a bit of the computational load of running large-scale queries, which helps the entire data set scale, and remain usable.

My other takeaway from OR17 is that I really need to keep better tabs on a particular colleague of mine, Andrea Schweer, as she often puts interesting code up on GitHub, and every time I look at her code I’m blown away by its quality, and how I can immediately make use of much of it. Don’t believe me? Look at this collection of cool stuff.

That’s the kind of thing I hope to be able to find with my future fiddling with the GitHub network graph. How many other developers have huge collections of interesting bits of code, maybe just half-finished, but still amazingly useful, waiting for us to discover, and use, and build communities around?

I’m really excited about this, and hope to be able to help make it happen.

UPDATE (12/15/2017):
Just to give you a tiny taste of what’s possible, GitHub has added a couple of recommendation-engine features. If you have a GitHub account, head on over to GitHub Discover and GitHub Explore which are both giant rabbit holes of fun, happy hunting! NOTE: neither of these features are what I had in mind, they’re just basic “you like these projects and follow these people, have you seen this projects?” or “hey, everyone else is excited about this, you should be, too” kinds of things. I’d like to focus in on branches in forks of a project, find the ones that have been pulled a lot, and mix that in with other social data (friends of friends, etc.).

UPDATE (01/16/2018):
Kim Shepherd wrote a fun song inspired by events at OR17 Two warnings: there’s a bit of NSFW language in the middle, and this probably makes more sense if you were there. But, it’s a good song and in good fun, so give it a listen.

UPDATE (02/07/2018):
Ooooh, this looks fun