URIs in RubyGems

Eric’s blog post got me thinking about what GitHub implies for how RubyGems should be designed. I think, in a nutshell, that although we may want to make small patches to the current system, ideally we’d be able to rethink the entire problem. (By “we” I mean actually Eric, since he’s a maintainer and I’m just backseat-driving. Obviously if Eric doesn’t have the time to do any of this I’m in no position to complain.)

The immediate problem is that a lot of people want to maintain their own forks of libraries on github and release them as gems. Github will publish them as gems, but downstream every user has to tweak her local RubyGems configuration:

$ gem sources -a http://gems.github.com
$ sudo gem install fhwang-activerecord

So, there are now two big gem-servers, RubyForge and Github, and issues of reconciliation between the two. But thinking outside of the immediate issue, shouldn’t we ideally be doing fully distributed gem serving anyway? Consider:

  • For years a number of Rubyists have hosted their code off of RubyForge and piped their gems through RubyForge just for distribution. Off the top of my head I can think of Eric, Ryan Davis, and Greg Brown—I’m probably missing a bunch of others, too.
  • There’s no guarantee that Github will be the last place to dev your Ruby code. What if Ryan and Eric start some new Perforce-driven developer’s portal tomorrow?

It seems to me that there are a few things RubyGems should be offering to the Rubysphere, and excelling at:

  • First and foremost, a guarantee that two people installing the same gem will get the same code
  • Reliable and robust distribution of those gems over the network
  • Ease of use in installing those gems—I’d say that having to configure your local RubyGems setup to install from GitHub detracts from this
  • Managing dependencies across all gems

I guess part of what’s changed is that you’d ideally consider a gem to be a certain chunk of code from a source, but that the source no longer has to be just RubyForge. I think the current system is uncomfortably straddling the centralized distribution with RubyForge as the primary source, and the decentralized distribution we should be using ideally. I should say also, to clarify, I have nothing but respect for the work that’s been done on RubyGems so far and I can remember the dark days before we didn’t have RubyGems—way less fun. But I suspect that the system is not keeping pace with how a lot of Ruby (and Rails) programmers are working today.

And, as a contrast to what I said above, here are a few problems that RubyGems should not be trying to solve:

  • Whether a given gem is any good, or cruddy, or outright malicious. Caveat installor.
  • Ensuring cohesion among libraries: If twenty people want to fork ActiveRecord, then it’s RubyGems’ job to enable that and not try to force them to be one happy family.
  • Different ways that library developers might tag releases, branches, etc. More on this below.

It seems to me the way you’d solve this is with URIs. So, let’s say I’m running a gem server of my own, you might install one of my gems with:

$ sudo gem install http://gems.fhwang.net/admin_assistant

Maybe my library depends on Hoe, so my gemspec might look like:

Gem::Specification.new do |spec|
  spec.add_dependency 'http://gems.zenspider.com/hoe'

And RubyGems should be able to automatically track and install that dependency, like before. The fact that they come from two different servers should be no problem at all.

What do URIs give us? They give us two things. First, you get uniqueness because you’re using domain names, just as with XML namespace URIs. Second, they mean you don’t need everyone to agree on the best way to tag versions or branches or whatnot.

Because a lot of people are starting to use other people’s code without version numbers. This is most obvious in ./script/plugin install in Rails, but happens a lot in other ways too. While there are lots of ways in this might be too risky (big corporate deployments, etc.), there appear to be tons of Ruby programmers who are perfectly comfortable with this in practice. Upstream, you’re getting a lot of developers who don’t bother to make version numbers, but then, they might branch, or they might use their own versioning scheme, etc. URIs make this very easy actually:

sudo gem install http://gems.fhwang.net/admin_assistant/HEAD
sudo gem install http://gems.fhwang.net/admin_assistant/1.1

# install from a git SHA:
sudo gem install \

# install version 0.7 of the "ajax_theme" branch:
sudo gem install http://gems.fhwang.net/admin_assistant/ajax_themes/0.7

In examples like this, it’s not RubyGems’ problem what those URIs mean, it just does a GET and installs what it sees. Now, unfortunately by doing this you lose a ton of RubyGems’ cool built-in versioning support, but it appears a lot of people are bypassing that anyway with no regrets. You could still leave it in for library authors who want to offer old-school version numbers.

In terms of implementation, there are some pretty big changes that are required. For one thing, the way RubyGems are installed locally would be messed up by having URI namespaces, so that would have to be changed. Maybe it’d be as simple as having lots o’ directories, mimicking how Java does its library setup:

$ find /Library/Ruby/Gems/1.8/

More complicated is the network distribution. Right now I believe RubyGems works on a push model, whereby RubyForge knows about all its gems and pushes out a gem index to its mirrors. In a decentralized world, where nobody is registering their gem with a central service, you’d need a pull model, which then brings up various issues regarding caches and latency and whatnot. Not sure how difficult this challenge would be in practice.

And, yes, this whole idea may just be a ploy to get people to use urirequire again.

blog comments powered by Disqus
Tagged: ruby

« Previous post

Next post »