More than once, I've been asked to make sense of a document store underneath an out-of-control codebase. For my last project, I wrote JsonInference to help me see the entire data store all at once, looking for common patterns.

Given a bunch of JSON documents that are assumed to be similar, JsonInference reports on statistical patterns about commonality. For example, feed a report object a bunch of JSON hashes:

report = JsonInference.new_report
huge_json['docs'].each do |doc|
  report << doc
puts report.to_s

And you receive output that looks like this.

JsonInference report: 21 documents
:root > ._id: 21/21 (100.00%)
  String: 100.00%, 0.00% empty

:root > ._rev: 21/21 (100.00%)
  String: 100.00%, 0.00% empty

:root > .author_id: 14/21 (66.67%)
  Fixnum: 100.00%

:root > .sections: 21/21 (100.00%)
  Array: 100.00%, 0.00% empty
  :root > .sections:nth-child(): 50 children
    Hash: 100.00%
    :root > .sections:nth-child() > .title: 50/50 (100.00%)
      String: 100.00%, 0.00% empty
    :root > .sections:nth-child() > .subhead: 50/50 (100.00%)
      String: 100.00%, 2.00% empty
    :root > .sections:nth-child() > .body: 50/50 (100.00%)
      String: 100.00%, 0.00% empty
    :root > .sections:nth-child() > .permalink: 46/50 (92.00%)
      String: 100.00%, 15.22% empty

I keep meaning to write more about document stores and challenges they represent to teams in modeling data. I don't necessarily think they're worse than relational stores, but they do seem to offer lots of unfamiliar pitfalls.


For a recent consulting project, I found myself comparing a lot of large JSON documents in tests, which can be frustrating since differences don't show up well when comparing the hashes normally. Hence JsonDeepCompare, a Ruby gem for comparing large JSON documents and showing the most specific points of difference if they are unequal.

Let's say you've got a test case:

class MyTest
  include JsonDeepCompare::Assertions

  def test_comparison
    left_value = {
      'total_rows' => 2,
      'rows' => [
          'id' => 'foo',
          'doc' => {
            '_id' => 'foo', 'title' => 'Foo', 'sub_document' => { 'one' => 'two' }
    right_value = {
      'total_rows' => 2,
      'rows' => [
          'id' => 'foo',
          'doc' => {
            '_id' => 'foo', 'title' => 'Foo', 'sub_document' => { 'one' => '1' }
    assert_json_equal(left_value, right_value)

Running it will output this error:

RuntimeError: ":root > .rows :nth-child(1) > .doc > .sub_document > .one" expected to be "two" but was "1"

The selector syntax uses a limited subset of JSONSelect to describe where to find the differences.


And there's this, too: Ellipsifier is a Javascript library that truncates HTML. It will retain the tag structure, counting only visible characters in the resulting text.

new Ellipsifier("to be or not to be", 5).result
//              "to be&nbsp;&hellip;"
new Ellipsifier('to <strong>be or</strong> not to be', 20).result
//              "to <strong>be or</strong> not to be"
new Ellipsifier('to <strong>be or</strong> not to be', 5).result
//              "to <strong>be</strong>&nbsp;&hellip;"

Another chunk of code written with the good folks at HowAboutWe.

Speaking at Goruco

So, I'm speaking at Goruco this year. On The Front-End Future:

With the rise of Javascript MVC frameworks like Ember and Backbone, web programmers find themselves at a fork in the road. If they keep doing server-side web programming, they'll benefit from tried-and-true tools and techniques. If they jump into Javascript MVC, they may be able to offer a more responsive web experience, but at significant added development cost. Which should they choose?

This talk will address the strategic costs and benefits of using Javascript MVC today. I will touch on subjects such as development speed, usability, conceptual similarities with desktop and mobile applications, the decoupling of rendering and routing from server logic, and the state of the emerging Javascript MVC community. I will also discuss the impact of this seismic change on Ruby, Rails, and your career as a software engineer.

Nobody should confuse me with a Javascript expert, and that's not why I'm giving this talk. There are many talks you can see that focus on the specifics of implementation that are being hashed out today. With my talk, I will be drawing out the macro trends in our field that affect the products we build, and the careers we craft.

In particular, I feel like the move to thick-client web apps is giving the Ruby and Rails community a bit of existential paralysis--we should be talking about this far more, and meeting this change head-on. The future is uncertain, but it is also bright.

Goruco is on Saturday, June 23. This is our sixth year, and without giving away the rest of the speakers, I think this might quite possibly be our best program yet. If you want to join us, tickets are still available.

Rich clients: On mobile, and history management

Martin Sutherland responds to my rich client thoughts with some insightful caveats:

Mobile browsers on devices not designed in Cupertino

First of all: mobile. If you're using an iPhone 4(S), you might not realize that a lot of web browsers on mobile devices are abominably slow. In terms of getting the first page of your app/site up and running on a mobile browser, an HTML page rendered on the server is going to beat a client-side JS application hands down in at least 90% of cases.

Continue reading “Rich clients: On mobile, and history management” »

Should your web application be rich-client from day one?

For the sake of discussion, I'm going to make a recommendation about the state of web application development today:

If you are writing a new web application, you should make it a rich-client application from the start. Your servers should not generate any HTML. You should do all that work in the browser with a Javascript framework such as Backbone.js or Ember.js, and the server should only talk to the browser via a REST API.

I'm not saying I believe this idea 100%, mind you. But I feel like we may be reaching some sort of specific tipping point, and I'm interested in teasing out why this would or wouldn't be a good idea.

Continue reading “Should your web application be rich-client from day one?” »