fhwang.net

JsonInference

More than once, I've been asked to make sense of a document store underneath an out-of-control codebase. For my last project, I wrote JsonInference to help me see the entire data store all at once, looking for common patterns.

Given a bunch of JSON documents that are assumed to be similar, JsonInference reports on statistical patterns about commonality. For example, feed a report object a bunch of JSON hashes:

report = JsonInference.new_report
huge_json['docs'].each do |doc|
  report << doc
end
puts report.to_s

And you receive output that looks like this.

JsonInference report: 21 documents
:root > ._id: 21/21 (100.00%)
  String: 100.00%, 0.00% empty

:root > ._rev: 21/21 (100.00%)
  String: 100.00%, 0.00% empty

:root > .author_id: 14/21 (66.67%)
  Fixnum: 100.00%

:root > .sections: 21/21 (100.00%)
  Array: 100.00%, 0.00% empty
  :root > .sections:nth-child(): 50 children
    Hash: 100.00%
    :root > .sections:nth-child() > .title: 50/50 (100.00%)
      String: 100.00%, 0.00% empty
    :root > .sections:nth-child() > .subhead: 50/50 (100.00%)
      String: 100.00%, 2.00% empty
    :root > .sections:nth-child() > .body: 50/50 (100.00%)
      String: 100.00%, 0.00% empty
    :root > .sections:nth-child() > .permalink: 46/50 (92.00%)
      String: 100.00%, 15.22% empty

I keep meaning to write more about document stores and challenges they represent to teams in modeling data. I don't necessarily think they're worse than relational stores, but they do seem to offer lots of unfamiliar pitfalls.

blog comments powered by Disqus

« Previous post

Next post »