Following up on an earlier post, a roundup of some systems for querying JSON documents. I’m looking for small data extraction languages that make it very easy to extract bits of data from a bunch of JSON blobs. Kind of like regular expressions or XPath or CSS Selectors. A key thing here is being able to do stuff like “find me the parts of this JSON document that have a property foo with value bar”. Also dealing with arrays intelligently.
jq. It’s the most popular JSON query language, and quite powerful. I also find it really terse and confusing but maybe that’s because I don’t use it enough. But it’s definitely the tool to compare everything else to. There’s also been some work on a usable libjq so you can use it inside other languages, like these Python bindings.
OboeJS. It’s a streaming JSON parser but also has an XPath-like language for pulling out chunks of a document. I like the look of it.
jsonfilter is a very simple grep-like tool that works with hierarchical input. It looks good and simple for doing simple things. The author is someone I trust to think a tool like this through.
Postgres. It has native support for JSON types and a pretty decent set of functions for working with it. It also has functions for converting JSON to Postgres’ array and recordset types, which you can then further query. And the JSONB type has a few extra query options. I don’t think I’d load data into Postgres just for JSON queries, but if you have JSON there already it’s pretty powerful.
MongoDB‘s JSON query language is surprisingly limited for a system that’s all about searching JSON documents. Maybe that’s just my ignorance, but I can’t find any operators for diving into the document structure other than basically walking the JSON document tree manually.
Other options I didn’t explore. #1 question is whether these have momentum and/or a future. A bunch of them look active on first glance. Will update as I read about them and find anything interesting. There’s a bunch of other things not on this list, too.
Update: in further discussion, pvg notes it’s helpful to split JSON query systems into two groups: “QLs that do selection/filtering and QLs that do that and also processing and sorting and munging and aggregation and summing and map/reduce and …”.
I really want a system that does all the processing and sorting and munging too. But SQL is very good at that already. So really I just want to do some simple extraction with jq or whatever and then feed it to a SQL system for aggregate reporting. Or else maybe just use Postgres to do everything? I need to get my hands dirty with some feature extraction examples to see how jq / postgres / etc feel.