Nelson's log

A better Python object for JSON take 2

I wrote about wanting a better Python object for JSON. Now I’ve tried out a bunch of options, all inadequate, and am clearer what I want.

The best of the libraries I found was addict. Or maybe easydict because it is slightly less magic. dotmap is also OK but has one wrinkle. attrdict was the only one I couldn’t make work well.

I’ve put sample code exercising various libraries in this gist.

Here are the things I want, in order of priority:

  1. Dot access to fields instead of []
  2. Sensible default value for missing keys instead of an exception
  3. Ease of adding new nested values
  4. Ease of serializing to JSON
  5. Small performance hit

All the libraries provide #1, dot syntax for reading keys. The challenge is #2, what happens if the field is missing (like d.noObj in my code). dotmap and addict both return an empty dict {} for missing keys. That’s not a terrible default although I’d prefer None. attrdict.AttrDefault() lets you choose a different default value but has other problems. easydict just throws exceptions for missing keys; making the dot syntax of questionable usefulness.

The tricky thing is #3 and it ties to what happens if you then assign to an empty dict for a missing key. That’s required for convenient syntax for setting a deeply nested value, the ideal syntax of “d.other.nested.thing = 6”. Both dotmap and addict actually create an empty dict value in the data structure the moment you try to read an empty key. That lets nested assignment work, but leaves litter in your data structure. addict edges out dotmap here because it provides a .prune() method to remove them. easydict doesn’t create empty dict values, so nested assignment isn’t convenient but then it’s less magic.

#4, JSON serialization,is not too hard in any system. addict, easydict, and attrdict just let you pass the thing to json.dump(); their objects are subtypes of dict. dotmap requires you explicitly call .toDict().

I did not measure #5, performance. Each library has a different approach to when it copies data, would be worth examining.

I mentioned above I couldn’t make attrdict work for me. That may be my error, not the library’s. It’s neat in that there’s actually two options; AttrDefault, which provides alternate default values, and AttrDict, which is a bit simpler. But AttrDict has a mysterious / confusing design wrinkle for “Recursive attribute access” that’s a n00b trap. And I couldn’t figure out how to JSON serialize an AttrDefault, nor even how to turn it into a normal dict. Maybe I missed something.

Doing this exercise made me wonder about the wisdom of #2, sensible default values. The problem is there’s not a one-size-fits-all default. Often you want it to be None, for simple primitives. But sometimes you might want [] for an empty list or {} for an empty object. If you want to let the programmer choose those values, you might as well go fully verbose and use the dict.get() method with its default value.

I could see using EasyDict in the future if I only wanted to read JSON and weren’t worried about #2, missing values. It’s the least magic. Or else Addict if I also wanted to construct JSON and get empty dicts returned for missing keys.

Update: it strikes me that a lightweight use of JSON schema would be helpful here. If you know what types to expect, you can fill in any missing keys with appropriate empty values. Not sure anyone uses JSON schema in the wild though. I’d want to start with something that auto-infers schema from examples. And I don’t care about validation, just using type expectations as hints on what to do with ambiguity.