A better Python object for JSON take 2

I wrote about wanting a better Python object for JSON. Now I’ve tried out a bunch of options, all inadequate, and am clearer what I want.

The best of the libraries I found was addict. Or maybe easydict because it is slightly less magic. dotmap is also OK but has one wrinkle. attrdict was the only one I couldn’t make work well.

I’ve put sample code exercising various libraries in this gist.

Here are the things I want, in order of priority:

  1. Dot access to fields instead of []
  2. Sensible default value for missing keys instead of an exception
  3. Ease of adding new nested values
  4. Ease of serializing to JSON
  5. Small performance hit

All the libraries provide #1, dot syntax for reading keys. The challenge is #2, what happens if the field is missing (like d.noObj in my code). dotmap and addict both return an empty dict {} for missing keys. That’s not a terrible default although I’d prefer None. attrdict.AttrDefault() lets you choose a different default value but has other problems. easydict just throws exceptions for missing keys; making the dot syntax of questionable usefulness.

The tricky thing is #3 and it ties to what happens if you then assign to an empty dict for a missing key. That’s required for convenient syntax for setting a deeply nested value, the ideal syntax of “d.other.nested.thing = 6”. Both dotmap and addict actually create an empty dict value in the data structure the moment you try to read an empty key. That lets nested assignment work, but leaves litter in your data structure. addict edges out dotmap here because it provides a .prune() method to remove them. easydict doesn’t create empty dict values, so nested assignment isn’t convenient but then it’s less magic.

#4, JSON serialization,is not too hard in any system. addict, easydict, and attrdict just let you pass the thing to json.dump(); their objects are subtypes of dict. dotmap requires you explicitly call .toDict().

I did not measure #5, performance. Each library has a different approach to when it copies data, would be worth examining.

I mentioned above I couldn’t make attrdict work for me. That may be my error, not the library’s. It’s neat in that there’s actually two options; AttrDefault, which provides alternate default values, and AttrDict, which is a bit simpler. But AttrDict has a mysterious / confusing design wrinkle for “Recursive attribute access” that’s a n00b trap. And I couldn’t figure out how to JSON serialize an AttrDefault, nor even how to turn it into a normal dict. Maybe I missed something.

Doing this exercise made me wonder about the wisdom of #2, sensible default values. The problem is there’s not a one-size-fits-all default. Often you want it to be None, for simple primitives. But sometimes you might want [] for an empty list or {} for an empty object. If you want to let the programmer choose those values, you might as well go fully verbose and use the dict.get() method with its default value.

I could see using EasyDict in the future if I only wanted to read JSON and weren’t worried about #2, missing values. It’s the least magic. Or else Addict if I also wanted to construct JSON and get empty dicts returned for missing keys.

Update: it strikes me that a lightweight use of JSON schema would be helpful here. If you know what types to expect, you can fill in any missing keys with appropriate empty values. Not sure anyone uses JSON schema in the wild though. I’d want to start with something that auto-infers schema from examples. And I don’t care about validation, just using type expectations as hints on what to do with ambiguity.

Update Nov 2019: see FlexDict. It has solutions for the creating-objects-for-keys problem.

4 thoughts on “A better Python object for JSON take 2

  1. you’ve hooked me, i’m tempted to dive into this rabbithole yak shave now too. thanks for all the research!

    for reading at least, i wonder if we could come up with a magic default value for nonexistent keys that does what we want but avoids unpleasant surprises. we could easily write a small class that duck types like an empty dict *and* an empty list by boolean evaluating to `False` and responding to key access, attribute access, and indexing by returning itself.

    one catch is that `is None` would fail, which may not be possible to fix. another catch is that it’s magic, so it probably would surprise us at least a few times. still curious though.

    for writing, each instance could remember the access that generated it and serialize to the appropriate type: index => list, attr/key access => dict. the magic level is pretty scary at that point, of course.

  2. I like your idea, Ryan. I think the problem is much simpler if you only try to solve the reading JSON case. Most of the weirdness I describe above comes from trying to modify the objects and then reserialize them.

    It’d be worth circling back and seeing how Javascript handles it and why it feels simpler. Missing keys just have the value “undefined”, which works more or less like None in Python. It does not act like a list or a dict.

  3. I can see these being useful for some kinds of interactive use but how can any subclass of dict (which just about all of these seem to be) survive production use with attributes named ‘keys’ or ‘values’? That seems like a recipe for inscrutable disaster, well before any of the more complicated issues pop up.

Comments are closed.