simple database exploration

Following up on an earlier post wanting to be able to do database things simpler, some nice results.

I’m liking dataset as a quick way to write data from Python into a database. Basically you just insert dicts into a table; it takes care of generating a schema, selecting types for things, etc. I’m really liking it for writing data to databases. It helps read data too! But it doesn’t do much other than wrap rows in dicts which is nice, but I wanted more. (It also has support for fancy Python queries, but I just want raw SQL).

For reading data I’m liking ipython-sql. It adds a nice Jupyter interface on top of a raw SQL repl like that provided by sqlite or psql. It’s quite simple to install. And since it’s all implemented as an IPython magic you can mix it in with Python code in your notebook. I’m mostly just using it as a way to display SQL results as HTML tables; simple, but very useful. It has some support for fancier stuff like plotting and exporting the result sets into Python objects, Pandas DataFrames, and CSV files.

 

Python shelve: db type could not be determined

I hit a funny bug with the Python shelve module. I’d write a shelf database, then try to open it again and get this error:

db = shelve.open('foo.db')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/lib/python3.6/shelve.py", line 227, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib/python3.6/dbm/__init__.py", line 88, in open
    raise error[0]("db type could not be determined")
dbm.error: db type could not be determined

Turns out this is my mistake. I created the shelf with code like this:

db = shelve.open('foo')

This creates a file named foo.db, a Berkeley DB file. And then I tried to open it:

db = shelve.open('foo.db')

See the error? The shelve module is appending the .db for me; I shouldn’t have added it myself to the filename. Simple enough mistake I made, confusing error results. It’d be nicer if the code threw an error or opened a new database file named foo.db.db, but instead you get this confusing error.

Python 3.6.3 on Ubuntu 16.04, which I think is libdb 5.3.

Note this error is different from an often-referenced bug in older Pythons not recognizing GDBM databases consistently (issue 13007).

Some IndexedDB notes

I’m using IndexedDB for permanent storage for a webapp. It’s pretty nice; a NoSQL transactional database in your browser! And all built with asynchronous calls. The API is notoriously confusing so I’m using Dexie as my front end. It’s very friendly and I like the way it uses Promises so directly. It also has a smart story about schema versioning. (See also LocalForage.) Anyway, some notes:

How much disk your data uses is a mystery. I have a database with 1873 records that consist of an integer primary key, an integer timestamp, and two doubles. Let’s call it 64 bits each, so that’s 32 bytes a record. OTOH on disk it’s taking 464k, or 254 bytes a record. There’s no Javascript API for asking how much space you’re using or what your quota is.

Storage limits are also confusing. Firefox has no upper bound but will use up to 50% of your disk’s free space and then start evicting stuff on an LRU basis. Chrome also has no fixed limit but may be based on the user’s hard drive size. Older references say a strict 5MB limit but that seems to be much bigger now. For a desktop user it seems likely to be gigabytes.

Firefox implements IndexedDB as sqlite files, but the sqlite schema on disk isn’t anything like my application’s schema. There’s tables named “database” and “index_data” and “object_store”, so I’m guessing they’re using sqlite more as a structured file than the actual database.

 

gzthermal compression visualization

Neat little tool: gzthermal-web, which visualizes the efficiency of compression from gzip. Each byte is colored depending on how well it compresses; dark blue for 1 bit up to red for 9 bits. It’s a wrapper for the gzthermal tool, as discussed on Hacker News and in this sort of odd article. (Undocumented feature; append &w=1 for a wider image.)

Here’s my weblog visualized with it, sorry for the giant long image. It looks better if you view the image alone shrunk-to-fit.

Structure is clear. The bright yellow parts are inlined PNG images which won’t gzip very well since they’re already compressed. The dark blue chunk at the front is my calendar, a bunch of stupidly repeated HTML. Blog posts are the motley blue/green patches separated by the dark blue of the HTML boilerplate between blog entries.

I’m not exactly sure how I’d act on this information. But it looks neat!

gzthermal.now.sh

Javascript Promises wot I learned

I program in 2006-era Javascript. But recently I started doing some more modern stuff. Arrow functions, those are great! And easy. Promises! Those are great. And confusing. Here’s some notes about what I’ve learned about Promises.

Warning: I’m a n00b here, some of this info could be wrong.

  • The best single guide to Promises I’ve read is Google’s. The folksy tone at the start is annoying but keep reading. MDN also has a basic guide that’s useful. You’ll want the MDN reference too.
  • The key thing no one explains is that when you create a Promise, it is automatically added to the event loop and will be executed by the browser when it has time. You do not have to schedule a promise or create your own event loop, like you do in Python async programming. It’s all automagic.
  • The exciting thing about Promises compared to other async frameworks is the .then() function, the ability to chain a second promise onto a first one. “First do this thing asynchronously, then when it is completed start doing this second thing. And make this chunk of two things itself a Promise so it runs asynchronously”.
  • You can also combine Promises with all() and race() but I don’t see it being used that way very often.
  • The other neat thing about Promises is it has an API for error handling. I’ve never used this for real (I’m an optimist programmer), so I can’t say more about it, but it seems important.
  • A Promise is a Javascript function that runs like any other function, but will yield control and allow other things to run when it’s blocking on network IO. (At least I think that’s right, I’m not 100% confident.)
  • When I’m sloppy I think of a Promise as basically a Thread. An independent unit of execution that runs in parallel with everything else. That’s not really true; Promises aren’t Threads, in particular there’s no interrupting multitasking. There is co-operative multi-tasking though so as soon as your Promise blocks waiting for network or whatever, something else will happen. Specifically the browser’s event loop will keep running and the page will be responsive, the key reason to be using async programming at all.
  • It’s OK to create Promises that never invoke their resolve or reject functions. That means the Promise won’t ever have a return value to pass on to the next Promise in the then chain, but that’s OK. Maybe you wanted the side effects or something. It’s probably not a good idea though, the promise will stay in pending state forever and nothing chained after it with then() or whatever will ever run.
  • It’s also OK to mix Promises and events for different styles of asynchronous programming. Promise chains let you program in a more or less declarative way: do A then B then C. But it can be tricky to arrange your control flow for a whole page program to make sense this way. Events are nice because the publisher of the event doens’t need to know anything about who might be consuming it somewhere else. It’s fine to have a Promise post an event that something else listens to and acts on asynchronously.

This code is laughably simple, but I found it useful when learning about Promises:

 

var state = 1;

console.log("starting");

console.log(state);

p1 = new Promise((resolve, reject) => {
    console.log("p1 created");
    setTimeout(function() {
        console.log("p1 finished");
        state = 2;
        resolve();
    }, 1000);
});

console.log("made p1");
console.log(state);
p1.then(function() { console.log(state); });
console.log("made p1.then");

Ubiquiti EdgeMAX EdgeOS router notes

I’m setting up a new Ubiquiti router, an EdgeRouter X SFP using firmware v1.9.7+hotfix.4. Some notes on doing this:

Out of the box the router only works on ethernet port 0 (eth0). It does not run DHCP. You have to manually configure a computer to talk to it on 192.168.1.1.

The router will redirect you to HTTPS when you connect. However they of course don’t have an SSL certificate for 192.168.1.1 so your browser will refuse to connect. You can override this in Chrome and Firefox by looking for “Advanced” mode on the error screen.

The first thing to do with a new Ubiquiti router is upgrade the firmware. The stock firmware is old and missing important features. (My memory is upgrading UniFi access points is also necessary, the shipping version had a crippling bug.)

Once you flash it the first thing to do is run the “Basic Setup” Wizard. (Note, this is not available in the shipping firmware.) This will set the router up to do What You Expect from a consumer router; NAT routing for eth0 (the WAN port) and DHCP for the rest. For a simple home network that is probably all you need to do.

1.9.7 and other versions have a bug where if you manually configure some static IP addresses along with DHCP, then dhcpd won’t start when you reboot the router. The fix for this is to remove all static entries. I was able to add one static entry later via the DHCP control panel and rebooting still works. Not sure if I can manually add an entry or not. This is all probably a bug in vyatta, their GUI config system.

DNS is not enabled for DHCP entries or static hosts. That’s a thing the dnsmasq used by many routers does for you. I can’t figure out how hard this would be to enable. The obvious wizard reportedly only adds your entries to /etc/hosts on the router itself which is not propagated by its DNS server. I didn’t try very hard.

UPnP is not enabled by default. You can add it via a feature wizard; internal is “switch0” and external is “eth0” (or whatever your WAN port is).

For my own notes, a network setup idiosyncratic to me, I added a static interface route to 192.168.1.0/24 going out eth0. Hoping to reach my invisible wireless ethernet bridge boxes on 192.168.1.110 and .111. No luck, but I think that may be because they’re configured to think the network is 192.168.0.0/23 and so can’t talk to my 192.168.3.* addresses. Oops.