sudo bash apt-get update apt-get upgrade apt-get install git git clone https://github.com/openaddresses/machine.git cd machine/chef ./run.sh
Done! The shell command openaddr-process-one now works and does stuff.
In brief, this:
- installs Chef and Ruby via apt
- runs a Python setup recipe. That installs a few Ubuntu Python packages with apt (including GDAL and Cairo), then does a “pip install” in the OpenAddress machine directory. This tells pip to install a bunch of other Python stuff we use.
- runs a recipe for OpenAddresses. This uses git to put the source JSON data files in /var/opt.
But really, that’s so manual. If you just pip install openaddr-machine it makes a /usr/local/bin/openaddr-ec2-run script that will do the work for you. That in turn invokes a run.py script which you run on your local machine. It, among other things, runs a templated shell script to set up an EC2 instance and run the job on it.
The shell script that is run on EC2 is pretty basic. It:
- Updates apt (but does not upgrade)
- Installs git and apache2
- clones the openaddress-machine repo into /tmp/machine
- Runs scripts to setup swap on the machine, then invoke chef to set up the machine
- Runs openaddr-process to do the job
- Shuts the machine down.
The run.py script you use on your own machine is mostly about getting an EC2 instance.
- De-template the shell script and put it in user_data.
- Use boto.ec2 to bid on a spot instance
- Wait for up to 12 hours until we get our instance
The details of how the EC2 instance is bid for, created, and waited on are a bit funky but seem well contained.