MacOS directory permissions

Macs have this weird problem where the Unix file permissions for things get corrupted. Not just a couple of top level directories either. Things like the Finnish localization file for iBooks, an app I have literally never run, is marked group writeable and shouldn’t be. How does this happen?

The thing that’s most alarming is my root directory, /, is mode 0777. World writeable. And owned by my user account, not root. Literally any program running on my computer can come in and hijack the whole system because of that. Not the first time that’s happened either. I’ve read somewhere that a bunch of bad Mac install scripts like to just recursively make things world writeable “to make it work” and they work their way up to /. Also there was that one time when iTunes kept making /Users world writeable. Quality programming there, Apple.

The problem is so common Disk Utility has a special GUI app just to “repair permissions” by comparing the filesystem to records of what should be there left behind by installers. Only that’s a little scary because what if it breaks something? Helpfully there’s an audit mode just to see what’s changed. Run from the GUI or via diskutil verifyPermissions /.

At the bottom of the page is the audit of what all the tool finds wrong on my Mac after filtering out 3000+ lines of garbage. Mostly not too scary, although libruby.dylib being world writeable sure seems like a potential security disaster. The most terrifying one is

Warning: SUID file “System/Library/CoreServices/RemoteManagement/” has been modified and will not be repaired

What, a setuid root executable that’s part of a remote management system that’s been modified? Why that’s not suspicious at all! But have no fear: Apple itself says that’s one of roughly 100 messages from the audit you can “safely ignore”. So yes, the security audit tool prints a lot of false positives. Fucking garbage.

Note that the root directory is not one of the reports in the audit.

I felt lucky and ran the tool and it changed a bunch of things. Then ran the audit again and it found three problems, including the setuid root file I’m supposed to ignore.

It did not repair my root directory. I manually set that to 0755, owned by root.wheel.

Started verify/repair permissions on disk0s2 Macintosh HD
Permissions differ on "System/Library/CoreServices/Feedback"; should be drwxr-xr-x ; they are lrwxr-xr-x 
Permissions differ on "usr/lib/libruby.2.0.dylib"; should be lrwxrwxrwx ; they are lrwxr-xr-x 
Permissions differ on "usr/lib/libruby.dylib"; should be lrwxrwxrwx ; they are lrwxr-xr-x 
Warning: SUID file "System/Library/CoreServices/RemoteManagement/" has been modified and will not be repaired
Permissions differ on "Applications/"; should be lrwxr-xr-x ; they are -rw-r--r-- 
Group differs on "Library/Printers"; should be 80; group is 0
Group differs on "Library/Printers/Icons"; should be 80; group is 0
Group differs on "Library/Printers/InstalledPrinters.plist"; should be 80; group is 0
Permissions differ on "Library/Printers/InstalledPrinters.plist"; should be -rw-rw-rw- ; they are -rw-r--r-- 
Group differs on "Library/Java"; should be 0; group is 80
Permissions differ on "Library/Java"; should be drwxr-xr-x ; they are drwxrwxr-x 
Group differs on "Library/Preferences/SystemConfiguration/"; should be 80; group is 0
Group differs on "Library/Preferences/"; should be 80; group is 0
Group differs on "Library/Printers/PPDs"; should be 80; group is 0
Group differs on "Library/Printers/PPDs/Contents"; should be 80; group is 0
Group differs on "Library/Printers/PPDs/Contents/Resources"; should be 80; group is 0
Permissions differ on "System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/libruby.2.0.dylib"; should be lrwxrwxrwx ; they are lrwxr-xr-x 
Group differs on "private/var/db/GPURestartReporter"; should be 0; group is 80
Permissions differ on "private/var/db/GPURestartReporter"; should be drwxr-xr-x ; they are drwxrwx--- 
Finished verify/repair permissions on disk0s2 Macintosh HD

New Mac logic board: system identity

I brought my 2011 iMac in for repair, to replace a video card under an extended warranty program specific to my hardware configuration. (Reading between the lines, a cooling problem.) Once they got it on the testbench the tech decided to replace the system board as well because it tested faulty. At no charge! Despite the machine being a year+ out of warranty. Nice of them.

Anyway, a new system board means a new CPU, a new ethernet MAC, and a bunch of other small things. Which changed my machine’s identity for various things. I had to re-log in to all Apple online services. Time Machine also expressed concern but offered to “inherit” the backup I had, which seems to have worked. Had to set up a new IP assignment on my router for the MAC address of course.

I’d also logged Chrome out of Google and wiped all local state, for my security. Happy to report that logging back in to Chrome restored everything pretty well. I got all my Chrome extensions back but lost the extension configuration state. Also my browser link history is gone forever, I’d sort of hoped that would sync through Google too.

Fixed wireless Internet: Cambium, Ubiquiti

I don’t believe in wireless communications. Too spooky. Also too unreliable, and slow. And 2.4GHz wifi is particularly bad in San Francisco, between the crowding, the lack of distinct channels in 802.11a/b/g, and the faraday cages surrounding all our lathe-and-plaster rooms.

But in Grass Valley wireless is the only option. AT&T and Comcast refuse to provide wired service to rural homes and PUC and the FCC are not regulating their monopolies effectively. My ISP is SmarterBroadband, a wireless ISP that works by establishing fixed wireless links from house to house to regional distribution point to central office and then down to fast fiber backhaul. It works surprisingly well and reliably, the main limiting factor has been speeds and the challenges of getting clear line-of-sight in a hilly area covered with trees.

My link for the last couple of years has been a 900 MHz link through oak trees to a house 1.4 miles north of me. The key thing about 900 MHz is it works OK through tree cover, something higher frequencies has a harder time with. Cambium Networks hardware running Canopy, which I think dates back to the old Motorola Canopy days. It’s been pretty reliable; the only time my point to point link has failed has been in heavy snow, and then only because snow accumulated on the antenna. Also low latency (20ms?) and low jitter. The problem has been speed. The max link speed is 2600kbits/s, and the product they sell me only bursts that fast before throttling me down to 1000kbit/s. That’s 300 megabytes / hour, and it sucks.

Yesterday they upgraded me. Sent a guy 70 feet up a tree with a Cambium ePMP 1000, a 5Ghz antenna. Pointed at the same house but that far up we’re above the trees, clear line of sight. I’m told the link is good to 25 Mbit/s but I’m being sold 12/2 Mbps. Which is still pretty great and I’m grateful for the upgrade. I’m surprised how cheap the antennas are. They offered me an upgrade for a very low cost, and it looks like you can buy your own antenna for about $140. Of course you need the service!

In addition to 900MHz and 5GHz my ISP sells two other types of wireless links. There’s a 2.4GHz product they mostly don’t sell any more. It also requires clear line of sight, but is slower than 5GHz. And they have a new 3.4GHz product they label “4G”, I think because it’s based on cellular technology. It’s pretty fast too (their product offers 70% the bandwidth of 5GHz) and is able to handle a little multipath, so works with a small amount of tree cover. Didn’t work at my house though, and once we were committed to going up a tree we might as well use the 5GHz gear.

The other half of the problem I have now is getting the Internet connection from the tree 200′ from my house inside. I’m intending to run cable, but that may get complicated depending on the status of existing conduits and the challenge of trenching. Right now I have a shitty old Linksys WRT54GL sitting outside and it works surprisingly well. That’s making me think I should try a wireless link instead, even though I don’t believe in wireless.

I asked on Metafilter and got a clear consensus answer on the product I need: a Ubiquiti Nanostation M. These are consumer grade point to point wireless links. There’s a variety of frequencies and antenna sizes. They’re rated for 5+ miles so are way overkill for my 200 ft needs. But then I can probably use the tiny 9″ x 3″ antennas and be done with it. It looks like I can get a pair for about $150. Need mounting hardware and a bit of wiring work at both ends to install it right, but still that’s got to be cheaper than digging a trench.

The big question is how reliable this kind of link is. Reports are promising! I’m also curious to know what’s going on at a protocol level. They call their protocol “airMAX” which is apparently not 802.11, it’s some TDMA thing. OK, I’m fine with proprietary. But then does the link work as an ethernet bridge? Or is it working at an IP level and messing with my packets? More to learn.

Wireless; maybe not as scary as I have believed! Still nothing beats a clean run of Cat5e. Except a pickup truck full of hard drives, of course.

Update: Ubiquiti answered a tweet I made asking for advice, with a suggestion to use the Nanostation locoM5 and a suggested wireless bridge configuration. That sure makes it easy!

Multivariate linear regression, gradient descent

I’m taking Andrew Ng’s online Machine Learning course on Coursera. First time doing a MOOC for real, and on the fence about the learning style, but it is nice to have an organized class with weekly assignments.

Two weeks have gone by. The two weeks together sort of consist of one learning unit. You learn how to do linear regressions for datasets. What does that have to do with machine learning? Well, a linear regression model is a very simple form of predictive modelling for a dataset. “I fit this straight line to my 100 data points, then can use that line to predict values for arbitrary other inputs”.

The course is a bit schizophrenic about being math vs. computer programming. Ng’s lecture notes are entirely in terms of linear algebra, building up to result equations like

Screen Shot 2015-07-26 at 6.55.26 PM

(WTF? X is a matrix of your input feature set; m rows of n features each. y is an m row vector of expected feature outputs. Theta is an m row vector that is the coefficients of your linear regression prediction model. Alpha is the “learning rate”, a number that’s picked essentially by intuition. The assignment := is shorthand for iteration; we keep iteratively improving the theta vector until it converges.)

I hate linear algebra. Always did, ever since I was 19 years old and it was my 8AM class. It was the only math class I nearly failed, then crammed super hard the last week and got an A. Then promptly forgot it all. Happily, this class is also a programming class, and the actual exercises are “implement this function in Octave / Matlab”. So I get to turn that confusing math into simple code:

Screen Shot 2015-07-26 at 7.04.07 PM

While I’m a good programmer it’s been many years since I used a matrix programming language like Maple/Matlab/Octave/R. So getting to that function was hard-worn. I ended up implementing that by following Ng’s lecture progression. He starts with a simple single variable linear regression. I coded that using lots of loops so all the actual arithmetic was scalar operations. Then I tediously hand-translated all those loops into vector forms and generalized it to multivariable inputs. Good learning exercise both to remind me how linear algebra works and to learn the funky vagaries of Octave/Matlab execution. (TIL automatic broadcasting). It was gratifying to see how much faster the code ran in vector form!

Of course the funny thing about doing gradient descent for linear regression is that there’s a closed-form analytic solution. No iterative hillclimbing required, just use the equation and you’re done. But it’s nice to teach the optimization solution first because you can then apply gradient descent to all sorts of more complex functions which don’t have analytic solutions. If I end up getting to do genetic algorithms again I’m gonna be thrilled.

In the end I feel pretty proud of myself for completing week 2, doing all the optional extra work, and understanding it all. My long term goal here is just to understand enough about machine learning algorithms that I can stop worrying about how they are implemented, just bash about with someone else’s software libraries applied to my data. But it’s helpful to understand what’s going on under the hood.

Installing Octave on Mavericks

I’m using Octave for a machine learning course I’m taking online. It’s an open source MatLab clone, and true to open source it is ugly, awkward, and doesn’t install easily. Of course it’s also great free software that I’m grateful for, but that doesn’t make it easier to install.

There’s a binary version of Octave 3.8.0 that is an official installer. But the command line interface doesn’t seem able to plot graphics of any kind. The GUI version can, but there’s so many other bugs in the GUI I want the command line to work. Specifically for a program like this to run:

t = [0:0.2:2*pi];

Homebrew has a formula for Octave 3.8.2 with a zillion dependencies and options. In theory “brew install octave” should work but in practice it’s not sufficient. Here’s something approximating what worked for me, based mostly on these notes

  1. brew install Caskroom/cask/aquaterm
    Some funky graphics system that allows plots to be drawn without X11
  2. brew install gnuplot –with-x11 –with-aquaterm
    Install GNUPlot using various graphics systems. The octave formula doesn’t seem to select these reliably.
  3. brew install octave –without-docs –without-gui –without-java
    Building the docs induces TeX to be installed. No thanks. I have no idea what optional Java things Octave depends on but I’d rather not bother. And I don’t want the experimental GUI, so skip that too.
  4. brew link –overwrite gcc
    Octave’s formula is designed to be built with gcc and gcc’s Fortran. It installs gcc (which takes over an hour to compile!) but I had a different version of gfortran installed previously, which caused my Octave build to fail. The –overwrite causes the gcc package to override gfortran.
  5. brew install ghostscript
    I had to run this three times by hand because the download server was broken.

This took about 3 hours total. Once completed, my simple 2 line plot program works. Yay!

One added complication; you can select different graphics systems in Octave at runtime. The brew info even helpfully tells you how:

setenv('GNUTERM','qt') # Default graphics terminal with Octave GUI
setenv('GNUTERM','x11') # Requires XQuartz; install gnuplot --with-x
setenv('GNUTERM','wxt') # wxWidgets/pango; install gnuplot --wx
setenv('GNUTERM','aqua') # Requires AquaTerm; install gnuplot --with-aquaterm

If I don’t set this at all, I seem to be getting aqua. And it works. I can also use the x11 graphics system and it works as well.

That IPython web browser integration sure looks like a smart idea now doesn’t it?

JSON query languages

Following up on an earlier post, a roundup of some systems for querying JSON documents. I’m looking for small data extraction languages that make it very easy to extract bits of data from a bunch of JSON blobs. Kind of like regular expressions or XPath or CSS Selectors. A key thing here is being able to do stuff like “find me the parts of this JSON document that have a property foo with value bar”. Also dealing with arrays intelligently.

jq. It’s the most popular JSON query language, and quite powerful. I also find it really terse and confusing but maybe that’s because I don’t use it enough. But it’s definitely the tool to compare everything else to. There’s also been some work on a usable libjq so you can use it inside other languages, like these Python bindings.

OboeJS. It’s a streaming JSON parser but also has an XPath-like language for pulling out chunks of a document. I like the look of it.

jsonfilter is a very simple grep-like tool that works with hierarchical input. It looks good and simple for doing simple things. The author is someone I trust to think a tool like this through.

Postgres. It has native support for JSON types and a pretty decent set of functions for working with it. It also has functions for converting JSON to Postgres’ array and recordset types, which you can then further query. And the JSONB type has a few extra query options. I don’t think I’d load data into Postgres just for JSON queries, but if you have JSON there already it’s pretty powerful.

MongoDB‘s JSON query language is surprisingly limited for a system that’s all about searching JSON documents. Maybe that’s just my ignorance, but I can’t find any operators for diving into the document structure other than basically walking the JSON document tree manually.

Other options I didn’t explore. #1 question is whether these have momentum and/or a future. A bunch of them look active on first glance. Will update as I read about them and find anything interesting. There’s a bunch of other things not on this list, too.

Update: in further discussion, pvg notes it’s helpful to split JSON query systems into two groups: “QLs that do selection/filtering and QLs that do that and also processing and sorting and munging and aggregation and summing and map/reduce and …”.

I really want a system that does all the processing and sorting and munging too. But SQL is very good at that already. So really I just want to do some simple extraction with jq or whatever and then feed it to a SQL system for aggregate reporting. Or else maybe just use Postgres to do everything? I need to get my hands dirty with some feature extraction examples to see how jq / postgres / etc feel.

2011 iMac crash: vertical lines on screen

My iMac just crashed hard. Watching a video in VLC one second, dead machine with mostly black screen the next. Some strange light colored vertical lines.

Googling around this is apparently a known problem, a hardware failure of the GPU. This Apple support page notes a particular known problem for exactly my model of video card and CPU. (2011 iMac, AMD Radeon HD 6970M GPU, 3.4GHz i7 CPU). Sounds like they have a repair program.