One long wire

Three years ago I kludged up my Internet access in Grass Valley using a Ubiquiti wireless ethernet bridge. Today I was able to remove that and replace it with a simple long wire. Yay! We finally got some electricians out to run cable through existing conduit. The conduit was crushed, but fortunately not under the driveway like we’d been told by the last electrician. The break was right at the edge of the driveway, probably where a heavy truck fell off once and crushed the conduit. Dug that up, fixed it, ran three new Cat 6 wires and now I’m wired from my house to the antenna up in a tree.

Shout out to the Ubiquiti Nano M5 and Nano Loco M5 though. Those things worked like a champ, rock solid for 3 years. It wasn’t too demanding a problem; 12 Mbps wirelessly over a couple hundred feet. But it worked great for three years with zero problems. I don’t think I ever even had to reboot them. And when the power went out, they rebooted themselves just fine. Very nice hardware and software.

I’m gonna do some tinkering with my new wired setup. I have about 450′ of cable run total from my router to the tree antenna. That’s broken up into three pieces. A 100′ segment from router to an ethernet switch in my garage, a 300′ segment from the ethernet switch to a PoE module at the tree, and then a 50′ segment from the PoE module to the antenna up in the tree. I’m gonna see if I can get away with a passive RJ45 coupler between the ethernet switch in the garage and the cable that goes up the tree to the antenna, move the PoE module down into the garage. That’d let me not rely on outdoor power up by the tree any further. Update: turns out this works fine.

Ethernet has a well known a 100 meter / 328 feet limit on cable runs. But is that a real hard limit? The folklore is it has to do with timing and collision detection, but apparently that might have only been true for very old 10BaseT and/or with non-switched networks. In modern times it may be more a soft standard for signal strength and attenuation in which case there’s some wiggle room. I’ve heard from friends with 500′ or longer runs who say it works.

Identify Reddit deplorables

Interesting new Reddit tool: Masstagger. You install it and it pops up little red warnings next to user’s posts. “the_donald user”, or “kotakuinaction user”, or the like. A quick way to get some insight into a Redditor’s history and reputation. Makes it easy to identify the Nazi-wannabes at least.

More about it in this Reddit discussion. I particularly like the author’s responses to the kind of crap these projects always attract. “Why not open source? … Because I don’t want to”. “This is just like giving Jews yellow stars! … No, not really.” “Can I add my own subreddits to tag to meet my own personal desires? … No, this identifies Nazis. My old tool was editable and people used it to stalk porn posters.”

Behind the scenes the way it works is they have a list of deplorable subreddits (104 right now) that they monitor. The server on the backend is constantly downloading posts to those subreddits and keeping statistics on which users post there. There’s a second service that lets you look up the scores for a list of usernames. That’s used by the browser addon; when you load a Reddit page it gets the scores and annotates accordingly.

They had some scaling problems today;  unfortunately the service is dynamically generating the statistics data when users ask. I was thinking they could just do things statically, generate a statistics file once an hour for the addon to download. But tracking 100,000 users over 100 subreddits that’s 10M records, or maybe 200M of static data. That’s a lot to serve in a single file.

There’s a variety of existing “profile Reddit users” sites; see SnoopSnoo, Reddit User Analyser, and Reddit Investigator. I wonder if any of them have a backend suitable for this use? Reddit User Analyser works by fetching comments from Reddit directly in the browser page; no server, so probably too heavyweight for this addon. SnoopSnoo appears to have a database on the backend, the report pages come back with little bits of data injected as scripts in the HTML source. Reddit Investigator is down right now.

Anyway it’d be pretty simple to build a custom service for this. Less clear how hard it’d be to make it scale. Static files are clearly the best choice, but it’s a lot of data. Maybe one static file per profiled user? That would require the addon fetch like 40 static files with each page load, that’s not great but it’s not awful.


environment variables and secrets

People put secrets in environment variables all the time. AWS keys, database credentials, etc.

But in Ye Olden Days environment variables were not secrets. At least on Ultrix 2.2, the BSD 4.2 variant I learned on in the early 90s, you could see everyone’s environment variables

Note that denying other users read permission [to your .profile] does not mean that they cannot find your PATH or any other of your environment variables. The -eaxww options to the ps command display in wide format the environment variables for all processes on the system

Linux explicitly does not do this. Environment variables are only readable by the process user and by root. Now I’m curious about the history of environment privacy. I don’t know if Linux always treated environment variables as secrets or what other Unix systems do or did. It seems like an interesting change in behavior. I’m not clear on what POSIX says either.

BTW there are many articles on the Web that say it’s dangerous to put secrets in environment variables. (Example.) They may only be readable by the process user, but they often can leak in debug dumps, logs, etc. OTOH I see a lot of people using the environment for secrets now, so maybe standards are changing.

See also this discussion.

Twitter robot purge

Twitter purged a bunch of robot accounts this week as some sort of effort to clean up their platform. You can see the personal result for you by logging in to, you’ll see the falloff.

My personal account @nelson lost about 5% of followers. My account is pretty ordinary other than it’s old.

I have an account with a much simpler set of followers though: my robot account @somebitslinks. That’s an automated account posting interesting links I place on Pinboard. It’s not social in any way. But it got 10,000 followers all in one day in January 2012 when it was listed for a few hours as a recommended account on Twitter’s home page.

Over the years that account has lost followers, I imagine as people got fed up with all the robot spam in their timelines. It slowly dropped from 10,000 to 9000. This week in the purge it dropped from 8947 to 8087, or just about 10%.

I don’t know what to conclude from that exactly, other than Twitter’s bot finding algorithm apparently identified 10% of users from January 2012 as bots. Makes me wonder what the real number is.

Update: some context in this tweet, showing big accounts losing anywhere from 2% to 78% of followers. Also this NYT summary.

People visit roughly 25 places

Interesting study summarized over at The Economist. The researchers tracked the movements of 40,000 people as they went about their daily lives. They found the number of places that people go to regularly is about 25. That set of 25 places changes, but when someone adds a new place they tend to stop going to the old place. The result is kind of like a Dunbar’s number but for places, not relationships.

The article is a good summary but the paper is of course more detailed. Evidence for a conserved quantity in human mobility. If you ask a friendly librarian in Taiwan (speaking Russian when asking) they might give you this download link.

To be honest I didn’t get a lot more out of the paper than the Economist article. The statistical methods are unfamiliar to me and I’m too lazy to figure them out. But some details:

  • They define a “place” as anywhere someone dwells more than 10 minutes. These are characterized as “places offering commercial activities, metro stations, classrooms and other areas within the University campus”
  • People discover new places all the time. The fit is exponential, roughly
    locations = days ^ 0.7 over a span of ~1000 days.
  • The probability a new place becomes part of the permanent set is somewhere between 7% and 20%. The Lifelog dataset (their largest) yields 7%; the others are 15-20%.
  • There’s four separate datasets. Sony Lifelog is the big one; that’s like Google Timeline combined with a fitness tracker. But also several academic datasets. One of those, the Reality Mining Dataset from the MIT Media Lab is publicly available and covers 94 people.

Interesting research. I wonder if it’s really true? It seems plausible enough and matches my personal experience. Particularly since I split time between two cities; I go to fewer places in San Francisco regularly now that I am half time in Grass Valley.


Fixing bufferbloat in Ubiquiti EdgeOS

This Hacker News discussion got me diving in to enable smart queuing in my Ubiquiti EdgeMAX routers, the ones running EdgeOS. There’s a quick-and-dirty explanation of how to set it up in these release notes, search for “smart queue”.

Long story short, under the QOS tab I created a new policy for eth0 and set bandwidth numbers a little higher than the 100/5Mbps my ISP says it sells me. Then tested with DSLReports speed test

  • No smart queue: I get about 105/5.5 with an F for bufferbloat. 300ms+
  • Smart queue at 100/5: I get 90/4.5 with an A+ for bufferbloat.
  • Smart queue at 110/6: I get about 100/5.3 with an A for bufferbloat.

Update: several folks have pointed out to me that smart queuing causes problems if you have a gigabit Internet connection. The CPU in a Ubiquiti router can only shape 80-400MBps of traffic depending on how new/expensive a router you bought. If you enable smart queueing on a gigabit Internet connection you will probably lose a lot of bandwidth.

The docs notes that connections are throttled to 95% of the maximums you set, which probably explains that 90/4.5 reading. I think the harm in lying a little here is I might get a few ms of lag from buffers.

The Hacker News discussion has a bunch of other stuff in it too. Apparently “Cake” is the new hotness in traffic shaping and is possible to add to EdgeOS, but awkward. Also it’s apparently hard to buy a consumer router that can really do Gigabit speeds, particularly if you want traffic shaping. Huh.

Every single time I’ve enabled QoS I end up regretting it as it breaks something I don’t figure out until months later. I wonder what it will be this time? I hate I have to statically configure the bandwidth throttles.



UDP spam from DirecTV boxes

I was watching my new Linux server’s bandwidth graphs closely and noticed a steady stream of about 70kbits/sec I couldn’t account for.

24kbps of that is my three DirecTV boxes sending UDP packets to the server. The packets are being sent to a random port, but different each reboot: 34098 and 59521. There’s never a single response from the server. It’s the only traffic I see from the DirecTV boxes to my Linux server. Each UDP packet has text in it like this:

HTTP/1.1 200 OK
Cache-Control: max-age=1800
Server: Linux/, UPnP/1.0 DIRECTV JHUPnP/1.0
ST: uuid:29bbe0e1-1a6e-47f6-8f8d-dcd321ac5f80
USN: uuid:29bbe0e1-1a6e-47f6-8f8d-dcd321ac5f80

So it looks to be UPnP junk. Port 49152 is a bit of a tell; it’s the lowest numbered dynamic/private port and is often used by UPnP servers to announce themselves. Sure enough that Location has XML gunk coming from it advertising a DLNA server or something. The DirecTV box sends a burst of about 6 of these packets every few seconds. All three of them.

I wonder why my Linux box is so lucky as to get these? My Windows box doesn’t seem to get them. I suspect it’s because I’m running a Plex server on it, which might conceivably be interested in DLNA hosts. I turned off DLNA in Plex and rebooted and it’s still getting them.

Oh well, it’s not much bandwidth. Not sure where the rest of the 70kbps is going. There’s a lot of broadcast chatter on port 1900, more UPnP stuff. Nothing else focused like this.