routes at my WISP

I continue to be baffled by my WISP’s networking. Last few days I’ve had frustrating small outages, like 1-2 minutes at a time. But only during working hours. Support told me they were doing “some maintenance that might cause that” one of the days. I fear they think even an occasional 2 minute outage while they reboot equipment is OK, but happening 5+ times a day is getting really old. I do have some sympathy for them though. It’s a fixed wireless network, my house is connected via at least 2 radio relays at other customers’ houses before reaching a wired network. That’s gotta be hard to manage.

Anyway, I realize part of my confusion now is that traceroute is lying to me. Here’s an mtr report after running about 25 minutes.

                                       Packets               Pings
 Host                                Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. OpenWrt.lan                       0.0%  1606    0.4   0.3   0.3   0.5   0.0
 2. sbbgateway.lan                    2.6%  1606    2.2   2.3   1.6  24.6   2.0
 3. 173.195.183.1                     0.0%  1605   35.7  39.6  16.0 136.5  15.0
 4. 173.195.188.45                    0.0%  1605   33.3  37.5  14.4 139.4  14.9
 5. 173.195.188.21                    0.0%  1605   46.1  39.7  17.2 142.3  15.5
 6. 173.195.177.1                     0.0%  1605   29.0  36.9  18.0 128.5  15.1
 7. 173.195.177.254                   0.0%  1605   37.8  39.0  17.5 145.9  15.3
 8. v122.core1.fmt2.he.net            0.0%  1605   39.1  51.3  28.0 176.6  14.3
 9. 10ge7-1.core1.sjc2.he.net         0.0%  1605   57.3  51.2  27.4 128.1  15.7
10. 72.14.219.161                     0.0%  1605  104.7  47.6  28.2 155.5  16.3
11. 108.170.242.81                    0.0%  1605   57.0  49.5  27.5 142.6  15.9
12. 216.239.49.103                    0.0%  1605   55.2  50.6  28.3 146.1  15.8
13. google-public-dns-a.google.com    0.0%  1605   49.0  47.2  27.4 158.4  14.3  

Hop 1 is my router. Hop 2 is the wireless antenna on my property. Hop 8 is Hurricane Electric, a network provider in California, and beyond that is the public Internet.

But what’s going on between hops 3 and 7? I used to think this was the wireless nodes in my WISP, that hop 3 was the first radio and hop 4 was the second and so on. But looking at that average ping time I realize there’s no latency between 3 and 7. So now I think that hop 3 must be the first bit of wired infrastructure at my ISP, and that hops 2-3 are the sum total of all fixed wireless links. Perhaps they are acting as bridges, transparent at the IP layer. That’s how my own Ubiquiti nodes operate, I actually have 2 Ubiquity M2s running between hops 1 and 2 that are invisible to mtr.

FWIW the boxes at hops 3-5 look to be running MikroTik RouterOS. That’s also consistent with wired infrastructure; my wireless antenna at hop 2 is a Cambium device. No idea what 6 and 7 are. I don’t want to freak out my ISP by probing too hard.

One reason I’m looking at this so closely is I’m trying to understand the 2.3% packet loss shown at hop 2. That looks like it’s to my own equipment and I should be able to fix that. But every test I’ve done directly shows 0 packet loss to that device, the Cambium antenna up in my tree. Now my theory is the packet loss is actually in the wireless network between hops 2 and 3, the WISP’s wireless infrastructure. MTR can’t report it directly because those devices aren’t visible at the IP layer, so it just credits the packet loss to the last IP hop in the route, hop 2.

I think I see a similar thing when I mtr backwards back into my router at the WISP from a public Internet host; there’s no latency in my ISP’s network on the way in until a high latency gap at the last hop. The route back is different from the route out, which makes direct comparison a bit trickier.

Latency

I did a ping test to 173.195.177.254, the edge of my ISP’s network.

51 packets transmitted, 51 received, 0% packet loss, time 50076ms
rtt min/avg/max/mdev = 17.645/38.194/84.315/14.119 ms

I also asked my neighbor to do one.

52 packets transmitted, 52 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 17.737/29.401/53.861/8.731 ms

So his latency is 9ms better than mine. Significantly less jitter and worst case, too. I believe the difference between us is one radio hop. He has a radio link directly to Twin Cities Church. I think I have two hops, one north to a house and then from there south to the church.

I’d thought the church was where the wired internet began, it’s a major hub for the WISP. But those latency numbers suggest there’s possibly more wireless. The WISP’s office is down in Alta Sierra. The distances involved are so short I don’t think signal propagation time matters; it’s at most 20km, or 0.07ms at the speed of light. Not relevant. Processing time on the antenna relays though, not to mention congestion, those matter. Perhaps it’s 9ms per wireless hop, based on my measurement of my own link. We have ~20ms of latency unaccounted for, that suggests there are two more wireless hops from the church to the wired infrastructure.

Exede satellite Internet

A Metafilter discussion introduced me to a type of Internet access I didn’t know about before; Exede satellite Internet, part of ViaSat (and WildBlue). It seems better than the Hughesnet satellite Internet which is so slow, metered, and high latency as to be nearly useless. Although I took a brief look at Hughes and they seem to have improved their offering somewhat, not sure if Exede is significantly better.

If you’re in the right regions Exede offers Liberty Internet, 12/3 Mbps, or even 25/3, for $70-$150 a month. The problem is after you go through 12-30GB that month they drop you down to a lower service which is only 1-5Mbps, or maybe worse. If you’re in the wrong area all you get is Classic service, 5/1 Mbps, for $50-$130/month. That comes with a 10-25GB monthly cap and I can’t tell if it’s a hard cap or another slowdown.

It’s still high latency though. Their FAQ is full of mealy-mouth bullshit about “web accelerators” and how latency is better, but the what to expect page confesses 700ms latency and that gaming won’t work. That’s the incontrovertible problem with geosync satellite systems; it’s 240ms just getting to the satellite and back, not to mention all the other work that has to be done.

The other part of Exede that got my attention is it’s bidirectional; you’re sending signals back to the satellite via the antenna. Old Hughes installs relied on a phone line for uploads. Hughes now advertises up to 1Mbps upload speeds though, so maybe they have a satellite option now.

I was baffled why the 12/25 Mbps service was available in San Francisco, but only 5 Mbps in Grass Valley. It’s satellite, why does it matter where I am? Well it turns out they’ve broken up the US into cells with different “beams”. I found a good map and technical detail. It looks like most of the western US isn’t served by ViaSat-1 at all, but instead by Anik-F2.

beams.JPG

 

beams 2.JPG

 

Windows metered network connections

My ISP is down again so I’m back to Verizon tethering. Setting up the network connection I noticed Windows has a metered network option. The docs are very vague, but basically it’s a flag to tell the OS and applications “hey, this network isn’t free”. Windows itself will be a bit more careful; only downloading priority updates for example. There’s also configurable options to not sync OneDrive or download device drivers.

In theory third party apps could also inspect this setting and be smarter on a metered connection. Specifically apps on the new “Universal” platform. In practice I wonder if many apps are that smart. Dropbox isn’t, apparently. It’s all a bit confusing really, and most of the search results you see on Google about metering are “why isn’t stuff updating how do I turn off this metered connection thing?”

Also TIL that Windows 10 enrolls your computer as a peer to peer server in uploading Windows updates to other Windows computers. Yes, by default Microsoft assumes you want to use your upstream bandwidth to serve other customers. That’s some fucking bullshit.

 

Ah, Windows, ye olde errors

Bloom is off the rose with Windows. Or at least, Civilization 6, one of the games I switched to Windows to be able to play well. (It runs on Mac but 50% slower, and I wanted higher end graphics hardware.)

The whole system often freezes when exiting the game to desktop. Well that’s not quite right, Ctrl-Alt-Del still works and you can reboot or log out. But Alt-Tab doesn’t work to switch apps, which is odd because it works fine while the game is running.

GSync also doesn’t work. That’s the fancy variable framerate stuff. All sorts of screen tearing unless I enable VSync.

Both of these bugs could be specific to Civ 6. But it’s the kind of thing you’d think the OS and its graphics drivers would take care of uniformly for all games. Ha, as if.

 

Windows S.M.A.R.T.

Windows doesn’t seem to have built in S.M.A.R.T. support! After reading this guide and this Stack question I think the best option is CrystalDiskInfo. It used to have some bundled adware but the author promises he removed it all, and there’s a portable version for download (ie: no installer). I took a chance and it seems to work.

Capture.PNG

That’s my USB drive. Ignore all the junk and focus on “Health Status: Good”. Its heuristics seem reasonable; at least it shows my two new drives healthy and identifies errors in a failing drive that look plausible.

Gripe

Why the fuck doesn’t Windows have basic SMART error monitoring and notification? It’s the first line of detection for a failing drive, popping up errors for the user would save countless amounts of lost data. (And ignore the advice to try Windows’ wmic; at least on the drive I know is failing, it shows “OK” even though it’s not.)

And why are SMART tools so bad? Note the same user-hostile display table at the bottom as every other SMART tool in the world shows. What’s the temperature? 123? 111? No, it’s 00000000001D, which for some reason is being displayed in hex with leading zeros. At least this program translates it to 29C. No idea what the spin up time might mean though. I don’t even know what “Current” and “Worst” are supposed to mean. Linux smartmontools shows the same nonsense.

Speaking of smartmontools, I tried installing like four versions of it for Windows from dodgy SourceForge binaries. They all sucked, many couldn’t even find my drives. So much terrible Windows software out there.

Ken’s computer is showing signs of a failing hard drive. Last CHKDSK randomly found about 30 files corrupted. The machine is about five years old, just in time for a shitty consumer hard drive to fail. Ugh.