Simple monitoring for small projects

Just set up some long-overdue monitoring for small projects like my blog and linkblog.

updown.io makes HTTP requests to your server and alerts you if they fail or do not contain expected contents.

healthchecks.io alerts you if an HTTP URL on their server is not pinged in a configurable amount of time. It’s intended for cron jobs; if your job doesn’t load the URL in time you get a warning.

Both are free for initial usage, enough for at least a year of a small project (or more). And low cost beyond that. They’re both professionally made and have been around awhile. I’m impressed with how many integrations they have. Email notification is the default but there’s Slack, SMS, etc as well.

shout-out also to scrutineer.tech which runs a simple RSS feed warning you about upcoming SSL certificate expiration.

Starlink bandwidth over 3+ years

I was surprised in my last post about how I was getting 25 Mbps upload speeds. So that had me looking again at my bandwidth tests since I got Starlink. These are tested with the Ookla Speedtest CLI tool. They are a test of my house’s network connection, so occasionally speeds are slower than real because I’m using the network or I’ve temporarily failed over to my backup ISP (not Starlink). But overall these reflect Starlink performance.

Fine grained graph

The main trend that interest me is in 2023 and 2024, the steady improvement of speeds. Particularly upload speed. This is a great trend to see in your ISP!

It’s also interesting to see those halcyon days of 2021 before Starlink got heavily subscribed. That’s a suggestion of what the maximum radio throughput service is. Service was worse then: more outages, higher latency, more variability. But download bandwidth was same-ish for a lot of this period. Upload has definitely gone up.

Monthly average

The graph above is some fine-grained reporting interval that Grafana selected, probably a few hours per point. The graph below is an average of every 30 days. The same trend is more clearly visible but of course it doesn’t capture the variance.

Starlink 7 days stats April 2024

This blog post contains screenshots of my monitoring dashboards for my residential Starlink node in Grass Valley, CA. Bottom line: the service is quite good these days.

  • Average speeds: 150Mbps down / 25Mbps up
  • Average latency: 36ms
  • Average packet loss: 0.6%

I do a lot of careful client-side monitoring. Used to pay more attention when the service was new and had some problems. But since January 2023 it’s worked very well for me, when their major over-capacity problem was solved at least for my area. And it got better in January 2024 when they started to improve latency. Mostly now I don’t pay attention to it very closely because it works fine.

The downsides are the median packet loss is high enough to matter. And performance is variable, it gets noticeably worse in evenings presumably because of congestion. But it’s pretty good always. FWIW I think Starlink’s behavior is more like a cellular ISP than wired or geosync satellite. Better quality in general than overloaded American cellular but similarly unpredictable packet loss and jitter.

I still have my dashboards and monitors running and they give very detailed client-side views of performance. Here’s screenshots of my 4 separate dashboards, all data for the last week (April 22 to 29). I think this week was mostly unremarkable and typical. First couple are basic ISP monitoring; the other two are more specific and unique to me or Starlink.

FWIW I have a gen 1 Dishy with ordinary residential service.

Ookla speed tests

This graph shows results from hourly speedtests using Ookla’s command line tool. Main results:

  • highly variable download and upload speeds with a mean of 150Mbps down and 25Mbps up.
  • few hours under 20 Mbps download speed

Compared to the old days, overall speeds are better. I used to be more like 100 / 15. Even better, the “hours under 20 Mbps” is greatly improved; that used to be 8 or more hours a day.

ICMP Ping tests

Using telegraf to send 5 pings every 15s to 8.8.8.8 and measure round trip time and packet loss. See below for the better IRTT tests showing similar data; starting here because pings are familiar to everyone.

  • average latency around 36ms
  • 0.58% mean packet loss
  • fairly stable throughout the day but there’s some visible daily pattern

IRTT latency tests

These are much more detailed results using the IRTT tool to test latency and packet loss with UDP packets to a server I run in a datacenter at or very near the Starlink LA POP.

  • 0.61% mean packet loss
  • 36ms mean latency.
  • Several incidents a day of 20% or more packet loss. (You see this pattern in applications too; the network goes away for a few seconds).
  • Clear pattern of higher packet loss during evenings. 30 minute average gets as bad as 2% or so occasionally.

Starlink monitoring

Starlink’s own monitoring data, using their gRPC endpoint to pull data from the dish.

  • two 30s+ outages around 3am. these seem to be related to firmware upgrades
  • 36ms mean latency
  • 0.6% “obstruction fraction”. It’s unclear what the dish reports as obstructions; at least some of it is packet loss unrelated to my antenna. I have a very clear but not 100% perfect view where Dishy is pointed.

Proxmox production container for linkblog

Starting to containerize some stuff that I originally migrated to a VM. My linkblog, for a start. I need a generic Linux environment that’s fairly beefy; libc, not musl. Ideally I’d bake all this into a reusable container but I’m rolling it by hand for now. Here’s my notes.

Container creation

  • Create a unprivileged container with hostname “blogs”
  • Add root password, my SSH key
  • Use the Ubuntu 22.04 (or latest) template. Ubuntu so I can get deadsnakes for Python versions. I’d like latest Ubuntu but can’t because Playwright (the screenshot library I need for the app) only supports LTS releases.
  • 16GB of disk, 4 CPU cores, 16GB of RAM, 1GB of swap. Or more, these are all soft limits so it may be better to set these higher
  • Default networking but enable DHCP instead of static.
  • Consider adding access to vmbr1 for the NFS server.
  • Edit the container options to make it start on boot

Linux configuration

  • Start the container, launch a console, log in as root with password
  • apt update; apt upgrade
  • apt install joe sudo curl avahi-daemon git zip unzip rsync webp sqlite3
  • locale-gen en_US.UTF-8
  • Install tailscale
    curl -fsSL https://tailscale.com/install.sh | sh
  • Shut down container
  • On the Proxmox hypervisor server, modify /etc/pve/lxc/???.conf to give access to tailscale in the container
  • Start the container
  • Log in on console again
  • tailscale up, approve the URL to join the machine. Consider disabling tailscale key expiry in the web console.
  • Use ordinary ssh client to log in as root@ ts.net tailnet name.

(not sure avahi-daemon is working here, I can’t use the .local DNS name I am expecting. I tend to use Tailscale to connect to things anyway.)

Linkblog configuration

Specific to my application, but some generic notes here also about creating a prod environment for anything.

User setup

  • As root, make a user for the project.
    adduser linkblog; adduser linkblog sudo;
  • Log in as linkblog@ with password
  • ssh linkblog@localhost, to create the ~/.ssh directory.
  • create ~linkblog/.ssh/authorized_keys

Linux configuration

  • sudo apt install python-is-python3 python3.10-pip

App configuration

  • mkdir ~/prod
  • cd ~/prod
  • rsync -a nelson@sf.somebits.com:~nelson/somebits/linkblog/prod/ .
  • rm -r venv
  • python3 -m venv venv; source venv/bin/activate; python3 -m pip install -U setuptools pip wheel
  • pip install -r requirements.txt
  • sudo venv/bin/playwright install-deps
  • shot-scraper install

Testing and running for real

  • ./update.sh -f
  • cd site-generated; /usr/bin/python3 -m http.server 9191
  • Test HTML load in web browser
  • Push site manually and accept ssh host key.
    ./update.sh -f -p
  • Install crontab entry to run update.sh for user linkblog

Feed2toot for @somebitslinks@tech.lgbt

Separate cron job to update a Mastodon account from the RSS feed. Does not depend on any of the linkblog code, just the RSS feed.

  • As user linkblog…
  • mkdir ~/mastodon; cd ~/mastodon
  • python3 -m venv venv; source venv/bin/activate; python3 -m pip install -U setuptools pip wheel
  • pip install feed2toot
  • copy over the install from the old system. This gets credentials, config, and state.
    rsync -av --exclude=venv nelson@example.com:~/src/mastodon/linkblog-toot/ .
  • run a quick test, should show posting nothing if there are no new stories
    feed2toot -n --debug -c feed2toot.ini
  • Install crontab entry to run go.sh for user linkblog

TODO

  • Package linkblog as a proper Python package so deployment is better
  • Write linkblog code to manage crontab (or at least emit the line)
  • Write linkblog code to run test server

Windows and voice recognition

Looking for typing accommodations while my wrist is giving me trouble. I love how Google Voice recognition works on Android, you’d think that it would be even easier and better on a desktop PC. But the UI doesn’t quite seem to be there.

Update. I originally wrote this post using Windows 10. Turns out Windows 11 has an entire new thing called Voice Access and it’s quite good, just about what I need. I’ve edited the post to reflect all this, but it’s probably a little confusing.

I’m fortunate that I can use both hands still and mousing gives me no trouble, just trying to take some load off my left hand.

Windows 10

Windows 10 seems to have two different systems for voice recognition. There’s an older speech control system that is mostly focused on UI control like opening the start menu or closing windows. It has some speech to text dictation but the language model is terrible and not useful.

There’s also the newer windows dictation accessible via Win-H. It’s dictation only, not really system control, but reasonably good. However the UI’s awkward. Also the language model is not as good as what I get on my pixel eight. But it is usable.

The biggest problem is activating dictation mode. I have to click in the window I want to type in, then click this weird microphone icon that’s hovering in a fixed width window on the desktop. And even then it does not work reliably. I would dearly love to assign the “take dictation” function to something like a hot key or mouse button four. Haven’t found a way to script that. Quick searches make it look like it’s not possible.

I had thought Microsoft put more work into usability. The full speech control system does work, but it’s pretty awkward and is something you would only use if you had no alternative. And the fact it’s basic dictation speech model is no good makes it useless to me. Maybe it’s better in Windows 11? I think the full control Voice Access system is much better.

Windows 11

Microsoft did put more work into usability! Windows 11 is significantly better. The old speech control system that I didn’t like is basically gone. Windows Dictation has been extended to Windows Voice Access and gives you fairly reasonable control over the whole UI.

For my purposes, the most important thing is that the speech to text in the new Windows voice access is much better. The UI is also better. You can basically leave it running all the time, no need to press a button to activate the microphone every time you need it. It doesn’t do anything until you start talking. Then it will just insert what you say as if you typed it. That means I can leave the thing running all the time. And when I have something more extensive to say, I can talk instead of type. (Although it seems like it picks up audio from the speaker so that’s a little annoying if you are playing a video with someone talking.) The speech model is pretty good, other than the automatic punctuation system which I’ve had to turn off. Totally usable though, it’s closer to Google Board speech detection than not.

There’s a good little voice control demo that I ran through and I’m impressed that I could actually use this to control a PC with my voice if I’m patient enough. It works by you speaking voice commands. It has some generic controls for clicking named buttons or narrowing in on a region of the screen to click on. It also has some specific app integration. For instance, “close tab” seems to be something it understands explicitly to close tabs in a web browser. It doesn’t work in Chrome but it works great in Edge. Maybe Edge has extra code to interface with voice access? One drawback is there is some ambiguity. If I start a sentence with the word start or close it may interpret it as a command and not me trying to type those words. There is a dictation only mode but I haven’t figured out how to turn it on (saying “switch to dictation mode” should do it but does not for me.)

Voice In extension

Now I’m wondering why Chrome on desktop doesn’t have voice typing. It seems like it would be easy to build it into the browser. I’m finding some third party hacks like Voice In.

I gave Voice In a try and it seems to mostly work and be a lot simpler trigger than the windows built-in thing. You can bind it to any windows hotkey but in Chrome that means control or alt + a letter. I really want just a mouse button. I probably can use Autohotkey to make that happen. Being a Chrome extension makes it pretty limited but then also probably easier to integrate.

I gave it more of a try and it’s mixed results. The language model these things recognize to is so important. I was writing an email to a friend about gay stuff and it kept missing very basic words that would be common in a gay context but not in some other. Also it has a habit of capitalizing random words like extreme or City. Mostly it just made me wish I could use Google Voice typing model because I am very good at getting what I want to out of it.

Having the extension be Chrome only is a little irritating particularly when I switch to slack. I guess I could run slack in the browser. The free version works reasonably well but really you’re going to pay $60 a year if you use it all the time. It’s not a big problem not having a keystroke to activate, mostly you just leave it active all the time.

Google could absolutely build a good Windows desktop or Chrome product for voice typing. I suspect the market just isn’t big enough for them

Kinesis classic on a modern computer

My RSI is flaring up so I thought I’d fire up my ancient Kinesis Classic keyboard, the one that saved my career 20+ years ago when I first had problems.

The keyboard has an ancient AT keyboard cable. I found it with a PS/2 adapter already attached, those two ports are electrically identical so the adapter is easy. I don’t have a PS/2 to USB adapter around the house but fortunately my 2019 PC still has PS/2 ports on the back. Purple for keyboard! After plugging it in the keyboard lit up but I had to reboot to make it actually work. IIRC that’s a problem with the old Kinesis firmware. It works fine with a second keyboard on USB.

It mostly seems to still work fine. Sometimes keys seem to be repeating when I don’t mean it, like I’ll press x once and xxx is inserted. It’s not frequent though, a missed interrupt maybe? I mostly still remember how to type with it too, I suspect it’ll take a day to adjust (took a week or more the first time).

I can’t find any documentation of the custom remapping I set up long ago. It sounds complicated but the main effect is to move keys more to where they are on a normal keyboard. Here’s what I found I had:

  • Bottom row right hand: all 4 arrow keys in hjkl vi configuration.
  • []{} remapped to bottom row left hand (where left and right arrow were)
  • `~ is moved to upper left (where += is on the Kinesis)
  • += is moved to caps lock
  • Caps Lock is unmapped.
  • Escape is moved to the key marked “delete” on the thumb
  • Delete is on lower left (where ~` is on the Kinesis).
  • The Backspace key sends VK_BACK in windows, a Backspace.
  • The Insert key is default, VK_INSERT.

I also went and found my ancient notes on the keyboard firmware for programming.

Progrm-Ctrl-F10   Reset keyboard memory 

Progrm-Backslash Toggle keyclick mode
Progrm-Hyphen Toggle tones for caps lock, etc.
Progrm-F9-xx Change repeat rate to xx
F1 .5 cps F2 3 cps F3 5 cps F4 7 cps F5 10 cps F6 15 cps
F7 20 cps F8 30 cps F9 40 cps F10 60 cps F11 125 cps F12 300 cps

Progrm-Ctrl-F5 Toggle Dvorak/Qwerty mode

Progrm-Shift-F6 Toggle shift as sticky modifier
Progrm-Ctrl-F6 Toggle control as sticky modifier
Progrm-Alt-F6 Toggle alt as sticky modifier

Progrm-F12 Enter/exit remapping mode

Progrm-F11 Enter/exit macro definition mode
Progrm-F7 Make macro pause for data / End entering macro data
Progrm-F8 Put a half-second delay into a macro
Progrm-F10 Disable/Enable all macros

Shift-Shift-F12 Report firmware version (press both shift keys)

Python packaging, pytest, click, VS.Code

I’m working on my blog engine. I want to package it nicely as a Python package. I also want to be able to run my code from a command line using the click library, and write and run tests using pytest, and be able to do all this with the debugger in VS.Code. I’ve never made my peace with Python packaging, for many many years now I’ve avoided it. Because it’s a mess.

Anyway I just tried again here in April 2024 and more or less succeeded in just an hour or so. This is where I would write clear docs with a github repo with the code but.. nah. I’m not confident it’s right and I don’t understand it well enough to present coherently. But I’ll write up what I learned.

I mostly followed the Packaging Python Projects tutorial which explains the creation of this sample project. Following that gets you a basic Python project which you can install with pip install -e. The key thing here is everything’s driven by the pyproject.toml file. If you are reading any docs that talk about setup.py they are old and outdated; the TOML is the new way.

Some subtleties in the pyproject.toml:

  • My project has dependencies that I specified in the TOML. It depends on pytest and click.
  • I had to pick a build backend; I’m using setuptools as the old non-fancy default.
  • My project has console scripts. You can specify these in entry points or in a scripts section.

Getting click working was not hard. There’s docs for click + setuptools. Note these are in terms of the old setup.py style but translating to pyproject.toml isn’t too hard.

Here’s my source tree for a toy package called “whir”

  • pyproject.toml in the root directory
  • Python code in sys/whir/.
  • src/whir/__init__.py that imports all the public functions, particularly the click commands.
  • src/whir/lib.py is simple library code that other stuff imports.
  • src/whir/main.py contains a main_cli() function that’s a click command. __init__.py imports it so it can be used as a script in pyproject.toml. It also imports the lib code via import whir.lib, not a relative import.
  • tests/test_lib is a pytest function that tests the lib code. It imports it with from whir import lib.

Getting pytest working was not hard either. The default setup worked just fine from a shell. Getting it to work inside VS.Code’s extension for it required adding something to pyproject.toml to get pytest to use importlib, as described here. But now I can just press a button the IDE GUI and all my tests run with a nice display. Yay!

Getting the VS.Code debugger to work with click was tricky. AFAICT there’s no easy way to get VS.Code to invoke a click command. But there’s a hack where you call the click command with string command line arguments inside a if __name__ == '__main__'; then VS.Code can be used to just debug the file and it will work.

Ranting reflection

It all worked pretty straightforwardly! But I have no idea at all how it works. If something goes wrong I won’t understand what is happening or how to fi x it. And the tools are creating little files and shims that get between me and my code. It’s always like this with packaging and build systems and I hate it.

Fix for Android Pixel Delayed Notifications

I’ve had an annoying problem on my phone for two months: notifications would be delayed by many minutes, mostly only showing up when I manually woke the phone. I think I found a fix.

Turn off Adaptive Connectivity. Details in this Reddit post the text of which I will duplicate below.

Update Apr 10: after a few days of trying this I’d say turned it off has helped but not fixed the problem entirely. Updates still don’t always come immediately but they are delayed by just a few minutes now, not 30+. I don’t have solid data on that, just observing.

Update Apr 24: notifications still seem better before but it’s not perfect. Also I’ve since learned about Android Doze, a feature that dates back to Android 6 where the phone goes into a low power Doze mode. It’d cause exactly the delay problem I’ve seen. Some more discussion here. I wonder if something changed recently with the implementation that made the delay much more noticeable?

But first, a rant. This is a supremely irritating kind of problem that shows how frustrating the modern tech support experience is. There was no realistic way to get helpful info from Google about what this bug might be or how to fix it. I don’t know if there’s even any real personalized support path at all, if there is one I assume it’s the usual “turn it off and on again” useless response. Web searches didn’t get me anywhere, in part because Google’s index is awash in spam. The Android docs are not good, Adaptive Connectivity doesn’t seem to be described adequately anywhere. I did finally find my way to Google Issue Tracker which has what looks like a relevant bug entry. But that’s write-only, barely any acknowledgment.

I’m a software engineer! I understand the tech, I’m careful, I’m willing to spend the time providing expert help to reproduce and debug this issue. But I have no way to get through to someone. I guess that’s inevitable given the economics of supporting millions of users. Still irritating for my $1000 flagship phone.

Not for the first time have I wished that shibboleet was a real thing. I did get some really helpful Google engineer feedback on Mastodon after a passkey problem I reported but that was just dumb luck and a particularly friendly Googler, the right person happened to see my complaints.

The Reddit post

Are you getting delayed notifications on your phone? Try turning off the setting “Adaptive Connectivity” (under Network & internet in Android 14). I think that fixed it for me. Also check that “Adaptive Battery” and “Battery Saver” are off.

Two months ago I posted about getting delayed notifications; they’d only show up when I woke up the phone. A popular post, it keeps getting comments from people saying “me too”. Thanks to a hint in this comment I learned about the “Adaptive Connectivity” setting. I’d never heard of it before. But it was turned on; I turned it off and now 24 hours later my phone has been reliably getting notifications the moment they happen.

As a bonus, I think this may have also fixed my problem with the weather app showing me in the wrong location. Sometimes I’d get weather for Los Angeles where my IP address is rather than Grass Valley, CA where my phone is. I’m less certain of it but it’s plausible.

It’s not clear to me what Adaptive Connectivity is supposed to do, I can’t find clear docs about it. I think it may be a Google Pixel only thing? A lot of web pages talk about it only in terms of cellular, 4G vs 5G. I’m on WiFi almost always and still had notification problems. I don’t know what Adaptive Connectivity’s intent is but my experience is in Android 14 it means “break notifications”. My guess is it’s some power saving setting that is having the effect of taking the phone offline.

Note there’s other phone settings that might cause delayed notifications. Adaptive Battery is the big one (I’ve always had it off). Battery Saver mode may disrupt notifications.

Also individual apps all have battery settings. That used to be Unrestricted / Optimized / Restricted and my understanding is Optimized was supposed to let notifications through immediately. The UI for all that just changed in Android 14 QPR2 but I think it functions the same. Most of my apps are set to Optimized and are getting notifications on time.

Bottom line: turn off Adaptive Connectivity and Adaptive Battery if you are having trouble with delayed notifications.

GPSLogger v129 preferences on Android 14

I use Mendhak’s GPSLogger to track my position. It’s a great tool but a little low level, and in particular its files access is a bit confusing because the docs are written in terms of a basic Unix model but Android has virtualized and secured file storage so many times it is hard to find things. (I hate this about Android; it still lets you use files, but it’s very confusing and no longer a visible part of the consumer product.)

My issue now is I want to back up my settings, reinstall the app, then restore the settings. GPSLogger allows for that! It’s a feature called profiles. Basically you get it to write all your settings to a .properties file, then you can restore them later. The question is, where is that file written?

I found the file by mounting my Pixel 8 as a USB drive, then on Windows going to This PC\Pixel 8\Internal shared storage\Android\data\com.mendhak.gpslogger\files. (Actually I found it using Windows file search, which took several minutes). There’s a Default Profile.properties there. Also a temp.properties I explicitly created. You may have to tell GPSLogger to save the profile first.

I can’t find the properties files using the Files app on the Android device itself. Nor Solid Explorer. Presumably these are being hidden or sandboxed by the OS?

The properties file itself is a simple text file with lines like startonbootup=true in it.

This whole path was deleted when I uninstalled the app. (No surprise! Android wipes apps’ saved files in most cases.)

Once I reinstalled the app and saved a profile once the folder was back. Just dropping Default Profile.properties back in the folder didn’t seem to work; the app may not reread that file? I created a new nelson.properties (editing the file first, the profile name is in the properties file). That let me switch to a new Nelson profile with my preferences loaded. There are a variety of other ways documented in the FAQ to load a profile’s file.

Everything seems to work on reboot. GPSLogger starts with the new Nelson profile.

I have GPSLogger storing my GPS tracks to /storage/emulated/0/Documents/GPSLogger. I can see that in Solid Explorer by looking at Internal Storage > Documents > GPSLogger. Syncthing also sees that folder and is syncing it for me. When I reinstalled the app it could no longer write to files there (an error in Log View). I had to re-enable “write all files” permission (GPSLogger prompts you to do this if you change the folder it writes to).

I verified everything’s working after the reinstall, all the way to my syncthing server the data that archives my data.

OpenStreetMap: temporary road closures

I made another complex OSM edit, marking a road closed for construction. Details of the closure are on page 2 here. Nevada Street Bridge (OSM) is closed to cars until Nov 1 2024 or so.

As with all things OSM, how to actually do something this complex is not clear and there are many ways in the free-form schema.

First as a practical matter, serious mapping apps aren’t just relying on OSM for road information. They are pulling road closures from other sources. Google Maps has the closure and is routing around it. The fanciest OSM routing app I know, OsmAnd, does not (yet) know about the closure.

I chose to do the edit the aggressive way. I changed the way from highway=tertiary to highway=construction and construction=tertiary. Ordinarily this is the kind of thing you’d do for a new road being built. But I think it will have the effect I want, I just have to remember to go back and edit it again in November when the road is open. As a bonus I was able to tag it foot=yes which captures that you can still walk through the area, just not drive.

The other option was to just add a Conditional Restriction to the way, as seen here or here. A tag like motor_vehicle:conditional=no @ (2018 May 22-2018 Oct 7). I didn’t do this. One reason is this tag seems mostly useful for tagging permanent things with relative timestamps, like “road closed to cars 8am to 10am evey weekday”. I was also worried that most OSM users don’t process such a complex tag and will default to showing the road open. And finally that date range syntax I pulled as an example is really hinky. The ISO 8601 syntax would be 2018-05-22/2018-10-07 but near as I can tell OSM does not have an ironclad standard to use ISO 8601 for dates everywhere. I haven’t seen ISO 8601 slash syntax for time intervals in any of the docs.

End of the day I just want it to look right on the map. This is good enough.

Update: a day later OsmAnd routes around the closure, I think because it picked up my edit. Both Valhalla and OSRM on the OSM website still show a route through the closed road. I have no idea if they have updated data or not, but I’m guessing not. Update 2: it took most of a week but now the routing on the OSM website bypasses the closed bridge.

City reference map

From page 2 of the newsletter. The blue area is closed.