Starlink v1 vs. an inch of snow

Snowing in Grass Valley (again). Last two times it’s done this Ken had to switch us to our backup fixed wireless Internet. I wasn’t home to look at it. This time around I am home. I was able to confirm Dishy says it’s heating (in the mobile app, also alert_is_heating: True in the gRPC). But flaky connection that became progressively useless. Starlink’s not supposed to have this hard a time with snow but it sure seems to. I’m glad I have a backup.

Here’s some network diagnostic graphs, both from Starlink’s gRPC status and from my own IRTT ping tests. Starlink shows definite problems with a sort of random assortment of statuses that all boil down to “no signal”. My pings show much higher latencies and periods of very high packet loss. All consistent with “shit’s broke, yo.”

I can’t easily access the dish (it’s on the roof) so wasn’t able to try to brush snow off. Or get a good photo, but what I could do suggests there’s a layer of snow + ice maybe half the thickness of what’s on the roof. I’m guessing we’ve gotten an honest inch of snow here, two at the most. Note I have an O.G. round dish v1; no idea if the newer v2 ones handle snow better.

Update: an hour or two later and we’ve gotten a proper.. 4 inches? Not sure. I happened outside to see the dish pointing straight up (recovery mode) and then watch it tilt to its new normal position. And a bunch of water run off. So the melting is working. OTOH there was still a bit of snow there, so it’s not working very well. Or maybe the signal is being blocked by snow in the atmosphere?

Either way I recall this working better in last year’s snow.

Economist Android app: two bugs when changing font size

Finally figured out what causes a couple of bugs that have been driving me crazy in the Android app for several months now. Both are caused by changing the font size of articles (with the Tt button displayed at the top of every article screen).

The simpler bug is that if you increase the font size from the default, the last paragraph’s line spacing is incorrect. It looks like every other paragraph has the line spacing adjusted to match the font size but this last paragraph does not. This is 100% consistent and reproducible on both my phone and tablet. Screenshot below.

The more pernicious bug is that sometimes the navigation at the end of the article doesn’t work. It’s a bunch of links to other articles in the same section. But tapping them sometimes takes you to some other story, generally one or two above the one you tapped on. I think this is also triggered by changing font sizes. (Could it be the page is calculating tap zones based on the original font size layout, not the visible one?) I can’t reproduce it 100% reliably.

Broken line spacing
This navigation menu doesn’t work

Syncthing and conflicts (also Joplin)

I’ve been using Syncthing for several years now to sync several gigabytes of files, mostly various kinds of user-generated documents. Like my entire personal source code repository. Also my Joplin notes. Across 3-5 systems: Windows, Linux, and an Android device. It works great. I’ve never had to think about it much. But something went wrong with Joplin recently and I learned something…

Syncthing has no conflict resolution tool.

Seriously. Syncthing detects conflicts and does something sensible with conflicted files but does not offer any sort of merge tool or UI to help you know about conflicts and deal with them. I got confused because SyncTrazor, the Windows UI for SyncThing I use, does have a simple GUI Conflict Resolver. But there’s nothing like that in the main SyncThing web-based GUI.

All the core SyncThing does if it detects a conflict is try to guess which version of the file should be canonical. It has some heuristics based on modification dates and will prefer a modified file to a deleted one. It then renames the discarded version as <filename>.sync-conflict-<date>-<time>-<modifiedBy>.<ext>. That gets kept forever as a regular file also in your syncthing store and gets synced just like any other file. It’s up to you to notice these conflicts and do something about them; syncthing will not tell you about them. (Hilariously, you can have conflicts on conflicts and end up with files named ...sync-conflict-…sync-conflict -…-sync-conflict...)

I’m not complaining about Syncthing; what it does is very simple and understandable. Conflict resolution is hard and subtle, better not to do anything than do a bad job. Automated merge strategies are possible but fraught with dangers, and manually requiring users to merge is not in keeping with SyncThing’s background nature. OTOH I’m a little spooked that these potential errors have crept in without any visibility. It’d be nice if the Web GUI at least showed a count of conflicts, but maybe that’d just cause anxiety?

I’ve got about 50,000 files in my Windows Documents folder (synced between two PCs) and just today learned I have 27 conflicts. Mostly saved game files; it’s possible Steam Sync was also involved which no doubt causes all sorts of confusion. FWIW none of these conflicts have caused any problems, so maybe I don’t really need to care about them. I just deleted them; nothing was going to be reading those conflict files anyway. The 3 conflicts in my source tree worry me a bit more, they’re all in git repo metadata. But those git repos are working fine and git fsck isn’t complaining, so going to just delete those too.

How do conflicts happen? When two separate syncthing instances update the same file before they sync them to each other. This happens surprisingly seldom for me, I think because I have an always-on hub that everything syncs to and Syncthing seems to be watching the file system for changes and uploading them quickly. Conflicts seem more likely to happen on my Android phone where running in the background is a challenge and it may not sync at all when I’m not on WiFi.

Which brings me to Joplin. I’m looking at all this because I’d had a lot of problems with Joplin/Android and file sync. I recently encrypted my Joplin notes and something weird happened on the phone when it synced to it. Not sure what was really wrong but I just deleted all the conflicts my phone generated and I seem to still have all my notes. Joplin itself only syncs to local files every few minutes, then Syncthing itself only syncs those files to the hub every few minutes. Seems a little delicate.

Overheating Intel NUC

My 2018 home media server (an Intel NUC7i5BNH) is failing. It crashes under load; seems to fail when I use it for something intensive (like watching video via the Plex server there). Long story short, the hardware is flaking out; limiting the CPU speed to about 90% of maximum seems to make it stable again. Detailed notes on diagnosing and fixing it below.

When it crashes it locks solid; doesn’t respond to network, doesn’t send pings out (from an in-memory process), syslog doesn’t get updated, hard drive light doesn’t blink. Nothing in the syslog, it just stops writing, nothing in the dmesg file either.

Thanks to this blog post I realized telegraf was logging temperatures even though they weren’t on the dashboard graph. So I added them. Temperatures were going up to 84°C as the CPU load went up. That seems hot, but the CPU is rated to 100°C and looking online I see folks running NUCs at CPU temperatures of 95°C or more. (All temperatures I’m talking about here are from the CPU Package sensor.)

I installed the system stress tools s-tui and stress-ng and was able to crash the system again after stressing (with s-tui defaults) for 2-3 minutes, at around 83°C. Definitely crashing under load, but am not positive it’s temperature.

Thinking it was overheating, I figured the root cause is I added a USB drive to the system and stacked it on top of the NUC. It’s not blocking any vents or anything but will impede passive cooling. So I moved the USB drive. I also took the case off the machine and blew some dust out. When that didn’t work I took the thing apart and give it a proper cleaning, although I did stop short of removing the heat sink since that’d require new thermal paste. I also updated the BIOS and set the fan settings, raising it from “quiet” to “balanced” and eventually “cool”. The fan is definitely spinning and speeding up under load.

But I was still able to crash it with s-tui after just a couple of minutes. Even after all the cleaning, and running with the case off, it still gets to 83°Cish and crashes under load. 83° isn’t that hot! So maybe it’s not actually overheating but failing under some other stress? I wish I had a second power supply to test with.

A full run of memtest86+ passes.

I tried turning off turbo boost in the BIOS; that limits the CPU to 2.2GHz and never lets it run at 3.4GHz. This seems to help; under stress temperatures max out about 57°C. System still works under load after 45 minutes. Of course I’m losing a lot of performance. I don’t think this particular NUC lets you tune overclock parameters in detail, the options are not available in the BIOS.

I got most of my speed back by using Linux’s fancy drivers for controlling CPU scaling and power usage. intel_pstate is the name of the driver, this Arch doc page is a good start as are these RedHat docs. cpupower is the recommended userspace tool but the systemd service definition isn’t available on Ubuntu.

Turns out intel_pstate has a file interface at /sys/devices/system/cpu/cpufreq/. A simple thing is just to write a max frequency to the /sys interface files.

if grep -q i5-7260U /proc/cpuinfo; then
        echo -n "(nelson) throttling CPU in /etc/rc.local for flaky Intel NUC. Max speed: "
        echo 3000000 | tee /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
fi

The system seems to be stable at 3.2GHz, just a bit under the 3.4GHz max. At 3.1GHz I was able to stress for a full hour and the system got up to 78°C but never crashed. I set the max to 3.0GHz just to be safe.

Still wish I knew what specifically was failing. I suspect it’s only going to get worse and I should be looking for replacement hardware now. Also curious what changed. December 14 the average temperature climbed from 37°C to 48°C; that’s when I upgraded from Ubuntu 20.04 to 22.10. All sorts of stuff changed then but the only obvious thing is that there’s a new behavior where average load creeps up from 0.15 to 0.2 over the course of a month (before, 0.15 steady was typical.)

Some graphs:

Temperature and CPU during a recent crash
Temperature for the last six months
s-tui after the system crashed

ChromeOS, Flatpak, and Obsidian

Bad experience with ChromeOS trying to install Obsidian following these instructions. I did eventually get it working but only with a bunch of ugly hacks as root.

Note the install instructions have you making bwrap setuid root. Ugh! I did not do that, because other things online say it isn’t necessary. I did try enabling setuid a few times and it didn’t fix anything.

First problem: flatpak just won’t work other than root. Even flatpak list gives me a “Permission denied” error (unhelpfully not telling me what permission is denied.) This happens even with bwrap setuid. I don’t know if that’s a general flatpak thing or is a problem specific to ChromeOS. Docs I’ve read online suggest that at least parts of flatpak shouldn’t require root but that installing software probably will. Whatever, I’ll just do everything as root. (At least part of the problem is /home/nelson/.local/share/flatpak ended up being owned by root and permissions 0700; this happened on Ubuntu too, so a Flatpak bug? But even fixing that, still couldn’t flatpak list on the Chromebox.)

Installing with Flatpak went smoothly. There’s now an Obsidian app icon in the ChromeOS GUI. But when I launch it either nothing happens, or the logo for Obsidian shows up but with an infinite spinning circle on top of it. No feedback on what’s wrong. The online advice for access to ChromeOS system logs is all out of date. I finally found my way to a log viewer at chrome://device-log/ but there’s nothing there about the app not working. The log dump generated from chrome://network/ does seem to have useful info but I could not understand what was failing.

OK, so let’s launch from a command line instead. An hour later, here’s a short transcript of all the breakage I encountered.

~ ⬢ [Systemd] ❯ flatpak run md.obsidian.Obsidian
error: Permission denied

~ ⬢ [Systemd] ❯ sudo !!
sudo flatpak run md.obsidian.Obsidian
error: "flatpak run" is not intended to be ran with sudo

~ ⬢ [Systemd] ❯ sudo bash
root ~ ⬢ [Systemd] ❯ flatpak run md.obsidian.Obsidian
Debug: Will run Obsidian with the following arguments: 
Debug: Additionally, user gave: 
[13 zypak-helper] Failed to connect to session bus: [org.freedesktop.DBus.Error.NoReply] Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
[13 zypak-helper] src/helper/main.cc:40(DetermineZygoteStrategy): Assertion failed: bus
/app/bin/obsidian.sh: line 40:    13 Aborted                 (core dumped) zypak-wrapper /app/obsidian $@ ${EXTRA_ARGS[@nelsonminar

root ~ ⬢ [Systemd] ❯ dbus-run-session flatpak run md.obsidian.Obsidian
... 
[0319/141600.942543:FATAL:electron_main_delegate.cc(299)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.

root ~ ⬢ [Systemd] ❯ dbus-run-session flatpak run md.obsidian.Obsidian --no-sandbox
it worked!

So.. it has to run as root. Then it turns out it doesn’t want to run as root, at least via sudo. And then it needs access to the system’s DBus, which for some reason the Flatpak hasn’t set up. Then Electrno complains about root needing --no-sandbox so if I provide that option it works! Lololo.

Just a total mess of complex systems stuff that’s standing between me and my application. DBus, the Crostini container, the Flatpak container, the Chrome/Electron sandbox. Layers and layers of stuff and some of it is broken.

So a failed experiment, even if in the end I did get the thing running. To be fair to ChromeOS: this is really not what that product is designed for. A Linux program, Flatpak, a separate Chrome/Electron instance. Yeah, don’t do that. I suspect if I need to do this for real I’ll just end up using the Android app running on the Chromebook and put up with the less-than-optimal mobile UI.

Joplin works again on Android

I’ve been using Joplin for a few years now as my note taking app. It’s mostly good but has some rough edges. But there was one ugly thing; Android file system sync was broken for most of a year, without acknowledgement. I used this to sync my machines (via SyncThing for the files). This got fixed a few months ago with Joplin 2.9 but then that release was never pushed to the Google Play Store, making it much less accessible to users.

Happy to report that my phone currently has Joplin 2.10 from Google Play and filesystem sync is now working. The UI is still a little awkward but it seems to be functioning. (With some delay; Joplin is not super swift in re-syncing.)

The whole experience has left me with a sour feeling about Joplin. Maybe that’s not fair; it’s a free open source project. And it’s remarkably complex, supporting a bunch of platforms and sync methods. The underlying technical issue (Android file system permissions) was genuinely complicated. OTOH the developer didn’t handle the bug very well and between that and the Google Play release problem I get the impression he doesn’t much care about the Android platform. Also the UI really is clumsy in a lot of ways.

Another bug / UI problem: I just enabled encryption on my Windows desktop. It re-encrypted the notes and synced them. Then I went to my phone and tried to synchronize and got some generic error (“TypeError”) that suggested data was bad. So I manually turned encryption on in the phone app (it should have turned itself on) and now my phone is very slowly encrypting and re-uploading all its notes again. This is bad! In general the encryption implementation seems wonky.

Alternatives

So maybe I need an alternate notetaking product again. It seems like a simple product but then it needs to be really well designed and reliable, so that it becomes a piece of your memory and not some hated complex app. The main things I need are syncing between Windows and Android (and/or ChromeOS) and good Markdown support. And a good UI for displaying and editing markup.

There’s a lot of note apps out there; here’s a recent roundup of 7. Simplenote was the first I used and loved. I stopped using it because notes aren’t encrypted. Although ironically I never turned encryption on with Joplin because of my sync problems. I switched to Standard Notes for awhile but the price went to $50/year and it had so many UI problems I didn’t want to pay that. So I switched to Joplin.

I’d try Google Keep but the UI is like a clown circus. Evernote is way, way, way too complicated. So I’m sticking with Joplin or now. Maybe I should just pay for Joplin Cloud, that would have solved my Android sync problem.

1Password has a pretty functional little notes app in it, btw, but it doesn’t have the Markdown UI stuff I want, not its intended purpose.

Anyway, Joplin seems to be working for now. If I see something better I’ll try it. If Joplin breaks again I’m going to consider going back to Simplenote. I love how Automattic supports this little product.

Update: thanks to a comment from reader Gina I tried Obsidian, a closed-source Electron app. I like it! The UI is very nice, I particularly like the default Markdown editing mode where the markup syntax is hidden except for the line you’re editing. Also despite all the fancy features it has the basic data model is “text files in a folder” which is refreshingly transparent. The app has a well-used plugin architecture with a lot of third party plugins. It supports advanced things like a canvas, a graph view, and links between notes, but the basic stuff is pretty solid. I hope they aren’t going down the Evernote road of increasing complexity.

I was able to import from Joplin by exporting markdown from it, although I couldn’t get the updated/created metadata to come over (this might be fixable; joplin does export the dates). Obsidian doesn’t encrypt data files on local disk but supports end-to-end encryption in their $96/year cloud sync product. It also looks plausible to sync yourself via SyncThing, but you won’t have the fancy extra layer of conflict resolution that Joplin sync provides. All in all it looks like a very promising product.

Google Pixel security disaster

There’s a major, bad security flaw in Google Pixel phones and some Samsung phones

allow an attacker to remotely compromise a phone at the baseband level with no user interaction, and require only that the attacker know the victim’s phone number.

This notification has been published despite the bug not being patched. Google’s announcement blames Samsung; it’s their modem software that has the flaws and apparently they “hit their 90-day deadline” for disclosure and so now it comes out in public. Even so, Google decided to withhold the worst details because fixes aren’t out.

There is a patch for Google Pixel phones, the March 2023 security update. But that update isn’t available yet for my Pixel 6 phone; rumor has it that it might come out March 20. (Part of a pattern of late updates for what should have been a flagship product.) In the meantime, well, my phone could be owned at any moment. More discussion on Reddit.

The recommended workaround is to disable VOIP (“WiFi calling”). Which I’ve done, and which makes my phone basically useless. The cellular chip in the Pixel 6 Pro has never worked reliably, its connection performance is terrible. Also Verizon hasn’t gotten around to building cellular infrastructure in obscure, non-techy places like Noe Valley, San Francisco. At least I still sort of have cellular access; folks using T-Mobile and Google-Fi phones apparently are left without any way to make calls at all (I guess it requires Voice-over-LTE?)

Google really screwed up with the Pixel 6, it’s been a frustrating phone to own. I’d told myself the next phone I’d buy would be from Samsung but, well, they’re really the root cause of this problem. Apple often does seem to do a better job on building devices that work.

Nightly congestion: Starlink vs local WISP

I’ve complained a few times about Starlinks congestion. This past week we switched over to my local WISP (snow outage, then I’m not there to switch it back). Here’s IRTT measuring packet loss and latency, you can clearly see the change around March 4/5.

The WISP is also congested, I think probably worse than Starlink. During congestion it has lower packet loss but then higher latencies, and for longer periods of time. I assume that means the WISP is doing more buffering. My WISP connection is 2 radio links from the wired infrastructure (I think).

Chromebook first impressions

I got my first Chromebook, a cheap machine just to try it out. I like it! I’ll probably end up getting a fancier Chromebook to replace my 2017 Razer Blade laptop when I travel. I’d love to get one that could also replace the tablet I use for books and videos.

ChromeOS makes a great first impression! I’d not realized it’s basically a whole new desktop OS, a third way other than Windows and MacOS. (And I guess Linux on the desktop, lol.) It’s an interesting middle ground between a full desktop OS and a mobile OS like a phone or tablet.

I can’t quite decide whether I’ll want to use a Chromebook like a desktop computer or like a tablet. ChromeOS seems squarely in the middle. I think it could basically replace what I use a tablet for (reading books, watching videos, light web surfing) if I could find a Chromebook where the keyboard gets out of the way. Assuming the UI works well as a touch screen, which it seems to with tablet mode. These are marketed as “2-in-1” devices.

I ran into a problem with hardware though; no one makes the Chromebook I want as a tablet replacement. Part of the problem is price; Chromebooks tend to be under $400 and so don’t use very high quality screens, etc like the $800 Galaxy Tab S8+ I’d be replacing. (Although see Samsung’s Chromebook 2, 4K Chromebook, HP Dragonfly Pro, or this article). Part of the problem is mechanical. I’d love a “detachable” where the keyboard comes entirely off like the Lenovo Duet 3. But then the keyboard attachment is not a strong hinge, which means you can’t reliably support the weight of the whole machine with the keyboard while sitting or propped up in bed. Maybe one of the 180° folding screens would work better. Really need to get my hands on them to see.

The other big question is CPU type: ARM or Intel. I firmly believe ARM is the future for devices like this. But I’d say 80% of the Chromebooks out there are Intel, a few even AMD. And the ARM systems tend to be the cheaper / lower powered ones. No one is making anything like Apple’s laptops for ChromeOS, Linux, or really even Windows.

The machine I bought is an Acer 311 C722-K4CN, aka N20Q9, aka willow. It cost $120 on Amazon (new!) and is marketed as a school / student laptop. I think these were first introduced in 2020 for $350, mine was built in Feb 2021. It’s definitely not a powerhouse, $120 implies a lot of compromises, but so far the hardware exceeds my expectations other than screen quality. I deliberately chose an ARM system; it’s a MediaTek MT8183 which is maybe 2-3x more powerful than a Raspberry Pi 4 CPU? It seems powerful enough for web browsing at least. No HDMI, microSD, and limited USB ports.

First thing I did was update ChromeOS from v88 or so to v105 and now v111. A little surprised it’s a 32 bit build; the CPU is definitely 64 bit. I also installed Linux via the official route (“Crostini”) it’s a VM, arch reports aarch64. From what I’ve read on ARM the ChromeOS kernel is 64 bit but the userspace is 32 bit. I wonder if that’s partly a RAM savings thing?

Chrome OS is quite big; 14 GB! The Linux install wants another 10 GB (although 3 GB seems sufficient.)

One hassle: there aren’t ChromeOS apps for everything I’d expect. Particularly surprised there’s no Slack app. Slack works in a browser though, and Chrome makes it easy to “Create Shortcut” so it looks more like a separate app in the GUI.

I’m also a little confused: there’s both a “Web Store” and a “Play Store”. I think the Play Store is for Android apps, they run natively on ChromeOS now and many things (like Tailscale) are installed as generic Android apps. The Chrome Web Store is a mix of things for the Chrome browser (like extensions) and then some ChromeOS enhancements like a compose key input method (which is written in Javascript). There isn’t a whole lot on the Web Store.

Google has good docs for building Chrome OS apps. The basic message is “it’s an Android app with special accommodations for Chromebooks” with observations like “every Chromebook has a physical keyboard” and “A system-level back button is a pattern carried over from Android’s handheld roots—one that doesn’t fit as well in a desktop context.” It’s remarkable how many different kinds of systems Android runs on: phones, tablets, laptops, TVs, Android Auto, Android Automotive. That’s a lot of different UI metaphors!

I use Firefox everywhere else but feel like I should be using Chrome as my browser on a Chromebook. I mean, it’s in the name. Also the whole OS is basically one big Chrome instance? Firefox does work but it is the mobile Android app, not a desktop app. Without proper tabs, and the wrong user agent, and… I guess it’s possible to install a Linux version but I haven’t tried it.

The Linux support is interesting. It works via a system called Crostini which seems a bit like WSL2 in its approach; it creates a VM with an LXC container, then installs Debian inside that. See also the wiki here. You can install Arch Linux and Ubuntu, albeit with rough edges. Note Crostini is different than replacing ChromeOS with a full Linux install; that’s also often possible on Chromebook hardware. Crostini supports graphics via both X11 and Wayland. Crostini is an evolution of crouton, an earlier system for running different OSes inside ChromeOS.

The underlying Android system is mostly hidden in Chrome OS but accessible. Ctrl-Alt-T will launch crosh, the Chrome OS shell for the host OS. This isn’t a full Unix shell, it’s got its own weird command set, but ping and top are there.

I thought I would hack the OS a lot more but one joy of a Chromebook is it mostly works just fine and doesn’t need a lot of tinkering. I do want to eventually get a full development environment for Python and web frontend. Between the Linux system and VS.Code it seems entirely doable.

One big caveat: ChromeOS is designed as a cloud-centric system and definitely wants to be online all the time. It’s possible to run things offline but you have to do a little preparation. Honestly computers are mostly useless to me offline anyway, I can’t go 2 minutes without having to look something up on the Internet. As long as I can read a book or watch a downloaded video on an airplane, I’m good.

Tailscale key expiry

I had my first real problem with Tailscale. Not coincidentally, six months after first using and loving it. Tailscale expires keys every 180 days and you have to manually re-authenticate every single host. This is the first bad user experience I’ve had with Tailscale, the product is usually much friendlier.

I got no notice that this expiration was coming. And it’s potentially catastrophic; what if your host is unreachable except via Tailscale? (See below). I don’t use Tailscale a lot and I don’t have any proper monitoring. Maybe if I were paying closer attention I’d have known it was coming?

The rest of this note is a little ranty because I’m experiencing the indignation of a sharp corner on an otherwise very nicely designed product. Tailscale folks, I love your product. This part of it needs some attention. To be fair, key management and updates are a hard problem in all of software, Tailscale is not unusual.

How to update Tailscale key

Re-authenticating isn’t hard with the CLI: just run tailscale up --force-reauth. This gives you a URL that you load in a browser logged in to Tailscale’s management console. It’s a remarkably manual process. The Android GUI client is a little easier; you’re prompted to log in directly on the device itself.

But here’s the shit sandwich: “tailscale up --force-reauth currently involves bringing down the Tailscale connection and thus should not be done remotely over SSH or RDP.” Can confirm; I tried to update a remote host via ssh over Tailscale and the update hung before I saw the authentication URL. A terrible experience for a networking product whose major purpose is remote administration of machines. And the auth process is interactive, you can’t just fire it off in a cron job or something. (There’s a hint that maybe there’s a “Reauthenticate” option in the dashboard after a machine “has already signed a new authentication URL” but I no longer have an expired host to test that with.)

I do give the product team credit, they did think about the “what if I can’t access the host to update it?” problem. There’s a button you can press to temporarily enable an expired key (for 30 minutes), giving you access. You still have the “how do you update remotely” problem but at least you can get to the host.

There may be a better way to do all this; you can generate auth keys and automate using them. Maybe a more serious Tailscale deployment is managed that way, I haven’t looked into it. The default keys are a great user experience right until they expire.

Determining Tailscale key status

The software tools for detecting whether your Tailscale key is expired are not good. The best tool is the Machines dashboard; this does have a status line and you can filter for expired machines.

On a host with an expired key your first clue that something isn’t working is probably when you can’t access other hosts on the Tailnet. tailscale status reports a confusing message, “Health check: not connected to home DERP region 12. Logged out.”. If you try tailscale up it will prompt you to reauthenticate and afterwards tailscale status will keep showing you the auth URL until you reauth.

On a working connected host elsewhere on a tailnet tailscale status does not show you if another host has an expired key. The status shown is - which I think means “unknown”. It’s not offline or some special expired status. tailscale ping will report peer's node key has expired.

I could not find a way to see what machines were soon to be expired in the machine dashboard; the keys page doesn’t show the automatic keys that the default path uses. The command line tools don’t seem to tell you either, or at least I can’t find it. I guess you can periodically tailscale up --force-auth to refresh the keys but it’s up to you to know when to do that? Again, I assume using managed auth keys is the real solution for this.

Why expire at all?

I don’t quite understand why Tailscale expires host keys at all. There’s a general idea that expiring logins, etc is good security practice to solve various corner cases like a years-old server that gets recycled or is running an old insecure version. Sorta? But for that kind of thing I much prefer a system where an old auth token is only expired if it has not been used for awhile; expiring something that was trusted just minutes ago seems like a bad decision.

It’s possible to disable key expiration for specific hosts via the machines dashboard. The docs suggest this is for “trusted servers, subnet routers, or remote IoT devices that are hard to reach.” There’s no way to disable expiration for a whole tailnet; all you can do is set the expiration time from 1 to 180 days.

Windows update

This isn’t about key expiration really, but the Windows client still has no auto-updater. I was still running 1.30 with a security hole! They’re working on it (and recently changed the installer they use) but it doesn’t seem a high priority. Installing an update is easy enough but it really needs to be automatic.