MicroSD card speeds simplified: V30 A2

If you want to buy a decent general purpose MicroSD card in 2022, buy something that says “V30” and “A2”. I’m happy with the Sandisk Extreme. It works pretty well as the primary disk in a Raspberry Pi and will also work for 4K video, maybe even 8K.

In detail.. There’s a confusing array of different class definitions for SD cards. Ignore “Speed Class” (2, 4, 6, 10) or UHS (1 or 3); those are obsolete. Also ignore SDXC, SDHC, and SDUC; those refer to storage capacity and you can just read the size.

The primary throughput rating on a card now is V, for Video Speed, and modern choices are V30, V60, or V90. V30 is 30Mbps and is fine for 4K video. There’s a nice chart here of speeds. Note a V30+ card will also probably be labeled “Speed Class 10” and “UHS 3”. Faster is better but you pay for it, so unless you know you need it save your money.

The only random access rating on a card is Application Performance Class and comes in two ratings: A1 or A2. There’s no real price increase for A2 so get that. This isn’t so important for recording video but means absolutely everything for use as a general hard drive, say in a phone or a Raspberry Pi. A2 isn’t so great; it’s 4000 read IOPS. Compare 600,000 IOPS for a fancy SSD. But at least it’s got a rating. Old fashioned spinning disks are more like 50-200 IOPS, so even that A2 is a big improvement over what we used to use.

The Sandisk Extreme 128GB I like is $20 for 128GB, V30, A2. Some prices for various speeds of 128GB cards from NewEgg

  • V10 A2 $13 – $17
  • V30 A1 $11 – $20
  • V30 A2 $11 – $20
  • V60 A2 $27
  • V90 A2 $100

V30 A2 is the sweet spot for performance; no point getting anything slower for general use. You pay a significant premium for V60 or V90. By all means pay that if you need it, but you probably don’t. 128GB seems to be the sweet spot for storage; 64GB cards cost almost the same.

Windows 11: 22H2 and Memory Integrity woes

More boring Windows sysadmin crap, no satisfactory conclusion.

Windows 11 22H2 came out today. I tried to install it. It gets to 37% and fails with basically no useful error message. “Window Update error code 0x80070001”. Thanks dudes. Searching for that error code is not helpful, with a bunch of cargo cult advice for a zillion different things that might be wrong.

I tried using Get-WindowsUpdateLog to give me a text file log of what went wrong but it’s impenetrably detailed. This looks like the error but I can’t imagine what is the actual problem.

2022-09-20 15:37:00.9942721 16468 23084 Deployment      *FAILED* [80070001] file = onecore\enduser\windowsupdate\client\engine\handler\osdeployment\helper\osdeploymenthelper.cpp, line = 612
2022-09-20 15:37:00.9942763 16468 23084 Deployment      *FAILED* [80070001] file = onecore\enduser\windowsupdate\client\updatedeployment\handler\osdeployment\installer\osinstaller.cpp, line = 1157
2022-09-20 15:37:01.0008701 16468 23084 Handler         Install complete for update ID: 8E409263-45B4-438D-AD01-6E9674AB032A.1 Return code is 0x80070001. Requires Reboot:No
2022-09-20 15:37:01.0008721 16468 23084 Handler         Enter deployment handler NotifyResult
2022-09-20 15:37:01.0009405 20108 24312 Deployment      Deployment job Id 51AE112D-2F4D-46DB-A0A4-381CB450EAAE : Update 8e409263-45b4-438d-ad01-6e9674ab032a.1 failure delegate invoked.
2022-09-20 15:37:01.0009697 16468 23084 Handler         Leave deployment handler NotifyResult
2022-09-20 15:37:01.0010261 16468 23084 Deployment      *FAILED* [80070001] file = onecore\enduser\windowsupdate\client\updatedeployment\handler\osdeployment\installer\osinstaller.cpp, line = 356
2022-09-20 15:37:01.0010271 16468 23084 Handler         *FAILED* [80070001] Leave deployment handler Install

So I started trying to figure out what else might be wrong with my system. One persistent problem is Windows Security claims that Memory Integrity isn’t enabled; that’s some virtualization thing to isolate processes. It wouldn’t turn on because an obsolete driver was blocking it. What driver? Some ancient Western Digital thing; I don’t even have any WD hardware on my system. It just got an update via Windows Update which resulted not only in the old version being installed but a slightly newer version also incompatible being installed. Great jerb. Anyway you can manually remove the drivers by using pnputil on the wdcsam files. I have no idea if this could break a system but I chanced it and it worked.

Well, sorta. Now I could enable memory integrity. Then it reboots to install itself and during the reboot I get a blue screen of death with something about Bug Check 0x7E, which means “System Thread Exception Not Handled”. That’s programmer speak for “shit’s fucked up, yo”. Searching online finds other people with this problem with the usual scattershot advice for fixing things and the occasional malware recommendation.

Just before then the event log had errors about IntelHaxm: “HAXM Failed to init VMX” and “HAXM can’t work on system without VT support”. I’m confused about this. I definitely have virtualization support, I’m using Hyper-V for WSL2. I don’t know about VT support. Also wondering if the Intel thing is a problem; this is an AMD Ryzen CPU. This Windows system started life on an Intel CPU and I transplanted the drive to an AMD system. Everything’s worked fine (it installs new drivers for the hardware it detects) but maybe there’s some weird vestige? Or it could be completely unrelated. I give up.

Windows 11: fixing crackling microphone

My headset plugged into my motherboard (Realtek audio) stopped working on my machine, maybe about when I upgraded to Windows 11? Not sure, but when I talked people would hear noise but just a burst of static. The fix was the re-set the format to 2 channel, 16 bit, 44100 Hz.

Windows sound drivers are an absolute mess. The fancy new Windows 11 UI works fine for basic things, but all the important stuff is still hidden in the Windows 7 Control Panel style settings. Why TF hasn’t Microsoft fully ported all the control panel stuff yet? It’s been 7 years since Windows 10 came out and yet still a lot of important settings are hidden in Windows 7 APIs.

In the end I found this guide most useful, using the local “Listen to this Device” loopback, Discord, and Google Voice to test. The web app for Google Voice bafflingly has no microphone playback / test I could find, but if you place a phone call to +1 (909) 390-0003‬ something will answer that will echo calls back to you. (Sadly this number has been tagged “Gateway to Hell” online enough that you might see that name.)

In the end my problem was the microphone device had somehow been set to 32 bits per sample. (This happened two years ago, too.) Which breaks everything not in some coherent way, but by just playing corrupted staticy audio. Nice. Windows is sure doing no favors to ordinary users letting people choose all these formats. 16 bit 44100 Hz is just fine for anyone who isn’t running a recording studio. (Or 48000 Hz; love having two nearly equivalent choices.)

Tailscale first impressions: very good

Goodness what a networking miracle. After years of failing to get VPN tunneling set up between two houses, I finally gave Tailscale a try. I thought it’d take me a few hours to get going. No: 10 minutes, tops, and I was VPNing from one house to another without regard for the NAT on one side and the Starlink double NAT on the other.

Never seen a complex networking software like this work so easily. The install experience is a dream; great clear docs, everything Just Works. Linux, Windows, Android, just install the thing, log in via Google with one click, done. I immediately could ping the new Tailscale IP address to reach my machine. There’s a great, easy to understand web console showing status of all your Tailscale hosts.

Behind the scenes it’s not magic so much as well-put-together parts. Wireguard is the underlying VPN protocol, making it easy to set up point to point tunnels. Some sophisticated NAT traversal (and relay servers) ensure everything can connect to each other. All the keys, negotiation, and naming is handled in a datastore Tailscale runs. You authenticate to Tailscale using one of many choices; Google is definitely very seamless.

The one thing I’m on the fence about with Tailscale is that it’s designed as a point-to-point VPN between hosts running Tailscale. That means that while I can access my Linux box behind NAT, I can’t access my thermostat on the same LAN directly like you’d normally expect with a router VPN. They do support full subnet routing via a Tailscale node, that seems like a big commercial feature for them, but it’s not the simple default setup. OTOH this P2P model will cover 90% of what I need to do and has some advantages aside.

I assume there’s no way to use Tailscale without trusting Tailscale itself; its software and service are in the middle. I’m fine with that but I wonder how that limits their sales to big companies. There is an open source alternative server that so far the Tailscale company has been friendly about.

Configuration

I didn’t really have to do any setup beyond install, but I did a couple of extra things.

I set up DNS aliases in a domain I own for all my hosts. Now I can connect to sfwin.ts.example.com instead of some opaque IP address. Tailscale has their own beta feature for DNS (MagicDNS) but it works by having your client machines use their DNS server instead of your own. That’s probably fine but I didn’t want to quite jump to that yet. MagicDNS will automatically add new Tailscale hosts to DNS records; I’ll have to do that manually.

I forwarded UDP port 41641 from my router to my Linux server in SF. This isn’t necessary, but I noticed that traffic between my two Linux boxes behind NAT routers was going via one of Tailscale’s relays, overhead I’d rather avoid. Tailscale does have a lot of NAT traversal, even simple UPnP would have worked, but this static forward is fine by me too. I’m not positive but I think it’s helped.

Kicking the tires

tailscale status is an invaluable command for seeing what’s going on with your peers. It prints basic status of your hosts with extra information for open links like idle, tx 820 rx 732 or active; direct 98.97.1.1:16227

tailscale ping has some tailscale-specific info. For instance this is what it looks like when I ping a host behind double NAT when the link was previously down. I’m not positive but I take this as evidence that the link first comes up via a relay in Seattle, then switches to direct connection after a few pings.

nelson @tt ~ 1 ❯ tailscale ping gvlin.ts.example.com
pong from gvlin (100.106.1.1) via DERP(sea) in 316ms
pong from gvlin (100.106.1.1) via DERP(sea) in 97ms
pong from gvlin (100.106.1.1) via DERP(sea) in 76ms
pong from gvlin (100.106.1.1) via 98.97.1.1:16227 in 224ms
nelson @tt ~ 3s ❯ tailscale ping gvlin.ts.example.com
pong from gvlin (100.106.1.1) via 98.97.1.1:16227 in 95ms

I’m curious about performance. For two hosts without Tailscale I could get about 360Mbps copying a stream of random numbers with a direct link (via ssh and rsync, so maybe some encryption involved). With tailscale I got maybe half that, 200Mbps, using 140% of CPU on the server and 80% of CPU on the client receiving the file. I did one try of a copy via a relay and maxed out at about 10Mbps; if that’s a real problem you can run your own relay server. I didn’t test packet loss or latency using iperf3 or the like, but I suspect it’s not much different than the underlying medium.

Tailscale hosts are getting addresses in the 100.64.0.0/10 block used for private networks. Boy I hope that never collides with Starlink’s use of this block! They also have IPv6 support but I’m not sure whether I’m likely to ever use those addresses.

One confusing thing; on Linux tailscale isn’t using the routing table for its VPN traffic. While there is a tailscale0 network device, there is no route shown in ip route for the 100.64.0.0/10 block. Instead it’s in the IP Tables, as iptables -S shows. I don’t really understand IP tables so I couldn’t decipher all the rules but it looks like there’s one blanket rule, not a specific rule for each destination host.

Tailscale links go down after just a little bit of disuse but come back up immediately when used. Latency for that first packet through is under one second.

Making my Windows machine part of Tailscale means the WSL Linux images running on it also can access the private VPN addresses. I believe WSL2 works entirely via forwarding through the host Windows networking, so that’s not a surprise.

Tailscale has some higher end authentication features, in particular a way to integrate ssh logins. I haven’t tried them. There’s also an alpha feature for collecting a list of “services” running on Tailscale nodes for easy access.

HVAC thermostat access: Carrier Infinity Touch

Trying to diagnose an HVAC control problem. Software details below on how to get access to data from the Carrier thermostat.

I have a very fancy central AC system with a Carrier Infinity Touch thermostat and three zones controlled by dampers. My office is in the main zone 1 and is set to 75F, but in the late morning it cools down to as low as 69. The thermostat is in another room and unfortunately right next to the bedroom zone 2, with a door between we usually leave open.

I’m able to measure a few things: the temperature from the HVAC thermostat, the temperature from a Laseregg sensor about a foot away from the thermostat, and the position of the damper for each zone (0-15, scaled to 50-65). Unfortunately I can’t get access to HVAC fan speed or how hard the compressor is working (ie, what stage).

Here’s details of zone 1 on three different days:

The key message here is with the door shut (on left), the two thermometers (blue and yellow) tracked each other’s temperature within 2F. But with the door open (on right) the divergence goes up to 5 degrees. The middle graph had me opening and closing the door during the day; not positive but basically every time the blue thermostat line dips down is when I closed the door.

I have no theory for why the two thermometers would show different temperature trends at all, or why the difference would vary depending on the door being closed. The thermostat is mounted on a wall, but I don’t think that wall is unusually hot or anything. Is it possible the thermostat isn’t reporting a simple temperature sensor but some complicated synthetic thing based on air temperature?

The other thing I learned from this experiment is what the dampers that control zone flow are doing. Zone 1 is the main room with the thermostat graphed above; zone 2 is on the other side of that door I’m opening and closing.

Three days of temperatures in zone 1 and dampers zone 1 and 2

It’s a bit messy, but the main thing this is showing is that every morning the AC isn’t even cooling zone 2 (purple), all the cold air is going to zone 1 (orange). Zone 2 comes on in the afternoon and usually zone 1 tapers off, although yesterday on the very hot day basically both zones were full open.

No clear conclusions here; I really wish I understood why the thermostat and the Laseregg thermometer diverge so much. But I do see why my office in zone 1 gets so cold in the mornings; all the air is being diverted into that zone. With the door open I suspect that cold air then bleeds into zone 2 enough the HVAC decides it doesn’t need to explicitly cool zone 2 itself. Closing the door helps with that, or maybe if I set zone 2’s temperature a little lower it’ll change things.

The real solution would be to move the zone 1 thermostat further away from zone 2.

Trying to understand all problems through my lens of software engineering is foolish. I suspect a real HVAC expert would be able to understand the problem quickly. I should call the folks who installed this, their system designer seemed good, but the techs they send out aren’t creative problem solvers.

Software details for accessing Carrier data

Here’s how I got this data from my Carrier HVAC.

The thermostat comes with a crappy website for controlling it and no API for data. Fortunately folks have hacked this; Infinitude is the key software and there’s a Home Assistant wrapper.

Infinitude is a Perl proxy server. You run it on your LAN, use the thermostat config to point it at the proxy, and now you have a MITM for recording data and control. It’s a pass-through, the Carrier Internet app still works. Infinitude has its own Web UI with status display and the ability to configure some or all of what the thermostat does. It also has an API.

Infinitude also has support for a serial interface to the thermostat with high frequency data; I didn’t mess with that. And apparently the thermostat now has MQTT support so that’s an interesting alternative that someone may hack support for. (Or not; it may be that a side effect of MQTT is that the proxy hack won’t work any more.) The discussion in that link is interesting btw, this thermostat is a much more capable system than the consumer UI would indicate.

I’m running Infinitude as a Docker image. It wants to run on port 3000 but unfortunately that’s in use for Grafana already (even in the container?!). And the entrypoint doesn’t let you override that. So I ended up having to modify the Dockerfile and build my own image. I’m launching with ./infinitude daemon -m $MODE -l http://:3030

Infinitude is not well documented. The docs promise a --help but I couldn’t make that work. Not sure what other command line arguments the thing supports. I’ve lost my ability to read Perl (to the extent Perl was ever readable) so I gave up.

There are some docs for the REST API but they are incomplete. A key API call not mentioned there is /api/status. Between that and /api/config I can get a lot of data off the device. What’s there?

There’s some device-level stuff, most of it not filled out. There’s a timestamp with a bogus timezone (-08:01 when it should be -07:00. Lol.) Most of interest is per-zone data accessible with jq with the key .zones[0].zone[0] for my living room

  • clsp, htsp: desired temperature setting (cool and heat)
  • rt, rh: room temperature (73-76) and humidity (40-41)
  • damperposition: numbers from 0 to 15. 15 is running full out, I think, but I’ve seen 10-15 and 0
  • zoneconditioning: active_cool, idle
  • currentActivity: home, manual (or sleep, etc?)
  • hold: off or on, depending on how I set it
  • otmr: 22:30, the time the current home program ends
  • enabled: always on?
  • fan: always off? (maybe manual fan control)
  • id: 1, the zone number
  • name: living room, the zone name

It seems to update at least once a minute.

What I’m not seeing here is how hard the compressor is working or how fast the fan is running. That data can be displayed in the thermostat, but it’s more a property of the HVAC itself than the thermostat. I wonder if the HVAC unit itself has a serial port you can tap for data? I’m not aware of it having any Internet access.

I didn’t look as close at /api/config, it didn’t seem very interesting. This is where the program seems to be available; times and temperatures.

starship shell prompt magic

It’s been nearly a decade since I indulged in customizing my shell prompt, so here I am. I’m trying Starship, a very nicely engineered solution that’s cross-platform.

Last time I customized things I settled on liquidprompt, a very complicated set of shell scripts that works pretty well. It shows me things like Git status, Python venv activation, whether I have suspended screen sessions.

Here’s the default starship in the same context.

The big difference is that second line; Starship wants to stuff so much stuff in the command line they wrap. (Do I really need to know the Python version?!)

The other obvious difference is the weird symbols; Starship relies on you using a Nerd Font, a regular font patched with like 4000 symbols stuffed in the Unicode private use area. I can’t decide if that’s clever or a terrible idea, AFAICT there’s no graceful fallback so if you don’t happen to have the special font you get a lot of broken glyphs. (Or else you have to have a second nerdfont-free configuation.)

But what I really like about Starship is the very clean implementation. It’s a single 8MB Rust binary. It seems to run very fast. And front-and-center in the config are timeouts for if scanning files or running a command take too long. The last thing you want is a slow prompt! That’s a real problem with liquidprompt in large Git repos.

Configurating Starship

The config file is reasonable; there’s a top level prompt which is basically “combine these modules”. And then lots of modules. For instance Python is defaulting to display

'via [${symbol}${pyenv_prefix}(${version} )(\($virtualenv\) )]($style)'

So that’s where the text “via”, the Python verison, etc is coming from. You can both override the format string itself and the values of variables like $style.

When reading through the docs pay particular attention to $format since that’s what actually displays. Also note some modules are $disabled by default. The docs are out of date with the binary I installed; starship print-config shows the current active config. There’s a bunch of other neat commands too; explain, timings, etc.

There’s a strange combination of using lots of extra text (like “via” or brackets around git status). And yet also having everything color coded, so the textual delineation is superfluous.

I can’t find a way to make general conditional behaviors. There’s support for conditionals on the empty string, and some modules are hard-coded to do conditional display (ie: only show hostname if remote). But no general purpose conditioning.

It’s quite voracious about showing language details. If you have a directory with both Python and Javascript files in it, you get “via  v18.7.0” and “via 🐍 v3.10.4” in every prompt.

git status bears some consideration. It often shows up as red !? which to me suggests an error state, but really it’s just saying “you have uncommitted work”.

My configuration

I spent most of an hour reading and tinkering and landed here.

That’s in a git repo with +3/-3 changes pending, a job in the background, and the last command exiting with status 1. This all runs in 1-2ms except for the git_metrics count of lines changed; that’s 10ms for this very small status. (git_metrics is off by default.) That’s a pretty maxi-prompt btw, this is what I get with a new local shell in my home directory

My config is below. Mostly what I did here was remove extra words like “via” or “in” from the various modules, also suppressed language version numbers entirely. No doubt I’ll keep tinkering.

add_newline = false
[line_break]
disabled = true

[username]
format = "[$user]($style)"
[hostname]
format = "[@$hostname]($style) "
[cmd_duration]
format = "[$duration]($style) "
[status]
disabled = false
symbol = ""

[git_branch]
symbol = ""
format = "[$symbol$branch(:$remote_branch) ]($style)"
ignore_branches = [ "master", "main" ]
[git_metrics]
disabled = false
[git_status]
format = '([$all_status$ahead_behind]($style) )'

[python]
format = '[($virtualenv )]($style)'
[nodejs]
disabled = true
[gcloud]
disabled = true

Rudimentary perf notes

Still working my way through learning ARM64; I can write loops now! I have my first “real” program, something which takes a 64 bit number and prints it out in hexadecimal. This got me interested in how the CPU is really running my code, which leads me to the perf tool in the Linux kernel. This is an strace or prof-like tool that monitors CPU performance counters, of which ARM64 has an awful lot.

Notes here are from a rank beginner; anyone who’s used perf before will not learn anything here.

I installed it on Raspberry Pi OS with apt install linux-perf. That installs perf_5.10 which you need to invoke with the version number; the perf shell script wrapper is trying to run perf-5.15 to match my kernel version.

$ perf_5.10 stat ./ch4printword

 Performance counter stats for './ch4printword':

              0.22 msec task-clock:u              #    0.196 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
                 2      page-faults:u             #    0.009 M/sec
             1,611      cycles:u                  #    0.007 GHz
               167      instructions:u            #    0.10  insn per cycle
   <not supported>      branches:u
                13      branch-misses:u

       0.001135464 seconds time elapsed

       0.000000000 seconds user
       0.001426000 seconds sys

Neat! A little surprised this worked as me; let’s try it as root

$ sudo perf_5.10 stat ./ch4printword 2>&1 | grep cycles
           318,276      cycles                    #    1.720 GHz
$ sudo perf_5.10 stat --all-user ./ch4printword 2>&1 | grep cycles
             1,132      cycles                    #    0.006 GHz

Running perf as root also lets you see all the cycles the kernel used, which for my toy program is most of it. Presumably overhead setting up the process? The --all-user flag restricts to only show user time. Or alternately, don’t run as root.

Note the number of cycles it takes varies each time I run it. Why? Not sure, but I assume it has to do with some unpredictable interaction with other system activity and/or the way the process got created. But while cycles varies, instructions don’t. My program is deterministic.

$ for i in $(seq 1 5); do perf_5.10 stat ./ch4printword 2>&1 | grep cycles; done
             1,288      cycles:u                  #    0.004 GHz
             1,034      cycles:u                  #    0.004 GHz
             1,376      cycles:u                  #    0.005 GHz
             1,039      cycles:u                  #    0.004 GHz
               957      cycles:u                  #    0.005 GHz

perf supports running a job a bunch of times and showing average and SD

$ perf_5.10 stat -r 1000  ./ch4printword > /dev/null

 Performance counter stats for './ch4printword' (1000 runs):

              0.18 msec task-clock:u        #    0.145 CPUs utilized ( +-  0.67% )
             1,130      cycles:u            #    0.006 GHz  ( +-  0.58% )
               167      instructions:u      #    0.15  insn per cycle
                 6      branch-misses:u     ( +-  1.62% )

        0.00126752 +- 0.00000893 seconds time elapsed  ( +-  0.70% )

perf by default is printing counters for events it thinks I care about. You can see all possible events with perf list. One chunk are performance counters from the armv8_cortex_a72 itself. There’s a lot of data here, like cache misses and bus access and stuff. I was interested in branch prediction, so

$ perf_5.10 stat -r 1000 -e mem_access,cpu_cycles,instructions,br_pred,br_mis_pred  ./ch4printword > /dev/null

 Performance counter stats for './ch4printword' (1000 runs):

                20      mem_access:u                              ( +-  0.21% )
               985      cpu_cycles:u                              ( +-  0.72% )
               167      instructions:u
                63      br_pred:u                                 ( +-  0.22% )
                 5      br_mis_pred:u                             ( +-  1.81% )

        0.00122416 +- 0.00000844 seconds time elapsed  ( +-  0.69% )

One small mystery; my program has 32-48 branch instructions executed. why is br_pred bigger than 48? Perhaps this has to do with speculative execution?

167 instructions in 1000ish cycles is not very good. I thought ARM was RISCy enough you could expect 1 instruction per cycle but in reality it’s way more complicated. The Cortex-A72 is a superscalar CPU with a 15 stage pipeline and can be running five instructions at once! But this is a very short program so maybe not the best test.

I did some quick tries with perf on CPU-bound jobs.

  • 1.10 instructions / cycle: stress -c 1 -t 5:
  • 0.94: zstd -19
  • 0.44: zstd -3
  • 0.89: gzip

Those last three are I/O intensive so probably not a great test.

This blog post on performance tuning demonstrates 2.053 instructions / cycle for matrix multiplication. It also says “In scientific code with a mix of integer and floating point operations, an IPC of 2 is a good starting goal.”

Learning all this reminds me how complex micro-optimization is. Particularly for C code; you’re having to not only predict what the CPU is doing but also what the C optimizer is capable of emitting. I’d naively thought programming in assembly would make it easier to get some highly optimized code for things but I’m quickly discovering it’s so complex you probably get better results letting a compiler do the real work.

California electricity grid; resource adequacy

California has its own electricity grid, roughly independent from other states. The California ISO that runs it has an excellent dashboard (and mobile app) showing grid status, predicted demand, the march towards renewable energy, etc. It’s very well done, I think a response to the chaos around the year 2000 when we didn’t have enough electricity for the state.

We’re about to have a very hot week so I looked in detail at the 7 day resource adequacy graph. What this means is documented mostly here.

There’s two datasets: graphed in teal and purple respectively. Teal is more or less actual power. Dashed line is forecast capacity; it fluctuates mostly with solar. The faint dotted line is forecast demand; it fluctuates mostly with air conditioning need and evening home demand. What this is showing us is that towards late afternoon on 9/4, 9/5, and 9/6 we’re going to have a little more demand than supply. That’s not a catastrophe; CAISO can buy power from other states, at least assuming any is available.

The purple graph is the same data but excludes wind and solar from production, that’s why it’s so steady. (It also excludes something from demand but I’m not sure what). An added complication are “credits”; I’m not sure what those are, but if you visualize them they’re only about 2GW out of 55GW.

It’s worth noting that these are predictions, promises of what the electricity providers are telling CAISO they can generate. I believe they’re obligated to actually supply all this promise, the shortages in 2000 and 2001 was partly because some of the generators were playing games with supply to manipulate the market.

There’s a lot of other neat data on the CAISO site, particularly the supply pages that highlight renewables. May 8 2022 was a banner day: we had enough solar and wind to cover all demand, so fossil fuels dipped very low. (Not zero, but we exported more power than we generated in fossil fuels. I assume that was a choice made for financial reasons.) Alas this is only possible in late spring when solar is high and demand is low.

Another interesting dataset are the curtailment reports, times CAISO tells solar or wind providers to stop providing power because they have no place to put it. It’s not a huge amount of power and seems mostly to do with local bottlenecks, not pure overcapacity. Exta power first goes to batteries and pumped hydro.

One thing CAISO doesn’t provide much is graphs of trends over year; they’re very focussed on intraday reporting. They do publish monthly reports with data in PDF format.

Linux system call numbers

Since I’m in the guts of assembly programming I’m curious about system call numbers in Linux. What’s weird is they differ depending on system architecture; exit is 93 on aarch 64 but 60 on x86_64. How does this work?

The usual source for getting system call numbers is /usr/include/sys/syscall.h. But trying to read that stuff is a twisty maze of nested includes and CPP magic (that path doesn’t even exist!) If you find your way to /usr/include/x86_64-linux-gnu/asm/unistd_64.h you’ll finally find a list but even then it’s not exactly authoritative. Those are the numbers C programs using the system calls are using.

The Linux kernel itself defines the system calls in the kernel in files named syscall*.tbl. There’s 20 right now. These are also extracted in handy table form on this web page which makes for a nice (if possibly outdated) reference. exit is 1, or 60, or 93, or 9437185, or… why are these all different?

This Stack Overflow question has a partial answer; the x86_64 numbers were rewritten in 2001 by aa to “optimize it at the cacheline usage level”. Talk about baking in implementation details for ancient hardware! But the numbers are arbitrary anyway and I doubt it’s harmful.

To complicate things further not every system call is implemented on every architecture. nice, for instance, is system call 34 on i386 but isn’t in x86_64; instead the more modern setpriority system call (141 or 154) is used. Presumably libc is papering over these differences for most programs.

I think system call numbers tend to be assigned in ascending order, so sorting by number gives you a little history of additions to the Linux kernel ABI.

Lightweight ARM64 build & run on x86

qemu-user-static is some serious magic.

So the reason I’m playing with Raspberry Pi stuff is I have a mind to learn ARM64 assembly, aarch64. And figured it’d be fun to do that with a native hardware experience. There’s no practical reason for this, just felt a desire to get back to my roots. (My first real programming was all 6502 assembly as a kid.)

So far I’ve been doing my little exercises in an aarch64 shell, either directly or using VS.Code remote development to ssh in. I like the idea of actual ARM hardware being in the mix. But cross compiling and emulating is reasonable too. And so easy!

To cross-assemble on an x86_64 host, all you need to do is

$ apt install binutils-aarch64-linux-gnu
$ PATH=/usr/aarch64-linux-gnu/bin/:$PATH

That will install as and ld and related tools for assembling and linking object code. There’s lots more packages for cross-compilers for C++, Go, Modula 2… apt search aarch64 gives you a list. (I once spent a month trying to get a gcc cross-compiler working.)

But here’s the real magic:

$ make hello
$ file hello
hello: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, not stripped
$ arch
x86_64
$ ./hello
-bash: ./hello: cannot execute binary file: Exec format error
$ apt install qemu-user-static
$ ./hello
Hello World!

WTF? I’m running an aarch64 executable on my x86_64 system. That is some serious magic. For extra credit the x86_64 system is actually WSL, a virtualized Linux. Which probably doesn’t really matter but is kinda neat.

Half the magic is done with the binfmt-support package. That is a kernel module that tells Linux how to execute various binaries it doesn’t know about. It’s the same trick that makes .jar files directly executable. So that’s the part that’s intercepting the aarch64 binary to run it somehow.

QEMU is doing the actual running of the foreign binary. It’s a software emulator, not hardware emulation, so it is pretty straightforward to have it support a bunch of architectures. qemu-user-static just packages it up nicely. It looks to support 29 different architectures including s390, riscv64, even the good ol’ mipsel I cut my teeth on in college. The emulator binaries are pretty big, 3-4MB each and statically linked.

I didn’t read up much on the environment QEMU is providing; there’s lots to read. It’s definitely not a whole operating system in a persistent VM. It seems to translate syscall numbers though; my aarch64 program works even though it calls exit as syscall 93 when on x86_64 it’s 60. The docs says it translates signals and maps threads too.

Advanced topics: running dynamically linked binaries, getting a debugger working.

Update: WSL isn’t completely seamless; after a reboot the QEMU stuff stops working because it relies on systemd to install stuff. You can run sudo update-binfmts --enable to fix it. There’s also a related problem where after installing the emulator WSL complains /proc/sys/fs/binfmt_misc/WSLInterop: No such file or directory; that is fixed on reboot.