Reading a file inside a zip as text in Python 3

I have a zipfile. It has a text file inside it. I want to read the first file as text. Here’s how to get a file-like object that has a text stream.

with zipfile.ZipFile(fn) as zf:
  data_fn = zf.namelist()[0]
  bfp = zf.open(data_fn)
  fp = io.TextIOWrapper(bfp, encoding='utf_8_sig')

The tricky bit is all the last line. The problem is ZipFile.open returns a bytes stream. So you have to wrap it in TextIOWrapper to decode it to text. Also my source data has a Unicode byte order mark in it, so utf_8_sig is a magic decoder that will strip out the BOM if present.

This is very remedial Python but I had a hard time figuring it out.

Adding static type signatures to PugSQL

Got distracted by a yak today; trying to add Python static type checking to PugSQL generated code. I got something that kind of works but it’d take some effort to make really useful. The basic idea is to create a .pyi file for PugSQL’s queries module.

The challenge with PugSQL is it’s generating functions dynamically. At runtime it reads a .sql file that says “create a function called pugsql.compiler.Module.count_links()“. So of course VS.Code’s static type checker pylance/pyright has no idea what count_links() is and it displays a red squigglely line.

I kludged it by using PyRight’s type stub support. I used PyRight to autogenerate type signatures for pugsql, including a file named typings/pugsql/compiler.py. (These generated signatures were not very good; the command line guessing tool does less of a good job guessing types that VS.Code’s interactive stuff.)

Then I edited that file to add some type signatures for my custom methods I am creating using PugSQL.

    def count_links(self) -> int:
            ...
    def select_recent(self, count: int) -> list[dict]:
            ...
    def upsert_link(self, *args: dict[str, Any]) -> None:
            ...

count_links is the simplest, it’s a basic select count(*) from links query. Takes no arguments, returns a number. select_recent pulls off N rows and returns them as a list of dicts (select * from links limit 10). Finally upsert_link is taking a list of rows (as dicts) and inserting them.

These are pretty generic type signatures. For instance select_recent is returning typed data from sqlite; those dicts actually contain entries with types you could infer from the SQL schema. That’s a good deal more complicated to do automatically, though. It’d be awfully nice, but then my main goal here was just to shut up the red squiggly line and for that a more generic type is sufficient.

Having done all this it’s enough of a nuisance I probably won’t pursue this further. But I could imagine PugSQL emitting these .pyi files automatically, it could be pretty slick.

It’d be worth reviewing what SQLAlchemy is doing for static type checking. They have a similar problem and it looks like they’ve thought pretty carefully about a solution.

Very simple Python ORMs

Starting a new project and want something to take the tedium out of a small sqlite database (like 2 tables, a few thousand rows). I don’t want to write wrappers for SQL to insert and select rows. OTOH I don’t want a complex ORM like SQLAlchemy or Django. What are my options?

  • Peewee. Well established, seems to be the consensus choice. Don’t love the code examples but they’re fine. Docs are good. Supports migrations.
  • PonyORM. The other frequently mentioned option. Has nifty metaprogramming so you can write queries in Python. But it does not support migrations.
  • Datalite. A very small binding between Python Dataclass types and sqlite tables. Docs are thin but code is small and readable. Supports migrations, surprisingly.

Based on my last project with SQLAlchemy, migration support is important. So that rules out PonyORM. Peewee and Datalite are very different in scale. I’ll start with Datalite and if it’s too limiting, consider moving to Peewee.

Update: Dan reminds me of PugSQL which is a wrapper for easy SQL execution in Python. Not really an ORM but does take the tedium out of simple SQL queries. And avoids ORM magic.

Mikrotik / RouterOS v6 thoughts

Playing with a new router every month is not a rewarding hobby.

I got my hands on a Mikrotik Hex S and tried it as my home router. It mostly worked fine but I couldn’t figure out how to get WAN failover to work. So I gave up on it for now.

Mostly was impressed with RouterOS as a basic router product for a prosumer. The UI is idiosyncratic and their OS approach is kind of old school. But the thing works. Also it’s relatively simple and transparent for simple things. And I like the simple command line / windows app / web UI trichotomy, they all work the same way. I was able to get a basic router up and running with no trouble. (It’s pretty sad that https for configuration doesn’t work; you can enable it but there’s SSL problems.)

The docs are a mess. Apparently there used to be a user-edited Wiki with many lovingly written detailed articles. That all got thrown over in favor of some official manufacturer docs. I’d applaud that investment but the execution is not great, many of the docs are read. It took me a long time to understand the gestalt of RouterOS, the concepts you need to understand to use it.

Version management is a mess too. I used RouterOS v6.48.6, the latest long term release. They put out a v7 beta months ago which was buggy, fair enough. Then they declared v7 stable. But apparently several major features look enabled but do not work? That’s the vibe I got reasearching. I stayed away from it. Which is too bad, because you need v7 for Wireguard which is one of the features I was looking for.

The other feature I needed is WAN failover. And after a couple hours’ research I gave up and decided I couldn’t do it in a way I was comfortable with. There’s no simple way to just enable it; all methods require manually configuring routes, etc. The official docs are confusing. This blog post was more helpful and details three methods of doing it, all with drawbacks. The thing that held me up is they all require hard-coding routes to specific gateway IP addresses. But ISP gateway IP addresses are dynamic, you can’t just hardcode them like that reliably. This discussion had a link to a script that looked like it might configure things with a funky script that pulls data from the DHCP subsystem. Maybe it even works, but that was a level of complication too far for me.

Folks like Mikrotik routers. And I can understand why, particularly a few years ago. Cheap but good hardware, a solid “real” OS, very powerful configurability. But in 2022 the software felt pretty retro and the moment I ran into something complicated it got idiosyncratic. Which brings me back to OpenWRT; if I’m going to hack, might as well at least hack on something that feels like a Unix system I understand. Anyway I’m back on Ubiquiti’s router for now, at least I know what their bugs are like.

Google caches “what is my ip”

I’m in the habit of searching Google for “what is my ip” to see my IP address; they have a little onebox result right at the top that shows it.

Turns out it’s cached. Worse than useless, actively harmful.

At least that’s what it looked like. I searched “what is my IP” today when I was on WAN2, my backup, and saw my WAN2 address. Then I fixed WAN1 so it’d work again and it looked like my router returned to WAN1 from failover to WAN2. But I’d type “what is my IP” and Google still showed the WAN2 address, like the return hadn’t happened. I finally checked on some other site and saw that the failover had happened. A shift-reload on the Google search page did finally show the correct data. All of this sure looks like something is caching the IP address. Maybe it’s related to a persistent HTTP or QUIC session my browser has open?

Every time I do this I have to remind myself https://api.ipify.org/ is the best way to check your IP. Simple text result, a web service intended for this use. Ipify has other tricks too, including an inexpensive geolocation service.

Brisket smoking plan

Something different here, my notes on how to smoke a brisket. There’s a zillion pages on how to do this on the Internet with conflicting advice. (This page seems better than most). I’m cooking a 6lb flat cut at 250F, on a gas grill with wood chips. I’m going with the method that estimates 60-90 minutes / pound total (so, 6-9 hours) and has you wrap the meat during the stall to help it get hot. The key thing for all this is a good meat thermometer.

At 90 minutes / pound it should take 6-9 hours total, including resting time. Hour markers below are approximate and on the high end, should be checking and ready to change earlier based on temperature.

Night before

  • Salt meat
  • Soak wood chips
  • Make rub

Hour 0

  • Put rub on meat
  • Heat grill hot
  • Get wood chips smoking
  • Lower grill temp to 250F
  • Put in a water pan to keep humidity up
  • Put the meat on indirect heat near smoker box. Fat cap down.

Hours 0-5ish

  • Check regularly, maintain wood chips, water, grill temperature
  • Some folks recommend spritzing meat with water
  • Consider rotating meat 180 degrees (fat still down) for even smoking

Hour 5ish (maybe sooner)

  • Meat should be at “the stall”; 150F – 165F.
  • Wrap meat in foil with a little liquid. Wrap air tight!
  • Remove wood chips
  • Put meat back, indirect heat

Hour 5-8ish

  • Maintain grill at 250F
  • Cook until meat at 203F

Hour 8-9ish

  • Meat is at 203F
  • Turn off grill. Put wrapped meat in an empty cooler.
  • Let rest for an hour or more. Longer is fine!

Hour 9-9.5ish

  • Heat grill to hot: 400F or so
  • Unwrap meat
  • Put meat back on grill, indirect heat
  • Leave just long enough to dry the bark on the surface. Maybe 15 minutes?

Hour 9.5ish

  • Slice meat just before serving
  • Slice meat with fat side up
  • Be sure to use a cutting board with a juice catcher

Updates

It’s going well, pictures on Twitter. Some things I’ve learned:

  • I want to do a point cut next time.
  • My 3 burner propane Weber gets to about 240F in the middle if I turn one side burner on the lowest setting. Hard to be exactly sure, it was windy enough temperature fluctuated.
  • The Weber’s thermometer in the lid reads about 20F low at these temperatures. Indicates 220F, actually 240F.
  • This little smoker box looks good. But I have to put in a lot of chips, like every 20-30 minutes adding a big handful.
  • The cook came out just as predicted. 8 hours on the fire to 203F, an hour for rest, a bit extra to dry it off.
  • The wrapped foil package comes out just full of meat juices, I must have gotten 3 cups. I was able to save it but the foil is an awkward vessel to collect it all. Also makes you realize how different the wrap makes this from a dry smoke; a traditional brisket all that liquid would have slowly evaporated or dripped off.
  • The meat was good. Not excellent, but good and worth the effort. There’s a style of brisket, mostly burnt ends, which comes out almost like bacon; crispy and with a lot of lovely rendered fat. I think you must need a point cut for that. The flat I used was much more a solid cooked chunk of meat and that’s good too but not great. Other changes I’d make: more salt, more smoke (somehow).

Sunpower PVS6 web console

The Sunpower PVS monitor has a full HTML web console accessible on its internal network. The PVS acts as a full Internet router on a debug ethernet port you can plug into. If you plug a laptop into it the laptop will be online (routed through the PVS) and also serve extra stuff at http://www.sunpowerconsole.com/. It’s mostly a web view of the data in the JSON API but it’s an official debugging tool and is quite nice.

I have my RPi plugged into that debug port and am proxying all web traffic, so on my LAN I can access http://pvspi0.local/ and get at these pages.

Here’s a list of URLs that it supports. These come from this detailed comment which also describes what these URLs do. A lot of these are steps on a little configuration wizard that presumably installers use. I think all these URLs are safe to load but some clearly have buttons to press that could reconfigure or break your system. Don’t press those.

The info view is particularly good; here’s a partial screenshot (I didn’t include the serial numbers.) Three sections: the PVS as a whole, per-panel/inverter data, and then CT monitors.

Solar panel max output, microinverter details

My solar system is mostly doing great. But I’m obsessing over small details and one thing I’ve noted is some (but not all) of my panels seem to max out or even underperform in the middle of the day by 5-10%. I’m trying to figure out why.

Panel 2122: expected
Panel 9897: clipped in the middle

Above is a graph of two of my 28 SunPower SPR-A410 panels with SPR-A410-G-AC microinverters. These were built and installed in 2021. The graph on the left shows what I’d expect; a steady increase in output following the sun. The panels face west, so it’s not a symmetric curve. The falloff end of the day is shading.

The graph on the right is the weird thing; the DC power gets up to about 300W and then sticks there, even dropping down a bit to under 290W. It does briefly get to 334W at the end of the day, so it’s not some absolute max. It’s like that’s the max output of the panel is being limited somehow. It looks like 15 of my 28 panels do this, the other 13 have the smooth curve. It’s clipping about 10% of power production for the middle part of the day, so maybe 5% of a panel’s possible output or 3% of the whole system.

I know a 410W rated panel will never actually put out 410W. But how much should it put out? The maximum I’ve ever seen is 385W (DC) or 367W (AC).

But what I really want to know is why some panels do this odd clipping thing when others don’t? Some theories:

  • Shading. I really don’t think so, both on visual inspection and also just looking at the pattern.
  • Overheating. That was my first theory but the temperature graphs don’t really bear it out.
  • AC conversion. There’s no obvious difference in the AC vs DC power for the panels, they all look to have about the same conversion efficiency. The clipping pattern is visible in the DC as well.

Some things to do next:

  • More statistical exploration. Is it the same panels doing the same thing every day? Is the effect consistent day to day?
    select SERIAL, max(p_mppt1_kw) from pvs6_inverter where SERIAL='…9897' group by time(1d);
  • Get out the map of which panels are installed where and see if there’s a spatial correlation.

Data Details

I’m pulling this data from the SunPower PVS6 device_list API call which is accessible from the device itself. It reports the following variables for each microinverter at each panel:

  • Status: STATE, stat_ind. Always “working” or “error”
  • Temperature: t_htsnk_degc.
  • Cumulative production: ltea_3phsum_kwh.
  • Instant power measurements: p_3phsum_kw, p_mppt1_kw
  • Instant current measurements: i_3phsum_a, i_mppt1_a
  • AC frequency: freq_hz
  • No longer available: v_mppt1_v vln_3phavg_va

The interesting measures here are temperature and power in kW.

Power (and amperage) are reported both for DC power (mppt1) and AC power (3phsum). DC is always(?) the higher number so I assume the small difference is a measure of microinverter inefficiency. Which seems to be about 5% when the panel as a whole is producing about 300W.

Updates

This pattern didn’t happen on 2022-06-21, a fully sunny day. The only real difference I can think of is that my house was using more power; very hot day, AC running. So there’s more load in the house. But surely that doesn’t matter, PG&E will absorb however much power I can produce? Anyway here’s a graph of 9897, the problem panel above. All of the panels look like this this day.

9897

Korean writing: Hangul and Unicode

CAVEAT LECTOR: these are my notes after 2 hours of reading Wikipedia articles. I’m a rank amateur, in no way is any of this stuff authoritative. Better resources than this blog post: Learn Hangeul, Let’s Learn Hangul.

I’m interested in Korean writing, specifically Hangul. It’s so graphically beautiful! And I love writing systems. It seems possible to learn enough to pronounce words clumsily in just a few hours.

Hangul 한글 is what modern Korean is written in. But historically there were other writing systems. Hanja 한자 / 漢字 is the big one; was used starting around 4th century BCE and is based on Chinese characters. I think it’s a little like how kanji works for Japanese. Hanja is still in occasional use but I think mostly in text describing historical things or some proper nouns. There are some other historical writing systems too, mostly efforts to use Chinese characters to represent Korean word sounds. There are several modern Romanizations including a standard one. Also Kontsevich, a transliteration into the Cyrillic alphabet.

Hangul was invented in the 15th century by King Sejong 조선 세종 (or his advisors) and published in a written Chinese document called the Hunminjeongeum 훈민정음 / 訓民正音. It wasn’t immediately popular, Wikipedia implies the educated classes were snobbish about their classical Chinese writing. Hangul became official right at the end of the 19th century. It’s been slightly reformed over time, with some phonemes being removed or unified as they fell out of use.

Despite looking a little like Chinese on first blush Hangul is actually a phonetic writing system; each square that looks like a character is actually a syllable composed of (usually) three phonemes. In general Korean is a CVC language. A block like 한 (“han”) is really three letters called jamo 자모. So 한 can be broken down into three jamos drawn in one block: ㅎㅏㄴ (H A N). There’s a claim that Hangul is a featural writing system, that the visual shape of a jamo reflects the shape the mouth makes when making the sound. Not sure how accurate that is.

Jamos are pretty much never written alone. Instead they are combined into a square block with a fairly straightforward algorithm nicely visualized on Wikipedia. The last consonant is at the bottom. The first consonant is either to the left or above the vowel. It gets a little more complicated because some letters have complex forms.

One extra detail: at least in Unicode, jamos come in both leading (choseong) form like ᄀ and trailing (jongseong) form like ᆨ. The shapes look the same but in this font I’m using they’re shown at different heights. I’m not sure why leading vs. trailing is encoded differently or whether Korean native speakers think of them as different things; since jamos don’t normally appear alone it’s a pretty academic point.

English speakers should be aware that Korean phonology is quite different from English or similar European languages. A “t” sound could be a simple t ㄷ, a tense t͈ ㄸ, or an aspirated tʰ ㅌ. These variants are written as different jamos and are basically entirely different consonants even if they sound kind of the same to an English speaker’s ear. Modern Korean is not tonal but historically it was. Hangul used to use diacritics to indicate tone.

Korean in Unicode is a bit of a mess. My understanding is almost all systems encode Korean with Hangul Syllables, characters in the range U+AC00 to U+D7AF. Each code point is a single syllable. Use those and it’s pretty simple, there’s just a lot of them. The odd thing is these 11,184 code points are entirely algorithmically derivable as combinations of the individual jamos that make up the syllables. The graphical combination is also algorithmic. Instead of thousands of syllable code points you could just write Korean unambiguously as strings of 67 different jamos, they are in Unicode (U+1100 to U+11FF). But that’s not what people do in practice (see the road not taken). I think the only time you’re likely to see the jamo form is if you use normalize and decompose input into Unicode NFD.

A second historical complication of Korean Unicode is mentioned in a tart comment in the UTF-8 RFC

Amendment 5 to ISO/IEC 10646, however, has moved and expanded the Korean Hangul block, thereby making any previous data containing Hangul characters invalid under the new version.  Unicode 2.0 has the same difference from Unicode 1.1. The official justification for allowing such an incompatible change was that no implementations and no data containing Hangul existed, a statement that is likely to be true but remains unprovable.  The incident has been dubbed the “Korean mess”, and the relevant committees have pledged to never, ever again make such an incompatible change.

Mentioning this mostly because it’s funny, doesn’t seem relevant to contemporary systems.

Next thing for me is I’d like to learn more about Korean calligraphy and font design. How Korean artists think about their own writing and manipulate it.

Source

My recent Starlink POPs and IP address changes

A thrilling list of my Starlink ISP’s point of presence, where my packets come out to the Internet. Looking at this because this past week a bunch of users including me have had a problem where their TCP sockets won’t stay open for more than a few minutes, breaking all sorts of stuff. Each date is when my IP address changed.

  • 2022-05-29: Seattle
  • 2022-06-01: Los Angeles
  • 2022-06-04: period of instability
  • 2022-06-04: LA
  • 2022-06-06: LA
  • 2022-06-07: instability
  • 2022-06-07: LA
  • 2022-06-10: LA
  • 2022-06-11: LA

I first noticed my TCP connection problem on 2022-06-07, about the time of that period of instability. It’s mostly been gone since 2022-06-08 but it came back at least a little bit today and is gone again.

The command to do this is

grep '^2022' homeip.log | fgrep -v '157.131' | fgrep -v 173.195 | uniq -c -f 2