Desktop feed readers

While working on my Atom feed for the linkblog I was frustrated I had no way to preview a feed. The only reader I knew about was Feedly, the hosted app, and they don’t have a way to say “reload this feed” to easily see changes. I couldn’t find a modern desktop client feed reader at all. Turns out there‘s a few, here’s three I tried.

  • QuiteRSS: simple, no nonsense. Last release was April 2020. There’s a fork with a little more tinkering but it’s not a lively project.
  • SeaMonkey: the continuation of the old Mozilla hairball of Internet apps. It has a decent feed reader. And active development.
  • LifeRea: a Linux app. Works in WSLg. Actively developed.

I used this to diagnose a formatting issue with my atom feed. My summaries are now HTML but I wasn’t really formatting them as HTML with <p> tags and the like so the images were being inlined in an ugly way. Fixed now.

Blocked by Akamai

For the past two days I’ve been taking web previews of all 25,000 links from my linkblog. Akamai seems to have blocked me in retaliation. Requests to Akamai-hosted services like are giving me an old school unstyled 403 Forbidden.

I assume they think I’m a scraper of some sort. Which I am, but an awfully low key one. I’ve got a single thread downloading web pages one at a time, every 1-10 seconds. Surprised that triggers Akamai’s defenses. It’s not a huge deal for my current project, but I sure hope it goes away once I stop my survey because it’d be awfully annoying to be blocked from 10% of the Internet permanently.

A little curious how they even caught on to me. I imagine user agents; the only place I make an effort to pretend to be a desktop browser is when I download the actual image named in OpenGraph data. (Oddly this fixed a lot of problems; apparently I can download the HTML with some user agents but not the linked image?!)

I’ve got metascrape set up using undici as the HTTP client and it sends a user agent of undici.

shot-scraper’s default user-agent is Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/102.0.5005.40 Safari/537.36 which looks close to a legit desktop browser but not exactly. It’s possible to reconfigure what shot-scraper uses.

User agent is the dumbest sort of security; it’s trivially easy to spoof, so really you’re only blocking lazy well meaning people like me if you filter on it. But there’s much more aggressive forms of user agent detection. See curl-impersonate which tries to mimic desktop browsers’ SSL behavior to evade more hardcore detection.

Update: 24 hours later and still blocked. I’m mad about this now. If I’d been aggressively scraping one site repeatedly I’d understand. But some lightweight automated screenshotting of sites all over the Internet once every few seconds should not cause me to get blocked forever. I had to set up tinyproxy just to book a damn hotel. Ironically I can’t read Akamai’s own support notes on their bans, since they are all hosted on Akamai.

Update 2: the block slowly expired starting the morning of July 23, 3 days after my screenshotter tool stopped running. It wasn’t lifted all at once. was the first site I noticed working again. Later took the longest and for all I know some Akamai sites are still blocking me. Schwab showed a really irritating failure mode; the main page loaded fine (hosted elsewhere?) but the login iframe was blocked. At least that showed an error; I also hit some sites where some invisible AJAX refused to load, invisibly.

Tech notes on my linkblog

My linkblog website is done, see it here:

After a week or two of tinkering (and more research) I’ve come up with a nice clean web page for my linkblog. Each post has its own box with an image preview. I’m pretty excited how it looks. Also reminded myself of a cardinal rule of design projects; get something that looks good and functional ASAP. It’s so motivating!

Here’s screenshots of design iterations (from newest to oldest).

It’s a pretty straightforward layout but there’s one clever thing I’m doing, highlighting positive sentiment posts (white) from negative (black). I like how I iterated from a simple vertical design to something with alternating left and right. It’s idiosyncratic but I think it’s interesting.

Code architecture

It’s all pretty simple. Custom built Python static site generator driven from Pinboard data making liberal use of external Unix tools for complicated things. Very basic hand coded HTML and CSS, no frameworks, no Javascript. Static sites for static data!

The main loop is to sync data from Pinboard once an hour into sqlite. Then run a process to generate image previews for new links. Finally render HTML from the data in sqlite and push to a web server.

I used my friend Dan’s PugSQL for database access in Python. Very nice little tool; you write actual SQL code with just the lightest sprinkling of metadata for variable names, then execute it in your Python code. It takes some of the hassle out of writing SQL in Python without doing anything too magic. Worked great for this project. I only have two tables in SQLite and a total of 7 SQL queries. One nice thing about sqlite is there’s less need to optimize; I’m happy to make several SQL calls to render one post rather than try to do some complicated join to minimize database round trips.


I’ve written several blog posts about preview generation and I’m glad for all the research I did. I settled on a single 320 pixel wide image for a preview, height up to 320 with a preferred height of 180. My code supports multiple engines for generating an image preview but in the end I’m only using two. Metascraper, a standalone program that analyzes HTML for OpenGraph tags, etc to select an image. And shot-scraper, a standalone program that takes screenshots with a headless browser. I also tried the linkpreview service and a Python library called webpreview but they were redundant with metascraper.

Multiple engines yields multiple candidate images. I pretty much always take the Metascraper image; it’s available for about 90% of the links. In a few hand-coded exceptions I’ll prefer the screenshot. (Metascraper offers crappy images for Hacker News posts and Wikipedia pages that don’t have featured images. Also sometimes it returns an SVG image which the rest of my code can’t handle.)

Overall I’d say 90% or more of my posts have good images. A few of the Metascraper images are turkeys. Some screenshots are marred by cookie popups, etc. One neat thing is that web pages that don’t screenshot well also tend to be the ones with thoughtful OpenGraph previews.

The preview images are downloaded or generated and resized to 640 wide with lossy cwebp. Major savings there, the results are maybe 10% the size compared to just serving the actual image. Screenshots average 27kb, metascrape’s website images averages 50kb with a few oddball much larger ones because I couldn’t resize the source image. (cwebp can’t deal with animated GIFs!)

Page size and performance

I’m going for a fairly maximalist presentation so I don’t care too much about page size. 100 links on a page makes for a 5MB download, almost entirely the image previews. Google’s PageSpeed Insights gives me a pass on “core web vitals” and an 85 on Performance, which isn’t awful. The main complaint is just that it’s a 5 megabyte page with 100 medium sized images on it. I could easily solve this by just including fewer links on the page 👿. Or maybe go to an infinite scrolling / load on demand design, but the extra complexity of that does not seem worth it to me.

The one performance thing I’m not happy with is the image reflow. I don’t set an explicit height for the images because I’m using CSS to calculate the height. I’m not cropping the images but relying on overflow: hidden to contain them to max-height 320px. This all looks good, is less work for me, and has the nice property that the full preview image is available if you click on it. But it does cause a lot of reflowing while the page loads. I should revisit this.

Update: Thanks to a hint from Thomas S I now am including the native file image size in the HTML img tags. Combined with CSS rules for max-width: 100% and height: auto and the browser does a nice job laying out the images before loading them. I also added loading: lazy to the image tags at his suggestion which is a big help. The new mystery is font loading; Firefox loads them after the visible images. But that’s a deep rabbit hole.

Responsive design

This is the first time I’ve coded a mobile-friendly view for a website with a responsive design. All by hand, no framework to help me. My design is intended for desktop use, I worked with an 800px wide frame as the core design element. 320px for images, about 440px for text, and some gutters. But I wanted it to look reasonable on a phone too. So I read a couple of MDN articles and figured out how to do responsive design. It’s simpler than I realized.

The first principle is that you need to set <meta name="viewport“>. Without it mobile browsers seem to go into some compatibility mode where they render the page for a screen that’s 960px wide and then shrink it to fit in the actual CSS pixels for the screen, often about half or a third the size. That’s why fonts are so tiny for non-mobile websites on phones! So you set the viewport to width=device-width and now the page will render at the phone’s native CSS width, typically around 400 CSS pixels. (Which is probably actually 800 or 1200 physical pixels thanks to high DPI.)

The second principle is just to design the site so the page looks good at various widths in a regular desktop browser. I originally hardcoded a rigid width: 800px in my main display element which looks terrible if you shrink the browser below that (it clips). Making a more flexible layout works better. I was a bit constrained because I really wanted the preview images to be fixed at 320px, but now the text could go anywhere from 440px to about 250px and still look good.

The third principle is media queries to define explicit CSS rules for different screen sizes. My main rule is if the screen is smaller than 800 pixels to render differently; switch to a single column layout (images below text), use smaller margins, etc. For smaller mobile size screens I knock the font size down a bit too, to make more text fit. I did the opposite of the recommended “mobile first design”; my default CSS rules are for wide desktop screens, then I have overrides for small mobile screens.

Firefox has a great “responsive design mode” that lets you simulate how your page looks on various real world phones. Big help.

Future work

I’m pretty happy with where this stands. But I’m sure I’ll tinker.

One future work idea is to do some sort of pagination / archive view. Right now I’m just showing the last 100 links, without even dates displayed. That’s fine, there’s an archive view on Pinboard. But maybe I’ll eventually do more on my static site. See notes above about infinite scrolling as a possibility, too.

The other thing to tweak is continue to get better previews. Metascraper could be improved. Also I realized I could use Feedly to generate previews for me; they have an API where I can pull their preview images. There’s diminishing returns for this kind of work.

Finally I have one unexplored design idea. Display each preview as a full-bleed image in the background of a box for the post. Then put my descriptive text on top of the preview image, using some combination of blur and shading and maybe text halos to make it readable. It’s a pretty aggressive design but I’ve seen stuff like that look good before and it’d be fun to try.

WebP conversion

I just spent about an hour learning about converting images to WebP. For my linkblog; I want my link previews to be small files. Target display is 320x180ish images.

Here’s where I landed to resize everything to 640 pixels wide with a fairly high quality:

cwebp -quiet 
  -af -q 90 
  -resize 640 0 
  -metadata all 

These are guesses but seem to work about right.

-af -q 90 are the quality settings. 90 is fairly high, the default is 75. af means “spend more time making it look good”, it seems to take about 2x as long. I also experimented with the -m setting to trade off time vs compression quality but the default -m 4 seems fine to me.

The resize forces all images to a width of 640 and whatever height is natural. Twice the target display resolution for retina display.

The one surprise is some of the images (but only some) seem to get a little brighter after conversion. I thought that was a symptom of ICC color profiles not being applied right, which is why I added the -metadata all flag. But it’s still happening. I don’t care enough to figure it out; it could just be the difference in the browser resize algorithm vs cwebp.

Starting with a bunch of 1280×720 screenshot PNGs the webp versions are about 10% the size. If I leave out the image resizing it’s about 25%. That’s mostly thanks to the lossy compression which is fine for my purposes given I’m resizing too.

Overall this got my full page down from 24MB to 2MB. Worth the time!

Website previews: images for unfurls

Some notes on another way to generate one image per link for my linkblog: unfurls. Those are the site previews things like Twitter Cards or Slack Unfurls or Facebook Link Previews. They try to extract a few words and an image from the page; the resulting preview is structured data, not just a picture. Most of these basically work by looking for oEmbed tags, OpenGraph tags, or Twitter’s card tags and then just guessing if those metadata tags aren’t present.

Most unfurls are a mix of text, a little structured data, and an image or video embed. I’m focused on the image. Possibly I could generate an image from textual data too.

Page metadata: OpenGraph and friends

Facebook, Twitter, Slack, etc rely primarily on page metadata to get a text summary and example image. Only works for sites that publish the metadata but since it’s so commonly used now a lot of sites have it. These formats all offer one image. By custom that image tends to be 400+ pixels wide and roughly 2:1 aspect ratio.

OpenGraph is the big standard, originally from Facebook, here’s a useful description of how it works in practice. Dates to 2010. For my purposes the key tag is og:image. og:title gives text.

Twitter’s cards are a popular expansion of the OpenGraph idea dating to 2012. More details on supported tags here. twitter:image is the most relevant, and possibly twitter:player. There’s more tags for textual metadata than OpenGraph.

oEmbed is the oldest standard, from 2008. Metadata isn’t in the page itself, instead you have to make a query to a special JSON endpoint. Interesting image fields are thumbnail_url and maybe the photo and video types. Lots of textual metadata too. I don’t know how popular oEmbed is these days but Slack supports it. Articles also are relevant, particularly since Google supports them. It started in 2011 as a way to tell search engines how to summarize a page. Metadata is in the page in one of three encodings (sigh). There’s also an overwhelming number of tags; maybe thumbnailUrl is what I want? or image? Honestly this is all so complicated I’m not in a hurry to learn more.

Without metadata

Metadata is great if it’s present, what do you do if it’s not? Maybe pull an image from the page? This approach doesn’t seem so popular but Feedly does it. There’s a description in item 4 of what they do. Boils down to images tagged webfeedsFeaturedVisual, or else the first big image in the story, or else the biggest image on the page.

Another option would be to take a screenshot of the page itself. I have some notes on that but it’s not very simple to get a good result.

A third option would be to use the favicon. Not awesome, but it is there.

Heuristics for combining preview data

So now we have myriad ways to find an image for the site; which one do we use? Slack has a great post (from my friend Matt!) about how they generate unfurls. They give oEmbed priority, then Twitter+OpenGraph, then HTML meta tags as a fallback. They also seem to combine data from multiple sources; their cards have room for a lot of text.

Tyler Young from Felt recently tweeted about their solution. Light on details though, mostly it’s just “it’s complicated”. That echoes what I’ve heard informally from folks who’ve worked on this problem at various companies. Lots of one-off hacks in the end.

Code Tools

All this stuff is complicated and of general interest; is there a reusable library for generating previews I can just use? has a big list, here’s some highlights from it and some others.

metascraper (GitHub) looks to be the most active of the NPMs. MIT license, active development. Has a lot of configurable options and also custom code for popular sites. This looks like a strong contender I should evaluate further.

iFramely (GitHub) is mostly a hosted service but their parser is on GitHub with an MIT License. Javascript, looks like fairly active development. Also promising.

unfurl (GitHub). TypeScript, active development, MIT license.

Link Preview (GitHub) generates OpenGraph, TwitterCard, and oEmbed previews of pages. The source is Javascript and MIT licensed. Last updates about 2 years ago. Looks like a small project.

pyUnfurl (GitHub) does the various metadata things and falls back to favicon. Python, MIT License, small project with last main development 3 years ago.

extruct is in Python (BSD license) and recently updated. OpenGraph but not Twitter (yet, there’s a pull request).

webpreview is in Python (MIT license) and was last updated two years ago. OpenGraph, Twitter, Schema, or else “from the webpage’s content”.

Metaphor is the Python code with the most Google juice but hasn’t been updated in 5 years.

Buying a service

Another option for unfurls is just buying a blackbox service for this. is the one I know; part of the value ad is they have “700+ Official Content Providers” who they’ve worked to interoperate with. Their $9/mo product wants to embed “cards” though, which I take to mean Javascript running on your site. Not for me. The $99/mo product is a lot more flexible but is more than I’d want to spend. iFramely also looks promising although similar problem with pricing / embeds.

Update: Ryan B mentioned using, it looks promising. It has a JSON API that returns data including an image. The free plan might work for me or paid plans start at $8/mo.

It’d also be possible for me to do a bit of guerilla scraping for my own “service”. Post the link to Feedly or Twitter, then capture what preview they come up with. That might actually work pretty well, at least right until it doesn’t.

My Plan

I should start with Metascraper. It looks good, my only complaint is dealing with Node. Alternately I think I’m stuck writing my own thing, a lightweight OpenGraph / Twitter card scraper plus reimplementing some image choosing heuristics. That’s a lot of work I’d rather not do.

Once I have an unfurl tool I can use I should test it on my linblog. I’m really curious how many of the links I’ve posted have useful metadata. I’d guess about half of them in the last 3 years. Seems worth doing a survey.

Screenshots may still prove to be a useful fallback.

Generating screenshots from web pages

I’m working on a way to get one image for a link for each link in my linkblog. There’s two basic approaches: screenshots and unfurls. This post has my notes on screenshots, particularly Simon Willison’s shot-scrape tool based on Microsoft Playwright.

Screenshots boil down to running something like a normal browser and capturing the page after it renders. Sounds simple but the modern web with ads, cookie popups, etc makes this hard. The problem is getting a screenshot that shows the intended content. I tested about 50 links of mine with shot-scraper and maybe 40% didn’t work at all, or had a paywall notice or a cookie consent popup covering everything or otherwise didn’t show anything useful.

One straightforward solution would be just to take screenshots manually when adding a new link. I already like what I’m seeing when I choose to blog a link, why not take a picture? I don’t want to create new work in my flow of posting to a linkblog but honestly this wouldn’t be too much, particularly with a browser tool automating the screenshot process. A related solution would be to pay someone else to do the screenshot for you via Mechanical Turk or the like. But at my small scale I don’t think that makes sense.

shot-scraper is a promising new tool for automated screenshots. It’s a wrapper around Microsoft Playwright, an automation framework that wraps up browsers like Chrome or Firefox and makes it easily programmable. Simon’s shot-scraper puts a nice command line interface around it and while it’s written in Python, it’s more of a command line tool than a Python API. It’s fairly subtle and allows for extracting specific CSS pieces, running some Javascript in the page context, etc. I’ve had promising results with it so far but it needs some tweaking to get the more recalcitrant pages to display.

Ben Welsh’s News Homepages project is open source and also uses Playwright to take screenshots of newspaper pages. The clever thing there is a bunch of extra rules to make the 200 sites they care about look better. There are generic rules like “don’t show anything named popup_wrapper”. Also site-specific rules that boil down to hiding specific CSS classes or running a bit of extra code to adjust the page layout. The bespoke approach won’t work for me since I don’t have a short target list but some of the generic rules might work.

I wonder if it’s possible to install an ad blocker or other addons inside Playwright’s browser? (Simon thought maybe in the Javascript version, but not Python). That might be a good general purpose way to improve the screenshot quality.

Another tweak is to the user agent you send when taking the screenshot. Looking like a bot often gets you blocked so emulating a browser may be necessary. I have a suspicion that the screenshots from mobile versions of sites are likely to be better than desktop ones. Although Tyler warns that different sites work better with different user agents.

There are some web services that do screenshots that might work well: Site-Shot ($15/mo for 5000 screenshots), url2png ($29/mo for 5000), urlbox ($99/mo for 20,000), or the old Thumbshots (shutdown). Update: (free for 7000, $25/mo for 500k). A bunch more listed in urlbox alternatives. (Pikwy looked promising but didn’t work for me.) One thing to check with these is how easy it is to store the screenshot yourself; some seem oriented to them hosting the screenshot and charging per-view.

Overall I’m not very excited about screenshots and will probably not pursue this further for my linkblog. Unfurls / site previews seem a better path.

Linkblog design

I’m working on my linkblog and went looking to other linkblogs for inspiration. Some notes on their data design and presentation. Note that the heydey of a linkblog was in 2003 so a lot of these designs are quite dated. See Cameron Marlowe’s roundup for a bunch of the old ones.

Common data elements (“authored” means the person linkblogging created the data themselves; otherwise it’s from the page being linked or automated.

  • URL of page (maybe more than one in a post)
  • Authored short description of page (sometimes title of page; not authored)
  • Authored extended description of page
  • Authored tags
  • Date link was created

Uncommon elements

  • Permalink to linkblog post itself
  • Image chosen from page (not screenshot)
  • Screenshot (only on Hot Links)
  • Social media engagement: Tweet this, Tumblr reactions
  • Full text comments (on LinkMachineGo)
  • “Via” metadata, the source for the link (on Waxy links)
  • Morale, positive/negative sentiment (Erik Benson) (I’m doing this now myself and like it a lot.)

To me the defining characteristic of a linkblog is that each post is a single URL. And that the linkblog post itself has no permalink, page view, etc. The point of the linkblog is to point to other stuff, not to be a thing itself. (A key corollary is the RSS feed for a linkblog should have links directly to the destinations, not links to the linkblog posts.) But not all linkblogs are like this; Cal’s is more of a freeform blog, just with very short posts, as is Waxy’s. And there’s a constellation of link-heavy blogs like Today in Tabs or Webcurios.

I’m not thinking of a linkblog as a social conversation. To me it’s a one-way publication.

I think it’s interesting how text-heavy all these linkblogs are, even the modern ones. Hot Links is the only one that had page screenshots / thumbnails. A couple of the modern ones seem to be doing unfurl-like images too, like a youtube embed or a single picture from the page. Mostly text though. My guess is it’s because making images is a hassle.

Here’s some screenshots from modern linkblogs.

And here’s some from O.G. linkblogs (2003ish)

Thoughts for my design

I’m going to stick with the Pinboard data schema. The important elements there are URL, short description, extended description, and tags. I also have a special tag now which indicates sentiment. I’m ambivalent about whether the other tags are useful; the old tagging metadata ethos seems like a failure to me. But I dutifully type six or so random words per link, might as well display them.

What I’m most interested in is what else can I add to a linkblog entry to make it more interesting / convey more about the target. My big wish is for an image: either a screenshot or a Slack/Twitter/Facebook/OpenGraph like unfurl. I’ve been researching how to do that. I also am curious if there’s an AI driven web page summarizer that would be useful. Probably not, and limited to text-heavy links anyway. But worth a look.

As for visual design I’m most inspired by that Hot Links screenshot above. I want to avoid the “list of text links” look although with a fancy layout like Balaji’s (4th modern screenshot) it looks better.


Here’s a list of all the linkblog URLs I looked at for this little survey.

Modern linkblogs

O.G. linkblogs