Wanted: browser based page scraping

Doing yet another HTML scraping project, contemplating the slowness and desolation that is BeautifulSoup or spend hours learning scrapy or surely there’s something better by now?

There is, my web browser, with DOM CSS queries. Just load the page and do querySelector and you’re done. Most modern HTML is quite nicely scrapable in browser Javascript. The problem is you can’t effectively script a browser to process thousands of pages. I’d hoped node.js would offer a solution but they don’t have some battle-hardened HTML parser like a browser has. There are some options, wonder if any are worth the time to learn about.

 

One thought on “Wanted: browser based page scraping

Comments are closed.