I've started working on this. The web has gotten sufficiently annoying, even with ad-blocking, that it feels worth it.
@madewokherd i wonder if a better architecture wouldn't be building this around headless chrome
@tbodt That would be a very different project.
@madewokherd the train of thought is, first it's just html, but someday you're probably going to find some website that needs to fetch the content with javascript, so maybe it does make sense to reuse all the browser crap but wrap it in a highly controlled layer. but that does make it a very different kind of project
@tbodt When that happens, I will write my own logic to fetch the content.
@madewokherd obligatory embroidery troubleshooting page (header tag left open in each level) https://web.archive.org/web/20140310190221/http://www.sewingandembroiderywarehouse.com/embtrb.htm
@mavica_again I'll make sure I test with this, thanks.
I reached a point where I can parse most of an HTML header and now I'm exhausted.