riddle me this: what's stopping AI scrapers from changing their user agent if they already don't respect robots.txt

stupid-ass arms race

@mavica_again Nerd-sniped into searching GitHub for the answer

I think maybe it's a deliberate choice to not make it any more complicated than necessary for actual existing bots github.com/TecharoHQ/anubis/pu

@madewokherd i get it but like. then what's the point

this was prompted by a forum post pointing out you can get rid of anubis (which they argue is harmful for getting in the way of javascript-free browsing) by just making your user-agent as wget's

@mavica_again The point is it doesn't take 5 minutes to load Bugzilla anymore. If that changes, then they escalate, I guess.

@madewokherd like i said, arms race

because it's less than trivial for ai scrapers to change their user agent

@mavica_again It's an arms race, but one where no one is currently escalating. Probably because it buys them very little. Yes, an AI scraper could trivially detect Anubis and change the user agent to wget or whatever. And they could scrape the very small portion of the web behind Anubis for maybe a weekend before people start configuring it to block that.

Follow

@mavica_again I bet AI companies could also trivially configure their scrapers to not DDoS the Internet, but apparently they don't care enough to do that either. It'd probably be less work for more payoff than participating in this particular arms race.

Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!