The Internet Archive has decided to ignore robots.txt
blog.archive.org/2017/04/17/ro

This is amazing news for internet history.
a) All public stuff will be crawled. Don't want that? Don't make your shit public.
b) Lapsed domains replaced with parking pages using restricting robots.txt won't prevent old, dead versions of sites from being visible.

Archive everything. #NewLibraryofAlexandria #OCD

Follow

@jbbdude point b seems especially good. not sure how i feel about point a, but i understand its reasoning.

· · 0 · 0 · 0
Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!