Archive.org doesn’t honor robots.txt

As I’ve mentioned before you can block the Internet Archive from archiving your website with the following in your robots.txt file as they state.
User-agent: ia_archiver
Disallow: /

Or you can block all robots, including the Way Back Machine.
User-agent: *
Disallow: /

However, I’ve found some sites that don’t honor robots.txt.

I also found out that once your domain expires and you no longer have the domain, you may be surprised to find out there is a archive of your website despite having it set to be excluded them whole time you owned the domain. So while they might not show any archives while you have the robots.txt file, they still archive it anyway, for the day you no longer control the site.

Leave a Comment

Comments are reviewed before publishing to prevent spam.