Had to spend another 30 minutes to block whole subnets of scraper bots that claimed to be googlebot or such that overloaded my server. This is all such a pain in the ass.
Post
@tante claiming to be googlebots would be an excuse?
@Ulan_KA naa they have "googlebot" in their user agent string which I normally have on my allowlist so search engines work with my page, otherwise people just see iocaine chaos
@tante so being googlebot (or other search engine) grants permission to access, I understand. But that still should be limited, at least in frequency – is it possible to grant each search engine permission once and after that list it blocked until the pages content had been altered or some time had passed?
@Ulan_KA there are methods in place for that as well but when you have a whole bunch of IPs trying to scrape your server it just gets out of hand even with those systems that try limiting.
@tante Yea, same here because rate limiting them wasn't enough anymore.