Post · bonfire.cafe

Had to spend another 30 minutes to block whole subnets of scraper bots that claimed to be googlebot or such that overloaded my server. This is all such a pain in the ass.

Jaime Herazo

@jherazob@mastodon.ie · 2 months ago

@tante
Is there a source of IP blocklists others can adopt? Or is this already at the whack-a-mole stage with new ones popping up all the time? Lived that when managing a mail server against spammers ages ago and wasn't fun

tante

@tante@tldr.nettime.org · 2 months ago

@jherazob whackamole stage

Jaime Herazo

@jherazob@mastodon.ie · 2 months ago

@tante
Ugh, was afraid of that

U.Lancier

@Ulan_KA@social.tchncs.de · 2 months ago

@tante claiming to be googlebots would be an excuse?

tante

@tante@tldr.nettime.org · 2 months ago

@Ulan_KA naa they have "googlebot" in their user agent string which I normally have on my allowlist so search engines work with my page, otherwise people just see iocaine chaos

U.Lancier

@Ulan_KA@social.tchncs.de · 2 months ago

@tante so being googlebot (or other search engine) grants permission to access, I understand. But that still should be limited, at least in frequency – is it possible to grant each search engine permission once and after that list it blocked until the pages content had been altered or some time had passed?

tante

@tante@tldr.nettime.org · 2 months ago

@Ulan_KA there are methods in place for that as well but when you have a whole bunch of IPs trying to scrape your server it just gets out of hand even with those systems that try limiting.

gpk

@gpk@23.social · 2 months ago