Post · bonfire.cafe

I'm feeding bots with #FreeBSD. https://j.agrue.info/bot-feeding-on-freebsd.html. Many thanks to https://maurycyz.com/projects/trap_bots/ ! I (think I) improved the software slightly.

Do you know of AI bots using #GeminiProtocol ? Do you know what AI crawlers do and don't do, and what defenses work this week and don't? What are the next moves in what feels like a long game of cat-and-mouse?

Writing to think 2: Write harder - Bot-feeding on FreeBSD

algernon the beaming

@algernon@come-from.mad-scientist.club replied · 2 weeks ago

@jaredj ...now that I've been tagged into this thread, I'll link to this lobste.rs comment of mine, wherein I mention a couple of trivial defenses that have been working quite well for me for many months now.

TLDR: use ai.robots.txt, and if you see Firefox/ or Chrome/ in the user agent, check if there's a sec-fetch-mode header too. If not, 99% its a bot. And if you can afford it, block Alibaba's & Huawei's ASNs.

These three are responsible for catching ~95% of the bots that crawl my sites (that's about ~20-22 million requests caught each day, sometimes less, sometimes a whole lot more).

The reason for blocking Alibaba & Huawei ASNs is that - as far as I see - they're the ones that use crawlers piggybacking on real browsers most, and real browsers are hard to catch when you don't keep state. It was easier to catch them by their ASNs.

EK :a_openbsd:

@rqm@exquisite.social replied · 2 weeks ago

@jaredj On the BBS someone wanted to use generative AI to populate Gemini with more content. It was a pointless and stupid idea and it never went anywhere, to my knowledge. Not quite crawling but it shows the AI boosters are there (or at least were) on Gemini too.

For now I think Gemini is only protected by obscurity and that to implement a Gemini crawler would not be profitable for crawling companies.

I haven't yet seen AI crawlers in my server logs that I could identify, but there is even less protection with Gemini than with HTTP because of the protocol. Like you cannot require cookies, and javascript things (captchas, Anubis) won't work either. And because it is plaintext it would be easier for crawlers to ingest as well.

I have been thinking about this and I see three lines of defense for #Geminiprotocol possible:

1) firewall level blocking of crawler ip's
2) Requiring client certificates for simply viewing content
3) For a harsher lockdown require server certificates

Something like Iocaine by @algernon with an SNI-based reverse proxy in front of Capsukes would be also great fun.

Lasse Beyer

@lasse@social.tchncs.de replied · 2 weeks ago

@rqm @jaredj @algernon I think you could rate limit your server to one request per second or something like that. As there are no inline images etc. in gemtext, that wouldn't bother any real user. And if an IP constantly hits the rate limit, you could still block it then.

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.0 no JS en

Automatic federation enabled