Discussion
Loading...

#Tag

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Hacker News
Hacker News
@h4ckernews@mastodon.social  路  activity timestamp 6 days ago

How I protect my Forgejo instance from AI web crawlers

https://her.esy.fun/posts/0031-how-i-protect-my-forgejo-instance-from-ai-web-crawlers/index.html

#HackerNews #AIProtection #Forgejo #WebCrawlers #Cybersecurity #TechTips

How I protect my forgejo instance from AI Web
Crawlers

This article describes my nginx configuration and strategy on how to prevent web crawlers from putting down my instance while still serving most people with minimal amount of friction.
  • Copy link
  • Flag this post
  • Block
Nizar Kerkeni 馃嚬馃嚦 賳夭丕乇 丕賱賯乇賯賳賷 boosted
t04d8b
t04d8b
@t04d8b@social.lol  路  activity timestamp 3 weeks ago

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 馃敼

  • Copy link
  • Flag this post
  • Block
t04d8b
t04d8b
@t04d8b@social.lol  路  activity timestamp 3 weeks ago

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 馃敼

  • Copy link
  • Flag this post
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.1-alpha.40 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct