Facebook's Fascination with My Robots.txt
https://blog.nytsoi.net/2026/02/23/facebook-robots-txt
#HackerNews #Facebook #RobotsTxt #SocialMedia #TechNews #WebCrawlers
#Tag
Facebook's Fascination with My Robots.txt
https://blog.nytsoi.net/2026/02/23/facebook-robots-txt
#HackerNews #Facebook #RobotsTxt #SocialMedia #TechNews #WebCrawlers
How I protect my Forgejo instance from AI web crawlers
https://her.esy.fun/posts/0031-how-i-protect-my-forgejo-instance-from-ai-web-crawlers/index.html
#HackerNews #AIProtection #Forgejo #WebCrawlers #Cybersecurity #TechTips
How a web crawler is supposed to work:
1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content
How AI/LLM training crawlers work:
1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.
How a web crawler is supposed to work:
1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content
How AI/LLM training crawlers work:
1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.