Facebook's Fascination with My Robots.txt
https://blog.nytsoi.net/2026/02/23/facebook-robots-txt
#HackerNews #Facebook #RobotsTxt #SocialMedia #TechNews #WebCrawlers
#Tag
Facebook's Fascination with My Robots.txt
https://blog.nytsoi.net/2026/02/23/facebook-robots-txt
#HackerNews #Facebook #RobotsTxt #SocialMedia #TechNews #WebCrawlers
This is rather worrying!
@piccalilli My (admittedly cynical) assumption is that they will still hoover up anything they can find on your site, they’re just no longer showing it to anyone outside Google.
How a web crawler is supposed to work:
1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content
How AI/LLM training crawlers work:
1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.
How a web crawler is supposed to work:
1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content
How AI/LLM training crawlers work:
1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.