#Tag · bonfire.cafe

Hacker News

Facebook's Fascination with My Robots.txt

https://blog.nytsoi.net/2026/02/23/facebook-robots-txt

#HackerNews #Facebook #RobotsTxt #SocialMedia #TechNews #WebCrawlers

Random Notes

Facebook's Fascination with My Robots.txt

Facebook is requesting my robots.txt thousands of times per hour.

Piccalilli

@piccalilli@front-end.social · 2 months ago

This is rather worrying!

https://www.alanwsmith.com/en/37/wa/jz/s1/

Fix Your robots.txt or Your Site Disappears from Google

a post from alan w. smith

Leonardo Di Ottio

@LeonardoDiOttio@mastodon.social · 2 months ago

@piccalilli My (admittedly cynical) assumption is that they will still hoover up anything they can find on your site, they’re just no longer showing it to anyone outside Google.

#Google #RobotsTxt #SEO

Nizar Kerkeni 🇹🇳 نزار القرقني boosted

t04d8b

@t04d8b@social.lol · 3 months ago

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 🔹