Discussion
Loading...

#Tag

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Piccalilli
Piccalilli
@piccalilli@front-end.social  ·  activity timestamp 5 days ago

This is rather worrying!

https://www.alanwsmith.com/en/37/wa/jz/s1/

Fix Your robots.txt or Your Site Disappears from Google

a post from alan w. smith
Leonardo Di Ottio
Leonardo Di Ottio
@LeonardoDiOttio@mastodon.social replied  ·  activity timestamp 5 days ago

@piccalilli My (admittedly cynical) assumption is that they will still hoover up anything they can find on your site, they’re just no longer showing it to anyone outside Google.

#Google #RobotsTxt #SEO

  • Copy link
  • Flag this comment
  • Block
Nizar Kerkeni 🇹🇳 نزار القرقني boosted
t04d8b
t04d8b
@t04d8b@social.lol  ·  activity timestamp last month

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 🔹

  • Copy link
  • Flag this post
  • Block
t04d8b
t04d8b
@t04d8b@social.lol  ·  activity timestamp last month

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 🔹

  • Copy link
  • Flag this post
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1-beta.35 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct