Discussion
Loading...

#Tag

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Nizar Kerkeni 馃嚬馃嚦 賳夭丕乇 丕賱賯乇賯賳賷 boosted
t04d8b
t04d8b
@t04d8b@social.lol  路  activity timestamp 3 weeks ago

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 馃敼

  • Copy link
  • Flag this post
  • Block
t04d8b
t04d8b
@t04d8b@social.lol  路  activity timestamp 3 weeks ago

How a web crawler is supposed to work:

1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content

How AI/LLM training crawlers work:

1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.

#AI #LLM #webCrawlers #robotsTxt 馃敼

  • Copy link
  • Flag this post
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.1-alpha.40 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct