This is rather worrying!
@piccalilli My (admittedly cynical) assumption is that they will still hoover up anything they can find on your site, they’re just no longer showing it to anyone outside Google.
This is rather worrying!
@piccalilli My (admittedly cynical) assumption is that they will still hoover up anything they can find on your site, they’re just no longer showing it to anyone outside Google.
How a web crawler is supposed to work:
1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content
How AI/LLM training crawlers work:
1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.
How a web crawler is supposed to work:
1. Reads /robots.txt
2. Parses robots.txt and honors User-Agent | Allow / Disallow designations
3. Returns periodically to retrieve permitted content
How AI/LLM training crawlers work:
1. Crawls entire website
2. Reads /robots.txt
3. Returns 10 minutes later
4. GOTO 1.