📰 News publishers limit Internet Archive access due to AI scraping concerns
(⌐■-■) Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
Scraping for AI training may or may not be legal. But the effort crawlers put into evading detection and blocking is a smoking gun, an admission this scraping is not fair.
A website appears to be scraping hashtags and creating AI articles, and then replying to the OG post
It stole one of my posts (https://oldfriends.live/@paul/114770093020700675) for its AI created article then spammed me from @s00laiman
It's doing it with #HashTagGames tags and other trending hashtags.
https://www.trend247daily.com/articles
Article created from scraped post: https://www.trend247daily.com/article/mastering-the-art-of-the-productive-day-wake-up-look-busy-go-to-bed
See this thread above, unless the AI content spammer deletes its reply and breaks the thread.
I don't know where it is getting its content, from it's Mastodon Account ( @s00laiman ) account, rss, or the API. If it has an application I would hope @staff and @moderation would shut it down from scraping the API.
The web-scraping is aggressive not just to hoard training data, but also to keep other AI bots from doing the same.
They're not satisfied with stealing all your content, they also want exclusivity by any means necessary.
The web-scraping is aggressive not just to hoard training data, but also to keep other AI bots from doing the same.
They're not satisfied with stealing all your content, they also want exclusivity by any means necessary.