AI scrapers request commented scripts
https://cryptography.dog/blog/AI-scrapers-request-commented-scripts/
#HackerNews #AI #scrapers #commented #scripts #technology #automation
#Tag
AI scrapers request commented scripts
https://cryptography.dog/blog/AI-scrapers-request-commented-scripts/
#HackerNews #AI #scrapers #commented #scripts #technology #automation
I just published a blog post summing up my most pertinent thoughts about dealing with badly-behaved web-scraping bots:
https://cryptography.dog/blog/AI-scrapers-request-commented-scripts/
It isn't exactly a Hallowe'en-themed article, but today is the 31st and the topic is concerned with pranking people who come knocking on my website's ports, so it's somewhat appropriate.
#infosec #bots #halloween #scrapers #AI #someMoreHashtagsHere
Since so many people are boosting this thread I think I'll take the opportunity to mention that I'm available for hire on a part-time or contract basis.
Feel free to reach out if you like my ideas about computer-related topics and have both the budget and need of someone who has such ideas.
I can be reached by Fediverse DM or the contact form on my website:
I just published a blog post summing up my most pertinent thoughts about dealing with badly-behaved web-scraping bots:
https://cryptography.dog/blog/AI-scrapers-request-commented-scripts/
It isn't exactly a Hallowe'en-themed article, but today is the 31st and the topic is concerned with pranking people who come knocking on my website's ports, so it's somewhat appropriate.
#infosec #bots #halloween #scrapers #AI #someMoreHashtagsHere
The developer of Bear Blog on the latest wave of scrapers he faced.
They've depleted all human-created writing on the internet, and are becoming increasingly ravenous for new wells of content.
[...]
I'm still speculating here, but I think [mobile] app developers have found another way to monetise their apps by offering them for free, and selling tunnel access to scrapers.
The developer of Bear Blog on the latest wave of scrapers he faced.
They've depleted all human-created writing on the internet, and are becoming increasingly ravenous for new wells of content.
[...]
I'm still speculating here, but I think [mobile] app developers have found another way to monetise their apps by offering them for free, and selling tunnel access to scrapers.
sysadmins/webmasters of fedi:
I am looking for suggestions of which search engine crawlers I should consider permitting in my robots.txt file.
There can definitely be value in having a site indexed by a search engine, but I would like to deliberately exclude all of those which are using the same data to train LLMs and other genAI. More specifically, I would only like to allow those which have an explicit stance against training on others data in this fashion.
Currently I reject everything other than Marginalia (https://marginalia-search.com/). Are there any others I should consider?
sysadmins/webmasters of fedi:
I am looking for suggestions of which search engine crawlers I should consider permitting in my robots.txt file.
There can definitely be value in having a site indexed by a search engine, but I would like to deliberately exclude all of those which are using the same data to train LLMs and other genAI. More specifically, I would only like to allow those which have an explicit stance against training on others data in this fashion.
Currently I reject everything other than Marginalia (https://marginalia-search.com/). Are there any others I should consider?
Scraping for AI training may or may not be legal. But the effort crawlers put into evading detection and blocking is a smoking gun, an admission this scraping is not fair.
Website owner? Not keen on the Mellowtel browser library building a botnet of untraceable scrapers from unwitting users who are using a browser plugin that contains Mellowtel? I've raised a GitHub issue for them to explain how much contempt they have for our consent. Join in, politely, make them look like the jerks they are. https://github.com/mellowtel-inc/mellowtel-js/issues/41
Website owner? Not keen on the Mellowtel browser library building a botnet of untraceable scrapers from unwitting users who are using a browser plugin that contains Mellowtel? I've raised a GitHub issue for them to explain how much contempt they have for our consent. Join in, politely, make them look like the jerks they are. https://github.com/mellowtel-inc/mellowtel-js/issues/41
A space for Bonfire maintainers and contributors to communicate