Re. Meta scraping copyrighted content on millions of websites, including fedi instances, to train it's AI:

https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower

While mastodon.art blocks a bunch of crawlers and domains (including Meta) both at an IP level and in our robots.txt , and our instance domain isn't in the above list, unfortunately cdn.masto.host IS in the list. As we're hosted with masto.host, this is where all media uploaded to our instance is stored, and thus has been part of Meta's scraping :(

@berrefjord So that its AI will produce 'better' results when used; as the linked article says, 'AI models require a tremendous amount of data for their training data to work effectively.' Meta's AI produces text and images; 'Use Meta AI assistant to get things done, create AI-generated images for free, and get answers to any of your questions.'
@Curator what do I expect, when they what? Pirated 8 TB of books without all these author's consent. They can, laws are 20 years behind, they will do it.

I hear their dataset is even crumbling, training on it's own data slop, because they ALWAYS need NEW human data to train on like a parasite.

All for the sake to comply to shareholders, rich ppl etc.

There's not really much we can do about this. Masto.host doesn't block anything (it leaves blocks to the discretion of customers), but even public content from our instance that federates to other instances would have been hit if those instances got crawled.

I think I've seen people mention building a lawsuit against them but can't find any info on this right now; I'll update if I do. Meta did just win a lawsuit about doing the same thing with training its AI on millions of books though :/

1 more replies (not shown)