LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)

"The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal."

ARTICLE:https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower

FULL PDF:https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf

#FediPact #meta #threads#AI

@FediPact
> Rather than scraping from sites directly, many of the addresses on Meta’s leaked list belong to Content Delivery Networks (CDNs) that are used by websites to cache and store information to improve site performance.

This is a critical point. An instance or website can defend itself in numerous different ways, including actively adversarial strategies, and still succumb to extraction - if they're using Cloudflare

cc: @subMedia