Re. Meta scraping copyrighted content on millions of websites, including fedi instances, to train it's AI:
https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower
While mastodon.art blocks a bunch of crawlers and domains (including Meta) both at an IP level and in our robots.txt , and our instance domain isn't in the above list, unfortunately cdn.masto.host IS in the list. As we're hosted with masto.host, this is where all media uploaded to our instance is stored, and thus has been part of Meta's scraping :(