That would be great to know.
Probably this requires blocking at every possible level to be sure (eg. robots.txt, user agent, IP ranges...) And if some bots are using ActivityPub for scraping we could also block their HTTP signature public keys?
We've prototyped a system that builds on Bonfire's circles/boundaries to define and enforce blocks at the instance, user, and post levels. Would love feedback and suggestions to make it stronger!