Post · bonfire.cafe

I started crawling 80,000,000 web pages for a personal project August 1 after an initial trial "PING-ing" 40,000,000 of them.

Cloudfare limiting is triggered at least twice a day, and almost immediately if I up my own rates.

Currently at 16% as of September 17.

Has rate limiting severely limited web archiving capability in the advent of AI?

Should I be crawling from multiple IP addresses?

What volume pages are likely to be crawled in day-to-day archiving processes?

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

Automatic federation enabled