Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
#Digital 鈿擄笍 #Vagabond 馃
@beet_keeper@digipres.club  路  activity timestamp last month

I started crawling 80,000,000 web pages for a personal project August 1 after an initial trial "PING-ing" 40,000,000 of them.

Cloudfare limiting is triggered at least twice a day, and almost immediately if I up my own rates.

Currently at 16% as of September 17.

Has rate limiting severely limited web archiving capability in the advent of AI?

Should I be crawling from multiple IP addresses?

What volume pages are likely to be crawled in day-to-day archiving processes?

#WebArchiving #Digipres

  • Copy link
  • Flag this post
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About 路 Code of conduct 路 Privacy 路 Users 路 Instances
Bonfire social 路 1.0.0-rc.3.21 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login