I just closed da loop on the first example of the distributed, bittorrent backed, "archiveteam-warrior except anyone can do it and not necessarily have uploading to archive org being the only means of sharing". .. thing. And it's pretty good. In a couple months we've slowly spread our octopus tentacles over the whole bittorrent/scraping stack and it's cool to start seeing the pieces connect.
We have a basic problem: someone recognizes a dataset will disappear, then we would have to have a whole convoluted forum process where we post what we're taking, blah blah heroism, volunteerism, solving hard problems on the fly, love this group. Except that sometimes only half the data would show up, or it would end up with one person needing to seed 20TB from their home connection. So uneven labor distribution
Anyway once we get the frontend and docs writ we'll have a sketch of an idea: what if not just distributing the data, we also distributed the scraping. making it possible for deduplicated, distributed crawl tasks that feed automatically back into feeds of torrents. Once we convince arvindn to make webtorrent enabled by default, we've got some cool news with the replayweb.page folks to share. Along with being able to share the work and mutually validating snapshots of the web, that's your distributed wayback machine.
Then it's time to start the federation part where it gets really interesting - making AP groups that can mutually coordinate archival work and publish it in private, overlapping torrent bubbles
Edit: here is the thing in code form, docs are the cherry on top: https://codeberg.org/Safeguarding/-/projects/18027
#sciop#Bittorrent#BittorrentIsStillHappening#ProtocolEvolutionEvenIfItsALittleCrusty#SeriouslyTheresSoMuchOpenSpaceInBittorrent