@jonny this entire thread is amazing, top-notch tool development for a noble cause.
@ #academia : if you feel desperate about the wholesale breakdown of science under the current US administration, consider helping out with #SciOp: Decentralized backups of datasets under threat, in a torrent swarm.

Have a disused laptop or Raspi? Make it part of the swarm and take the data outside the US (or any) administration's grasp!

#scienceunderattack #bittorrent #decentralizedbackup #libraryofcongress

check this out if you want to help preserve the archive of "most local newspapers through most of US history" that had its funding pulled, even if you only have a couple dozen gigabytes to spare, you can
a) make an account on https://sciop.net/ ,
b) run a qbittorrent instance, go to preferences>web ui and click enable,

and just do this

python -m pip install sciop-scraping
sciop-cli login
sciop-cli client add
sciop-scrape chronicling-america --next

and that's all.

if you have spare storage, you can sort by seeders, ascending, and start from there. or subscribe to the rss feed and auto-download it.

this is an archive funded by the library of congress (threatened) and the national endowment for the humanities (actively being eliminated). the alternative is that an enormous amount of US history that doesn't percolate into history books is owned and operated by lexisnexis and other for-profit data brokers.

this is the first run of some tooling to lower the bar for participatory scraping - at the moment, the archive is still online, and the scraper will automatically embed a webseed URL in the created torrent. so even if you don't have space to seed, you can scrape the data, upload the torrent, and make it possible for waiting peers to become mirrors

#sciop

@jonny this entire thread is amazing, top-notch tool development for a noble cause.
@ #academia : if you feel desperate about the wholesale breakdown of science under the current US administration, consider helping out with #SciOp: Decentralized backups of datasets under threat, in a torrent swarm.

Have a disused laptop or Raspi? Make it part of the swarm and take the data outside the US (or any) administration's grasp!

#scienceunderattack #bittorrent #decentralizedbackup #libraryofcongress

if anyone is bored or wants to contribute to gray archive tech, i've done all the hard parts around this, but here is a set of things you could do to make "practical repack mutability for torrents" happen: https://codeberg.org/Safeguarding/-/projects/19508

so we have an indexer and a cli tool that can interact with clients. if we added one link table that allowed people to declare relationships between torrents - like e.g. if one replaces another, or is an updated version of, successor to, and so on, then one could plug in the pieces so the cli periodically checks for updated versions of torrents and swaps them out in the local client.

this could be your name in the credits: "what if bittorrent trackers weren't just static repositories of torrent files and generic peer connection machines but could facilitate socio-technological resolutions to basic problems in the protocol."

#bittorrent #sciop

if anyone is bored or wants to contribute to gray archive tech, i've done all the hard parts around this, but here is a set of things you could do to make "practical repack mutability for torrents" happen: https://codeberg.org/Safeguarding/-/projects/19508

so we have an indexer and a cli tool that can interact with clients. if we added one link table that allowed people to declare relationships between torrents - like e.g. if one replaces another, or is an updated version of, successor to, and so on, then one could plug in the pieces so the cli periodically checks for updated versions of torrents and swaps them out in the local client.

this could be your name in the credits: "what if bittorrent trackers weren't just static repositories of torrent files and generic peer connection machines but could facilitate socio-technological resolutions to basic problems in the protocol."

#bittorrent #sciop

Ed Summers
Ed Summers boosted

I just closed da loop on the first example of the distributed, bittorrent backed, "archiveteam-warrior except anyone can do it and not necessarily have uploading to archive org being the only means of sharing". .. thing. And it's pretty good. In a couple months we've slowly spread our octopus tentacles over the whole bittorrent/scraping stack and it's cool to start seeing the pieces connect.

We have a basic problem: someone recognizes a dataset will disappear, then we would have to have a whole convoluted forum process where we post what we're taking, blah blah heroism, volunteerism, solving hard problems on the fly, love this group. Except that sometimes only half the data would show up, or it would end up with one person needing to seed 20TB from their home connection. So uneven labor distribution

Anyway once we get the frontend and docs writ we'll have a sketch of an idea: what if not just distributing the data, we also distributed the scraping. making it possible for deduplicated, distributed crawl tasks that feed automatically back into feeds of torrents. Once we convince arvindn to make webtorrent enabled by default, we've got some cool news with the replayweb.page folks to share. Along with being able to share the work and mutually validating snapshots of the web, that's your distributed wayback machine.

Then it's time to start the federation part where it gets really interesting - making AP groups that can mutually coordinate archival work and publish it in private, overlapping torrent bubbles

Edit: here is the thing in code form, docs are the cherry on top: https://codeberg.org/Safeguarding/-/projects/18027

#sciop#Bittorrent#BittorrentIsStillHappening#ProtocolEvolutionEvenIfItsALittleCrusty#SeriouslyTheresSoMuchOpenSpaceInBittorrent

I just closed da loop on the first example of the distributed, bittorrent backed, "archiveteam-warrior except anyone can do it and not necessarily have uploading to archive org being the only means of sharing". .. thing. And it's pretty good. In a couple months we've slowly spread our octopus tentacles over the whole bittorrent/scraping stack and it's cool to start seeing the pieces connect.

We have a basic problem: someone recognizes a dataset will disappear, then we would have to have a whole convoluted forum process where we post what we're taking, blah blah heroism, volunteerism, solving hard problems on the fly, love this group. Except that sometimes only half the data would show up, or it would end up with one person needing to seed 20TB from their home connection. So uneven labor distribution

Anyway once we get the frontend and docs writ we'll have a sketch of an idea: what if not just distributing the data, we also distributed the scraping. making it possible for deduplicated, distributed crawl tasks that feed automatically back into feeds of torrents. Once we convince arvindn to make webtorrent enabled by default, we've got some cool news with the replayweb.page folks to share. Along with being able to share the work and mutually validating snapshots of the web, that's your distributed wayback machine.

Then it's time to start the federation part where it gets really interesting - making AP groups that can mutually coordinate archival work and publish it in private, overlapping torrent bubbles

Edit: here is the thing in code form, docs are the cherry on top: https://codeberg.org/Safeguarding/-/projects/18027

#sciop#Bittorrent#BittorrentIsStillHappening#ProtocolEvolutionEvenIfItsALittleCrusty#SeriouslyTheresSoMuchOpenSpaceInBittorrent