Last week trump announced plans to "review" 8 Smithsonian museums. Today he doubled down, very explicit about the intent to revise history to reflect the ethno-nationalist fantasy of US history.

You can do something about that! We are backing up the digital archives of those museums on sciop: https://sciop.net/tags/smithsonian

You can take direct action to preserve the historical artifacts the right wants to destroy:

1) you can download a copy and seed it, every seeder counts. Subscribe to the smithsonian RSS feed to auto-download torrents as they are scraped.

2) we have also written a crawler connected to sciop that distributes the scraping work, and automatically creates and uploads a validated torrent that piggybacks off the s3 bucket as a webseed source while it lasts (instructions in reply).

The data from the 8 threatened museums is on the order of ~10 TB, and we have split it up by jpg/tif so people without much spare storage can join in on the jpg's at least. The full contents of the public smithsonian bucket is ~700TB, so if we want to have a full independent copy we'll need lots more seeders.

All this code is being written flat out, on the run, as it's needed by volunteers with exactly zero resources, so it's not polished or well documented, and if you're interested in helping damp the flames of the book burning by contributing to any of the code or docs, we'd love to have you.

#Smithsonian#Sciop

the first round of scraping is almost done, but if there are any remaining parts to download by the time you see this, then the scraper works like

  • Sciop has an API for claiming a dataset/part, so the CLI queries the api for the next unclaimed thing to download
  • We write scrapers for each set of things that need to get scraped, hopefully refining the process so we need to write less each time
  • You download, validate, and pack the data, and then create a torrent from it automatically
  • The sciop-cli can manage your login to sciop as well as a bittorrent client running on the machine (currently qbittorrent only, but prs welcome for other client integrations), so if you have logged in, then the torrent will automatically be uploaded to sciop and added to your client.

so...

python -m pip install sciop-scraping sciop-cli
sciop-cli login
sciop-cli client login
sciop-scrape smithsonian --next

as i said, all this is written on the fly with very little spare time for good structure and docs, so if you would like to pitch in making things work better, documenting things, or adding a crawler of your own and calling a scrape quest, you are more than welcome - sciop isn't just for what we target, it's a place for any threatened information and anyone can participate.