Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Cory Doctorow
@pluralistic@mamot.fr  ·  activity timestamp last week

Hey #rsync experts! I have a dilemma. I do a daily backup to an external disk, but some of the files (VM containers) are HUGE (100GB), and because they are zero-padded, their size is always the same. By the time the rsync for these files finishes, they have wildly different mod dates from the originals on my HDD. I can set --modify-wiindow to $BIGNUM, but then all the regular files *don't* back up.

Maybe the answer is to touch all the big files (originals and backup) at the end of the rsync?

  • Copy link
  • Flag this post
  • Block
Chris Markiewicz
@effigies@mas.to replied  ·  activity timestamp last week

@pluralistic `rsync -az`? `-a` preserves all filesystem metadata. If there are significant blocks of all zeros of `-z` could reduce the transfer time, but could also end up bottlenecking on the real data.

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@effigies Thanks - the zero-padding is pre-encryption, so the ciphertext at rest is a bunch of random numbers, not a string of zeroes.

  • Copy link
  • Flag this comment
  • Block
Chris Markiewicz
@effigies@mas.to replied  ·  activity timestamp last week

@pluralistic Right. Makes sense.

  • Copy link
  • Flag this comment
  • Block
HankB
@HankB@fosstodon.org replied  ·  activity timestamp last week

@pluralistic
I am not an rsync expert though I used it for years for backups. I'm not using ZFS send/receive for backups on Debian hosts. rsync backs up files. ZFS send/receive backs up filesystems, sending only blocks that have changed. And a result of the copy on write behavior is that snapshots are nearly free (in terms of execution time) and identifying changed blocks likewise. The cost in terms of disk space is the changed blocks.

1/

  • Copy link
  • Flag this comment
  • Block
HankB
@HankB@fosstodon.org replied  ·  activity timestamp last week

@pluralistic

ZFS "knows" which blocks are changed and need not compare files on local and remote so it sidesteps that issue.

What I cannot assert is how effective this will be for your use case. It depends on the disk write patterns of your VMs. If you want to explore in further depth, I suggest contacting Klara Systems (with whom I have no relationship other than listening to there people in various podcasts.)

  • Copy link
  • Flag this comment
  • Block
Daniel Lakeland
@dlakelan@mastodon.sdf.org replied  ·  activity timestamp last week

@pluralistic
I think you have a bigger issue than the dates. If a VM is running and you dont have something like btrfs to snapshot it, and you just do a backup, you are not at all guaranteed to have a functional restorable backup that makes sense by the end of it... The master filesystem is changing during the sync. I think rsync will work hard but it doesn't know if a chunk of the filesystem changes after it thinks its synced that chunk...

  • Copy link
  • Flag this comment
  • Block
Matt Panaro
@eigen@mattstodon.panar.ooo replied  ·  activity timestamp last week

@pluralistic I feel like you would've already thought of/tried this: but if it didn't work, then that means I've been using rsync wrong and I'd like to know about it: if you pass the `-a` flag to rsync (or go dig out the specific flag from all the ones 'a' turns on), that should make rsync match the timestamps on the new file to the timestamps on the old file.

  • Copy link
  • Flag this comment
  • Block
Matt Panaro
@eigen@mattstodon.panar.ooo replied  ·  activity timestamp last week

@pluralistic here's mod times and access times ('a' includes 't' but not 'U'):

--times, -t preserve modification times
--atimes, -U preserve access (use) times

  • Copy link
  • Flag this comment
  • Block
sash@noc.social
@sash@noc.social replied  ·  activity timestamp last week

@pluralistic Not sure if I understand the question correctly. If you want to preserve the timestamps and such, use -a. This will fix the copy on change issue (timestamp changes on source->copy) if you don't want to rely on timestamps, use -c (slower, reads source and copy)
My personal rsync command is:
>> rsync -abzO --partial-dir=.rsync-partial --force --ignore-errors --delete --backup-dir=/backup/archive/`date +%Y-%m-%d` source/dir /backup/snapshot/<<<

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@sash Thanks for this. Just read the manpage for -a and I think what I really need is just -t (which is bundled into -a).

  • Copy link
  • Flag this comment
  • Block
Jef Poskanzer
@jef@mastodon.social replied  ·  activity timestamp last week

@pluralistic @sash Yeah -t. Also if you ever use scp directly instead of via rsync, or even do local cp, these aliases are good:
alias scp scp -p
alias cp cp -p
I also like the -i flag on cp and mv.

  • Copy link
  • Flag this comment
  • Block
theincredibleholg
@theincredibleholg@mastodon.social replied  ·  activity timestamp last week

@pluralistic perhaps -c or -cc algo to force a better check than size and date on them can help. Idk.

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@theincredibleholg That's a possibility, thanks.

  • Copy link
  • Flag this comment
  • Block
François Galea
@zerkman@pouet.chapril.org replied  ·  activity timestamp last week

@pluralistic Which filesystem is it on ? Using btrfs (and possibly XFS or ZFS) you could create a read-only snapshot, then rsync that snapshot instead of the original path, then remove the snapshot.

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@zerkman It's ext4

  • Copy link
  • Flag this comment
  • Block
François Galea
@zerkman@pouet.chapril.org replied  ·  activity timestamp last week

@pluralistic Seems you can make "rsync snapshots" using a tool such as timeshift. No idea if it will work with your huge files though.

https://github.com/linuxmint/timeshift

GitHub

GitHub - linuxmint/timeshift: System restore tool for Linux. Creates filesystem snapshots using rsync+hardlinks, or BTRFS snapshots. Supports scheduled snapshots, multiple backup levels, and exclude filters. Snapshots can be restored while system is running or from Live CD/USB.

System restore tool for Linux. Creates filesystem snapshots using rsync+hardlinks, or BTRFS snapshots. Supports scheduled snapshots, multiple backup levels, and exclude filters. Snapshots can be re...
  • Copy link
  • Flag this comment
  • Block
Julien Goodwin
@LapTop006@aus.social replied  ·  activity timestamp last week

@pluralistic there's a million different ways to deal with this, one of the easiest is to use an underlying filesystem that can do snapshots (in general that's ZFS although there are others) and then just sync the snapshot.

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@LapTop006 Thanks. I think changing the fs is out of scope here.

  • Copy link
  • Flag this comment
  • Block
Julien Goodwin
@LapTop006@aus.social replied  ·  activity timestamp last week

@pluralistic there may be a way to achieve similar with whatever VM system you're using, but that'd probably be even more of a mess

  • Copy link
  • Flag this comment
  • Block
Matt
@mattw@mast.hpc.social replied  ·  activity timestamp last week

@pluralistic Just checking you have sparse file support on? --sparse or -S. That may speed things up or handle those files better.

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@mattw Yup, I've got -S on.

  • Copy link
  • Flag this comment
  • Block
Oblomov
@oblomov@sociale.network replied  ·  activity timestamp last week

@pluralistic @mattw what format are the VM containers in? If they're raw images, it might be worth converting them into qcow2 so that the padding isn't there in the first place, which might both eliminate the issue of the size staying constant and even speed up change detection.

  • Copy link
  • Flag this comment
  • Block
Matt
@mattw@mast.hpc.social replied  ·  activity timestamp last week

@pluralistic OK, my memory suggested support for sparse was fairly new to rsync, so wanted to check if it was on. I thought rsync was supposed to copy the file then match metadata once complete. Seems odd that it doesn’t, that would make it difficult to use as a synchronisation tool if it's out of sync once complete.. Don't suppose you could dump the flags?

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@mattw Looks like -t will sync up timestamps after the xfer.

  • Copy link
  • Flag this comment
  • Block
Matt
@mattw@mast.hpc.social replied  ·  activity timestamp last week

@pluralistic Ahh, that will do it.. Hah, I've been using -avz as my default for so long I'd forgotten it was a flag.

  • Copy link
  • Flag this comment
  • Block
Evan Prodromou
@evan@cosocial.ca replied  ·  activity timestamp last week

@pluralistic I am not an expert, and I personally hate it when I ask a question and people answer a different question, BUT I'm going to do that anyway:

Could you mount the VM container image(s) on source and destination and rsync their filesystems separately?

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@evan For complicated reasons that I don't want to do that. Thanks, though.

  • Copy link
  • Flag this comment
  • Block
Evan Prodromou
@evan@cosocial.ca replied  ·  activity timestamp last week

@pluralistic I am stealing this response.

  • Copy link
  • Flag this comment
  • Block
Yrrsinn@GPN DECT 6300
@yrrsinn@chaos.social replied  ·  activity timestamp last week

@pluralistic excluding these big files and setting up a separate run of your backup script with them is (not) an option?

  • Copy link
  • Flag this comment
  • Block
Cory Doctorow
@pluralistic@mamot.fr replied  ·  activity timestamp last week

@yrrsinn I think maybe this could work, but I'm not sure.

Run one would exclude vmcontainer*

Run two would only apply to vmcontainer, using mod-dates, not size, and afterwards touch vmcontainer* and also /media/doctorow/backupdrive/vmcontainer* so that they all have the same mod-date, and only vmcontainers that were changed between this and the next run would get backed up?

?

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.0-rc.3.5 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login