Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Should I be surprised that this works so well? (dump over ssh)
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Should I be surprised that this works so well? (dump over ssh)

raindog308raindog308 Administrator, Veteran
edited November 2016 in General

To my astonishment, this has consistently worked:

# dump -0 -f - / | ssh backup-server.example.com "cd /vault && cat > dump.0"

There's a little more to it in that I specify ssh keys, ports, per-server destination, etc. but that's essentially the command. I've examined the dump file on the backup server and done restores over it, etc. Of course, change level 0 for any level you like.

So, um, why isn't everyone using dump for backing up their VMs? I mean, I'm doing this over the WAN and ending up with a nice full/incremental rotation, I can pull out subsets for restore, it's compressed/secure, I could probably pipe a gpg encryption in there if I wished...

Let's Do Some Tests

Backup source: 1-core, 768M Vultr in Seattle.

Backup destination: DO in NYC. ~28ms.

Backing up:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        30G   11G   18G  39% /
Dump command: dump -0 -f - /
Where compression is listed, -z, -z5, or -z9 

Test Results (SCIENCE!)

Compression Level          Time              Dump Size     Source Server Impact

None                      3m28.997s             11.0G       nil (6% cpu)
2 (default)               3m36.289s              7.0G       noticeable if you look
5                         4m5.272s               6.9G       noticeable even if you don't look
9 (max)                   6m5.260s               6.9G       this is all you're doing

I'm being comical on the source server impact, but for example with level 5 or 9, the load average was well over 2.0, while with level 2 it was usually around 1.0

Destination side barely showed load - sshd was using 6% of CPU.

I was being lazy and using du -sh...I'm sure level 9 is a little smaller than level 5, but not so much that I'd care.

Of course, these are all full backups of the entire OS and in practice, I'd exclude some things (/tmp, etc.) and the daily incrementals would be much, much smaller (files changed since yesterday, compressed).

Given SSD disk speed these days, I think one could do a level 0 less frequently than the traditional once a week...more incrementals to play back but SSDs are fast.

Honorable Intentions

This method seems to meet all my needs/wants:

  • captures everything - default include, not default "remember to include"
  • can do incrementals which saves on my bandwidth
  • can extract a subset of files to restore.
  • encrypted in transit
  • compressible
  • haven't played with encryption yet but that's just a gpg command in the pipeline before ssh
  • doesn't require staging space on the client
  • can run unattended with passwordless ssh.
  • on the backup server, I can move the backups somewhere out of the clients' access once backups are done, and the client doesn't depend on looking at that for an rsync-type incremental (and can't destroy backups with a malicious rsync)

Only negative is that I'd prefer to go over sftp so the client is completely locked down and limited to sftp only. But I can chroot the client into an incoming directory where he can only put files and not escape to do anything else.

I was concerned that maybe going over the WAN would result in broken connections, etc. but I just did half a dozen transcontinental dumps (please, no crude humor) and things seem to be working fine...

Someone stop me before I fall in love with this solution, get it pregnant, and elope to Buffalo.

Comments

  • that's a really good find, just want to know what happens if there is a local file error, network hiccups? did you try to pipe to rsync?

    Thanked by 1Shigawire
  • ehabehab Member
    edited November 2016

    @Neoon said:

    the above gif is giving me a headache ... thank you

  • raindog308 said: So, um, why isn't everyone using dump for backing up their VMs?

    hm? your way is already overcomplicated?

    Just run the entire tar stream by SSH and gpg it:

    Thanked by 1ehab
  • WilliamWilliam Member
    edited November 2016

    https://paster.li/?9d26cd53e1ffbb15#2GssgcudHKu8W/jfdVc416tGMmz8te8XBpqH3Q1scdk=

    @jarland your crap of a free CF account again blocks anyone using the word "ssh" in a post. Get a better CDN provider or pay... a tech forum that dies on pasting any bash excerpt is just... lame.

    Thanked by 2Ole_Juul dnom
  • @William said:
    Just run the entire tar stream by SSH and gpg it:

    can you please give an example cmd.line?

  • i tried but every time i do CF tells me i can't post this and trashes my entire post. I added a link with basic info but will not type it up again.

    https://github.com/willgrz/Autobackup/blob/master/backup.sh

    Thanked by 1ehab
  • I've been doing tar piped into SSH for years. Works fantastic; everything I back up tends to be damn near wire-speed. More info here: damtp.cam.ac.uk/user/ejb48/sshspeedtests.html#newer

    raindog308 said: So, um, why isn't everyone using dump for backing up their VMs?

    Because then we wouldn't have customers who have had a service with us for four years get angry when their "life's work" gets deleted because they missed/ignored the 10+ invoice/overdue emails and then contact us three months later that they never took a backup of their "life's work" in four years.

  • raindog308raindog308 Administrator, Veteran

    @William said:

    raindog308 said: So, um, why isn't everyone using dump for backing up their VMs?

    hm? your way is already overcomplicated?

    Just run the entire tar stream by SSH and gpg it:

    Can I do incrementals with tar?

  • raindog308 said: Can I do incrementals with tar?

    You cannot do incremental with GPG realistically.

  • doghouchdoghouch Member
    edited November 2016

    @William said:
    i tried but every time i do CF tells me i can't post this and trashes my entire post. I added a link with basic info but will not type it up again.

    https://github.com/willgrz/Autobackup/blob/master/backup.sh

    Also, typing MySQL and PHP

    Thanked by 1Ole_Juul
  • geekalotgeekalot Member
    edited November 2016

    @William said:

    raindog308 said: So, um, why isn't everyone using dump for backing up their VMs?

    hm? your way is already overcomplicated?

    Just run the entire tar stream by SSH and gpg it:

    ^This

  • raindog308raindog308 Administrator, Veteran

    William said: You cannot do incremental with GPG realistically.

    I think there are at least couple ways.

    1. Dump
    dump -1 -f - / | gpg... | ssh...
    
    dump -2 -f - / | gpg... | ssh...
    
    etc.
    

    dump's knowledge of what time to base off is a file in /var.

    1. find with an mtime argument to build a list of files that you tar ("everything since the backup time yesterday")

    There's tar -u, but I'd have to think about how that would work over a networked pipeline...mmm, maybe not.

  • Why no love for tar --listed-incremental? :'(

  • AbdussamadAbdussamad Member
    edited November 2016

    dump is obsolete and by using it you are setting yourself up for trouble in the future:

    The fact that dump reads the block device directly...

    The problem is that the filesystem may be changing while you are dumping it. You have this problem with all backup utilities, but with dump it is more serious. When you are using tar, for example, a file could be changed at the time it is read by tar; in that case, that particular file would be corrupted in the resulting tar file. But whereas for tar this is a problem only if it so happens that the file is changed the instant it is read, dump could backup corrupted versions of files if they changed some time before dump attempts to read them.

    http://dump.sourceforge.net/isdumpdeprecated.html

    With ext4 you have pretty aggressive caching of write operations. You are going to lose data if you use dump.

    Thanked by 2deadbeef yomero
  • raindog308raindog308 Administrator, Veteran
    edited November 2016

    Abdussamad said: With ext4 you have pretty aggressive caching of write operations. You are going to lose data if you use dump.

    Yes, I eventually came across that...boo. Note sure if xfsdump suffers the same limitations, and there seems to be considerable difference of opinion.

    Well, on to tar --listed-incremental, or something else that can write an incremental to stdout...

  • ... or dar, then: http://dar.linux.free.fr/doc/presentation.html

    All-in-one solution: archive, compress, diff/incr backup, encrypt :)

    Thanked by 1deadbeef
  • deadbeefdeadbeef Member
    edited November 2016

    Doesn't borg fit your everyone's use case?

    https://github.com/borgbackup/borg

    Thanked by 2vimalware raindog308
  • @deadbeef said:
    Doesn't borg fit your everyone's use case?

    https://github.com/borgbackup/borg

    Someone opened a thread about it IIRC?

    Also, there is duply and duplicity, I use the second one and find it good enough.

    Thanked by 1deadbeef
  • vimalwarevimalware Member
    edited November 2016

    @deadbeef said:
    Doesn't borg fit your everyone's use case?

    https://github.com/borgbackup/borg

    Borg is push-only.

    But boy does it do a good job of de-duplicating similar blocks if you have an intermediate pull-backup machine(I have a $12 2TB kimsufi special that only 'pulls' in from all vps using rSnapshot over ssh-key)

    (Do NOT run anything besides a key-only SSH server on this box.
    Use dropbear-unlocked full-disk LUKS encryption to mitigate against OVH-management-level 'attacks'.)

    Borg then creates incremental snapshots based on changed blocks over ALL vps 'pulls' ; compresses that with LZMA (level 3-6 is good is you have an i3/i5 cpu), encrypts that, and pushes to Time4VPS maxing out my 100mbit kimsufi uplink.

    So, I can afford to lose either the Kimsufi Or time4vps box in this setup.

    Rsnapshot gives you quick filesystem snapshots for the occasional FUBAR.

    Borg serves as the long-term archival tier.

    I haven't setup pruning on Borg, yet.

    Thanked by 2ehab deadbeef
  • raindog308raindog308 Administrator, Veteran
    edited November 2016

    image

    Life is short. I'm just going to use Duplicity like a normal person.

  • New, based on ZFS snapshots and rsync for easy access on secure storage machine and GPG for external storage also via rsync - works well for me:

    https://github.com/willgrz/wBak-Autobackup-ZFS

    This is an extended version of my old backup script, it is based on ZFS snapshots as these are simple accessible to restore - implemented on eg. dm-crypt or other encrypted drive(s) this provides somewhat secure storage.
    
    The actual backup process is a simple rsync of /, then a ZFS snapshot is taken and if configured a GPG encrypted (thus secure to share) file is created & synced local/remote by rsync.
    
    Features:
    - Simple configuration and installation (eg. Ubuntu 16.04 system and ZFS volume + some packages)
    - Supports general excludes as well as single server excludes in server config
    - Allows backup scheduling per server (30 min server A, 5 min server B etc.) with default of 60min (set in general config)
    - rsync is rather reliable
    - ZFS snapshots provide easy access to any data point
    - Incremental backups and can be deduplication enabled (not recommended on ZFS on Linux though)
    - GPG encrypted backups are highly secure and can be shared to anywhere automated (default all 12 hours the latest snapshot), extend script by eg. FTP or Dropbox/whatever...
    
    See INSTALL for configuration.
    
    Thanked by 2ehab deadbeef
Sign In or Register to comment.