Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly achieve asymmetric multi-site backups #939

Open
Tectu opened this issue Jul 23, 2024 · 3 comments
Open

How to properly achieve asymmetric multi-site backups #939

Tectu opened this issue Jul 23, 2024 · 3 comments

Comments

@Tectu
Copy link

Tectu commented Jul 23, 2024

Let there be three hosts: A, B and C. All of them use ZFS.

  • A is the primary production server.
  • B is an on-site backup destination host with plenty of storage.
  • C is a remote backup destination host with not so much storage.

What I'd like to achieve is to backup everything from host A to host B. This is what is working nicely so far.
On top of that, I'd like to have host B backup some datasets to host C.
What is the proper, correct way of achieving this?

On host A I use the ZFS property syncoid:sync to select which datasets to push to host B (or rather, which datasets not to push).

I basically see two options to achieve this:

  1. On host B, also use the syncoid:sync property to select which datasets to push to host C.
  2. On host A, set the syncoid:sync property to either B or B,C selectively.

Specifically I'd like to ask:

  1. Are the two options I listed above correct / reasonable?
  2. Are there any other options worth considering?
  3. Which of the two options makes more sense? Personally, I feel like option 1. is better for my particular scenario/taste as I don't have to mess with the hostname resolution briefly mentioned in the sanoid readme.
@jimsalterjrs
Copy link
Owner

Personally, I just put things in a hierarchical order that works for me, and use recursion.

In other words, put the stuff to back up to C below the stuff to back up to B. So, you have a data structure on host A like poolA/thingsforB/thingsforC. Then, root@B:~# syncoid -r root@A:poolA/thingsforB poolB/thingsforB, followed by root@C:~# syncoid -r root@B:poolB/thingsforB/thingsforC poolC/thingsforC.

If for some reason you can't possibly organize your datasets hierarchically, well, you already found the sync property. :)

The biggest thing that usually trips people up about n-way replication is that the sync snapshots don't really work well anymore, because they get replicated to foreign machines that don't know what to do with them (and therefore they pile up and run you out of disk space, if you don't manually keep up with them).

Generally, if you're doing A->B and A->C (or any other topology of more-than-two-way replication) you should be using --no-sync-snap to keep syncoid from creating its own ephemeral snapshots, and just rely on sanoid on the source machine to provide regular snapshots.

Obviously, you do need to monitor your ongoing replication to make sure you don't fall out of sync--otherwise, if you fail to replicate too many times in a row, you eventually wind up with no common snapshots with the source.

@Tectu
Copy link
Author

Tectu commented Jul 23, 2024

Thank you for the quick answer on this!

The hierarchical approach does not work for some of my situations.

Generally, if you're doing A->B and A->C [...]

The thing I actually want to do is A -> B and B -> C. I do not want A to know anything about C.
Just with the added notion that I want to selectively pick which datasets that A pushed to B are pushed to C. In other words, C is only a "partial copy" of B.

I'm already making use of --no-sync-snap.

Is my understanding correct that therefore I should go with option 1 outlined in my opening post?
Is that a "common" approach?

@Tectu Tectu changed the title How to properly achieve asymetric multi-site backups How to properly achieve asymmetric multi-site backups Jul 26, 2024
@Pajkastare
Copy link

I don't know about "common" or not, but it seems very fragile to mess with the syncoid:sync property on the backup server - Are you really confident you know exactly which such properties to keep (and which to delete) if you need to restore from backup? And does nothing really change over time in your zpools?

You have not presented any reason why not to just cherry-pick the selected filesystems to replicate from B to C, and run syncoid for each of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants