Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that binary logs for PITR are in a shared directory #541

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

mattlord
Copy link
Collaborator

@mattlord mattlord commented Mar 12, 2024

When executing the vtctldclient RestoreFromBackup --restore-to-pos <value> command, the vttablet process in the vttablet container within the vttablet pod — in the RestoreFromBackup tabletmanager RPC — restores the full backup within the VTDATAROOT (specifically /vt/vtdataroot/vt_<tabletUID>/ for the mysql data) that is shared by all containers within the pod using the configured backup engine (e.g. xtrabackup). It orchestrates that in conjunction with the mysqlctld process that's running inside the mysqld container within the same vttablet pod. In the end there is a running mysqld instance inside the mysqld container that is from the restored full backup. Then once the full backup is in place and the mysqld process is running the vttablet process uses the OS tmp dir of /tmp to restore the binary logs from the backup — via the builtinbackupengine — for subsequent application and /tmp is not a shared mount point within the pod so when mysqlbinlog subsequently tries to read them from within the mysqld container it cannot find them in its container's /tmp directory and it fails with an error.

vtctldclient

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/cmd/vtctldclient/command/backups.go#L227-L263

vtctld[server]

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/vtctl/grpcvtctldserver/server.go#L3260-L3286

vttablet

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/vttablet/tabletmanager/rpc_backup.go#L173-L193

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/vttablet/tabletmanager/restore.go#L191-L273

mysqlctld (rather than mysqlctl, and which runs in the mysql container)

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/mysqlctl/backup.go#L364-L487

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/mysqlctl/builtinbackupengine.go#L995-L1060

vttablet builtinbackupengine

https://github.com/vitessio/vitess/blob/3ae5cf7e690e560dd5630119215bcc3f5ecf31c8/go/vt/mysqlctl/builtinbackupengine.go#L995-L1060

Related issues and PRs:

@mattlord mattlord force-pushed the point-in-time-recovery branch 5 times, most recently from 1824f91 to 0562d49 Compare March 13, 2024 01:15
@mattlord mattlord requested review from shlomi-noach, frouioui and GuptaManan100 and removed request for GuptaManan100 March 13, 2024 01:15
@mattlord mattlord changed the title Ensure that binary logs for PITR are restored to a shared location Ensure that binary logs for PITR are use a shared location Mar 13, 2024
@mattlord mattlord changed the title Ensure that binary logs for PITR are use a shared location Ensure that binary logs for PITR are use a shared directory Mar 13, 2024
@mattlord mattlord changed the title Ensure that binary logs for PITR are use a shared directory Ensure that binary logs for PITR are in a shared directory Mar 13, 2024
@mattlord mattlord force-pushed the point-in-time-recovery branch from 0562d49 to d259730 Compare March 13, 2024 02:25
Copy link
Collaborator

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Should we also provide the flag in yaml files?

@mattlord
Copy link
Collaborator Author

Nice! Should we also provide the flag in yaml files?

Yeah. I think this does it. e2e5e8b

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Mar 17, 2024

Yeah. I think this does it.

How is the value being set, and to what specific value?

@mattlord
Copy link
Collaborator Author

How is the value being set, and to what specific value?

The user would specify the flag and value in their cluster yaml definition using the extraFlags parameter, just as they do for mysqld flags, e.g. If they don't specify a value then we enforce the default within the operator.

Copy link
Collaborator

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like it's feature complete and can be taken out of Draft?

@mattlord mattlord marked this pull request as ready for review March 18, 2024 12:18
@mattlord mattlord requested a review from GuptaManan100 March 18, 2024 12:18
@shlomi-noach shlomi-noach requested a review from a team March 20, 2024 06:28
@mattlord
Copy link
Collaborator Author

How is the value being set, and to what specific value?

The user would specify the flag and value in their cluster yaml definition using the extraFlags parameter, just as they do for mysqld flags, e.g. If they don't specify a value then we enforce the default within the operator.

The flag ended up being for vttablet and vtbackup, not mysqlctld (although vtbackup is a modified mysqlctld). I will leave the mysqlctld extra flags support though as that may come to be useful.

Comment on lines 90 to 94
// Ensure that binary logs are restored to/from a location that all containers
// in the pod can access if no location was explicitly provided.
if _, ok := vttabletAllFlags["builtinbackup-incremental-restore-path"]; !ok {
vttabletAllFlags["builtinbackup-incremental-restore-path"] = vtDataRootPath
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the path specified in --builtinbackup-incremental-restore-path is not accessible to all containers in the pod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing it is up to the user to set the same value on all components too? mysqlctl, vttablet and vtbackup

Copy link
Collaborator Author

@mattlord mattlord Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing that happens now to every user. It doesn't work. PITR does not generally work in the operator today.

@@ -29,6 +29,7 @@ const (
vtRootInitScript = `set -ex
mkdir -p /mnt/vt/bin
cp --no-clobber /vt/bin/mysqlctld /mnt/vt/bin/
cp --no-clobber $(command -v mysqlbinlog) /mnt/vt/bin/ || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory /mnt/.../ is shared across all the containers in the pod I am assuming? In which case, this line would resolve what you wrote in the PR's description:

/tmp is not a shared mount point within the pod so when mysqlbinlog subsequently tries to read them from within the mysqld container it cannot find them in its container's /tmp directory and it fails with an error

Copy link
Collaborator Author

@mattlord mattlord Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply about copying the mysqlbinlog binary from the vitess/lite container image to the mysqlctld/vtbackup container (if it's not already there), as it looks like we'll need to keep that around in the lite image because the MySQL images do not contain that binary and it's needed for PITR.

@mattlord mattlord force-pushed the point-in-time-recovery branch 4 times, most recently from 4abb9ce to a2d80d2 Compare March 27, 2024 21:45
@mattlord mattlord force-pushed the point-in-time-recovery branch from a2d80d2 to 63abf80 Compare March 27, 2024 22:08
mattlord and others added 2 commits March 27, 2024 23:26
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@mattlord mattlord force-pushed the point-in-time-recovery branch from c9793dd to e069303 Compare April 10, 2024 18:32
This includes mysqld (of course) and mysqlbinlog

But it does NOT include xtrabackup

Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord force-pushed the point-in-time-recovery branch from e069303 to 2aeb36b Compare April 10, 2024 20:30
@mattlord mattlord force-pushed the point-in-time-recovery branch from ae78d72 to 5e7df2b Compare July 12, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants