Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore: precheck cluster is empty when first time full restore #45014

Merged
merged 15 commits into from
Jul 28, 2023
6 changes: 3 additions & 3 deletions br/cmd/br/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,12 @@ func printWorkaroundOnFullRestoreError(command *cobra.Command, err error) {
fmt.Println("#######################################################################")
switch {
case errors.ErrorEqual(err, berrors.ErrRestoreNotFreshCluster):
fmt.Println("# the target cluster is not fresh, br cannot restore system tables.")
fmt.Println("# the target cluster is not fresh, cannot restore.")
fmt.Println("# you can drop existing databases and tables and start restore again")
case errors.ErrorEqual(err, berrors.ErrRestoreIncompatibleSys):
fmt.Println("# the target cluster is not compatible with the backup data,")
fmt.Println("# br cannot restore system tables.")
fmt.Println("# you can remove 'with-sys-table' flag to skip restoring system tables")
}
fmt.Println("# you can remove 'with-sys-table' flag to skip restoring system tables")
fmt.Println("#######################################################################")
}

Expand Down
2 changes: 1 addition & 1 deletion br/pkg/backup/push.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ func (push *pushDown) pushBackup(
}
failpoint.Inject("backup-timeout-error", func(val failpoint.Value) {
msg := val.(string)
logutil.CL(ctx).Debug("failpoint backup-timeout-error injected.", zap.String("msg", msg))
logutil.CL(ctx).Info("failpoint backup-timeout-error injected.", zap.String("msg", msg))
resp.Error = &backuppb.Error{
Msg: msg,
}
Expand Down
38 changes: 22 additions & 16 deletions br/pkg/task/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -708,22 +708,6 @@ func runRestore(c context.Context, g glue.Glue, cmdName string, cfg *RestoreConf
return errors.Trace(err)
}

// todo: move this check into InitFullClusterRestore, we should move restore config into a separate package
// to avoid import cycle problem which we won't do it in this pr, then refactor this
//
// if it's point restore and reached here, then cmdName=FullRestoreCmd and len(cfg.FullBackupStorage) > 0
if cmdName == FullRestoreCmd && cfg.WithSysTable {
client.InitFullClusterRestore(cfg.ExplicitFilter)
}
if client.IsFullClusterRestore() && client.HasBackedUpSysDB() {
if err = client.CheckTargetClusterFresh(ctx); err != nil {
return errors.Trace(err)
}
if err = client.CheckSysTableCompatibility(mgr.GetDomain(), tables); err != nil {
return errors.Trace(err)
}
}

if client.IsIncremental() {
// don't support checkpoint for the ddl restore
log.Info("the incremental snapshot restore doesn't support checkpoint mode, so unuse checkpoint.")
Expand Down Expand Up @@ -769,6 +753,28 @@ func runRestore(c context.Context, g glue.Glue, cmdName string, cfg *RestoreConf
}()
}

if isFullRestore(cmdName) {
// we need check cluster is fresh every time. except restore from a checkpoint.
if client.IsFull() && len(checkpointSetWithTableID) == 0 {
if err = client.CheckTargetClusterFresh(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make BR fail once there are any table in the cluster. (Even there isn't intersection of them and the backup archive.) Will this break some use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's intended @benmaoer has confirmed it. so this behaviour will introduced in v7.3. and won't cherry-pick back to any release versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are 2 options:

  1. When restoring the entire cluster from a full backup, it is necessary to check whether the target cluster is an empty cluster, which has no user tables.

  2. When restoring specific tables or databases from the backup, it is necessary to detect if there are any conflicts between the tables to be restored and the existing tables in the target cluster. If conflicts are found, an error should be reported in advance to avoid data integrity issues.

Copy link

@dbakit dbakit Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes br restore full don't mean I realy want to resotre the entire cluster. In certain scenarios, I just want to restore multiple tables, but they are distributed across different databases. But neither br restore table nor br restore db supports regular expressions.
So, can you add a --force option to skip this check??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should skip the check if user has specified --filter argument already. @benmaoer @YuJuncen WDYT?

return errors.Trace(err)
}
}
// todo: move this check into InitFullClusterRestore, we should move restore config into a separate package
// to avoid import cycle problem which we won't do it in this pr, then refactor this
//
// if it's point restore and reached here, then cmdName=FullRestoreCmd and len(cfg.FullBackupStorage) > 0
if cfg.WithSysTable {
client.InitFullClusterRestore(cfg.ExplicitFilter)
}
}

if client.IsFullClusterRestore() && client.HasBackedUpSysDB() {
if err = client.CheckSysTableCompatibility(mgr.GetDomain(), tables); err != nil {
return errors.Trace(err)
}
}

sp := utils.BRServiceSafePoint{
BackupTS: restoreTS,
TTL: utils.DefaultBRGCSafePointTTL,
Expand Down
2 changes: 1 addition & 1 deletion br/tests/_utils/run_br
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
set -eux

br.test -test.coverprofile="$TEST_DIR/cov.$TEST_NAME.$$.out.log" DEVEL "$@" \
-L "debug" \
-L "info" \
--ca "$TEST_DIR/certs/ca.pem" \
--cert "$TEST_DIR/certs/br.pem" \
--key "$TEST_DIR/certs/br.key"
3 changes: 3 additions & 0 deletions br/tests/br_backup_empty/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ if [ $? -ne 0 ]; then
exit 1
fi

i=1
while [ $i -le $DB_COUNT ]; do
run_sql "DROP DATABASE $DB$i;"
i=$(($i+1))
Expand All @@ -71,6 +72,7 @@ run_sql "CREATE TABLE ${DB}1.usertable1 ( \
echo "backup empty table start..."
run_br --pd $PD_ADDR backup full -s "local://$TEST_DIR/empty_table"

i=1
while [ $i -le $DB_COUNT ]; do
run_sql "DROP DATABASE $DB$i;"
i=$(($i+1))
Expand All @@ -83,6 +85,7 @@ run_br --pd $PD_ADDR restore full -s "local://$TEST_DIR/empty_table"
# insert one row to make sure table is restored.
run_sql "INSERT INTO ${DB}1.usertable1 VALUES (\"a\", \"b\");"

i=1
while [ $i -le $DB_COUNT ]; do
run_sql "DROP DATABASE $DB$i;"
i=$(($i+1))
Expand Down
1 change: 1 addition & 0 deletions br/tests/br_full_ddl/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ fi

# clear restore environment
run_sql "DROP DATABASE $DB;"
run_sql "DROP DATABASE __tidb_br_temporary_mysql;"
# restore full
echo "restore start..."
export GO_FAILPOINTS="github.com/pingcap/tidb/br/pkg/pdutil/PDEnabledPauseConfig=return(true)"
Expand Down
2 changes: 1 addition & 1 deletion br/tests/br_systables/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ add_test_data() {
}

delete_test_data() {
run_sql "DROP TABLE usertest.test;"
run_sql "DROP DATABASE usertest;"
}

rollback_modify() {
Expand Down