New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

pkg/cmd/roachtest: Add large schema test for backup/restore. #139047

Open

benbardin wants to merge 3 commits into cockroachdb:master from benbardin:restore10k

Collaborator

benbardin commented Jan 14, 2025

Informs: #138747
Release note: None

benbardin requested a review from msbutler

January 14, 2025 17:19

Member

cockroach-teamcity commented Jan 14, 2025

This change is

Collaborator Author

benbardin commented Jan 14, 2025

Michael, if this looks good to you, I'll see what Faizan thinks.

I tried running this last night, backup/restore worked at 10k tables with no problem.

msbutler reviewed

View reviewed changes

pkg/cmd/roachtest/tests/large_schema_benchmark.go Outdated

+              				dest := destinationName(c)
+              				uri := `gs://` + backupTestingBucket + `/` + dest + `?AUTH=implicit`
+              				t.L().Printf("Backing up to %s\n", uri)

Collaborator

msbutler Jan 15, 2025

what exactly is the check box we're seeking to fill? Running a single full backup on a cluster with 10k tables seems straightforward. We could begin pushing the system in a variety of ways:

run x incremental backups
run schema changes during the backup process (maybe this workload already does this?)
run back ups with revision history

As an aside, it seems more realistic to run the backup(s) while the main workload is running. Instead of running backups manually after the fact, could we instead enable automatic scheduled backups? And then run a restore after the workload finishes?
https://github.com/msbutler/cockroach/blob/butler-ldr-init-scan-roachtest/pkg/cmd/roachtest/tests/large_schema_benchmark.go#L150

If foundations wants certain variants of the benchmark free of backups, perhaps we should add a different flavor of this test withBackups via the args to registerLargeSchemaBenchmark.

alicia-l2 Jan 15, 2025 •

edited

Loading

As an aside, it seems more realistic to run the backup(s) while the main workload is running. Instead of running backups manually after the fact, could we instead enable automatic scheduled backups? And then run a restore after the workload finishes?

Let's do this and run hourly incrementals, daily fulls, and restore from one of the hourly incrementals

benbardin force-pushed the restore10k branch 2 times, most recently from cbcc726 to d01adb0 Compare

January 21, 2025 20:04

Collaborator Author

benbardin commented Jan 21, 2025

Wrote a new test based on our discussion last week, PTAL?


          pkg/cmd/roachtest: Add large schema test for backup/restore.

ad6ec6f

Informs: cockroachdb#138747
Release note: None

benbardin force-pushed the restore10k branch from d01adb0 to ad6ec6f Compare

January 21, 2025 21:52

benbardin changed the title ~~pkg/cmd/roachtest: Add Backup/Restore to large schema test~~ pkg/cmd/roachtest: Add large schema test for backup/restore.

msbutler reviewed

View reviewed changes

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              // and multi-region).
+              func registerLargeSchemaBackupRestores(r registry.Registry) {
+              	// 10k is probably the upper limit right now, but we haven't tested further.
+              	for _, scale := range []int{1000, 5000, 10000} {

Collaborator

msbutler Jan 22, 2025

why do we need 3 new roachtests? couldn't we just have one with 10k?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              )
+              // registerLargeSchemaBackupRestores registers all permutations of
+              // multi-region large schema benchmarking (for different scales

Collaborator

msbutler Jan 22, 2025

stale comment?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              		spec.GCEVolumeType("pd-ssd"),
+              		spec.GCEMachineType("n2-standard-8"),
+              	}
+              	testTimeout := 19 * time.Hour

Collaborator

msbutler Jan 22, 2025

how long does this test actually take to run?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              				// quickly,on MR the job retention can slow things down. Let's
+              				// minimize how long jobs are kept, so that the creation / ingest
+              				// completes in a reasonable amount of time.
+              				_, err := conn.Exec("SET CLUSTER SETTING jobs.retention_time='1h'")

Collaborator

msbutler Jan 22, 2025

since this isn't MR, do we need this?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              				_, err = conn.Exec("SET CLUSTER SETTING kv.transaction.internal.max_auto_retries=500")
+              				require.NoError(t, err)
+              				// Create a user that will be used for authentication for the REST
+              				// API calls.

Collaborator

msbutler Jan 22, 2025

is this necessary?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              			// Create all the databases
+              			importConcurrencyLimit := 32
+              			options := tpccOptions{
+              				WorkloadCmd: "tpccmultidb",

Collaborator

msbutler Jan 22, 2025

i think you can leave WorkloadCmd empty, as we're not setting up mr tpcc

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              				testUser := fmt.Sprintf("test-user-%d", i)
+              				_, err = conn.Exec(fmt.Sprintf("CREATE USER %q", testUser))
+              				require.NoError(t, err)
+              				cmd := fmt.Sprintf("%s %q", grantPreamble, testUser)

Collaborator

msbutler Jan 22, 2025

does this grant command create N new descriptors or does it modify N descriptors? Any way to verify that we're indeed conducting N descriptor changes?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              		spec.WorkloadNode(),
+              		spec.WorkloadNodeCPU(8),
+              		spec.VolumeSize(800),
+              		spec.GCEVolumeType("pd-ssd"),

Collaborator

msbutler Jan 22, 2025

how big is this cluster once set up ? we need pd ssds, or can we get use cheap local ssds (375 GB per node)

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              			setupTPCC(ctx, t, t.L(), c, options)
+              			// The backup testing bucket is single-region, because that's cheaper.
+              			// So skip this test in multi-region clusters, to avoid the expense of

Collaborator

msbutler Jan 22, 2025

stale comment about multi region clusters?

pkg/cmd/roachtest/tests/large_schema_backup_restore.go Outdated

+              				require.NoError(t, err)
+              			}
+              			//time.Sleep(600 * time.Second)

Collaborator

msbutler Jan 22, 2025

stale comment

benbardin added 2 commits

January 24, 2025 12:01


          michaelcr

cb87fc1


          michaelcr

86ba1b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet