Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: Fix restarts integer overflow #4232

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EdwardX29
Copy link

Updates block and iter structs to use int64 for restarts, offset, and nextOffset. Changes prevent integer overflow found in this issue (#4216). Tested using go test ./...

@EdwardX29 EdwardX29 requested a review from a team as a code owner January 7, 2025 19:35
@EdwardX29 EdwardX29 requested a review from RaduBerinde January 7, 2025 19:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@EdwardX29 EdwardX29 requested a review from jbowens January 7, 2025 19:35
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a test with a very large block and verify we can iterate it.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @EdwardX29 and @jbowens)


sstable/rowblk/rowblk_iter.go line 116 at r3 (raw file):

	// offset is the byte index that marks where the current key/value is
	// encoded in the block.
	offset int64

You can consider adding a type offsetInBlock int64 which would make it evident what fields are offsets.


sstable/rowblk/rowblk_iter.go line 200 at r3 (raw file):

type blockEntry struct {
	offset   int64
	keyStart uint32

Aren't the rest of these also offsets?

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde)


sstable/rowblk/rowblk_iter.go line 116 at r3 (raw file):

Previously, RaduBerinde wrote…

You can consider adding a type offsetInBlock int64 which would make it evident what fields are offsets.

Good idea—we can add a comment on the type discussing the motivation for using an int64, including our expectation that an offset may be > math.MaxUint32 (even though restart points are serialized as uint32s, because the offset fields are sometimes updated to point after the last KV in the block)


sstable/rowblk/rowblk_iter.go line 200 at r3 (raw file):

Previously, RaduBerinde wrote…

Aren't the rest of these also offsets?

+1, I think they're all offsets except for valSize

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: 2 of 3 files reviewed, 6 unresolved discussions (waiting on @EdwardX29 and @RaduBerinde)


sstable/rowblk/rowblk_iter_test.go line 481 at r8 (raw file):

	var largeKeySize int64 = 2 << 30   // 2GB key size
	var largeValueSize int64 = 2 << 30 // 2GB value size

nit: these can be const instead of var, and if you omit the int64too, then down below we won't have to cast into an int when calling bytes.Repeat


sstable/rowblk/rowblk_iter_test.go line 534 at r8 (raw file):

	}

	KVTestPairs := func() []KVTestPair {

curious why use an anonymous function here?


sstable/rowblk/rowblk_iter_test.go line 538 at r8 (raw file):

		for i := 0; i < numKVs; i++ {
			key := fmt.Sprintf("key-%04d", i)
			value := strings.Repeat("a", valueSize)

nit: since we're memory constrained, maybe lift this to the outer scope as:

value4MB := bytes.Repeat([]byte("a"), valueSize)

so that every KV we add is reusing the same constructed byte slice rather than re-creating it


sstable/rowblk/rowblk_iter_test.go line 584 at r8 (raw file):

	numKVs := 63
	valueSize := 4 * (1 << 20)
	var FourGB int64 = 4 * (1 << 30)

nit: use consts for those of these that are constant


sstable/rowblk/rowblk_iter.go line 198 at r8 (raw file):

}

// Type representing an offset in a block

nit: it's idiomatic to start a Go comment on an identifier with the name of the identifier, in this case:

// offsetInBlock represents an offset in a block.

Copy link
Author

@EdwardX29 EdwardX29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @jbowens and @RaduBerinde)


sstable/rowblk/rowblk_iter_test.go line 481 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: these can be const instead of var, and if you omit the int64too, then down below we won't have to cast into an int when calling bytes.Repeat

32 bit architectures are not happy with large constants being used as ints in bytes.Repeat as constant expressions are evaluated at compile time. Adding a check for strconv.IntSize == 32 doesn't solve the issue either. CI run. However, using var and int64 solves this issue as overflow is at runtime instead; run here


sstable/rowblk/rowblk_iter_test.go line 534 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

curious why use an anonymous function here?

Originally was going to have both Multi large KV tests in one test so I wrote a lambda to generate KVs for both. switched to standard initialization.

@EdwardX29 EdwardX29 requested a review from jbowens January 9, 2025 21:12
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Make sure to squash the commits before merging

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @EdwardX29 and @RaduBerinde)


sstable/rowblk/rowblk_iter_test.go line 481 at r8 (raw file):

Previously, EdwardX29 (Edward Xiao) wrote…

32 bit architectures are not happy with large constants being used as ints in bytes.Repeat as constant expressions are evaluated at compile time. Adding a check for strconv.IntSize == 32 doesn't solve the issue either. CI run. However, using var and int64 solves this issue as overflow is at runtime instead; run here

Ah, got it.

fix restarts overflow

add comment for int64 change

update blockEntry offset members to uint32

add check for when block is greater than MaximumSize

add offsetInBlock type to represent offsets

add tests for integer overflow in SeekGE() and writing large blocks

adjust integers in tests for 32 bit machines

Skip block overflow tests for CI mem constraints

Use constants in tests and strconv.IntSize check

revert tests to int64 constants and int cast

revert tests to int64 vars

add explicit check for CI in tests
@EdwardX29 EdwardX29 force-pushed the edwardx29/block-restarts-overflow2 branch from 10f183e to 1b3ca39 Compare January 9, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants