-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/compact: defer retrieval of external values #4198
Conversation
I saw that error in CI also: https://github.com/cockroachdb/pebble/actions/runs/12304080308/job/34340708418?pr=4199 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)
internal/compact/iterator.go
line 1291 at r1 (raw file):
i.value = base.LazyValue{} return } else if !callerOwned {
[nit] remove the else
or the return
internal/compact/iterator.go
line 1408 at r1 (raw file):
} lv = base.MakeInPlaceValue(value) return
[nit] return base.MakeInPlaceValue(value), needDelete, closer, err
and _ base.LazyValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)
internal/compact/iterator.go
line 1291 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] remove the
else
or thereturn
Done
internal/compact/iterator.go
line 1408 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit]
return base.MakeInPlaceValue(value), needDelete, closer, err
and_ base.LazyValue
Done
cb24a4f
to
2c1fd7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r1, 1 of 2 files at r3, all commit messages.
Reviewable status: 5 of 6 files reviewed, 11 unresolved discussions (waiting on @itsbilal and @jbowens)
internal/compact/run.go
line 218 at r3 (raw file):
continue } v, _, err := value.Value(nil)
We used to call LazyValue.Value(nil)
to initialize compact.Iter.iterValue
for the current iterator position. That made sense since it would never be caller owned. And then any instability was handled by reusing valueBuf
.
Now we have two kinds of LazyValue
s here: those that have not been cloned, and those that have been. For the latter, the valueBlockFetcher
may be closed, in which case it will need to allocate. So I think we need another scratch buffer in this Runner
to avoid those allocations.
internal/compact/iterator_test.go
line 204 at r3 (raw file):
d.Fatalf(t, "unknown iter command: %s", parts[0]) } value := lv.InPlaceValue()
why is the test assuming in-place instead of using the more general LazyValue.Value
?
internal/compact/iterator.go
line 560 at r3 (raw file):
return &i.key, i.value } else if i.err != nil { return nil, base.LazyValue{}
nit: consider repeating the comment from above // SINGLEDELs are value-less
.
internal/compact/iterator.go
line 586 at r3 (raw file):
origSnapshotIdx := i.curSnapshotIdx var valueMerger base.ValueMerger // MERGE values are always stored in-place.
Can you add an invariants.Enabled assertion.
internal/compact/iterator.go
line 668 at r3 (raw file):
func (i *Iter) iterNext() bool { i.iterKV = i.iter.Next() if i.err != nil {
It seems that we should no longer need to look at i.err
given we are not changing it, and it will be nil when this function is called.
internal/compact/iterator.go
line 893 at r3 (raw file):
// continue looping. // // MERGE values are always stored in-place.
some comment about adding an assertion
internal/compact/iterator.go
line 919 at r3 (raw file):
// Save the current key. i.saveKey() i.value = i.iterKV.V
This i.value = ...
seems dubious. We are going to immediately call i.nextInStripe()
, and we haven't cloned the LazyValue
, so this value is no longer backed by anything.
I think it worked because i.iterValue
was always the empty slice in this case. And now with the LazyValue
being in-place, the LazyFetcher
will be nil, so it will work. But is in principle incorrect. I think we should explicitly assert that this is in-place, and the empty slice, and then set i.value = LazyValue{}
.
internal/compact/iterator.go
line 1114 at r3 (raw file):
// In this case, we still peek forward in case there's another DELSIZED key // with a lower sequence number, in which case we'll adopt its value. // If the DELSIZED does have a value, it must be in-place.
assertion please
internal/compact/iterator.go
line 1160 at r3 (raw file):
// If the tombstone has a value, it must be in-place. To save it, we // can just copy the in-place value directly. i.valueBuf = append(i.valueBuf[:0], i.iterKV.InPlaceValue()...)
assertion please
internal/compact/iterator.go
line 1286 at r3 (raw file):
// result into i.valueBuf). func (i *Iter) saveValue() { v, callerOwned, err := i.iterKV.V.Value(i.valueBuf[:0])
I don't think we should be fetching. And should be doing i.value, i.valueBuf = i.iterKV.V.Clone(i.valueBuf[:0], &i.scratchFetcher)
internal/compact/iterator.go
line 192 at r3 (raw file):
valueBuf []byte iterKV *base.InternalKV iterValue []byte
I'm glad iterValue
is no more. It seemed to be redundant with iterKV
but if iterKV
was nil we wouldn't set it to nil, which seemed error prone.
2c1fd7e
to
d416e2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 10 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/compact/iterator.go
line 560 at r3 (raw file):
Previously, sumeerbhola wrote…
nit: consider repeating the comment from above
// SINGLEDELs are value-less
.
Done.
internal/compact/iterator.go
line 586 at r3 (raw file):
Previously, sumeerbhola wrote…
Can you add an invariants.Enabled assertion.
InPlaceValue()
already asserts.
internal/compact/iterator.go
line 668 at r3 (raw file):
Previously, sumeerbhola wrote…
It seems that we should no longer need to look at
i.err
given we are not changing it, and it will be nil when this function is called.
You're right, although I'm confused as to why we're not checking i.iter.Error()
here?
internal/compact/iterator.go
line 893 at r3 (raw file):
Previously, sumeerbhola wrote…
some comment about adding an assertion
InPlaceValue()
already asserts.
internal/compact/iterator.go
line 919 at r3 (raw file):
Previously, sumeerbhola wrote…
This
i.value = ...
seems dubious. We are going to immediately calli.nextInStripe()
, and we haven't cloned theLazyValue
, so this value is no longer backed by anything.
I think it worked becausei.iterValue
was always the empty slice in this case. And now with theLazyValue
being in-place, theLazyFetcher
will be nil, so it will work. But is in principle incorrect. I think we should explicitly assert that this is in-place, and the empty slice, and then seti.value = LazyValue{}
.
Agreed.
internal/compact/iterator.go
line 1114 at r3 (raw file):
Previously, sumeerbhola wrote…
assertion please
InPlaceValue()
already asserts.
internal/compact/iterator.go
line 1160 at r3 (raw file):
Previously, sumeerbhola wrote…
assertion please
InPlaceValue()
already asserts.
internal/compact/iterator.go
line 1286 at r3 (raw file):
Previously, sumeerbhola wrote…
I don't think we should be fetching. And should be doing
i.value, i.valueBuf = i.iterKV.V.Clone(i.valueBuf[:0], &i.scratchFetcher)
I had that initially but decided I did not want introduce the possibility of needing to load the value from an external block reader in this PR. Fetching the value keeps the behavior closer to what was here originally, constraining the surface area of this PR.
internal/compact/run.go
line 218 at r3 (raw file):
Previously, sumeerbhola wrote…
We used to call
LazyValue.Value(nil)
to initializecompact.Iter.iterValue
for the current iterator position. That made sense since it would never be caller owned. And then any instability was handled by reusingvalueBuf
.Now we have two kinds of
LazyValue
s here: those that have not been cloned, and those that have been. For the latter, thevalueBlockFetcher
may be closed, in which case it will need to allocate. So I think we need another scratch buffer in thisRunner
to avoid those allocations.
Since we're not currently Clone
-ing, this value is necessarily in-place. My preference is to defer any use of Clone until a subsequent PR. I'm unsure it's ever worthwhile for compactions to Clone value block references. I don't think there's ever an instance where we call saveValue
and then don't end up reading the value, because we only need to iterate forward like this to determine SET
vs SETWITHDEL
internal/compact/iterator_test.go
line 204 at r3 (raw file):
Previously, sumeerbhola wrote…
why is the test assuming in-place instead of using the more general
LazyValue.Value
?
This particular unit test just does not support anything other than in-place values. It's trivial to use Value
. Done, so there's one less thing to edit if we extend the test to construct non in-place values.
d416e2f
to
6e17efb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 10 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/compact/run.go
line 218 at r3 (raw file):
Since we're not currently
Clone
-ing, this value is necessarily in-place.
I mean, this value is necessarily in-place or for the current iterator position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r3, 1 of 2 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @itsbilal and @jbowens)
internal/compact/iterator.go
line 668 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
You're right, although I'm confused as to why we're not checking
i.iter.Error()
here?
It's probably because what is returned in i.iter.Error()
will also be returned in i.iter.Close()
.
internal/compact/iterator.go
line 1286 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I had that initially but decided I did not want introduce the possibility of needing to load the value from an external block reader in this PR. Fetching the value keeps the behavior closer to what was here originally, constraining the surface area of this PR.
I'm slightly worried we will forget it. Could you add a TODO, and also include in that scope that run.go should use a scratch buffer?
internal/compact/run.go
line 218 at r3 (raw file):
I don't think there's ever an instance where we call
saveValue
and then don't end up reading the value, because we only need to iterate forward like this to determineSET
vsSETWITHDEL
But we will in the future, yes? When a compaction is not rewriting its references to blob files.
During compaction iteration, defer the retrieval of values stored in value blocks until either: a) we need to copy the value before stepping the iterator, or b) the compaction iterator yields the value to the main compaction loop. This refactor ensures that when a KV is elided by a tombstone, we avoid unnecessarily loading its value from the external value block. Additionally, refactoring the compaction iterator interfaces to propagate LazyValues will be used by value separation (cockroachdb#112) in which we will sometimes propagate external value references to output sstables without ever retrieving the value. Close cockroachdb#4197. Informs cockroachdb#112.
6e17efb
to
114a269
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @itsbilal and @sumeerbhola)
internal/compact/iterator.go
line 1286 at r3 (raw file):
Previously, sumeerbhola wrote…
I'm slightly worried we will forget it. Could you add a TODO, and also include in that scope that run.go should use a scratch buffer?
Added a TODO; I think the run.go Value
call will go away altogether
internal/compact/run.go
line 218 at r3 (raw file):
Previously, sumeerbhola wrote…
I don't think there's ever an instance where we call
saveValue
and then don't end up reading the value, because we only need to iterate forward like this to determineSET
vsSETWITHDEL
But we will in the future, yes? When a compaction is not rewriting its references to blob files.
Yeah, and then I think we'll remove this Value
call altogether
TFTRs! |
During compaction iteration, defer the retrieval of values stored in value blocks until either:
a) we need to copy the value before stepping the iterator, or b) the compaction iterator yields the value to the main compaction loop.
This refactor ensures that when a KV is elided by a tombstone, we avoid unnecessarily loading its value from the external value block. Additionally, refactoring the compaction iterator interfaces to propagate LazyValues will be used by value separation (#112) in which we will sometimes propagate external value references to output sstables without ever retrieving the value.
Close #4197.
Informs #112.