gzip: Use sync.pool #1080

Jeremyyang920 · 2024-11-26T17:34:03Z

This commit uses a sync.Pool for botht he gzip writer and reader so that we reduce the number of allocations and time GC takes as previously every mutation that required to be gzipped would make a call to newWriter and allocate a new object. This in turn spent a lot of time and created extra objects on the heap that were un needed which drove up GC time

This change is

Jeremyyang920 · 2024-11-26T17:36:00Z

pprof flame graph for reference over a 15s profile. Majority of time spent in malloc and gc due to all the object allocations.

stevendanna · 2024-11-26T17:46:59Z

internal/staging/stage/gzip.go

-	r, err := gzip.NewReader(bytes.NewReader(data))
+
+	gzReader := gzipReaderPool.Get().(*gzip.Reader)
+	defer gzipReaderPool.Put(gzReader)


If Reset() failed below, are we sure we want to put this back in the pool?

Hmm thats a good question. Im looking at the Reset code, and I think it just calls bufio.NewReader(r) to reset it back. So the put call is really just putting the base gzip.Reader{} object back and reset internally resets z.r which is the actual reader.

Does that check out with your understand of reset as well?

I've never looked at the inside of Reset before today :D.

Looking at this code, I'd be a little concerned that this could result in us keeping data alive for longer than is necessary because the gzipReader will still have a reference to it.

Given that your profile only shows the writer being a problem, I wonder if it would be better to start by just pooling the writer's and not the readers.

But don't feel a need to block on my feedback here.

It's a good call out. The reader doesn't show up in this profile since I think the other things dominated it.

We're doing a lot of profile work on this now, so I can certainly test that as well. The new heap profiles after this pool change don't seem to point to a big issue of data living for longer than necessary. But it could also be that GC is running very aggressively due to other poorly managed allocations.

For a reference, here is a new CPU profile taken after teh pooling changes. Im highlighting over where the reader shows up which is much smaller that the writer. But certainly there are way more places to look a for bad allocations or things remaining on the heap leading to tha giant GC block still.

ryanluu12345 · 2024-11-26T18:06:35Z

Unrelated to the change itself, but I think @noelcrl mentioned this in standup that the container images for v20.1 and v20.2 are no longer present. You can just rebase after this goes in:
#1081

ryanluu12345

General implementation looks good to me. Pending Steven's question and light testing to see the profile changes before/after change.

internal/staging/stage/gzip.go

This commit uses a sync.Pool for botht he gzip writer and reader so that we reduce the number of allocations and time GC takes as previously every mutation that required to be gzipped would make a call to newWriter and allocate a new object. This in turn spent a lot of time and created extra objects on the heap that were un needed which drove up GC time

Jeremyyang920 requested review from sravotto, ryanluu12345 and ZhouXing19 November 26, 2024 17:34

stevendanna reviewed Nov 26, 2024

View reviewed changes

ryanluu12345 reviewed Nov 26, 2024

View reviewed changes

internal/staging/stage/gzip.go Show resolved Hide resolved

internal/staging/stage/gzip.go Show resolved Hide resolved

cockroachdb deleted a comment from ryanluu12345 Nov 26, 2024

Jeremyyang920 force-pushed the jyang/gzip_pool branch from 2eed567 to 12fa00d Compare November 26, 2024 19:00

ryanluu12345 requested a review from noelcrl December 4, 2024 18:22

add logs for deferred events

9e9222e

Jeremyyang920 force-pushed the jyang/gzip_pool branch from 8b2decf to 9e9222e Compare December 11, 2024 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gzip: Use sync.pool #1080

gzip: Use sync.pool #1080

Jeremyyang920 commented Nov 26, 2024 •

edited by cockroach-dev-inf

Loading

Jeremyyang920 commented Nov 26, 2024

stevendanna Nov 26, 2024

Jeremyyang920 Nov 26, 2024

stevendanna Dec 12, 2024 •

edited

Loading

Jeremyyang920 Dec 12, 2024

Jeremyyang920 Dec 12, 2024

ryanluu12345 commented Nov 26, 2024

ryanluu12345 left a comment

gzip: Use sync.pool #1080

Are you sure you want to change the base?

gzip: Use sync.pool #1080

Conversation

Jeremyyang920 commented Nov 26, 2024 • edited by cockroach-dev-inf Loading

Jeremyyang920 commented Nov 26, 2024

stevendanna Nov 26, 2024

Choose a reason for hiding this comment

Jeremyyang920 Nov 26, 2024

Choose a reason for hiding this comment

stevendanna Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Jeremyyang920 Dec 12, 2024

Choose a reason for hiding this comment

Jeremyyang920 Dec 12, 2024

Choose a reason for hiding this comment

ryanluu12345 commented Nov 26, 2024

ryanluu12345 left a comment

Choose a reason for hiding this comment

Jeremyyang920 commented Nov 26, 2024 •

edited by cockroach-dev-inf

Loading

stevendanna Dec 12, 2024 •

edited

Loading