-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip: Use sync.pool #1080
base: master
Are you sure you want to change the base?
gzip: Use sync.pool #1080
Conversation
r, err := gzip.NewReader(bytes.NewReader(data)) | ||
|
||
gzReader := gzipReaderPool.Get().(*gzip.Reader) | ||
defer gzipReaderPool.Put(gzReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Reset() failed below, are we sure we want to put this back in the pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm thats a good question. Im looking at the Reset code, and I think it just calls bufio.NewReader(r) to reset it back. So the put call is really just putting the base gzip.Reader{} object back and reset internally resets z.r which is the actual reader.
Does that check out with your understand of reset as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never looked at the inside of Reset before today :D.
Looking at this code, I'd be a little concerned that this could result in us keeping data
alive for longer than is necessary because the gzipReader will still have a reference to it.
Given that your profile only shows the writer being a problem, I wonder if it would be better to start by just pooling the writer's and not the readers.
But don't feel a need to block on my feedback here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good call out. The reader doesn't show up in this profile since I think the other things dominated it.
We're doing a lot of profile work on this now, so I can certainly test that as well. The new heap profiles after this pool change don't seem to point to a big issue of data
living for longer than necessary. But it could also be that GC is running very aggressively due to other poorly managed allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a reference, here is a new CPU profile taken after teh pooling changes. Im highlighting over where the reader shows up which is much smaller that the writer. But certainly there are way more places to look a for bad allocations or things remaining on the heap leading to tha giant GC block still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General implementation looks good to me. Pending Steven's question and light testing to see the profile changes before/after change.
This commit uses a sync.Pool for botht he gzip writer and reader so that we reduce the number of allocations and time GC takes as previously every mutation that required to be gzipped would make a call to newWriter and allocate a new object. This in turn spent a lot of time and created extra objects on the heap that were un needed which drove up GC time
2eed567
to
12fa00d
Compare
8b2decf
to
9e9222e
Compare
This commit uses a sync.Pool for botht he gzip writer and reader so that we reduce the number of allocations and time GC takes as previously every mutation that required to be gzipped would make a call to newWriter and allocate a new object. This in turn spent a lot of time and created extra objects on the heap that were un needed which drove up GC time
This change is