-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489
base: master
Are you sure you want to change the base?
perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489
Conversation
Kindly cc-ing @veorq |
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):🟢 Maintainers must be able to edit this pull request (more info) ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a specialised library; elsewhere in the codebase, we already use https://github.com/libp2p/go-buffer-pool .
I suggest you use its Buffer
type which is indicated for these use cases.
Hey @thehowl I would not advise using libp2p-go-pool's Buffer over the standard library’s bytes.Buffer and here are some benchmark results from their own benchmarks that I added the Go standard library bytes.Buffer benchmarks orijtech/go-buffer-pool@8ea7d61 $ benchstat libp2p.txt stdlib.txt
name old time/op new time/op delta
WriteByte-8 16.8µs ± 4% 16.7µs ± 9% ~ (p=0.400 n=9+10)
BufferFullSmallReads-8 61.1µs ± 6% 46.4µs ± 3% -24.06% (p=0.000 n=10+10)
BufferNotEmptyWriteRead-8 417µs ± 3% 182µs ± 5% -56.28% (p=0.000 n=10+10)
name old speed new speed delta
WriteByte-8 244MB/s ± 3% 245MB/s ± 8% ~ (p=0.400 n=9+10)
name old alloc/op new alloc/op delta
WriteByte-8 0.00B 0.00B ~ (all equal)
BufferFullSmallReads-8 0.00B 0.00B ~ (all equal)
BufferNotEmptyWriteRead-8 0.90B ±122% 0.00B -100.00% (p=0.011 n=10+10)
name old allocs/op new allocs/op delta
WriteByte-8 0.00 0.00 ~ (all equal)
BufferFullSmallReads-8 0.00 0.00 ~ (all equal)
BufferNotEmptyWriteRead-8 0.00 0.00 ~ (all equal) and that's just for the Buffer; also the standard library's sync.Pool is battle tested across millions of repositories and is continually being improved while libp2p doesn't get much attention and some years ago I found a bug which was glaring but hadn't been detected due to inadequate testing libp2p/go-buffer-pool#26 but even the PR took some time just to push through. |
The p2p implementation uses sync.Pool under the hood as well. My issue is as follows: using a byte buffer with a sync.Pool directly is not useful, as the buffer you get out may have any size of the underlying buffer, which is not guaranteed to be reused. Some buffers may become very very large and end up wasting memory. Here's another suggestion: https://github.com/valyala/bytebufferpool - it seems to implement the kind of protection I was worried about :) |
Thanks for the response @thehowl!
Before being put back into the pool, we invoke buf.Reset() which then internally moves the underlying byteslice for reuse with b.buf[:0] by making it empty, but its capacity won't have to be re-created, which is where the savings come from. I believe that alleviates your concern. It is directly useful as the incumbent pool used in many high performance projects too.
I have more faith in valyala's implementation, thank you! Sadly though, it doesn't fully implement all *bytes.Buffer's methods like Truncate(..) and also in some instances like json.Indent which is used inside amino in more than 3 places which takes a *bytes.Buffer directly. |
c507270
to
96aee1d
Compare
yes, but if I'm encoding a value which takes 16MB to store (which would happen quite exceptionally, anyway) I want the allocated buffer to be freed rather than being put into the buffer. So, either we hard-code a limit beyond which we don't put things back into the buffer, or we use something like bytebufferpool (with the additional methods) which automatically calibrates to an ideal amount of buffer sizes |
@thehowl thanks for your discourse!
What you are asking for is only performed by p2p/go-buffer-pool.Buffer.Reset() which sets I don't think it is a good idea to try to make a complex heuristic that checks for memory limits then frees them, unless you are writing a garbage collector from scratch. Go's garbage collector handles this complex logic when alleviating pressure and I don't think we should be trying to re-invent the wheel, especially with the massive memory bloat that gnovm already has. |
0fd0b68
to
7f95d1a
Compare
…led reuse This change comes from an analysis of a bunch of RAM and CPU profiles and noticing that realm storage needs to invoke amino.MustMarshalAny but that in the profile for TestStdlibs, it was consuming 1.28GB. ```shell ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go 0 1.28GB (flat, cum) 0.61% of Total . . 80:func MustMarshalAny(o interface{}) []byte { . 1.28GB 81: return gcdc.MustMarshalAny(o) . . 82:} . . 83: . . 84:func MarshalAnySized(o interface{}) ([]byte, error) { . . 85: return gcdc.MarshalAnySized(o) . . 86:} ``` and ```shell focus=MarshalAny Showing nodes accounting for 1303.02MB, 0.6% of 217023.96MB total Dropped 13 nodes (cum <= 1085.12MB) ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 539.49MB 100% | bytes.(*Buffer).grow 539.49MB 0.25% 0.25% 539.49MB 0.25% | bytes.growSlice ----------------------------------------------------------+------------- 706.50MB 100% | bytes.(*Buffer).Write 167.01MB 0.077% 0.33% 706.50MB 0.33% | bytes.(*Buffer).grow 539.49MB 76.36% | bytes.growSlice ----------------------------------------------------------+------------- 93MB 58.68% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface (inline) 56.50MB 35.65% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryStruct (inline) 9MB 5.68% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList (inline) 158.51MB 0.073% 0.4% 158.51MB 0.073% | bytes.NewBuffer ----------------------------------------------------------+------------- 145.01MB 57.77% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).writeFieldIfNotEmpty 86MB 34.26% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface 20MB 7.97% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList 85.50MB 0.039% 0.44% 251.01MB 0.12% | github.com/gnolang/gno/tm2/pkg/amino.encodeFieldNumberAndTyp3 165.51MB 65.94% | bytes.(*Buffer).Write ----------------------------------------------------------+------------- 77.01MB 100% | github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice 61.50MB 0.028% 0.47% 77.01MB 0.035% | github.com/gnolang/gno/tm2/pkg/amino.EncodeUvarint 15.51MB 20.14% | bytes.(*Buffer).Write ----------------------------------------------------------+------------- ``` but after this change, we see more than 560MB shaved off ```shell ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go 0 560.95MB (flat, cum) 0.26% of Total . . 80:func MustMarshalAny(o interface{}) []byte { . 560.95MB 81: return gcdc.MustMarshalAny(o) . . 82:} . . 83: . . 84:func MarshalAnySized(o interface{}) ([]byte, error) { . . 85: return gcdc.MarshalAnySized(o) . . 86:} ``` and ```shell ----------------------------------------------------------+------------- 16.35MB 52.46% | github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice 14.81MB 47.54% | github.com/gnolang/gno/tm2/pkg/amino.writeMaybeBare 0 0% 0.26% 31.16MB 0.014% | bytes.(*Buffer).Write 31.16MB 100% | bytes.(*Buffer).grow ----------------------------------------------------------+------------- 31.16MB 100% | bytes.(*Buffer).Write 0 0% 0.26% 31.16MB 0.014% | bytes.(*Buffer).grow 31.16MB 100% | bytes.growSlice ----------------------------------------------------------+------------- ``` and even more after the change on ensuring that tm2/pkg/amino benchmarks could run we have quite good improvements! Running out of RAM is much worse than a couple of microseconds so we can tolerate an increase in some CPU time benchmarks. ```shell name old time/op new time/op delta Binary/EmptyStruct:encode-8 3.86µs ± 5% 3.92µs ± 5% ~ (p=0.548 n=5+5) Binary/EmptyStruct:decode-8 3.79µs ± 5% 3.79µs ± 6% ~ (p=0.690 n=5+5) Binary/PrimitivesStruct:encode-8 35.5µs ± 2% 36.5µs ± 5% ~ (p=0.151 n=5+5) Binary/PrimitivesStruct:decode-8 35.0µs ± 2% 38.6µs ±11% +10.17% (p=0.016 n=5+5) Binary/ShortArraysStruct:encode-8 5.91µs ± 6% 6.36µs ± 8% +7.61% (p=0.032 n=5+5) Binary/ShortArraysStruct:decode-8 6.07µs ±21% 6.39µs ± 8% ~ (p=0.151 n=5+5) Binary/ArraysStruct:encode-8 95.1µs ± 8% 100.6µs ± 7% ~ (p=0.222 n=5+5) Binary/ArraysStruct:decode-8 91.3µs ± 5% 98.5µs ±12% ~ (p=0.222 n=5+5) Binary/ArraysArraysStruct:encode-8 131µs ± 3% 132µs ± 6% ~ (p=0.841 n=5+5) Binary/ArraysArraysStruct:decode-8 136µs ± 9% 134µs ± 3% ~ (p=0.548 n=5+5) Binary/SlicesStruct:encode-8 85.4µs ± 1% 92.3µs ± 9% +8.15% (p=0.008 n=5+5) Binary/SlicesStruct:decode-8 87.1µs ± 8% 94.8µs ± 7% ~ (p=0.056 n=5+5) Binary/SlicesSlicesStruct:encode-8 506µs ± 2% 545µs ± 9% ~ (p=0.151 n=5+5) Binary/SlicesSlicesStruct:decode-8 506µs ± 3% 523µs ± 3% ~ (p=0.095 n=5+5) Binary/PointersStruct:encode-8 56.8µs ± 4% 65.5µs ±20% +15.43% (p=0.016 n=5+5) Binary/PointersStruct:decode-8 57.5µs ± 3% 55.9µs ± 3% ~ (p=0.095 n=5+5) Binary/PointerSlicesStruct:encode-8 162µs ± 4% 172µs ±21% ~ (p=0.841 n=5+5) Binary/PointerSlicesStruct:decode-8 163µs ± 5% 185µs ±13% ~ (p=0.095 n=5+5) Binary/ComplexSt:encode-8 314µs ± 3% 354µs ±11% +12.90% (p=0.008 n=5+5) Binary/ComplexSt:decode-8 319µs ± 2% 338µs ± 4% +5.87% (p=0.008 n=5+5) Binary/EmbeddedSt1:encode-8 39.8µs ± 7% 39.3µs ± 8% ~ (p=1.000 n=5+5) Binary/EmbeddedSt1:decode-8 37.0µs ± 4% 37.8µs ± 6% ~ (p=0.690 n=5+5) Binary/EmbeddedSt2:encode-8 316µs ± 7% 307µs ± 3% ~ (p=0.222 n=5+5) Binary/EmbeddedSt2:decode-8 316µs ± 3% 306µs ± 2% ~ (p=0.095 n=5+5) Binary/EmbeddedSt3:encode-8 217µs ± 7% 201µs ± 1% -7.26% (p=0.008 n=5+5) Binary/EmbeddedSt3:decode-8 222µs ±10% 204µs ± 2% -8.50% (p=0.032 n=5+5) Binary/EmbeddedSt4:encode-8 332µs ± 4% 325µs ± 3% ~ (p=0.421 n=5+5) Binary/EmbeddedSt4:decode-8 332µs ± 4% 324µs ± 5% ~ (p=0.095 n=5+5) Binary/EmbeddedSt5:encode-8 218µs ± 2% 212µs ± 3% ~ (p=0.056 n=5+5) Binary/EmbeddedSt5:decode-8 224µs ± 8% 209µs ± 1% -6.85% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:encode-8 9.03µs ± 6% 8.97µs ±12% ~ (p=0.841 n=5+5) Binary/AminoMarshalerStruct1:decode-8 8.91µs ± 5% 8.81µs ± 4% ~ (p=0.841 n=5+5) Binary/AminoMarshalerStruct2:encode-8 13.2µs ±10% 12.2µs ± 2% -7.26% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:decode-8 13.2µs ± 6% 12.5µs ± 5% ~ (p=0.095 n=5+5) Binary/AminoMarshalerStruct3:encode-8 7.17µs ± 3% 7.50µs ± 8% ~ (p=0.548 n=5+5) Binary/AminoMarshalerStruct3:decode-8 7.12µs ± 4% 7.84µs ±10% +10.12% (p=0.016 n=5+5) Binary/AminoMarshalerInt4:encode-8 6.60µs ± 5% 6.96µs ±11% ~ (p=0.421 n=5+5) Binary/AminoMarshalerInt4:decode-8 6.79µs ±12% 7.04µs ±15% ~ (p=0.690 n=5+5) Binary/AminoMarshalerInt5:encode-8 6.64µs ± 4% 6.92µs ± 5% +4.09% (p=0.032 n=5+5) Binary/AminoMarshalerInt5:decode-8 6.55µs ± 3% 7.76µs ±10% +18.44% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:encode-8 11.7µs ± 5% 13.2µs ±10% +13.09% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:decode-8 11.4µs ± 3% 11.6µs ± 2% ~ (p=0.222 n=5+5) Binary/AminoMarshalerStruct7:encode-8 9.86µs ± 1% 10.10µs ±19% ~ (p=0.310 n=5+5) Binary/AminoMarshalerStruct7:decode-8 9.55µs ± 3% 9.75µs ±10% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta Binary/EmptyStruct:encode-8 1.50kB ± 0% 1.41kB ± 0% -6.32% (p=0.008 n=5+5) Binary/EmptyStruct:decode-8 1.50kB ± 0% 1.41kB ± 0% -6.32% (p=0.008 n=5+5) Binary/PrimitivesStruct:encode-8 10.4kB ± 0% 9.6kB ± 0% -7.82% (p=0.008 n=5+5) Binary/PrimitivesStruct:decode-8 10.4kB ± 0% 9.6kB ± 0% -7.82% (p=0.000 n=4+5) Binary/ShortArraysStruct:encode-8 2.11kB ± 0% 1.92kB ± 0% -9.04% (p=0.008 n=5+5) Binary/ShortArraysStruct:decode-8 2.11kB ± 0% 1.92kB ± 0% -9.04% (p=0.008 n=5+5) Binary/ArraysStruct:encode-8 25.9kB ± 0% 22.0kB ± 0% -15.04% (p=0.008 n=5+5) Binary/ArraysStruct:decode-8 25.9kB ± 0% 22.0kB ± 0% -15.04% (p=0.008 n=5+5) Binary/ArraysArraysStruct:encode-8 37.7kB ± 0% 25.3kB ± 0% -33.07% (p=0.008 n=5+5) Binary/ArraysArraysStruct:decode-8 37.7kB ± 0% 25.3kB ± 0% -33.07% (p=0.008 n=5+5) Binary/SlicesStruct:encode-8 28.2kB ± 0% 25.1kB ± 0% -10.96% (p=0.008 n=5+5) Binary/SlicesStruct:decode-8 28.2kB ± 0% 25.1kB ± 0% -10.97% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:encode-8 183kB ± 0% 147kB ± 0% -19.92% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:decode-8 183kB ± 0% 147kB ± 0% -19.92% (p=0.008 n=5+5) Binary/PointersStruct:encode-8 14.4kB ± 0% 13.6kB ± 0% -5.64% (p=0.008 n=5+5) Binary/PointersStruct:decode-8 14.4kB ± 0% 13.6kB ± 0% -5.64% (p=0.008 n=5+5) Binary/PointerSlicesStruct:encode-8 43.9kB ± 0% 40.2kB ± 0% -8.49% (p=0.008 n=5+5) Binary/PointerSlicesStruct:decode-8 43.9kB ± 0% 40.2kB ± 0% -8.49% (p=0.008 n=5+5) Binary/ComplexSt:encode-8 95.3kB ± 0% 78.2kB ± 0% -17.97% (p=0.008 n=5+5) Binary/ComplexSt:decode-8 95.3kB ± 0% 78.2kB ± 0% -17.97% (p=0.008 n=5+5) Binary/EmbeddedSt1:encode-8 11.3kB ± 0% 10.2kB ± 0% -9.62% (p=0.000 n=5+4) Binary/EmbeddedSt1:decode-8 11.3kB ± 0% 10.2kB ± 0% -9.61% (p=0.000 n=5+4) Binary/EmbeddedSt2:encode-8 95.5kB ± 0% 78.3kB ± 0% -17.96% (p=0.008 n=5+5) Binary/EmbeddedSt2:decode-8 95.5kB ± 0% 78.4kB ± 0% -17.94% (p=0.008 n=5+5) Binary/EmbeddedSt3:encode-8 68.3kB ± 0% 56.6kB ± 0% -17.22% (p=0.008 n=5+5) Binary/EmbeddedSt3:decode-8 68.3kB ± 0% 56.6kB ± 0% -17.21% (p=0.008 n=5+5) Binary/EmbeddedSt4:encode-8 97.2kB ± 0% 82.3kB ± 0% -15.32% (p=0.008 n=5+5) Binary/EmbeddedSt4:decode-8 97.2kB ± 0% 82.3kB ± 0% -15.31% (p=0.008 n=5+5) Binary/EmbeddedSt5:encode-8 65.9kB ± 0% 55.3kB ± 0% -16.19% (p=0.008 n=5+5) Binary/EmbeddedSt5:decode-8 66.0kB ± 0% 55.3kB ± 0% -16.18% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:encode-8 2.87kB ± 0% 2.66kB ± 0% -7.23% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:decode-8 2.87kB ± 0% 2.66kB ± 0% -7.23% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:encode-8 4.58kB ± 0% 3.62kB ± 0% -20.95% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:decode-8 4.58kB ± 0% 3.62kB ± 0% -20.95% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:encode-8 2.42kB ± 0% 2.31kB ± 0% -4.62% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:decode-8 2.42kB ± 0% 2.31kB ± 0% -4.62% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:encode-8 2.38kB ± 0% 2.15kB ± 0% -9.38% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:decode-8 2.38kB ± 0% 2.15kB ± 0% -9.38% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:encode-8 2.36kB ± 0% 2.27kB ± 0% -4.07% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:decode-8 2.36kB ± 0% 2.27kB ± 0% -4.07% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:encode-8 3.51kB ± 0% 3.19kB ± 0% -9.05% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:decode-8 3.51kB ± 0% 3.19kB ± 0% -9.05% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:encode-8 2.89kB ± 0% 2.67kB ± 0% -7.72% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:decode-8 2.89kB ± 0% 2.67kB ± 0% -7.72% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Binary/EmptyStruct:encode-8 38.0 ± 0% 36.0 ± 0% -5.26% (p=0.008 n=5+5) Binary/EmptyStruct:decode-8 38.0 ± 0% 36.0 ± 0% -5.26% (p=0.008 n=5+5) Binary/PrimitivesStruct:encode-8 439 ± 0% 429 ± 0% -2.28% (p=0.008 n=5+5) Binary/PrimitivesStruct:decode-8 439 ± 0% 429 ± 0% -2.28% (p=0.008 n=5+5) Binary/ShortArraysStruct:encode-8 56.0 ± 0% 52.0 ± 0% -7.14% (p=0.008 n=5+5) Binary/ShortArraysStruct:decode-8 56.0 ± 0% 52.0 ± 0% -7.14% (p=0.008 n=5+5) Binary/ArraysStruct:encode-8 977 ± 0% 919 ± 0% -5.94% (p=0.008 n=5+5) Binary/ArraysStruct:decode-8 977 ± 0% 919 ± 0% -5.94% (p=0.008 n=5+5) Binary/ArraysArraysStruct:encode-8 1.28k ± 0% 1.08k ± 0% -15.05% (p=0.008 n=5+5) Binary/ArraysArraysStruct:decode-8 1.28k ± 0% 1.08k ± 0% -15.05% (p=0.008 n=5+5) Binary/SlicesStruct:encode-8 1.01k ± 0% 0.97k ± 0% -3.77% (p=0.008 n=5+5) Binary/SlicesStruct:decode-8 1.01k ± 0% 0.97k ± 0% -3.77% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:encode-8 6.33k ± 0% 5.95k ± 0% -5.90% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:decode-8 6.33k ± 0% 5.95k ± 0% -5.90% (p=0.008 n=5+5) Binary/PointersStruct:encode-8 637 ± 0% 627 ± 0% -1.57% (p=0.008 n=5+5) Binary/PointersStruct:decode-8 637 ± 0% 627 ± 0% -1.57% (p=0.008 n=5+5) Binary/PointerSlicesStruct:encode-8 1.62k ± 0% 1.56k ± 0% -3.28% (p=0.008 n=5+5) Binary/PointerSlicesStruct:decode-8 1.62k ± 0% 1.56k ± 0% -3.28% (p=0.008 n=5+5) Binary/ComplexSt:encode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/ComplexSt:decode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/EmbeddedSt1:encode-8 453 ± 0% 440 ± 0% -2.87% (p=0.008 n=5+5) Binary/EmbeddedSt1:decode-8 453 ± 0% 440 ± 0% -2.87% (p=0.008 n=5+5) Binary/EmbeddedSt2:encode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/EmbeddedSt2:decode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/EmbeddedSt3:encode-8 2.32k ± 0% 2.20k ± 0% -5.38% (p=0.008 n=5+5) Binary/EmbeddedSt3:decode-8 2.32k ± 0% 2.20k ± 0% -5.38% (p=0.008 n=5+5) Binary/EmbeddedSt4:encode-8 3.67k ± 0% 3.54k ± 0% -3.73% (p=0.008 n=5+5) Binary/EmbeddedSt4:decode-8 3.67k ± 0% 3.54k ± 0% -3.73% (p=0.008 n=5+5) Binary/EmbeddedSt5:encode-8 2.32k ± 0% 2.20k ± 0% -5.00% (p=0.008 n=5+5) Binary/EmbeddedSt5:decode-8 2.32k ± 0% 2.20k ± 0% -5.00% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:encode-8 97.0 ± 0% 94.0 ± 0% -3.09% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:decode-8 97.0 ± 0% 94.0 ± 0% -3.09% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:encode-8 149 ± 0% 133 ± 0% -10.74% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:decode-8 149 ± 0% 133 ± 0% -10.74% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:encode-8 77.0 ± 0% 76.0 ± 0% -1.30% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:decode-8 77.0 ± 0% 76.0 ± 0% -1.30% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:encode-8 71.0 ± 0% 68.0 ± 0% -4.23% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:decode-8 71.0 ± 0% 68.0 ± 0% -4.23% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:encode-8 74.0 ± 0% 73.0 ± 0% -1.35% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:decode-8 74.0 ± 0% 73.0 ± 0% -1.35% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:encode-8 122 ± 0% 117 ± 0% -4.10% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:decode-8 122 ± 0% 117 ± 0% -4.10% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:encode-8 101 ± 0% 98 ± 0% -2.97% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:decode-8 101 ± 0% 98 ± 0% -2.97% (p=0.008 n=5+5) ``` Fixes gnolang#3488
7f95d1a
to
8e64450
Compare
Kind ping @thehowl and team, to please take a look at my response; landing this PR gives lots of better rope for me to further improve performance and reduce allocations. |
There is prior literature on what I'm talking about, and by reputable people who know about the runtime and the pool implementation: golang/go#23199 I think bytebufferpool is a reasonable implementation which adjusts to the average size of the buffers; otherwise putting a limit on the maximum |
This change comes from an analysis of a bunch of RAM and CPU profiles and noticing that realm storage needs to invoke amino.MustMarshalAny but that in the profile for TestStdlibs, it was consuming 1.28GB.
and
but after this change, we see more than 560MB shaved off
and
and even more after the change on ensuring that tm2/pkg/amino benchmarks could run we have quite good improvements! Running out of RAM is much worse than a couple of microseconds so we can tolerate an increase in some CPU time benchmarks.
Fixes #3488