Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

odeke-em
Copy link
Contributor

This change comes from an analysis of a bunch of RAM and CPU profiles and noticing that realm storage needs to invoke amino.MustMarshalAny but that in the profile for TestStdlibs, it was consuming 1.28GB.

ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go
         0     1.28GB (flat, cum)  0.61% of Total
         .          .     80:func MustMarshalAny(o interface{}) []byte {
         .     1.28GB     81:	return gcdc.MustMarshalAny(o)
         .          .     82:}
         .          .     83:
         .          .     84:func MarshalAnySized(o interface{}) ([]byte, error) {
         .          .     85:	return gcdc.MarshalAnySized(o)
         .          .     86:}

and

   focus=MarshalAny
Showing nodes accounting for 1303.02MB, 0.6% of 217023.96MB total
Dropped 13 nodes (cum <= 1085.12MB)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                          539.49MB   100% |   bytes.(*Buffer).grow
  539.49MB  0.25%  0.25%   539.49MB  0.25%                | bytes.growSlice
----------------------------------------------------------+-------------
                                          706.50MB   100% |   bytes.(*Buffer).Write
  167.01MB 0.077%  0.33%   706.50MB  0.33%                | bytes.(*Buffer).grow
                                          539.49MB 76.36% |   bytes.growSlice
----------------------------------------------------------+-------------
                                              93MB 58.68% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface (inline)
                                           56.50MB 35.65% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryStruct (inline)
                                               9MB  5.68% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList (inline)
  158.51MB 0.073%   0.4%   158.51MB 0.073%                | bytes.NewBuffer
----------------------------------------------------------+-------------
                                          145.01MB 57.77% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).writeFieldIfNotEmpty
                                              86MB 34.26% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface
                                              20MB  7.97% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList
   85.50MB 0.039%  0.44%   251.01MB  0.12%                | github.com/gnolang/gno/tm2/pkg/amino.encodeFieldNumberAndTyp3
                                          165.51MB 65.94% |   bytes.(*Buffer).Write
----------------------------------------------------------+-------------
                                           77.01MB   100% |   github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice
   61.50MB 0.028%  0.47%    77.01MB 0.035%                | github.com/gnolang/gno/tm2/pkg/amino.EncodeUvarint
                                           15.51MB 20.14% |   bytes.(*Buffer).Write
----------------------------------------------------------+-------------

but after this change, we see more than 560MB shaved off

ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go
         0   560.95MB (flat, cum)  0.26% of Total
         .          .     80:func MustMarshalAny(o interface{}) []byte {
         .   560.95MB     81:	return gcdc.MustMarshalAny(o)
         .          .     82:}
         .          .     83:
         .          .     84:func MarshalAnySized(o interface{}) ([]byte, error) {
         .          .     85:	return gcdc.MarshalAnySized(o)
         .          .     86:}

and

----------------------------------------------------------+-------------
                                           16.35MB 52.46% |   github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice
                                           14.81MB 47.54% |   github.com/gnolang/gno/tm2/pkg/amino.writeMaybeBare
         0     0%  0.26%    31.16MB 0.014%                | bytes.(*Buffer).Write
                                           31.16MB   100% |   bytes.(*Buffer).grow
----------------------------------------------------------+-------------
                                           31.16MB   100% |   bytes.(*Buffer).Write
         0     0%  0.26%    31.16MB 0.014%                | bytes.(*Buffer).grow
                                           31.16MB   100% |   bytes.growSlice
----------------------------------------------------------+-------------

and even more after the change on ensuring that tm2/pkg/amino benchmarks could run we have quite good improvements! Running out of RAM is much worse than a couple of microseconds so we can tolerate an increase in some CPU time benchmarks.

name                                   old time/op    new time/op    delta
Binary/EmptyStruct:encode-8              3.86µs ± 5%    3.92µs ± 5%     ~     (p=0.548 n=5+5)
Binary/EmptyStruct:decode-8              3.79µs ± 5%    3.79µs ± 6%     ~     (p=0.690 n=5+5)
Binary/PrimitivesStruct:encode-8         35.5µs ± 2%    36.5µs ± 5%     ~     (p=0.151 n=5+5)
Binary/PrimitivesStruct:decode-8         35.0µs ± 2%    38.6µs ±11%  +10.17%  (p=0.016 n=5+5)
Binary/ShortArraysStruct:encode-8        5.91µs ± 6%    6.36µs ± 8%   +7.61%  (p=0.032 n=5+5)
Binary/ShortArraysStruct:decode-8        6.07µs ±21%    6.39µs ± 8%     ~     (p=0.151 n=5+5)
Binary/ArraysStruct:encode-8             95.1µs ± 8%   100.6µs ± 7%     ~     (p=0.222 n=5+5)
Binary/ArraysStruct:decode-8             91.3µs ± 5%    98.5µs ±12%     ~     (p=0.222 n=5+5)
Binary/ArraysArraysStruct:encode-8        131µs ± 3%     132µs ± 6%     ~     (p=0.841 n=5+5)
Binary/ArraysArraysStruct:decode-8        136µs ± 9%     134µs ± 3%     ~     (p=0.548 n=5+5)
Binary/SlicesStruct:encode-8             85.4µs ± 1%    92.3µs ± 9%   +8.15%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8             87.1µs ± 8%    94.8µs ± 7%     ~     (p=0.056 n=5+5)
Binary/SlicesSlicesStruct:encode-8        506µs ± 2%     545µs ± 9%     ~     (p=0.151 n=5+5)
Binary/SlicesSlicesStruct:decode-8        506µs ± 3%     523µs ± 3%     ~     (p=0.095 n=5+5)
Binary/PointersStruct:encode-8           56.8µs ± 4%    65.5µs ±20%  +15.43%  (p=0.016 n=5+5)
Binary/PointersStruct:decode-8           57.5µs ± 3%    55.9µs ± 3%     ~     (p=0.095 n=5+5)
Binary/PointerSlicesStruct:encode-8       162µs ± 4%     172µs ±21%     ~     (p=0.841 n=5+5)
Binary/PointerSlicesStruct:decode-8       163µs ± 5%     185µs ±13%     ~     (p=0.095 n=5+5)
Binary/ComplexSt:encode-8                 314µs ± 3%     354µs ±11%  +12.90%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                 319µs ± 2%     338µs ± 4%   +5.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8              39.8µs ± 7%    39.3µs ± 8%     ~     (p=1.000 n=5+5)
Binary/EmbeddedSt1:decode-8              37.0µs ± 4%    37.8µs ± 6%     ~     (p=0.690 n=5+5)
Binary/EmbeddedSt2:encode-8               316µs ± 7%     307µs ± 3%     ~     (p=0.222 n=5+5)
Binary/EmbeddedSt2:decode-8               316µs ± 3%     306µs ± 2%     ~     (p=0.095 n=5+5)
Binary/EmbeddedSt3:encode-8               217µs ± 7%     201µs ± 1%   -7.26%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8               222µs ±10%     204µs ± 2%   -8.50%  (p=0.032 n=5+5)
Binary/EmbeddedSt4:encode-8               332µs ± 4%     325µs ± 3%     ~     (p=0.421 n=5+5)
Binary/EmbeddedSt4:decode-8               332µs ± 4%     324µs ± 5%     ~     (p=0.095 n=5+5)
Binary/EmbeddedSt5:encode-8               218µs ± 2%     212µs ± 3%     ~     (p=0.056 n=5+5)
Binary/EmbeddedSt5:decode-8               224µs ± 8%     209µs ± 1%   -6.85%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8    9.03µs ± 6%    8.97µs ±12%     ~     (p=0.841 n=5+5)
Binary/AminoMarshalerStruct1:decode-8    8.91µs ± 5%    8.81µs ± 4%     ~     (p=0.841 n=5+5)
Binary/AminoMarshalerStruct2:encode-8    13.2µs ±10%    12.2µs ± 2%   -7.26%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8    13.2µs ± 6%    12.5µs ± 5%     ~     (p=0.095 n=5+5)
Binary/AminoMarshalerStruct3:encode-8    7.17µs ± 3%    7.50µs ± 8%     ~     (p=0.548 n=5+5)
Binary/AminoMarshalerStruct3:decode-8    7.12µs ± 4%    7.84µs ±10%  +10.12%  (p=0.016 n=5+5)
Binary/AminoMarshalerInt4:encode-8       6.60µs ± 5%    6.96µs ±11%     ~     (p=0.421 n=5+5)
Binary/AminoMarshalerInt4:decode-8       6.79µs ±12%    7.04µs ±15%     ~     (p=0.690 n=5+5)
Binary/AminoMarshalerInt5:encode-8       6.64µs ± 4%    6.92µs ± 5%   +4.09%  (p=0.032 n=5+5)
Binary/AminoMarshalerInt5:decode-8       6.55µs ± 3%    7.76µs ±10%  +18.44%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8    11.7µs ± 5%    13.2µs ±10%  +13.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8    11.4µs ± 3%    11.6µs ± 2%     ~     (p=0.222 n=5+5)
Binary/AminoMarshalerStruct7:encode-8    9.86µs ± 1%   10.10µs ±19%     ~     (p=0.310 n=5+5)
Binary/AminoMarshalerStruct7:decode-8    9.55µs ± 3%    9.75µs ±10%     ~     (p=0.690 n=5+5)

name                                   old alloc/op   new alloc/op   delta
Binary/EmptyStruct:encode-8              1.50kB ± 0%    1.41kB ± 0%   -6.32%  (p=0.008 n=5+5)
Binary/EmptyStruct:decode-8              1.50kB ± 0%    1.41kB ± 0%   -6.32%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:encode-8         10.4kB ± 0%     9.6kB ± 0%   -7.82%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:decode-8         10.4kB ± 0%     9.6kB ± 0%   -7.82%  (p=0.000 n=4+5)
Binary/ShortArraysStruct:encode-8        2.11kB ± 0%    1.92kB ± 0%   -9.04%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:decode-8        2.11kB ± 0%    1.92kB ± 0%   -9.04%  (p=0.008 n=5+5)
Binary/ArraysStruct:encode-8             25.9kB ± 0%    22.0kB ± 0%  -15.04%  (p=0.008 n=5+5)
Binary/ArraysStruct:decode-8             25.9kB ± 0%    22.0kB ± 0%  -15.04%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:encode-8       37.7kB ± 0%    25.3kB ± 0%  -33.07%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:decode-8       37.7kB ± 0%    25.3kB ± 0%  -33.07%  (p=0.008 n=5+5)
Binary/SlicesStruct:encode-8             28.2kB ± 0%    25.1kB ± 0%  -10.96%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8             28.2kB ± 0%    25.1kB ± 0%  -10.97%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:encode-8        183kB ± 0%     147kB ± 0%  -19.92%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:decode-8        183kB ± 0%     147kB ± 0%  -19.92%  (p=0.008 n=5+5)
Binary/PointersStruct:encode-8           14.4kB ± 0%    13.6kB ± 0%   -5.64%  (p=0.008 n=5+5)
Binary/PointersStruct:decode-8           14.4kB ± 0%    13.6kB ± 0%   -5.64%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:encode-8      43.9kB ± 0%    40.2kB ± 0%   -8.49%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:decode-8      43.9kB ± 0%    40.2kB ± 0%   -8.49%  (p=0.008 n=5+5)
Binary/ComplexSt:encode-8                95.3kB ± 0%    78.2kB ± 0%  -17.97%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                95.3kB ± 0%    78.2kB ± 0%  -17.97%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8              11.3kB ± 0%    10.2kB ± 0%   -9.62%  (p=0.000 n=5+4)
Binary/EmbeddedSt1:decode-8              11.3kB ± 0%    10.2kB ± 0%   -9.61%  (p=0.000 n=5+4)
Binary/EmbeddedSt2:encode-8              95.5kB ± 0%    78.3kB ± 0%  -17.96%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:decode-8              95.5kB ± 0%    78.4kB ± 0%  -17.94%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:encode-8              68.3kB ± 0%    56.6kB ± 0%  -17.22%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8              68.3kB ± 0%    56.6kB ± 0%  -17.21%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:encode-8              97.2kB ± 0%    82.3kB ± 0%  -15.32%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:decode-8              97.2kB ± 0%    82.3kB ± 0%  -15.31%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:encode-8              65.9kB ± 0%    55.3kB ± 0%  -16.19%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:decode-8              66.0kB ± 0%    55.3kB ± 0%  -16.18%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8    2.87kB ± 0%    2.66kB ± 0%   -7.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:decode-8    2.87kB ± 0%    2.66kB ± 0%   -7.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:encode-8    4.58kB ± 0%    3.62kB ± 0%  -20.95%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8    4.58kB ± 0%    3.62kB ± 0%  -20.95%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:encode-8    2.42kB ± 0%    2.31kB ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:decode-8    2.42kB ± 0%    2.31kB ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:encode-8       2.38kB ± 0%    2.15kB ± 0%   -9.38%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:decode-8       2.38kB ± 0%    2.15kB ± 0%   -9.38%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:encode-8       2.36kB ± 0%    2.27kB ± 0%   -4.07%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:decode-8       2.36kB ± 0%    2.27kB ± 0%   -4.07%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8    3.51kB ± 0%    3.19kB ± 0%   -9.05%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8    3.51kB ± 0%    3.19kB ± 0%   -9.05%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:encode-8    2.89kB ± 0%    2.67kB ± 0%   -7.72%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:decode-8    2.89kB ± 0%    2.67kB ± 0%   -7.72%  (p=0.008 n=5+5)

name                                   old allocs/op  new allocs/op  delta
Binary/EmptyStruct:encode-8                38.0 ± 0%      36.0 ± 0%   -5.26%  (p=0.008 n=5+5)
Binary/EmptyStruct:decode-8                38.0 ± 0%      36.0 ± 0%   -5.26%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:encode-8            439 ± 0%       429 ± 0%   -2.28%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:decode-8            439 ± 0%       429 ± 0%   -2.28%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:encode-8          56.0 ± 0%      52.0 ± 0%   -7.14%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:decode-8          56.0 ± 0%      52.0 ± 0%   -7.14%  (p=0.008 n=5+5)
Binary/ArraysStruct:encode-8                977 ± 0%       919 ± 0%   -5.94%  (p=0.008 n=5+5)
Binary/ArraysStruct:decode-8                977 ± 0%       919 ± 0%   -5.94%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:encode-8        1.28k ± 0%     1.08k ± 0%  -15.05%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:decode-8        1.28k ± 0%     1.08k ± 0%  -15.05%  (p=0.008 n=5+5)
Binary/SlicesStruct:encode-8              1.01k ± 0%     0.97k ± 0%   -3.77%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8              1.01k ± 0%     0.97k ± 0%   -3.77%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:encode-8        6.33k ± 0%     5.95k ± 0%   -5.90%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:decode-8        6.33k ± 0%     5.95k ± 0%   -5.90%  (p=0.008 n=5+5)
Binary/PointersStruct:encode-8              637 ± 0%       627 ± 0%   -1.57%  (p=0.008 n=5+5)
Binary/PointersStruct:decode-8              637 ± 0%       627 ± 0%   -1.57%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:encode-8       1.62k ± 0%     1.56k ± 0%   -3.28%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:decode-8       1.62k ± 0%     1.56k ± 0%   -3.28%  (p=0.008 n=5+5)
Binary/ComplexSt:encode-8                 3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                 3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8                 453 ± 0%       440 ± 0%   -2.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:decode-8                 453 ± 0%       440 ± 0%   -2.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:encode-8               3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:decode-8               3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:encode-8               2.32k ± 0%     2.20k ± 0%   -5.38%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8               2.32k ± 0%     2.20k ± 0%   -5.38%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:encode-8               3.67k ± 0%     3.54k ± 0%   -3.73%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:decode-8               3.67k ± 0%     3.54k ± 0%   -3.73%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:encode-8               2.32k ± 0%     2.20k ± 0%   -5.00%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:decode-8               2.32k ± 0%     2.20k ± 0%   -5.00%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8      97.0 ± 0%      94.0 ± 0%   -3.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:decode-8      97.0 ± 0%      94.0 ± 0%   -3.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:encode-8       149 ± 0%       133 ± 0%  -10.74%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8       149 ± 0%       133 ± 0%  -10.74%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:encode-8      77.0 ± 0%      76.0 ± 0%   -1.30%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:decode-8      77.0 ± 0%      76.0 ± 0%   -1.30%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:encode-8         71.0 ± 0%      68.0 ± 0%   -4.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:decode-8         71.0 ± 0%      68.0 ± 0%   -4.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:encode-8         74.0 ± 0%      73.0 ± 0%   -1.35%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:decode-8         74.0 ± 0%      73.0 ± 0%   -1.35%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8       122 ± 0%       117 ± 0%   -4.10%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8       122 ± 0%       117 ± 0%   -4.10%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:encode-8       101 ± 0%        98 ± 0%   -2.97%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:decode-8       101 ± 0%        98 ± 0%   -2.97%  (p=0.008 n=5+5)

Fixes #3488

@github-actions github-actions bot added the 📦 🌐 tendermint v2 Issues or PRs tm2 related label Jan 12, 2025
@odeke-em
Copy link
Contributor Author

Kindly cc-ing @veorq

@Gno2D2
Copy link
Collaborator

Gno2D2 commented Jan 12, 2025

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
  • The pull request description provides enough details (checked by @thehowl)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 The pull request was created from a fork (head branch repo: odeke-em/gno)

Then

🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request

Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission
The pull request description provides enough details

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 Not (🔴 Pull request author is a member of the team: core-contributors)
    └── 🟢 Not (🔴 Pull request author is user: dependabot[bot])

Can be checked by

  • team core-contributors

Copy link

codecov bot commented Jan 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@notJoon notJoon added the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Jan 13, 2025
@Kouteki Kouteki added the in focus Core team is prioritizing this work label Jan 13, 2025
Copy link
Member

@thehowl thehowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a specialised library; elsewhere in the codebase, we already use https://github.com/libp2p/go-buffer-pool .

I suggest you use its Buffer type which is indicated for these use cases.

@odeke-em
Copy link
Contributor Author

odeke-em commented Jan 13, 2025

Hey @thehowl I would not advise using libp2p-go-pool's Buffer over the standard library’s bytes.Buffer and here are some benchmark results from their own benchmarks that I added the Go standard library bytes.Buffer benchmarks orijtech/go-buffer-pool@8ea7d61

$ benchstat libp2p.txt stdlib.txt
name                       old time/op    new time/op    delta
WriteByte-8                  16.8µs ± 4%    16.7µs ± 9%      ~     (p=0.400 n=9+10)
BufferFullSmallReads-8       61.1µs ± 6%    46.4µs ± 3%   -24.06%  (p=0.000 n=10+10)
BufferNotEmptyWriteRead-8     417µs ± 3%     182µs ± 5%   -56.28%  (p=0.000 n=10+10)

name                       old speed      new speed      delta
WriteByte-8                 244MB/s ± 3%   245MB/s ± 8%      ~     (p=0.400 n=9+10)

name                       old alloc/op   new alloc/op   delta
WriteByte-8                   0.00B          0.00B           ~     (all equal)
BufferFullSmallReads-8        0.00B          0.00B           ~     (all equal)
BufferNotEmptyWriteRead-8    0.90B ±122%     0.00B       -100.00%  (p=0.011 n=10+10)

name                       old allocs/op  new allocs/op  delta
WriteByte-8                    0.00           0.00           ~     (all equal)
BufferFullSmallReads-8         0.00           0.00           ~     (all equal)
BufferNotEmptyWriteRead-8      0.00           0.00           ~     (all equal)

and that's just for the Buffer; also the standard library's sync.Pool is battle tested across millions of repositories and is continually being improved while libp2p doesn't get much attention and some years ago I found a bug which was glaring but hadn't been detected due to inadequate testing libp2p/go-buffer-pool#26 but even the PR took some time just to push through.

@odeke-em odeke-em requested a review from thehowl January 14, 2025 11:32
@thehowl
Copy link
Member

thehowl commented Jan 14, 2025

The p2p implementation uses sync.Pool under the hood as well. My issue is as follows: using a byte buffer with a sync.Pool directly is not useful, as the buffer you get out may have any size of the underlying buffer, which is not guaranteed to be reused. Some buffers may become very very large and end up wasting memory.

Here's another suggestion: https://github.com/valyala/bytebufferpool - it seems to implement the kind of protection I was worried about :)

@odeke-em
Copy link
Contributor Author

odeke-em commented Jan 14, 2025

Thanks for the response @thehowl!

My issue is as follows: using a byte buffer with a sync.Pool directly is not useful, as the buffer you get out may have any size of the underlying buffer, which is not guaranteed to be reused. Some buffers may become very very large and end up wasting memory.

Before being put back into the pool, we invoke buf.Reset() which then internally moves the underlying byteslice for reuse with b.buf[:0] by making it empty, but its capacity won't have to be re-created, which is where the savings come from. I believe that alleviates your concern. It is directly useful as the incumbent pool used in many high performance projects too.

Here's another suggestion: https://github.com/valyala/bytebufferpool - it seems to implement the kind of protection I was worried about :)

I have more faith in valyala's implementation, thank you! Sadly though, it doesn't fully implement all *bytes.Buffer's methods like Truncate(..) and also in some instances like json.Indent which is used inside amino in more than 3 places which takes a *bytes.Buffer directly.

@odeke-em odeke-em force-pushed the tm2-pkg-amino-slash-RAM-wastage-with-bytes.Buffer-reuse branch from c507270 to 96aee1d Compare January 14, 2025 15:06
@thehowl
Copy link
Member

thehowl commented Jan 14, 2025

Before being put back into the pool, we invoke buf.Reset() which then internally moves the underlying byteslice for reuse with b.buf[:0] by making it empty, but its capacity won't have to be re-created, which is where the savings come from. I believe that alleviates your concern. It is directly useful as the incumbent pool used in many high performance projects too.

yes, but if I'm encoding a value which takes 16MB to store (which would happen quite exceptionally, anyway) I want the allocated buffer to be freed rather than being put into the buffer. So, either we hard-code a limit beyond which we don't put things back into the buffer, or we use something like bytebufferpool (with the additional methods) which automatically calibrates to an ideal amount of buffer sizes

@thehowl thehowl removed the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Jan 15, 2025
@odeke-em
Copy link
Contributor Author

odeke-em commented Jan 18, 2025

@thehowl thanks for your discourse!

I want the allocated buffer to be freed rather than being put into the buffer.

What you are asking for is only performed by p2p/go-buffer-pool.Buffer.Reset() which sets b.buf = nil but for which I don't think is a good idea. Constantly garbage collecting buffers is what causes the very memory bloat and garbage collection pressure that we are trying to reduce. If the data was allocated and not used, the pool gets shrunk as needed to be, sync.Pool doesn't retain items forever if the only reference is in the pool, and it calls this out in the docs.
Reusing the underlying byteslice alleviates pressure that later needs can grow up to those limits without many allocations and usually commonly used patterns match: one common piece of wisdom in runtime building and garbage collection is that "new objects get freed more often and older objects get freed much less", same thing with usage patterns, the bigger the allocations, usually the more common they'll be.

I don't think it is a good idea to try to make a complex heuristic that checks for memory limits then frees them, unless you are writing a garbage collector from scratch. Go's garbage collector handles this complex logic when alleviating pressure and I don't think we should be trying to re-invent the wheel, especially with the massive memory bloat that gnovm already has.
In the 15 years of Go's lifetime, for which I've been a contributor for 11 years, I have never seen a report of someone reporting memory bloat due to sync.Pool.bytes.Buffer re-usage.

@odeke-em odeke-em force-pushed the tm2-pkg-amino-slash-RAM-wastage-with-bytes.Buffer-reuse branch 2 times, most recently from 0fd0b68 to 7f95d1a Compare January 20, 2025 11:38
…led reuse

This change comes from an analysis of a bunch of RAM and CPU profiles
and noticing that realm storage needs to invoke amino.MustMarshalAny
but that in the profile for TestStdlibs, it was consuming 1.28GB.
```shell
ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go
         0     1.28GB (flat, cum)  0.61% of Total
         .          .     80:func MustMarshalAny(o interface{}) []byte {
         .     1.28GB     81:	return gcdc.MustMarshalAny(o)
         .          .     82:}
         .          .     83:
         .          .     84:func MarshalAnySized(o interface{}) ([]byte, error) {
         .          .     85:	return gcdc.MarshalAnySized(o)
         .          .     86:}
```
and
```shell
   focus=MarshalAny
Showing nodes accounting for 1303.02MB, 0.6% of 217023.96MB total
Dropped 13 nodes (cum <= 1085.12MB)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                          539.49MB   100% |   bytes.(*Buffer).grow
  539.49MB  0.25%  0.25%   539.49MB  0.25%                | bytes.growSlice
----------------------------------------------------------+-------------
                                          706.50MB   100% |   bytes.(*Buffer).Write
  167.01MB 0.077%  0.33%   706.50MB  0.33%                | bytes.(*Buffer).grow
                                          539.49MB 76.36% |   bytes.growSlice
----------------------------------------------------------+-------------
                                              93MB 58.68% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface (inline)
                                           56.50MB 35.65% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryStruct (inline)
                                               9MB  5.68% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList (inline)
  158.51MB 0.073%   0.4%   158.51MB 0.073%                | bytes.NewBuffer
----------------------------------------------------------+-------------
                                          145.01MB 57.77% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).writeFieldIfNotEmpty
                                              86MB 34.26% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface
                                              20MB  7.97% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList
   85.50MB 0.039%  0.44%   251.01MB  0.12%                | github.com/gnolang/gno/tm2/pkg/amino.encodeFieldNumberAndTyp3
                                          165.51MB 65.94% |   bytes.(*Buffer).Write
----------------------------------------------------------+-------------
                                           77.01MB   100% |   github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice
   61.50MB 0.028%  0.47%    77.01MB 0.035%                | github.com/gnolang/gno/tm2/pkg/amino.EncodeUvarint
                                           15.51MB 20.14% |   bytes.(*Buffer).Write
----------------------------------------------------------+-------------
```

but after this change, we see more than 560MB shaved off

```shell
ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go
         0   560.95MB (flat, cum)  0.26% of Total
         .          .     80:func MustMarshalAny(o interface{}) []byte {
         .   560.95MB     81:	return gcdc.MustMarshalAny(o)
         .          .     82:}
         .          .     83:
         .          .     84:func MarshalAnySized(o interface{}) ([]byte, error) {
         .          .     85:	return gcdc.MarshalAnySized(o)
         .          .     86:}
```

and

```shell
----------------------------------------------------------+-------------
                                           16.35MB 52.46% |   github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice
                                           14.81MB 47.54% |   github.com/gnolang/gno/tm2/pkg/amino.writeMaybeBare
         0     0%  0.26%    31.16MB 0.014%                | bytes.(*Buffer).Write
                                           31.16MB   100% |   bytes.(*Buffer).grow
----------------------------------------------------------+-------------
                                           31.16MB   100% |   bytes.(*Buffer).Write
         0     0%  0.26%    31.16MB 0.014%                | bytes.(*Buffer).grow
                                           31.16MB   100% |   bytes.growSlice
----------------------------------------------------------+-------------
```

and even more after the change on ensuring that tm2/pkg/amino benchmarks
could run we have quite good improvements! Running out of RAM is much
worse than a couple of microseconds so we can tolerate an increase in
some CPU time benchmarks.

```shell
name                                   old time/op    new time/op    delta
Binary/EmptyStruct:encode-8              3.86µs ± 5%    3.92µs ± 5%     ~     (p=0.548 n=5+5)
Binary/EmptyStruct:decode-8              3.79µs ± 5%    3.79µs ± 6%     ~     (p=0.690 n=5+5)
Binary/PrimitivesStruct:encode-8         35.5µs ± 2%    36.5µs ± 5%     ~     (p=0.151 n=5+5)
Binary/PrimitivesStruct:decode-8         35.0µs ± 2%    38.6µs ±11%  +10.17%  (p=0.016 n=5+5)
Binary/ShortArraysStruct:encode-8        5.91µs ± 6%    6.36µs ± 8%   +7.61%  (p=0.032 n=5+5)
Binary/ShortArraysStruct:decode-8        6.07µs ±21%    6.39µs ± 8%     ~     (p=0.151 n=5+5)
Binary/ArraysStruct:encode-8             95.1µs ± 8%   100.6µs ± 7%     ~     (p=0.222 n=5+5)
Binary/ArraysStruct:decode-8             91.3µs ± 5%    98.5µs ±12%     ~     (p=0.222 n=5+5)
Binary/ArraysArraysStruct:encode-8        131µs ± 3%     132µs ± 6%     ~     (p=0.841 n=5+5)
Binary/ArraysArraysStruct:decode-8        136µs ± 9%     134µs ± 3%     ~     (p=0.548 n=5+5)
Binary/SlicesStruct:encode-8             85.4µs ± 1%    92.3µs ± 9%   +8.15%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8             87.1µs ± 8%    94.8µs ± 7%     ~     (p=0.056 n=5+5)
Binary/SlicesSlicesStruct:encode-8        506µs ± 2%     545µs ± 9%     ~     (p=0.151 n=5+5)
Binary/SlicesSlicesStruct:decode-8        506µs ± 3%     523µs ± 3%     ~     (p=0.095 n=5+5)
Binary/PointersStruct:encode-8           56.8µs ± 4%    65.5µs ±20%  +15.43%  (p=0.016 n=5+5)
Binary/PointersStruct:decode-8           57.5µs ± 3%    55.9µs ± 3%     ~     (p=0.095 n=5+5)
Binary/PointerSlicesStruct:encode-8       162µs ± 4%     172µs ±21%     ~     (p=0.841 n=5+5)
Binary/PointerSlicesStruct:decode-8       163µs ± 5%     185µs ±13%     ~     (p=0.095 n=5+5)
Binary/ComplexSt:encode-8                 314µs ± 3%     354µs ±11%  +12.90%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                 319µs ± 2%     338µs ± 4%   +5.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8              39.8µs ± 7%    39.3µs ± 8%     ~     (p=1.000 n=5+5)
Binary/EmbeddedSt1:decode-8              37.0µs ± 4%    37.8µs ± 6%     ~     (p=0.690 n=5+5)
Binary/EmbeddedSt2:encode-8               316µs ± 7%     307µs ± 3%     ~     (p=0.222 n=5+5)
Binary/EmbeddedSt2:decode-8               316µs ± 3%     306µs ± 2%     ~     (p=0.095 n=5+5)
Binary/EmbeddedSt3:encode-8               217µs ± 7%     201µs ± 1%   -7.26%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8               222µs ±10%     204µs ± 2%   -8.50%  (p=0.032 n=5+5)
Binary/EmbeddedSt4:encode-8               332µs ± 4%     325µs ± 3%     ~     (p=0.421 n=5+5)
Binary/EmbeddedSt4:decode-8               332µs ± 4%     324µs ± 5%     ~     (p=0.095 n=5+5)
Binary/EmbeddedSt5:encode-8               218µs ± 2%     212µs ± 3%     ~     (p=0.056 n=5+5)
Binary/EmbeddedSt5:decode-8               224µs ± 8%     209µs ± 1%   -6.85%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8    9.03µs ± 6%    8.97µs ±12%     ~     (p=0.841 n=5+5)
Binary/AminoMarshalerStruct1:decode-8    8.91µs ± 5%    8.81µs ± 4%     ~     (p=0.841 n=5+5)
Binary/AminoMarshalerStruct2:encode-8    13.2µs ±10%    12.2µs ± 2%   -7.26%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8    13.2µs ± 6%    12.5µs ± 5%     ~     (p=0.095 n=5+5)
Binary/AminoMarshalerStruct3:encode-8    7.17µs ± 3%    7.50µs ± 8%     ~     (p=0.548 n=5+5)
Binary/AminoMarshalerStruct3:decode-8    7.12µs ± 4%    7.84µs ±10%  +10.12%  (p=0.016 n=5+5)
Binary/AminoMarshalerInt4:encode-8       6.60µs ± 5%    6.96µs ±11%     ~     (p=0.421 n=5+5)
Binary/AminoMarshalerInt4:decode-8       6.79µs ±12%    7.04µs ±15%     ~     (p=0.690 n=5+5)
Binary/AminoMarshalerInt5:encode-8       6.64µs ± 4%    6.92µs ± 5%   +4.09%  (p=0.032 n=5+5)
Binary/AminoMarshalerInt5:decode-8       6.55µs ± 3%    7.76µs ±10%  +18.44%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8    11.7µs ± 5%    13.2µs ±10%  +13.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8    11.4µs ± 3%    11.6µs ± 2%     ~     (p=0.222 n=5+5)
Binary/AminoMarshalerStruct7:encode-8    9.86µs ± 1%   10.10µs ±19%     ~     (p=0.310 n=5+5)
Binary/AminoMarshalerStruct7:decode-8    9.55µs ± 3%    9.75µs ±10%     ~     (p=0.690 n=5+5)

name                                   old alloc/op   new alloc/op   delta
Binary/EmptyStruct:encode-8              1.50kB ± 0%    1.41kB ± 0%   -6.32%  (p=0.008 n=5+5)
Binary/EmptyStruct:decode-8              1.50kB ± 0%    1.41kB ± 0%   -6.32%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:encode-8         10.4kB ± 0%     9.6kB ± 0%   -7.82%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:decode-8         10.4kB ± 0%     9.6kB ± 0%   -7.82%  (p=0.000 n=4+5)
Binary/ShortArraysStruct:encode-8        2.11kB ± 0%    1.92kB ± 0%   -9.04%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:decode-8        2.11kB ± 0%    1.92kB ± 0%   -9.04%  (p=0.008 n=5+5)
Binary/ArraysStruct:encode-8             25.9kB ± 0%    22.0kB ± 0%  -15.04%  (p=0.008 n=5+5)
Binary/ArraysStruct:decode-8             25.9kB ± 0%    22.0kB ± 0%  -15.04%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:encode-8       37.7kB ± 0%    25.3kB ± 0%  -33.07%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:decode-8       37.7kB ± 0%    25.3kB ± 0%  -33.07%  (p=0.008 n=5+5)
Binary/SlicesStruct:encode-8             28.2kB ± 0%    25.1kB ± 0%  -10.96%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8             28.2kB ± 0%    25.1kB ± 0%  -10.97%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:encode-8        183kB ± 0%     147kB ± 0%  -19.92%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:decode-8        183kB ± 0%     147kB ± 0%  -19.92%  (p=0.008 n=5+5)
Binary/PointersStruct:encode-8           14.4kB ± 0%    13.6kB ± 0%   -5.64%  (p=0.008 n=5+5)
Binary/PointersStruct:decode-8           14.4kB ± 0%    13.6kB ± 0%   -5.64%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:encode-8      43.9kB ± 0%    40.2kB ± 0%   -8.49%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:decode-8      43.9kB ± 0%    40.2kB ± 0%   -8.49%  (p=0.008 n=5+5)
Binary/ComplexSt:encode-8                95.3kB ± 0%    78.2kB ± 0%  -17.97%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                95.3kB ± 0%    78.2kB ± 0%  -17.97%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8              11.3kB ± 0%    10.2kB ± 0%   -9.62%  (p=0.000 n=5+4)
Binary/EmbeddedSt1:decode-8              11.3kB ± 0%    10.2kB ± 0%   -9.61%  (p=0.000 n=5+4)
Binary/EmbeddedSt2:encode-8              95.5kB ± 0%    78.3kB ± 0%  -17.96%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:decode-8              95.5kB ± 0%    78.4kB ± 0%  -17.94%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:encode-8              68.3kB ± 0%    56.6kB ± 0%  -17.22%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8              68.3kB ± 0%    56.6kB ± 0%  -17.21%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:encode-8              97.2kB ± 0%    82.3kB ± 0%  -15.32%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:decode-8              97.2kB ± 0%    82.3kB ± 0%  -15.31%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:encode-8              65.9kB ± 0%    55.3kB ± 0%  -16.19%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:decode-8              66.0kB ± 0%    55.3kB ± 0%  -16.18%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8    2.87kB ± 0%    2.66kB ± 0%   -7.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:decode-8    2.87kB ± 0%    2.66kB ± 0%   -7.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:encode-8    4.58kB ± 0%    3.62kB ± 0%  -20.95%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8    4.58kB ± 0%    3.62kB ± 0%  -20.95%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:encode-8    2.42kB ± 0%    2.31kB ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:decode-8    2.42kB ± 0%    2.31kB ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:encode-8       2.38kB ± 0%    2.15kB ± 0%   -9.38%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:decode-8       2.38kB ± 0%    2.15kB ± 0%   -9.38%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:encode-8       2.36kB ± 0%    2.27kB ± 0%   -4.07%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:decode-8       2.36kB ± 0%    2.27kB ± 0%   -4.07%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8    3.51kB ± 0%    3.19kB ± 0%   -9.05%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8    3.51kB ± 0%    3.19kB ± 0%   -9.05%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:encode-8    2.89kB ± 0%    2.67kB ± 0%   -7.72%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:decode-8    2.89kB ± 0%    2.67kB ± 0%   -7.72%  (p=0.008 n=5+5)

name                                   old allocs/op  new allocs/op  delta
Binary/EmptyStruct:encode-8                38.0 ± 0%      36.0 ± 0%   -5.26%  (p=0.008 n=5+5)
Binary/EmptyStruct:decode-8                38.0 ± 0%      36.0 ± 0%   -5.26%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:encode-8            439 ± 0%       429 ± 0%   -2.28%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:decode-8            439 ± 0%       429 ± 0%   -2.28%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:encode-8          56.0 ± 0%      52.0 ± 0%   -7.14%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:decode-8          56.0 ± 0%      52.0 ± 0%   -7.14%  (p=0.008 n=5+5)
Binary/ArraysStruct:encode-8                977 ± 0%       919 ± 0%   -5.94%  (p=0.008 n=5+5)
Binary/ArraysStruct:decode-8                977 ± 0%       919 ± 0%   -5.94%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:encode-8        1.28k ± 0%     1.08k ± 0%  -15.05%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:decode-8        1.28k ± 0%     1.08k ± 0%  -15.05%  (p=0.008 n=5+5)
Binary/SlicesStruct:encode-8              1.01k ± 0%     0.97k ± 0%   -3.77%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8              1.01k ± 0%     0.97k ± 0%   -3.77%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:encode-8        6.33k ± 0%     5.95k ± 0%   -5.90%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:decode-8        6.33k ± 0%     5.95k ± 0%   -5.90%  (p=0.008 n=5+5)
Binary/PointersStruct:encode-8              637 ± 0%       627 ± 0%   -1.57%  (p=0.008 n=5+5)
Binary/PointersStruct:decode-8              637 ± 0%       627 ± 0%   -1.57%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:encode-8       1.62k ± 0%     1.56k ± 0%   -3.28%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:decode-8       1.62k ± 0%     1.56k ± 0%   -3.28%  (p=0.008 n=5+5)
Binary/ComplexSt:encode-8                 3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                 3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8                 453 ± 0%       440 ± 0%   -2.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:decode-8                 453 ± 0%       440 ± 0%   -2.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:encode-8               3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:decode-8               3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:encode-8               2.32k ± 0%     2.20k ± 0%   -5.38%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8               2.32k ± 0%     2.20k ± 0%   -5.38%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:encode-8               3.67k ± 0%     3.54k ± 0%   -3.73%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:decode-8               3.67k ± 0%     3.54k ± 0%   -3.73%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:encode-8               2.32k ± 0%     2.20k ± 0%   -5.00%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:decode-8               2.32k ± 0%     2.20k ± 0%   -5.00%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8      97.0 ± 0%      94.0 ± 0%   -3.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:decode-8      97.0 ± 0%      94.0 ± 0%   -3.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:encode-8       149 ± 0%       133 ± 0%  -10.74%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8       149 ± 0%       133 ± 0%  -10.74%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:encode-8      77.0 ± 0%      76.0 ± 0%   -1.30%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:decode-8      77.0 ± 0%      76.0 ± 0%   -1.30%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:encode-8         71.0 ± 0%      68.0 ± 0%   -4.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:decode-8         71.0 ± 0%      68.0 ± 0%   -4.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:encode-8         74.0 ± 0%      73.0 ± 0%   -1.35%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:decode-8         74.0 ± 0%      73.0 ± 0%   -1.35%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8       122 ± 0%       117 ± 0%   -4.10%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8       122 ± 0%       117 ± 0%   -4.10%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:encode-8       101 ± 0%        98 ± 0%   -2.97%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:decode-8       101 ± 0%        98 ± 0%   -2.97%  (p=0.008 n=5+5)
```

Fixes gnolang#3488
@odeke-em odeke-em force-pushed the tm2-pkg-amino-slash-RAM-wastage-with-bytes.Buffer-reuse branch from 7f95d1a to 8e64450 Compare January 20, 2025 12:13
@odeke-em
Copy link
Contributor Author

Kind ping @thehowl and team, to please take a look at my response; landing this PR gives lots of better rope for me to further improve performance and reduce allocations.

@thehowl
Copy link
Member

thehowl commented Jan 20, 2025

In the 15 years of Go's lifetime, for which I've been a contributor for 11 years, I have never seen a report of someone reporting memory bloat due to sync.Pool.bytes.Buffer re-usage.

There is prior literature on what I'm talking about, and by reputable people who know about the runtime and the pool implementation: golang/go#23199

I think bytebufferpool is a reasonable implementation which adjusts to the average size of the buffers; otherwise putting a limit on the maximum cap of the buffer is reasonable, too and requires no additional dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in focus Core team is prioritizing this work 📦 🌐 tendermint v2 Issues or PRs tm2 related
Projects
Status: In Review
5 participants