perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489

odeke-em · 2025-01-12T11:47:12Z

This change comes from an analysis of a bunch of RAM and CPU profiles and noticing that realm storage needs to invoke amino.MustMarshalAny but that in the profile for TestStdlibs, it was consuming 1.28GB.

ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go
         0     1.28GB (flat, cum)  0.61% of Total
         .          .     80:func MustMarshalAny(o interface{}) []byte {
         .     1.28GB     81:	return gcdc.MustMarshalAny(o)
         .          .     82:}
         .          .     83:
         .          .     84:func MarshalAnySized(o interface{}) ([]byte, error) {
         .          .     85:	return gcdc.MarshalAnySized(o)
         .          .     86:}

and

   focus=MarshalAny
Showing nodes accounting for 1303.02MB, 0.6% of 217023.96MB total
Dropped 13 nodes (cum <= 1085.12MB)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                          539.49MB   100% |   bytes.(*Buffer).grow
  539.49MB  0.25%  0.25%   539.49MB  0.25%                | bytes.growSlice
----------------------------------------------------------+-------------
                                          706.50MB   100% |   bytes.(*Buffer).Write
  167.01MB 0.077%  0.33%   706.50MB  0.33%                | bytes.(*Buffer).grow
                                          539.49MB 76.36% |   bytes.growSlice
----------------------------------------------------------+-------------
                                              93MB 58.68% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface (inline)
                                           56.50MB 35.65% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryStruct (inline)
                                               9MB  5.68% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList (inline)
  158.51MB 0.073%   0.4%   158.51MB 0.073%                | bytes.NewBuffer
----------------------------------------------------------+-------------
                                          145.01MB 57.77% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).writeFieldIfNotEmpty
                                              86MB 34.26% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface
                                              20MB  7.97% |   github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList
   85.50MB 0.039%  0.44%   251.01MB  0.12%                | github.com/gnolang/gno/tm2/pkg/amino.encodeFieldNumberAndTyp3
                                          165.51MB 65.94% |   bytes.(*Buffer).Write
----------------------------------------------------------+-------------
                                           77.01MB   100% |   github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice
   61.50MB 0.028%  0.47%    77.01MB 0.035%                | github.com/gnolang/gno/tm2/pkg/amino.EncodeUvarint
                                           15.51MB 20.14% |   bytes.(*Buffer).Write
----------------------------------------------------------+-------------

but after this change, we see more than 560MB shaved off

ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go
         0   560.95MB (flat, cum)  0.26% of Total
         .          .     80:func MustMarshalAny(o interface{}) []byte {
         .   560.95MB     81:	return gcdc.MustMarshalAny(o)
         .          .     82:}
         .          .     83:
         .          .     84:func MarshalAnySized(o interface{}) ([]byte, error) {
         .          .     85:	return gcdc.MarshalAnySized(o)
         .          .     86:}

and

----------------------------------------------------------+-------------
                                           16.35MB 52.46% |   github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice
                                           14.81MB 47.54% |   github.com/gnolang/gno/tm2/pkg/amino.writeMaybeBare
         0     0%  0.26%    31.16MB 0.014%                | bytes.(*Buffer).Write
                                           31.16MB   100% |   bytes.(*Buffer).grow
----------------------------------------------------------+-------------
                                           31.16MB   100% |   bytes.(*Buffer).Write
         0     0%  0.26%    31.16MB 0.014%                | bytes.(*Buffer).grow
                                           31.16MB   100% |   bytes.growSlice
----------------------------------------------------------+-------------

and even more after the change on ensuring that tm2/pkg/amino benchmarks could run we have quite good improvements! Running out of RAM is much worse than a couple of microseconds so we can tolerate an increase in some CPU time benchmarks.

name                                   old time/op    new time/op    delta
Binary/EmptyStruct:encode-8              3.86µs ± 5%    3.92µs ± 5%     ~     (p=0.548 n=5+5)
Binary/EmptyStruct:decode-8              3.79µs ± 5%    3.79µs ± 6%     ~     (p=0.690 n=5+5)
Binary/PrimitivesStruct:encode-8         35.5µs ± 2%    36.5µs ± 5%     ~     (p=0.151 n=5+5)
Binary/PrimitivesStruct:decode-8         35.0µs ± 2%    38.6µs ±11%  +10.17%  (p=0.016 n=5+5)
Binary/ShortArraysStruct:encode-8        5.91µs ± 6%    6.36µs ± 8%   +7.61%  (p=0.032 n=5+5)
Binary/ShortArraysStruct:decode-8        6.07µs ±21%    6.39µs ± 8%     ~     (p=0.151 n=5+5)
Binary/ArraysStruct:encode-8             95.1µs ± 8%   100.6µs ± 7%     ~     (p=0.222 n=5+5)
Binary/ArraysStruct:decode-8             91.3µs ± 5%    98.5µs ±12%     ~     (p=0.222 n=5+5)
Binary/ArraysArraysStruct:encode-8        131µs ± 3%     132µs ± 6%     ~     (p=0.841 n=5+5)
Binary/ArraysArraysStruct:decode-8        136µs ± 9%     134µs ± 3%     ~     (p=0.548 n=5+5)
Binary/SlicesStruct:encode-8             85.4µs ± 1%    92.3µs ± 9%   +8.15%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8             87.1µs ± 8%    94.8µs ± 7%     ~     (p=0.056 n=5+5)
Binary/SlicesSlicesStruct:encode-8        506µs ± 2%     545µs ± 9%     ~     (p=0.151 n=5+5)
Binary/SlicesSlicesStruct:decode-8        506µs ± 3%     523µs ± 3%     ~     (p=0.095 n=5+5)
Binary/PointersStruct:encode-8           56.8µs ± 4%    65.5µs ±20%  +15.43%  (p=0.016 n=5+5)
Binary/PointersStruct:decode-8           57.5µs ± 3%    55.9µs ± 3%     ~     (p=0.095 n=5+5)
Binary/PointerSlicesStruct:encode-8       162µs ± 4%     172µs ±21%     ~     (p=0.841 n=5+5)
Binary/PointerSlicesStruct:decode-8       163µs ± 5%     185µs ±13%     ~     (p=0.095 n=5+5)
Binary/ComplexSt:encode-8                 314µs ± 3%     354µs ±11%  +12.90%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                 319µs ± 2%     338µs ± 4%   +5.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8              39.8µs ± 7%    39.3µs ± 8%     ~     (p=1.000 n=5+5)
Binary/EmbeddedSt1:decode-8              37.0µs ± 4%    37.8µs ± 6%     ~     (p=0.690 n=5+5)
Binary/EmbeddedSt2:encode-8               316µs ± 7%     307µs ± 3%     ~     (p=0.222 n=5+5)
Binary/EmbeddedSt2:decode-8               316µs ± 3%     306µs ± 2%     ~     (p=0.095 n=5+5)
Binary/EmbeddedSt3:encode-8               217µs ± 7%     201µs ± 1%   -7.26%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8               222µs ±10%     204µs ± 2%   -8.50%  (p=0.032 n=5+5)
Binary/EmbeddedSt4:encode-8               332µs ± 4%     325µs ± 3%     ~     (p=0.421 n=5+5)
Binary/EmbeddedSt4:decode-8               332µs ± 4%     324µs ± 5%     ~     (p=0.095 n=5+5)
Binary/EmbeddedSt5:encode-8               218µs ± 2%     212µs ± 3%     ~     (p=0.056 n=5+5)
Binary/EmbeddedSt5:decode-8               224µs ± 8%     209µs ± 1%   -6.85%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8    9.03µs ± 6%    8.97µs ±12%     ~     (p=0.841 n=5+5)
Binary/AminoMarshalerStruct1:decode-8    8.91µs ± 5%    8.81µs ± 4%     ~     (p=0.841 n=5+5)
Binary/AminoMarshalerStruct2:encode-8    13.2µs ±10%    12.2µs ± 2%   -7.26%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8    13.2µs ± 6%    12.5µs ± 5%     ~     (p=0.095 n=5+5)
Binary/AminoMarshalerStruct3:encode-8    7.17µs ± 3%    7.50µs ± 8%     ~     (p=0.548 n=5+5)
Binary/AminoMarshalerStruct3:decode-8    7.12µs ± 4%    7.84µs ±10%  +10.12%  (p=0.016 n=5+5)
Binary/AminoMarshalerInt4:encode-8       6.60µs ± 5%    6.96µs ±11%     ~     (p=0.421 n=5+5)
Binary/AminoMarshalerInt4:decode-8       6.79µs ±12%    7.04µs ±15%     ~     (p=0.690 n=5+5)
Binary/AminoMarshalerInt5:encode-8       6.64µs ± 4%    6.92µs ± 5%   +4.09%  (p=0.032 n=5+5)
Binary/AminoMarshalerInt5:decode-8       6.55µs ± 3%    7.76µs ±10%  +18.44%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8    11.7µs ± 5%    13.2µs ±10%  +13.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8    11.4µs ± 3%    11.6µs ± 2%     ~     (p=0.222 n=5+5)
Binary/AminoMarshalerStruct7:encode-8    9.86µs ± 1%   10.10µs ±19%     ~     (p=0.310 n=5+5)
Binary/AminoMarshalerStruct7:decode-8    9.55µs ± 3%    9.75µs ±10%     ~     (p=0.690 n=5+5)

name                                   old alloc/op   new alloc/op   delta
Binary/EmptyStruct:encode-8              1.50kB ± 0%    1.41kB ± 0%   -6.32%  (p=0.008 n=5+5)
Binary/EmptyStruct:decode-8              1.50kB ± 0%    1.41kB ± 0%   -6.32%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:encode-8         10.4kB ± 0%     9.6kB ± 0%   -7.82%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:decode-8         10.4kB ± 0%     9.6kB ± 0%   -7.82%  (p=0.000 n=4+5)
Binary/ShortArraysStruct:encode-8        2.11kB ± 0%    1.92kB ± 0%   -9.04%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:decode-8        2.11kB ± 0%    1.92kB ± 0%   -9.04%  (p=0.008 n=5+5)
Binary/ArraysStruct:encode-8             25.9kB ± 0%    22.0kB ± 0%  -15.04%  (p=0.008 n=5+5)
Binary/ArraysStruct:decode-8             25.9kB ± 0%    22.0kB ± 0%  -15.04%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:encode-8       37.7kB ± 0%    25.3kB ± 0%  -33.07%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:decode-8       37.7kB ± 0%    25.3kB ± 0%  -33.07%  (p=0.008 n=5+5)
Binary/SlicesStruct:encode-8             28.2kB ± 0%    25.1kB ± 0%  -10.96%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8             28.2kB ± 0%    25.1kB ± 0%  -10.97%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:encode-8        183kB ± 0%     147kB ± 0%  -19.92%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:decode-8        183kB ± 0%     147kB ± 0%  -19.92%  (p=0.008 n=5+5)
Binary/PointersStruct:encode-8           14.4kB ± 0%    13.6kB ± 0%   -5.64%  (p=0.008 n=5+5)
Binary/PointersStruct:decode-8           14.4kB ± 0%    13.6kB ± 0%   -5.64%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:encode-8      43.9kB ± 0%    40.2kB ± 0%   -8.49%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:decode-8      43.9kB ± 0%    40.2kB ± 0%   -8.49%  (p=0.008 n=5+5)
Binary/ComplexSt:encode-8                95.3kB ± 0%    78.2kB ± 0%  -17.97%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                95.3kB ± 0%    78.2kB ± 0%  -17.97%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8              11.3kB ± 0%    10.2kB ± 0%   -9.62%  (p=0.000 n=5+4)
Binary/EmbeddedSt1:decode-8              11.3kB ± 0%    10.2kB ± 0%   -9.61%  (p=0.000 n=5+4)
Binary/EmbeddedSt2:encode-8              95.5kB ± 0%    78.3kB ± 0%  -17.96%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:decode-8              95.5kB ± 0%    78.4kB ± 0%  -17.94%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:encode-8              68.3kB ± 0%    56.6kB ± 0%  -17.22%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8              68.3kB ± 0%    56.6kB ± 0%  -17.21%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:encode-8              97.2kB ± 0%    82.3kB ± 0%  -15.32%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:decode-8              97.2kB ± 0%    82.3kB ± 0%  -15.31%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:encode-8              65.9kB ± 0%    55.3kB ± 0%  -16.19%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:decode-8              66.0kB ± 0%    55.3kB ± 0%  -16.18%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8    2.87kB ± 0%    2.66kB ± 0%   -7.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:decode-8    2.87kB ± 0%    2.66kB ± 0%   -7.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:encode-8    4.58kB ± 0%    3.62kB ± 0%  -20.95%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8    4.58kB ± 0%    3.62kB ± 0%  -20.95%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:encode-8    2.42kB ± 0%    2.31kB ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:decode-8    2.42kB ± 0%    2.31kB ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:encode-8       2.38kB ± 0%    2.15kB ± 0%   -9.38%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:decode-8       2.38kB ± 0%    2.15kB ± 0%   -9.38%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:encode-8       2.36kB ± 0%    2.27kB ± 0%   -4.07%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:decode-8       2.36kB ± 0%    2.27kB ± 0%   -4.07%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8    3.51kB ± 0%    3.19kB ± 0%   -9.05%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8    3.51kB ± 0%    3.19kB ± 0%   -9.05%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:encode-8    2.89kB ± 0%    2.67kB ± 0%   -7.72%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:decode-8    2.89kB ± 0%    2.67kB ± 0%   -7.72%  (p=0.008 n=5+5)

name                                   old allocs/op  new allocs/op  delta
Binary/EmptyStruct:encode-8                38.0 ± 0%      36.0 ± 0%   -5.26%  (p=0.008 n=5+5)
Binary/EmptyStruct:decode-8                38.0 ± 0%      36.0 ± 0%   -5.26%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:encode-8            439 ± 0%       429 ± 0%   -2.28%  (p=0.008 n=5+5)
Binary/PrimitivesStruct:decode-8            439 ± 0%       429 ± 0%   -2.28%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:encode-8          56.0 ± 0%      52.0 ± 0%   -7.14%  (p=0.008 n=5+5)
Binary/ShortArraysStruct:decode-8          56.0 ± 0%      52.0 ± 0%   -7.14%  (p=0.008 n=5+5)
Binary/ArraysStruct:encode-8                977 ± 0%       919 ± 0%   -5.94%  (p=0.008 n=5+5)
Binary/ArraysStruct:decode-8                977 ± 0%       919 ± 0%   -5.94%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:encode-8        1.28k ± 0%     1.08k ± 0%  -15.05%  (p=0.008 n=5+5)
Binary/ArraysArraysStruct:decode-8        1.28k ± 0%     1.08k ± 0%  -15.05%  (p=0.008 n=5+5)
Binary/SlicesStruct:encode-8              1.01k ± 0%     0.97k ± 0%   -3.77%  (p=0.008 n=5+5)
Binary/SlicesStruct:decode-8              1.01k ± 0%     0.97k ± 0%   -3.77%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:encode-8        6.33k ± 0%     5.95k ± 0%   -5.90%  (p=0.008 n=5+5)
Binary/SlicesSlicesStruct:decode-8        6.33k ± 0%     5.95k ± 0%   -5.90%  (p=0.008 n=5+5)
Binary/PointersStruct:encode-8              637 ± 0%       627 ± 0%   -1.57%  (p=0.008 n=5+5)
Binary/PointersStruct:decode-8              637 ± 0%       627 ± 0%   -1.57%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:encode-8       1.62k ± 0%     1.56k ± 0%   -3.28%  (p=0.008 n=5+5)
Binary/PointerSlicesStruct:decode-8       1.62k ± 0%     1.56k ± 0%   -3.28%  (p=0.008 n=5+5)
Binary/ComplexSt:encode-8                 3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/ComplexSt:decode-8                 3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:encode-8                 453 ± 0%       440 ± 0%   -2.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt1:decode-8                 453 ± 0%       440 ± 0%   -2.87%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:encode-8               3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt2:decode-8               3.37k ± 0%     3.22k ± 0%   -4.62%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:encode-8               2.32k ± 0%     2.20k ± 0%   -5.38%  (p=0.008 n=5+5)
Binary/EmbeddedSt3:decode-8               2.32k ± 0%     2.20k ± 0%   -5.38%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:encode-8               3.67k ± 0%     3.54k ± 0%   -3.73%  (p=0.008 n=5+5)
Binary/EmbeddedSt4:decode-8               3.67k ± 0%     3.54k ± 0%   -3.73%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:encode-8               2.32k ± 0%     2.20k ± 0%   -5.00%  (p=0.008 n=5+5)
Binary/EmbeddedSt5:decode-8               2.32k ± 0%     2.20k ± 0%   -5.00%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:encode-8      97.0 ± 0%      94.0 ± 0%   -3.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct1:decode-8      97.0 ± 0%      94.0 ± 0%   -3.09%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:encode-8       149 ± 0%       133 ± 0%  -10.74%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct2:decode-8       149 ± 0%       133 ± 0%  -10.74%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:encode-8      77.0 ± 0%      76.0 ± 0%   -1.30%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct3:decode-8      77.0 ± 0%      76.0 ± 0%   -1.30%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:encode-8         71.0 ± 0%      68.0 ± 0%   -4.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt4:decode-8         71.0 ± 0%      68.0 ± 0%   -4.23%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:encode-8         74.0 ± 0%      73.0 ± 0%   -1.35%  (p=0.008 n=5+5)
Binary/AminoMarshalerInt5:decode-8         74.0 ± 0%      73.0 ± 0%   -1.35%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:encode-8       122 ± 0%       117 ± 0%   -4.10%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct6:decode-8       122 ± 0%       117 ± 0%   -4.10%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:encode-8       101 ± 0%        98 ± 0%   -2.97%  (p=0.008 n=5+5)
Binary/AminoMarshalerStruct7:decode-8       101 ± 0%        98 ± 0%   -2.97%  (p=0.008 n=5+5)

Fixes #3488

odeke-em · 2025-01-12T11:47:41Z

Kindly cc-ing @veorq

Gno2D2 · 2025-01-12T11:48:15Z

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):

IGNORE the bot requirements for this PR (force green CI check)
The pull request description provides enough details (checked by @thehowl)

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)

☑️ Contributor Actions:

Fix any issues flagged by automated checks.
Follow the Contributor Checklist to ensure your PR is ready for review.
- Add new tests, or document why they are unnecessary.
- Provide clear examples/screenshots, if necessary.
- Update documentation, if required.
- Ensure no breaking changes, or include BREAKING CHANGE notes.
- Link related issues/PRs, where applicable.

☑️ Reviewer Actions:

Complete manual checks for the PR, including the guidelines and additional checks if applicable.

📚 Resources:

Debug

Automated Checks
Maintainers must be able to edit this pull request (more info)
If
🟢 Condition met
└── 🟢 The pull request was created from a fork (head branch repo: odeke-em/gno)
Then
🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)
If
🟢 Condition met
└── 🟢 On every pull request
Can be checked by

Any user with comment edit permission
The pull request description provides enough details
If
🟢 Condition met
└── 🟢 And
    ├── 🟢 Not (🔴 Pull request author is a member of the team: core-contributors)
    └── 🟢 Not (🔴 Pull request author is user: dependabot[bot])
Can be checked by

team core-contributors

codecov · 2025-01-12T11:50:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

thehowl

Please use a specialised library; elsewhere in the codebase, we already use https://github.com/libp2p/go-buffer-pool .

I suggest you use its Buffer type which is indicated for these use cases.

odeke-em · 2025-01-13T22:25:49Z

Hey @thehowl I would not advise using libp2p-go-pool's Buffer over the standard library’s bytes.Buffer and here are some benchmark results from their own benchmarks that I added the Go standard library bytes.Buffer benchmarks orijtech/go-buffer-pool@8ea7d61

$ benchstat libp2p.txt stdlib.txt
name                       old time/op    new time/op    delta
WriteByte-8                  16.8µs ± 4%    16.7µs ± 9%      ~     (p=0.400 n=9+10)
BufferFullSmallReads-8       61.1µs ± 6%    46.4µs ± 3%   -24.06%  (p=0.000 n=10+10)
BufferNotEmptyWriteRead-8     417µs ± 3%     182µs ± 5%   -56.28%  (p=0.000 n=10+10)

name                       old speed      new speed      delta
WriteByte-8                 244MB/s ± 3%   245MB/s ± 8%      ~     (p=0.400 n=9+10)

name                       old alloc/op   new alloc/op   delta
WriteByte-8                   0.00B          0.00B           ~     (all equal)
BufferFullSmallReads-8        0.00B          0.00B           ~     (all equal)
BufferNotEmptyWriteRead-8    0.90B ±122%     0.00B       -100.00%  (p=0.011 n=10+10)

name                       old allocs/op  new allocs/op  delta
WriteByte-8                    0.00           0.00           ~     (all equal)
BufferFullSmallReads-8         0.00           0.00           ~     (all equal)
BufferNotEmptyWriteRead-8      0.00           0.00           ~     (all equal)

and that's just for the Buffer; also the standard library's sync.Pool is battle tested across millions of repositories and is continually being improved while libp2p doesn't get much attention and some years ago I found a bug which was glaring but hadn't been detected due to inadequate testing libp2p/go-buffer-pool#26 but even the PR took some time just to push through.

thehowl · 2025-01-14T14:20:24Z

The p2p implementation uses sync.Pool under the hood as well. My issue is as follows: using a byte buffer with a sync.Pool directly is not useful, as the buffer you get out may have any size of the underlying buffer, which is not guaranteed to be reused. Some buffers may become very very large and end up wasting memory.

Here's another suggestion: https://github.com/valyala/bytebufferpool - it seems to implement the kind of protection I was worried about :)

odeke-em · 2025-01-14T15:05:30Z

Thanks for the response @thehowl!

My issue is as follows: using a byte buffer with a sync.Pool directly is not useful, as the buffer you get out may have any size of the underlying buffer, which is not guaranteed to be reused. Some buffers may become very very large and end up wasting memory.

Before being put back into the pool, we invoke buf.Reset() which then internally moves the underlying byteslice for reuse with b.buf[:0] by making it empty, but its capacity won't have to be re-created, which is where the savings come from. I believe that alleviates your concern. It is directly useful as the incumbent pool used in many high performance projects too.

Here's another suggestion: https://github.com/valyala/bytebufferpool - it seems to implement the kind of protection I was worried about :)

I have more faith in valyala's implementation, thank you! Sadly though, it doesn't fully implement all *bytes.Buffer's methods like Truncate(..) and also in some instances like json.Indent which is used inside amino in more than 3 places which takes a *bytes.Buffer directly.

thehowl · 2025-01-14T17:12:58Z

Before being put back into the pool, we invoke buf.Reset() which then internally moves the underlying byteslice for reuse with b.buf[:0] by making it empty, but its capacity won't have to be re-created, which is where the savings come from. I believe that alleviates your concern. It is directly useful as the incumbent pool used in many high performance projects too.

yes, but if I'm encoding a value which takes 16MB to store (which would happen quite exceptionally, anyway) I want the allocated buffer to be freed rather than being put into the buffer. So, either we hard-code a limit beyond which we don't put things back into the buffer, or we use something like bytebufferpool (with the additional methods) which automatically calibrates to an ideal amount of buffer sizes

odeke-em · 2025-01-18T08:45:52Z

@thehowl thanks for your discourse!

I want the allocated buffer to be freed rather than being put into the buffer.

What you are asking for is only performed by p2p/go-buffer-pool.Buffer.Reset() which sets b.buf = nil but for which I don't think is a good idea. Constantly garbage collecting buffers is what causes the very memory bloat and garbage collection pressure that we are trying to reduce. If the data was allocated and not used, the pool gets shrunk as needed to be, sync.Pool doesn't retain items forever if the only reference is in the pool, and it calls this out in the docs.
Reusing the underlying byteslice alleviates pressure that later needs can grow up to those limits without many allocations and usually commonly used patterns match: one common piece of wisdom in runtime building and garbage collection is that "new objects get freed more often and older objects get freed much less", same thing with usage patterns, the bigger the allocations, usually the more common they'll be.

I don't think it is a good idea to try to make a complex heuristic that checks for memory limits then frees them, unless you are writing a garbage collector from scratch. Go's garbage collector handles this complex logic when alleviating pressure and I don't think we should be trying to re-invent the wheel, especially with the massive memory bloat that gnovm already has.
In the 15 years of Go's lifetime, for which I've been a contributor for 11 years, I have never seen a report of someone reporting memory bloat due to sync.Pool.bytes.Buffer re-usage.

…led reuse This change comes from an analysis of a bunch of RAM and CPU profiles and noticing that realm storage needs to invoke amino.MustMarshalAny but that in the profile for TestStdlibs, it was consuming 1.28GB. ```shell ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go 0 1.28GB (flat, cum) 0.61% of Total . . 80:func MustMarshalAny(o interface{}) []byte { . 1.28GB 81: return gcdc.MustMarshalAny(o) . . 82:} . . 83: . . 84:func MarshalAnySized(o interface{}) ([]byte, error) { . . 85: return gcdc.MarshalAnySized(o) . . 86:} ``` and ```shell focus=MarshalAny Showing nodes accounting for 1303.02MB, 0.6% of 217023.96MB total Dropped 13 nodes (cum <= 1085.12MB) ----------------------------------------------------------+------------- flat flat% sum% cum cum% calls calls% + context ----------------------------------------------------------+------------- 539.49MB 100% | bytes.(*Buffer).grow 539.49MB 0.25% 0.25% 539.49MB 0.25% | bytes.growSlice ----------------------------------------------------------+------------- 706.50MB 100% | bytes.(*Buffer).Write 167.01MB 0.077% 0.33% 706.50MB 0.33% | bytes.(*Buffer).grow 539.49MB 76.36% | bytes.growSlice ----------------------------------------------------------+------------- 93MB 58.68% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface (inline) 56.50MB 35.65% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryStruct (inline) 9MB 5.68% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList (inline) 158.51MB 0.073% 0.4% 158.51MB 0.073% | bytes.NewBuffer ----------------------------------------------------------+------------- 145.01MB 57.77% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).writeFieldIfNotEmpty 86MB 34.26% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryInterface 20MB 7.97% | github.com/gnolang/gno/tm2/pkg/amino.(*Codec).encodeReflectBinaryList 85.50MB 0.039% 0.44% 251.01MB 0.12% | github.com/gnolang/gno/tm2/pkg/amino.encodeFieldNumberAndTyp3 165.51MB 65.94% | bytes.(*Buffer).Write ----------------------------------------------------------+------------- 77.01MB 100% | github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice 61.50MB 0.028% 0.47% 77.01MB 0.035% | github.com/gnolang/gno/tm2/pkg/amino.EncodeUvarint 15.51MB 20.14% | bytes.(*Buffer).Write ----------------------------------------------------------+------------- ``` but after this change, we see more than 560MB shaved off ```shell ROUTINE ======================== github.com/gnolang/gno/tm2/pkg/amino.MustMarshalAny in /Users/emmanuelodeke/go/src/github.com/gnolang/gno/tm2/pkg/amino/amino.go 0 560.95MB (flat, cum) 0.26% of Total . . 80:func MustMarshalAny(o interface{}) []byte { . 560.95MB 81: return gcdc.MustMarshalAny(o) . . 82:} . . 83: . . 84:func MarshalAnySized(o interface{}) ([]byte, error) { . . 85: return gcdc.MarshalAnySized(o) . . 86:} ``` and ```shell ----------------------------------------------------------+------------- 16.35MB 52.46% | github.com/gnolang/gno/tm2/pkg/amino.EncodeByteSlice 14.81MB 47.54% | github.com/gnolang/gno/tm2/pkg/amino.writeMaybeBare 0 0% 0.26% 31.16MB 0.014% | bytes.(*Buffer).Write 31.16MB 100% | bytes.(*Buffer).grow ----------------------------------------------------------+------------- 31.16MB 100% | bytes.(*Buffer).Write 0 0% 0.26% 31.16MB 0.014% | bytes.(*Buffer).grow 31.16MB 100% | bytes.growSlice ----------------------------------------------------------+------------- ``` and even more after the change on ensuring that tm2/pkg/amino benchmarks could run we have quite good improvements! Running out of RAM is much worse than a couple of microseconds so we can tolerate an increase in some CPU time benchmarks. ```shell name old time/op new time/op delta Binary/EmptyStruct:encode-8 3.86µs ± 5% 3.92µs ± 5% ~ (p=0.548 n=5+5) Binary/EmptyStruct:decode-8 3.79µs ± 5% 3.79µs ± 6% ~ (p=0.690 n=5+5) Binary/PrimitivesStruct:encode-8 35.5µs ± 2% 36.5µs ± 5% ~ (p=0.151 n=5+5) Binary/PrimitivesStruct:decode-8 35.0µs ± 2% 38.6µs ±11% +10.17% (p=0.016 n=5+5) Binary/ShortArraysStruct:encode-8 5.91µs ± 6% 6.36µs ± 8% +7.61% (p=0.032 n=5+5) Binary/ShortArraysStruct:decode-8 6.07µs ±21% 6.39µs ± 8% ~ (p=0.151 n=5+5) Binary/ArraysStruct:encode-8 95.1µs ± 8% 100.6µs ± 7% ~ (p=0.222 n=5+5) Binary/ArraysStruct:decode-8 91.3µs ± 5% 98.5µs ±12% ~ (p=0.222 n=5+5) Binary/ArraysArraysStruct:encode-8 131µs ± 3% 132µs ± 6% ~ (p=0.841 n=5+5) Binary/ArraysArraysStruct:decode-8 136µs ± 9% 134µs ± 3% ~ (p=0.548 n=5+5) Binary/SlicesStruct:encode-8 85.4µs ± 1% 92.3µs ± 9% +8.15% (p=0.008 n=5+5) Binary/SlicesStruct:decode-8 87.1µs ± 8% 94.8µs ± 7% ~ (p=0.056 n=5+5) Binary/SlicesSlicesStruct:encode-8 506µs ± 2% 545µs ± 9% ~ (p=0.151 n=5+5) Binary/SlicesSlicesStruct:decode-8 506µs ± 3% 523µs ± 3% ~ (p=0.095 n=5+5) Binary/PointersStruct:encode-8 56.8µs ± 4% 65.5µs ±20% +15.43% (p=0.016 n=5+5) Binary/PointersStruct:decode-8 57.5µs ± 3% 55.9µs ± 3% ~ (p=0.095 n=5+5) Binary/PointerSlicesStruct:encode-8 162µs ± 4% 172µs ±21% ~ (p=0.841 n=5+5) Binary/PointerSlicesStruct:decode-8 163µs ± 5% 185µs ±13% ~ (p=0.095 n=5+5) Binary/ComplexSt:encode-8 314µs ± 3% 354µs ±11% +12.90% (p=0.008 n=5+5) Binary/ComplexSt:decode-8 319µs ± 2% 338µs ± 4% +5.87% (p=0.008 n=5+5) Binary/EmbeddedSt1:encode-8 39.8µs ± 7% 39.3µs ± 8% ~ (p=1.000 n=5+5) Binary/EmbeddedSt1:decode-8 37.0µs ± 4% 37.8µs ± 6% ~ (p=0.690 n=5+5) Binary/EmbeddedSt2:encode-8 316µs ± 7% 307µs ± 3% ~ (p=0.222 n=5+5) Binary/EmbeddedSt2:decode-8 316µs ± 3% 306µs ± 2% ~ (p=0.095 n=5+5) Binary/EmbeddedSt3:encode-8 217µs ± 7% 201µs ± 1% -7.26% (p=0.008 n=5+5) Binary/EmbeddedSt3:decode-8 222µs ±10% 204µs ± 2% -8.50% (p=0.032 n=5+5) Binary/EmbeddedSt4:encode-8 332µs ± 4% 325µs ± 3% ~ (p=0.421 n=5+5) Binary/EmbeddedSt4:decode-8 332µs ± 4% 324µs ± 5% ~ (p=0.095 n=5+5) Binary/EmbeddedSt5:encode-8 218µs ± 2% 212µs ± 3% ~ (p=0.056 n=5+5) Binary/EmbeddedSt5:decode-8 224µs ± 8% 209µs ± 1% -6.85% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:encode-8 9.03µs ± 6% 8.97µs ±12% ~ (p=0.841 n=5+5) Binary/AminoMarshalerStruct1:decode-8 8.91µs ± 5% 8.81µs ± 4% ~ (p=0.841 n=5+5) Binary/AminoMarshalerStruct2:encode-8 13.2µs ±10% 12.2µs ± 2% -7.26% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:decode-8 13.2µs ± 6% 12.5µs ± 5% ~ (p=0.095 n=5+5) Binary/AminoMarshalerStruct3:encode-8 7.17µs ± 3% 7.50µs ± 8% ~ (p=0.548 n=5+5) Binary/AminoMarshalerStruct3:decode-8 7.12µs ± 4% 7.84µs ±10% +10.12% (p=0.016 n=5+5) Binary/AminoMarshalerInt4:encode-8 6.60µs ± 5% 6.96µs ±11% ~ (p=0.421 n=5+5) Binary/AminoMarshalerInt4:decode-8 6.79µs ±12% 7.04µs ±15% ~ (p=0.690 n=5+5) Binary/AminoMarshalerInt5:encode-8 6.64µs ± 4% 6.92µs ± 5% +4.09% (p=0.032 n=5+5) Binary/AminoMarshalerInt5:decode-8 6.55µs ± 3% 7.76µs ±10% +18.44% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:encode-8 11.7µs ± 5% 13.2µs ±10% +13.09% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:decode-8 11.4µs ± 3% 11.6µs ± 2% ~ (p=0.222 n=5+5) Binary/AminoMarshalerStruct7:encode-8 9.86µs ± 1% 10.10µs ±19% ~ (p=0.310 n=5+5) Binary/AminoMarshalerStruct7:decode-8 9.55µs ± 3% 9.75µs ±10% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta Binary/EmptyStruct:encode-8 1.50kB ± 0% 1.41kB ± 0% -6.32% (p=0.008 n=5+5) Binary/EmptyStruct:decode-8 1.50kB ± 0% 1.41kB ± 0% -6.32% (p=0.008 n=5+5) Binary/PrimitivesStruct:encode-8 10.4kB ± 0% 9.6kB ± 0% -7.82% (p=0.008 n=5+5) Binary/PrimitivesStruct:decode-8 10.4kB ± 0% 9.6kB ± 0% -7.82% (p=0.000 n=4+5) Binary/ShortArraysStruct:encode-8 2.11kB ± 0% 1.92kB ± 0% -9.04% (p=0.008 n=5+5) Binary/ShortArraysStruct:decode-8 2.11kB ± 0% 1.92kB ± 0% -9.04% (p=0.008 n=5+5) Binary/ArraysStruct:encode-8 25.9kB ± 0% 22.0kB ± 0% -15.04% (p=0.008 n=5+5) Binary/ArraysStruct:decode-8 25.9kB ± 0% 22.0kB ± 0% -15.04% (p=0.008 n=5+5) Binary/ArraysArraysStruct:encode-8 37.7kB ± 0% 25.3kB ± 0% -33.07% (p=0.008 n=5+5) Binary/ArraysArraysStruct:decode-8 37.7kB ± 0% 25.3kB ± 0% -33.07% (p=0.008 n=5+5) Binary/SlicesStruct:encode-8 28.2kB ± 0% 25.1kB ± 0% -10.96% (p=0.008 n=5+5) Binary/SlicesStruct:decode-8 28.2kB ± 0% 25.1kB ± 0% -10.97% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:encode-8 183kB ± 0% 147kB ± 0% -19.92% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:decode-8 183kB ± 0% 147kB ± 0% -19.92% (p=0.008 n=5+5) Binary/PointersStruct:encode-8 14.4kB ± 0% 13.6kB ± 0% -5.64% (p=0.008 n=5+5) Binary/PointersStruct:decode-8 14.4kB ± 0% 13.6kB ± 0% -5.64% (p=0.008 n=5+5) Binary/PointerSlicesStruct:encode-8 43.9kB ± 0% 40.2kB ± 0% -8.49% (p=0.008 n=5+5) Binary/PointerSlicesStruct:decode-8 43.9kB ± 0% 40.2kB ± 0% -8.49% (p=0.008 n=5+5) Binary/ComplexSt:encode-8 95.3kB ± 0% 78.2kB ± 0% -17.97% (p=0.008 n=5+5) Binary/ComplexSt:decode-8 95.3kB ± 0% 78.2kB ± 0% -17.97% (p=0.008 n=5+5) Binary/EmbeddedSt1:encode-8 11.3kB ± 0% 10.2kB ± 0% -9.62% (p=0.000 n=5+4) Binary/EmbeddedSt1:decode-8 11.3kB ± 0% 10.2kB ± 0% -9.61% (p=0.000 n=5+4) Binary/EmbeddedSt2:encode-8 95.5kB ± 0% 78.3kB ± 0% -17.96% (p=0.008 n=5+5) Binary/EmbeddedSt2:decode-8 95.5kB ± 0% 78.4kB ± 0% -17.94% (p=0.008 n=5+5) Binary/EmbeddedSt3:encode-8 68.3kB ± 0% 56.6kB ± 0% -17.22% (p=0.008 n=5+5) Binary/EmbeddedSt3:decode-8 68.3kB ± 0% 56.6kB ± 0% -17.21% (p=0.008 n=5+5) Binary/EmbeddedSt4:encode-8 97.2kB ± 0% 82.3kB ± 0% -15.32% (p=0.008 n=5+5) Binary/EmbeddedSt4:decode-8 97.2kB ± 0% 82.3kB ± 0% -15.31% (p=0.008 n=5+5) Binary/EmbeddedSt5:encode-8 65.9kB ± 0% 55.3kB ± 0% -16.19% (p=0.008 n=5+5) Binary/EmbeddedSt5:decode-8 66.0kB ± 0% 55.3kB ± 0% -16.18% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:encode-8 2.87kB ± 0% 2.66kB ± 0% -7.23% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:decode-8 2.87kB ± 0% 2.66kB ± 0% -7.23% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:encode-8 4.58kB ± 0% 3.62kB ± 0% -20.95% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:decode-8 4.58kB ± 0% 3.62kB ± 0% -20.95% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:encode-8 2.42kB ± 0% 2.31kB ± 0% -4.62% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:decode-8 2.42kB ± 0% 2.31kB ± 0% -4.62% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:encode-8 2.38kB ± 0% 2.15kB ± 0% -9.38% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:decode-8 2.38kB ± 0% 2.15kB ± 0% -9.38% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:encode-8 2.36kB ± 0% 2.27kB ± 0% -4.07% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:decode-8 2.36kB ± 0% 2.27kB ± 0% -4.07% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:encode-8 3.51kB ± 0% 3.19kB ± 0% -9.05% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:decode-8 3.51kB ± 0% 3.19kB ± 0% -9.05% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:encode-8 2.89kB ± 0% 2.67kB ± 0% -7.72% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:decode-8 2.89kB ± 0% 2.67kB ± 0% -7.72% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Binary/EmptyStruct:encode-8 38.0 ± 0% 36.0 ± 0% -5.26% (p=0.008 n=5+5) Binary/EmptyStruct:decode-8 38.0 ± 0% 36.0 ± 0% -5.26% (p=0.008 n=5+5) Binary/PrimitivesStruct:encode-8 439 ± 0% 429 ± 0% -2.28% (p=0.008 n=5+5) Binary/PrimitivesStruct:decode-8 439 ± 0% 429 ± 0% -2.28% (p=0.008 n=5+5) Binary/ShortArraysStruct:encode-8 56.0 ± 0% 52.0 ± 0% -7.14% (p=0.008 n=5+5) Binary/ShortArraysStruct:decode-8 56.0 ± 0% 52.0 ± 0% -7.14% (p=0.008 n=5+5) Binary/ArraysStruct:encode-8 977 ± 0% 919 ± 0% -5.94% (p=0.008 n=5+5) Binary/ArraysStruct:decode-8 977 ± 0% 919 ± 0% -5.94% (p=0.008 n=5+5) Binary/ArraysArraysStruct:encode-8 1.28k ± 0% 1.08k ± 0% -15.05% (p=0.008 n=5+5) Binary/ArraysArraysStruct:decode-8 1.28k ± 0% 1.08k ± 0% -15.05% (p=0.008 n=5+5) Binary/SlicesStruct:encode-8 1.01k ± 0% 0.97k ± 0% -3.77% (p=0.008 n=5+5) Binary/SlicesStruct:decode-8 1.01k ± 0% 0.97k ± 0% -3.77% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:encode-8 6.33k ± 0% 5.95k ± 0% -5.90% (p=0.008 n=5+5) Binary/SlicesSlicesStruct:decode-8 6.33k ± 0% 5.95k ± 0% -5.90% (p=0.008 n=5+5) Binary/PointersStruct:encode-8 637 ± 0% 627 ± 0% -1.57% (p=0.008 n=5+5) Binary/PointersStruct:decode-8 637 ± 0% 627 ± 0% -1.57% (p=0.008 n=5+5) Binary/PointerSlicesStruct:encode-8 1.62k ± 0% 1.56k ± 0% -3.28% (p=0.008 n=5+5) Binary/PointerSlicesStruct:decode-8 1.62k ± 0% 1.56k ± 0% -3.28% (p=0.008 n=5+5) Binary/ComplexSt:encode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/ComplexSt:decode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/EmbeddedSt1:encode-8 453 ± 0% 440 ± 0% -2.87% (p=0.008 n=5+5) Binary/EmbeddedSt1:decode-8 453 ± 0% 440 ± 0% -2.87% (p=0.008 n=5+5) Binary/EmbeddedSt2:encode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/EmbeddedSt2:decode-8 3.37k ± 0% 3.22k ± 0% -4.62% (p=0.008 n=5+5) Binary/EmbeddedSt3:encode-8 2.32k ± 0% 2.20k ± 0% -5.38% (p=0.008 n=5+5) Binary/EmbeddedSt3:decode-8 2.32k ± 0% 2.20k ± 0% -5.38% (p=0.008 n=5+5) Binary/EmbeddedSt4:encode-8 3.67k ± 0% 3.54k ± 0% -3.73% (p=0.008 n=5+5) Binary/EmbeddedSt4:decode-8 3.67k ± 0% 3.54k ± 0% -3.73% (p=0.008 n=5+5) Binary/EmbeddedSt5:encode-8 2.32k ± 0% 2.20k ± 0% -5.00% (p=0.008 n=5+5) Binary/EmbeddedSt5:decode-8 2.32k ± 0% 2.20k ± 0% -5.00% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:encode-8 97.0 ± 0% 94.0 ± 0% -3.09% (p=0.008 n=5+5) Binary/AminoMarshalerStruct1:decode-8 97.0 ± 0% 94.0 ± 0% -3.09% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:encode-8 149 ± 0% 133 ± 0% -10.74% (p=0.008 n=5+5) Binary/AminoMarshalerStruct2:decode-8 149 ± 0% 133 ± 0% -10.74% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:encode-8 77.0 ± 0% 76.0 ± 0% -1.30% (p=0.008 n=5+5) Binary/AminoMarshalerStruct3:decode-8 77.0 ± 0% 76.0 ± 0% -1.30% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:encode-8 71.0 ± 0% 68.0 ± 0% -4.23% (p=0.008 n=5+5) Binary/AminoMarshalerInt4:decode-8 71.0 ± 0% 68.0 ± 0% -4.23% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:encode-8 74.0 ± 0% 73.0 ± 0% -1.35% (p=0.008 n=5+5) Binary/AminoMarshalerInt5:decode-8 74.0 ± 0% 73.0 ± 0% -1.35% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:encode-8 122 ± 0% 117 ± 0% -4.10% (p=0.008 n=5+5) Binary/AminoMarshalerStruct6:decode-8 122 ± 0% 117 ± 0% -4.10% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:encode-8 101 ± 0% 98 ± 0% -2.97% (p=0.008 n=5+5) Binary/AminoMarshalerStruct7:decode-8 101 ± 0% 98 ± 0% -2.97% (p=0.008 n=5+5) ``` Fixes gnolang#3488

odeke-em · 2025-01-20T12:14:42Z

Kind ping @thehowl and team, to please take a look at my response; landing this PR gives lots of better rope for me to further improve performance and reduce allocations.

thehowl · 2025-01-20T18:21:01Z

In the 15 years of Go's lifetime, for which I've been a contributor for 11 years, I have never seen a report of someone reporting memory bloat due to sync.Pool.bytes.Buffer re-usage.

There is prior literature on what I'm talking about, and by reputable people who know about the runtime and the pool implementation: golang/go#23199

I think bytebufferpool is a reasonable implementation which adjusts to the average size of the buffers; otherwise putting a limit on the maximum cap of the buffer is reasonable, too and requires no additional dependency.

github-actions bot assigned odeke-em Jan 12, 2025

github-actions bot added the 📦 🌐 tendermint v2 Issues or PRs tm2 related label Jan 12, 2025

notJoon added the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Jan 13, 2025

Kouteki requested review from thehowl, mvertes, ltzmaxwell and petar-dambovaliev January 13, 2025 12:23

Kouteki added the in focus Core team is prioritizing this work label Jan 13, 2025

thehowl reviewed Jan 13, 2025

View reviewed changes

odeke-em requested a review from thehowl January 14, 2025 11:32

odeke-em force-pushed the tm2-pkg-amino-slash-RAM-wastage-with-bytes.Buffer-reuse branch from c507270 to 96aee1d Compare January 14, 2025 15:06

thehowl removed the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Jan 15, 2025

odeke-em force-pushed the tm2-pkg-amino-slash-RAM-wastage-with-bytes.Buffer-reuse branch 2 times, most recently from 0fd0b68 to 7f95d1a Compare January 20, 2025 11:38

odeke-em force-pushed the tm2-pkg-amino-slash-RAM-wastage-with-bytes.Buffer-reuse branch from 7f95d1a to 8e64450 Compare January 20, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489

perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489

odeke-em commented Jan 12, 2025

odeke-em commented Jan 12, 2025

Gno2D2 commented Jan 12, 2025 •

edited

Loading

✅ Automated Checks (for Contributors):

☑️ Contributor Actions:

☑️ Reviewer Actions:

📚 Resources:

If

Then

If

Can be checked by

If

Can be checked by

codecov bot commented Jan 12, 2025 •

edited

Loading

thehowl left a comment

odeke-em commented Jan 13, 2025 •

edited

Loading

thehowl commented Jan 14, 2025

odeke-em commented Jan 14, 2025 •

edited

Loading

thehowl commented Jan 14, 2025

odeke-em commented Jan 18, 2025 •

edited

Loading

odeke-em commented Jan 20, 2025

thehowl commented Jan 20, 2025

perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489

Are you sure you want to change the base?

perf(tm2/pkg/amino): reduce RAM heavy-handedness by *bytes.Buffer pooled reuse #3489

Conversation

odeke-em commented Jan 12, 2025

odeke-em commented Jan 12, 2025

Gno2D2 commented Jan 12, 2025 • edited Loading

🛠 PR Checks Summary

Manual Checks (for Reviewers):

✅ Automated Checks (for Contributors):

☑️ Contributor Actions:

☑️ Reviewer Actions:

📚 Resources:

If

Then

If

Can be checked by

If

Can be checked by

codecov bot commented Jan 12, 2025 • edited Loading

Codecov Report

thehowl left a comment

Choose a reason for hiding this comment

odeke-em commented Jan 13, 2025 • edited Loading

thehowl commented Jan 14, 2025

odeke-em commented Jan 14, 2025 • edited Loading

thehowl commented Jan 14, 2025

odeke-em commented Jan 18, 2025 • edited Loading

odeke-em commented Jan 20, 2025

thehowl commented Jan 20, 2025

Gno2D2 commented Jan 12, 2025 •

edited

Loading

codecov bot commented Jan 12, 2025 •

edited

Loading

odeke-em commented Jan 13, 2025 •

edited

Loading

odeke-em commented Jan 14, 2025 •

edited

Loading

odeke-em commented Jan 18, 2025 •

edited

Loading