Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header #453

poiuj · 2024-05-07T12:56:14Z

Description

zfree updates memory statistics. It gets the size of the buffer from jemalloc by calling zmalloc_size. This operation is costly. We can avoid it if we know the buffer size. For example, we can calculate size of sds from the data we have in its header.

This commit introduces zfree_with_size function that accepts both pointer to a buffer, and its size. zfree is refactored to call zfree_with_size.

sdsfree uses the new interface for all but SDS_TYPE_5.

Benchmark

Dataset is 3 million strings. Each benchmark run uses its own value size (8192, 512, and 120). The benchmark is 100% write load for 5 minutes.

value size       new tps      old tps      %       new us/call    old us/call    %
8k               272088.53    269971.75    0.78    1.83           1.92           -4.69
512              356881.91    352856.72    1.14    1.27           1.35           -5.93
120              377523.81    368774.78    2.37    1.14           1.19           -4.20

codecov · 2024-05-08T00:20:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.05%. Comparing base (4176604) to head (1a3a6d6).
Report is 37 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #453      +/-   ##
============================================
- Coverage     70.07%   70.05%   -0.03%     
============================================
  Files           110      110              
  Lines         59989    60085      +96     
============================================
+ Hits          42037    42092      +55     
- Misses        17952    17993      +41

Files	Coverage Δ
src/sds.c	`85.86% <100.00%> (-0.13%)`	⬇️
src/zmalloc.c	`85.05% <100.00%> (+0.41%)`	⬆️

... and 37 files with indirect coverage changes

lipzhu · 2024-05-08T07:15:50Z

I think the zmalloc_size is really expensive to update the used_memory, we had a similar discussion in #308 (comment), for this patch, I'm curious if the assertion of sds->alloc == allocate size in all cases, what happened if call the sdsResize ?

zvi-code · 2024-05-08T11:43:06Z

@lipzhu the update of used_memory has 2 cost aspects, the zmalloc_size and also the fact it's an atomic operation.

regarding zmalloc_size cost: it is actually called twice, once in zmalloc code and enother one inside the je_free because the tcache needs to know what bin the buffer belongs to. So, if you remove the zmalloc_size call for stat, you are still paying the cost. When using je_sdallocx to free (this PR) it uses, internally in jemalloc, fast_path that avoids fetching the size to begin with. BTW: I think for cpp delete operator is actually overloaded in jemalloc.cpp to use sized delete.
[edit I see you already have PR for this] regarding atomic cost: this I believe can be also eliminated with use of thread local vars(I guess the team has considered this before)

poiuj · 2024-05-08T16:06:51Z

@lipzhu good point about sds->alloc == allocate size. I was sure it is. But now I see that sds->alloc is truncated when the sds len is close enough to sds type max len (2^8, 2^16, etc.). I'll update the PR with a fix.

src/sds.c

…to align the logic with sdsnewlen function. (#476) This patch try to correct the actual allocated size from allocator when call sdsRedize to align the logic with sdsnewlen function. Maybe the #453 optimization should depend on this. Signed-off-by: Lipeng Zhu <[email protected]>

…to align the logic with sdsnewlen function. (valkey-io#476) This patch try to correct the actual allocated size from allocator when call sdsRedize to align the logic with sdsnewlen function. Maybe the valkey-io#453 optimization should depend on this. Signed-off-by: Lipeng Zhu <[email protected]>

…to align the logic with sdsnewlen function. (valkey-io#476) This patch try to correct the actual allocated size from allocator when call sdsRedize to align the logic with sdsnewlen function. Maybe the valkey-io#453 optimization should depend on this. Signed-off-by: Lipeng Zhu <[email protected]> Signed-off-by: Samuel Adetunji <[email protected]>

zfree updates memory statistics. It gets the size of the buffer from jemalloc by calling zmalloc_size. This operation is costly. We can avoid it if we know the buffer size. For example, we can calculate size of sds from the data we have in its header. This commit introduces zfree_with_size function that accepts both pointer to a buffer, and its size. zfree is refactored to call zfree_with_size. sdsfree uses the new interface for all but SDS_TYPE_5. Signed-off-by: Vadym Khoptynets <[email protected]>

sdsalloc returns sds's length for SDS_TYPE_5. It's not correct for SDS_TYPE_5. This patch makes sdsAllocSize call zmalloc_size for SDS_TYPE_5. sdsalloc is a lower level function that continues to return length for SDS_TYPE_5. Signed-off-by: Vadym Khoptynets <[email protected]>

Instead of zmalloc_size use s_malloc_size. Signed-off-by: Vadym Khoptynets <[email protected]>

src/sds.c

src/zmalloc.c

Change the doc string for `zfree_with_size` Signed-off-by: Vadym Khoptynets <[email protected]>

Signed-off-by: Vadym Khoptynets <[email protected]>

src/Makefile

src/sds.c

Signed-off-by: Vadym Khoptynets <[email protected]>

src/zmalloc.c

Signed-off-by: Vadym Khoptynets <[email protected]>

src/zmalloc.c

Signed-off-by: Vadym Khoptynets <[email protected]>

ranshid

LGTM

madolson · 2024-06-18T18:26:06Z

Ok, I wasn't able to get as consistent of performance numbers. Diving into a perf analysis shows about a 20% reduction in CPU in sdsfree though. So it seems like there might be other bottlenecks, this is definitely a good optimization.

src/zmalloc.c

madolson

Overall it looks good to me, I have a question about an alternative, but it's not a blocker from me. If you still want to go with this approach, I think it's OK to merge.

poiuj · 2024-06-18T20:07:02Z

Yes, let's go with this approach.

Signed-off-by: Vadym Khoptynets <[email protected]>

enjoy-binbin

LGTM, it is a good optimization.

src/zmalloc.c

Signed-off-by: Madelyn Olson <[email protected]>

Issue Introduced by #453. When we check the SDS _TYPE_5 allocation size we mistakenly used zmalloc_size which DOES take the PREFIX size into account when no malloc_size support. Later when we free we add the PREFIX_SIZE again which leads to negative memory accounting on some tests. Example test failure: https://github.com/valkey-io/valkey/actions/runs/9654170962/job/26627901497 Signed-off-by: ranshid <[email protected]>

**Motivation** Copy-on-write (COW) amplification refers to the issue where writing to a small object leads to the entire page being cloned, resulting in inefficient memory usage. This issue arises during the BGSAVE process, which can be particularly problematic on instances with limited memory. If the BGSAVE process could release unneeded memory, it could reduce memory consumption. To address this, the BGSAVE process calls the `madvise` function to signal the operating system to reclaim the buffer. However, this approach does not work for buffers smaller than a page (usually 4KiB). Even after multiple such calls, where a full page may be free, the operating system will not reclaim it. To solve this issue, we can call `zfree` directly. This allows the allocator (jemalloc) to handle the bookkeeping and release pages when buffers are no longer needed. This approach reduces copy-on-write events. **Benchmarks** To understand how usage of `zfree` affects BGSAVE and the memory consumption I ran 45 benchmarks that compares my clonewith the vanilla version. The benchmark has the following steps: 1. Start a new Valkey process 2. Fill the DB with data sequentially 3. Run a warmup to randomize the memory layout 4. Introduce fragmentation by deleting part of the keys 5. In parallel: 1. Trigger BGSAVE 2. Start 80/20 get/set load I played the following parameters to understand their influence: 1. Number of keys: 3M, 6M, and 12M. 2. Data size. While key themselves are of fixed length ~30 bytes, the value size is 120, 250, 500, 1000, and 2000 bytes. 3. Fragmentation. I delete 5%, 10%, and 15% of the original key range. I'm attaching a graph of BGSAVE process memory consumption. Instead of all benchmarks, I show the most representative runs IMO. <img width="1570" alt="3m-fixed" src="https://github.com/user-attachments/assets/3dbbc528-01c1-4821-a3c2-6be455e7f78a"> For 2000 bytes values peak memory usage is ~53% compared to vanilla. The peak happens at 57% BGSAVE progress. For 500 bytes values the peak is ~80% compared to vanilla. And happens at ~80% progress. For 120 bytes the difference is under 5%, and the patched version could even use more memory. ![500b-fixed](https://github.com/user-attachments/assets/b09451d3-4bce-4f33-b3db-2b5df2178ed2) For 12M keys, the peak is ~85% of the vanilla’s. Happens at ~70% mark. For 6M keys, the peak is ~87% of the vanilla’s. Happens at ~77% mark. For 3M keys, the peak is ~87% of the vanilla’s Happens at ~80% mark. **Changes** The PR contains 2 changes: 1. Static buffer for RDB comrpession. RDB compression leads to COW events even without any write load if we use `zfree`. It happens because the compression functions allocates a new buffer for each object. Together with freeing objects with `zfree` it leads to reusing of the memory shared with the main process. To deal with this problem, we use a pre-allocated constant 8K buffer for compression. If the object size is too big for this buffer, than we fall back to the ad hoc allocation behavior. 2. Freeing string objects instead of dismissing them Call to `zfree` is more expensive than direct call to `madvise`. But with #453 strings use the fast path – `zfree_with_size`. As a possible next step we can optimize `zfree` for other data types as well. --------- Signed-off-by: Vadym Khoptynets <[email protected]> Signed-off-by: ranshid <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>

lipzhu mentioned this pull request May 8, 2024

Significance of absolute accurate used_memory? #467

Open

poiuj force-pushed the unstable branch from 35924f9 to 883f63d Compare May 8, 2024 11:13

lipzhu mentioned this pull request May 9, 2024

Correct the actual allocated size from allocator when call sdsRedize to align the logic with sdsnewlen function. #476

Merged

poiuj mentioned this pull request May 15, 2024

Adjust sds types #502

Merged

madolson reviewed May 15, 2024

View reviewed changes

src/sds.c Outdated Show resolved Hide resolved

madolson reviewed May 16, 2024

View reviewed changes

src/sds.c Outdated Show resolved Hide resolved

madolson requested a review from ranshid May 16, 2024 16:53

poiuj added 3 commits June 3, 2024 15:14

Do not include zmalloc into sds.c

37ad930

Instead of zmalloc_size use s_malloc_size. Signed-off-by: Vadym Khoptynets <[email protected]>

poiuj force-pushed the unstable branch from 779ba8c to 37ad930 Compare June 3, 2024 15:37

poiuj requested a review from madolson June 3, 2024 16:15

ranshid reviewed Jun 11, 2024

View reviewed changes

src/sds.c Show resolved Hide resolved

ranshid reviewed Jun 11, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

ranshid reviewed Jun 11, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

ranshid reviewed Jun 11, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

poiuj requested a review from ranshid June 11, 2024 08:46

Apply suggestions from code review

21acd10

Change the doc string for `zfree_with_size` Signed-off-by: Vadym Khoptynets <[email protected]>

poiuj force-pushed the unstable branch from 1639a2b to 21acd10 Compare June 11, 2024 08:49

Update zfree_with_size() docstring

0c2048d

Signed-off-by: Vadym Khoptynets <[email protected]>

eliblight reviewed Jun 11, 2024

View reviewed changes

src/Makefile Outdated Show resolved Hide resolved

src/sds.c Show resolved Hide resolved

Revert Makefile changes pushed by mistake

72e6b28

Signed-off-by: Vadym Khoptynets <[email protected]>

ranshid reviewed Jun 13, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

Make zfree_with_size work correctly on systems without malloc_size

2f96c40

Signed-off-by: Vadym Khoptynets <[email protected]>

poiuj force-pushed the unstable branch from d20f301 to 2f96c40 Compare June 13, 2024 11:43

ranshid reviewed Jun 13, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

poiuj added 2 commits June 13, 2024 13:32

Update zfree_internal docstring and add assert

19f8f36

Signed-off-by: Vadym Khoptynets <[email protected]>

Include serverassert.h in zmalloc.c

195a4a9

Signed-off-by: Vadym Khoptynets <[email protected]>

poiuj requested a review from ranshid June 13, 2024 14:52

ranshid approved these changes Jun 13, 2024

View reviewed changes

madolson reviewed Jun 18, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

madolson approved these changes Jun 18, 2024

View reviewed changes

madolson added performance release-notes This issue should get a line item in the release notes labels Jun 18, 2024

Apply madolson's suggestion on zfree_internal

017941d

Signed-off-by: Vadym Khoptynets <[email protected]>

enjoy-binbin changed the title ~~Add zfree_with_size~~ Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header Jun 19, 2024

enjoy-binbin approved these changes Jun 19, 2024

View reviewed changes

madolson approved these changes Jun 19, 2024

View reviewed changes

madolson reviewed Jun 19, 2024

View reviewed changes

src/zmalloc.c Outdated Show resolved Hide resolved

Update src/zmalloc.c

1a3a6d6

Signed-off-by: Madelyn Olson <[email protected]>

madolson merged commit 0143b7c into valkey-io:unstable Jun 19, 2024
19 checks passed

ranshid mentioned this pull request Jun 25, 2024

Fix bad memory accounting for sds when no malloc_size available #694

Merged

poiuj mentioned this pull request Aug 13, 2024

Free strings during BGSAVE/BGAOFRW to reduce copy-on-write #905

Merged

ShooterIT mentioned this pull request Jan 13, 2025

Client argv object pool redis/redis#13726

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header #453

Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header #453

poiuj commented May 7, 2024 •

edited

Loading

codecov bot commented May 8, 2024 •

edited

Loading

lipzhu commented May 8, 2024 •

edited

Loading

zvi-code commented May 8, 2024 •

edited

Loading

poiuj commented May 8, 2024

ranshid left a comment

madolson commented Jun 18, 2024

madolson left a comment

poiuj commented Jun 18, 2024

enjoy-binbin left a comment

Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header #453

Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header #453

Conversation

poiuj commented May 7, 2024 • edited Loading

Description

Benchmark

codecov bot commented May 8, 2024 • edited Loading

Codecov Report

lipzhu commented May 8, 2024 • edited Loading

zvi-code commented May 8, 2024 • edited Loading

poiuj commented May 8, 2024

ranshid left a comment

Choose a reason for hiding this comment

madolson commented Jun 18, 2024

madolson left a comment

Choose a reason for hiding this comment

poiuj commented Jun 18, 2024

enjoy-binbin left a comment

Choose a reason for hiding this comment

poiuj commented May 7, 2024 •

edited

Loading

codecov bot commented May 8, 2024 •

edited

Loading

lipzhu commented May 8, 2024 •

edited

Loading

zvi-code commented May 8, 2024 •

edited

Loading