-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significance of absolute accurate used_memory
?
#467
Comments
I think it's safe to remove this exact counting. It's already not exact anyway (see discussion in the PR). In the contributor summit, someone mentioned that we should remove the memory counter in zmalloc and instead rely on the metrics from jemalloc's
We can optimize the balance between exactness and performance using some heuristics. The worst case scenario is that two threads allocates a huge amount of memory almost simultaneously when we're close to the maxmemory limit. To avoid that, we could fetch the metric more frequently when we're closer to the maxmemory, say when we're over 90% of maxmemory, and less often otherwise. Another possible heuristic is to count large allocations immediately (say over 10MB) and increment the global counter immediately from zmalloc in this case, and otherwise rely on the value we fetch from jemalloc less often, e.g. in cron. |
@zuiderkwast, did you get a chance to look at @lipzhu's proposal at #308 (comment)? The idea is to cache the small delta locally and only when the accumulated changes exceed some threshold commit them to the global variable atomically . On the reporting path, we could sum up the global and the deltas from all threads to get a close enough reading. Note that the local delta would need to be a signed number. I have a few questions about using
|
@PingXie Yes, I looked at the PR but it isn't currently implemented as in the #308 (comment) comment, right? I think this comment looks good and it is more simple than the current array of counters per thread. @lipzhu Do I understand correctly? Is the suggestion to compute the diff and report back to the main atomic static _Atomic size_t used_memory = 0;
static _Thread int64_t thread_used_memory_delta = 0;
#define THREAD_MEM_MAX_DELTA (100 * 1024)
static inline void update_zmalloc_stat_alloc(size_t size) {
thread_used_memory_delta += size;
if (thread_used_memory_delta >= THREAD_MEM_MAX_DELTA) {
atomic_fetch_add_explicit(&used_memory, thread_used_memory_delta, memory_order_relaxed);
thread_used_memory_delta = 0;
}
}
static inline void update_zmalloc_stat_free(size_t size) {
thread_used_memory_delta -= size;
if (thread_used_memory_delta <= THREAD_MEM_MAX_DELTA) {
atomic_fetch_sub_explicit(&used_memory, -thread_used_memory_delta, memory_order_relaxed);
thread_used_memory_delta = 0;
}
} So maybe there's no point using jemalloc's stats, but I'm answering the questions anyway, Ping:
I think But if we use
We would need fallback to count this ourselves, but it can be conditional (ifdef) so no cost if jemalloc is used. |
No. I don't think @lipzhu has implemented it yet. But yes your implementation is what I like to see and 100 KB (or 128 KB?) seems good to me too. |
Actually I wonder if we should go even higher, like 1MB. |
@zuiderkwast @PingXie Thanks for your comments. Let me clarify the decision according to your comments in this issue and pr. Please correct me if I misunderstood.
|
I think we are recommending just option 1 |
The problem/use-case that the feature addresses
Currently, when call the zmalloc/zfree related functions, it will call the zmalloc_size and atomic operations to update
used_memory
, these are all costly operations, is it worth to put such expensive operation in a frequently low API?Some pull requests are submitted to optimize for such kind of this issue.
#308
#453
Alternatives you've considered
Maybe we can consider to remove the absolute accurate property of
used_memory
to trade off between the performance?The text was updated successfully, but these errors were encountered: