Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] sentinel use info sentinel command to run faster #1511

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

kukey
Copy link
Contributor

@kukey kukey commented Jan 6, 2025

In sentinel cluster, sentinels will send info command to collect valkey-server state;
but in large cluster, the info command response is heavy and slow, so we need some
simple info response for sentinel.

changes in this pr:

valkey-server:

  • add info-simple-for-sentinel flag, to distinguish the valkey-server is support simple info;
  • info command add sentinel subcommand, to cover sentinel used info

sentinel:

  • sentinelValkeyInstance struct add info_simple flag;
  • sentinelSendPeriodicCommands will check info_simple to send diff comamnd;
  • reconnect valkey instacne, reset info_simple flag;
  • handle info response from server to parse info_simple flag

performence test:

  • info comamnd:
    valkey-benchmark -q -n 1000000 info
    info: 65841.45 requests per second, p50=0.703 msec
  • info sentinel:
    valkey-benchmark -q -n 1000000 info sentinel
    info sentinel: 236910.67 requests per second, p50=0.111 msec

Copy link

codecov bot commented Jan 6, 2025

Codecov Report

Attention: Patch coverage is 67.50000% with 13 lines in your changes missing coverage. Please review.

Project coverage is 70.83%. Comparing base (b3b4bdc) to head (2837997).
Report is 2 commits behind head on unstable.

Files with missing lines Patch % Lines
src/sentinel.c 0.00% 7 Missing ⚠️
src/server.c 81.81% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1511      +/-   ##
============================================
- Coverage     70.83%   70.83%   -0.01%     
============================================
  Files           120      120              
  Lines         64911    64930      +19     
============================================
+ Hits          45982    45993      +11     
- Misses        18929    18937       +8     
Files with missing lines Coverage Δ
src/config.c 78.33% <ø> (ø)
src/server.h 100.00% <ø> (ø)
src/server.c 87.33% <81.81%> (-0.14%) ⬇️
src/sentinel.c 0.00% <0.00%> (ø)

... and 10 files with indirect coverage changes

@kukey kukey force-pushed the opti-sentinel-info-simple branch from 2837997 to 2a5d3f6 Compare January 6, 2025 13:54
@@ -3373,6 +3373,9 @@ standardConfig static_configs[] = {
createSpecialConfig("replicaof", "slaveof", IMMUTABLE_CONFIG | MULTI_ARG_CONFIG, setConfigReplicaOfOption, getConfigReplicaOfOption, rewriteConfigReplicaOfOption, NULL),
createSpecialConfig("latency-tracking-info-percentiles", NULL, MODIFIABLE_CONFIG | MULTI_ARG_CONFIG, setConfigLatencyTrackingInfoPercentilesOutputOption, getConfigLatencyTrackingInfoPercentilesOutputOption, rewriteConfigLatencyTrackingInfoPercentilesOutputOption, NULL),

/* Capabalities */
createBoolConfig("info-simple-for-sentinel", NULL, IMMUTABLE_CONFIG, server.info_simple_for_sentinel, 1, NULL, NULL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this config only for sentinel? If yes, I think it should be moved to sentinel.c, we have some specific config parameters for sentinel node.

Copy link
Contributor Author

@kukey kukey Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,this config only for sentinel, but it's mean valkey has a certain ability that sentinel could send diff command to collect instance stats.
It's a server capability. I think it can't move to sentinel.c.

@gmbnomis
Copy link
Contributor

gmbnomis commented Jan 9, 2025

I am wondering why you consider the performance of the INFO command relevant in this case (or to be more precise what you mean by "large cluster"):

Sentinels do not scale by sharding, all members of the sentinel cluster manage all primaries. The reason to have multiple sentinel instances is to increase robustness. By default, each sentinel instance sends an INFO command every 10 seconds to each monitored Valkey node. (And IIRC, one can change that timing only by using a debug command)

So, even if we assume that we have 9 Sentinels (I assume most deployment use 3 or 5), this means that each Valkey node receives around one INFO command per second. For example:

$ valkey-cli -p 30000 info commandstats | grep cmdstat_info ; sleep 60 ; valkey-cli -p 30000 info commandstats | grep cmdstat_info

cmdstat_info:calls=123,usec=25555,usec_per_call=207.76,rejected_calls=0,failed_calls=0
cmdstat_info:calls=178,usec=34934,usec_per_call=196.26,rejected_calls=0,failed_calls=0

Which are 54 INFO commands per minute. This means that the performance of the INFO command is completely negligible.

@kukey Could you explain in which scenario the performance improvement achieved by this PR becomes relevant?

@kukey
Copy link
Contributor Author

kukey commented Jan 10, 2025

@gmbnomis We use 3-6 sentinel to manage thousands of valley isntance, if sentinel use this simple info, could save more cpu and bandwidth; and to the client and server, it`s more stable and smoother latency

@gmbnomis
Copy link
Contributor

gmbnomis commented Jan 14, 2025

@gmbnomis We use 3-6 sentinel to manage thousands of valley isntance, if sentinel use this simple info, could save more cpu and bandwidth; and to the client and server, it`s more stable and smoother latency

I see. So, the performance improvements of the INFO command given in the PR description are not really the point of this PR. It is about sentinel performance.

I noticed that the master_link_down_since_seconds field (usually, it is in the "replication" section) is not part of your proposal. However, it is expected by sentinel. If this field is present and greater than zero, the info period for the respective node decreases from 10s to 1s, i.e. a tenfold increase in INFO commands.

I am wondering if the improvements you see may be caused (partly) by the omission of this field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants