Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lettuce created more than 1 native redis connection to redis cluster node #3127

Open
jmf0526 opened this issue Jan 15, 2025 · 4 comments
Open

Comments

@jmf0526
Copy link

jmf0526 commented Jan 15, 2025

Current Behavior

In my env there are 40 redis cluster, each cluster has 20 master nodes and 20 replica nodes, so in total there are 1600 redis nodes. When my redis client application start up, there are 800+ redis connection on client side (observed by netstat), roughly 1 connection per redis master; but when the application workload is heavy , the connection number goes up to 7000+, on average there are 9 connection per one master node, and those connections never close.
There are multiple application instance on one server, I fear that if connection number increases indefinitly, tcp port could get exasted on the application server.

Java client code

The following code is used to create one redisTemplate for each redis cluster, since there are 40 cluster, so 40 redisTemplate are created in total.

ClusterTopologyRefreshoptions clusterTopologyRefreshoptions = clusterTopologyRefreshoptions.builder()
.enableAllAdaptiveRefreshTriggers()
.enablePeriodicRefresh(Duration.ofSeconds(300)).build();
                                                                                  
Clusterclientoptions clusterclientoptions = clusterclientoptions.builder().topologyRefreshoptions(ClusterTopologyRefreshoptions).build();

LettuceclientConfiguration lettuceclientconfiguration = LettucePoolingclientConfiguration.builder()
.readFrom(ReadFrom.MASTER_PREFERRED)
.clientOptions(clusterClient0ptions)
.clientResources(DefaultclientResources.builder().commandLatencyRecorder(CommandLatencyRecorder.disabled()).build())
.build();
//20master 20replica
Redisclusterconfiguration redisclusterconfiguration = createRedisclusterconfiguration();
Lettuceconnectionfactory lettuceconnectionfactory = new Lettuceconectionfactory(redisclusterconfiguration, lettuceclientconfiguration);

StringRedisTemplate redisTemplate = new StringRedisTemplate();
redisTemplate.setConnectionFactory(lettuceconnectionfactory );
redisTemplate.afterPropertiesSet();
return redisTemplate;

Expected behavior/code

Based on this comment (#860 (comment)) the tcp connection per cluster is (2 * number of nodes) + 1, but this is not true in my ev, with 40X40 nodes, the connection number should be 40*((2 * 40) +1))=3240, but in my env there are 7000+ connections.

My Question

How many native Redis connections does Lettuce create to connect to a Redis cluster node? From what I understand, Lettuce shares native Redis connections between threads. However, in my environment, I observe that up to 10+ connections can be created per cluster node.

Is there something wrong in my code?
Or is this behavior configurable?

Environment

  • Lettuce version(s): 6.1.10.RELEASE
  • Spring data redis version: 2.3.2.RELEASE
  • Redis version: 6.2.7
@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Jan 16, 2025
@jmf0526
Copy link
Author

jmf0526 commented Jan 20, 2025

Sorry, this is a wrong report ,I just found that I was using pooling client config accidentally...

LettuceclientConfiguration lettuceclientconfiguration = LettucePoolingclientConfiguration.builder()

After changing the above code to normal client config, the tcp connection count remain at a static normal value.

LettuceclientConfiguration lettuceclientconfiguration = LettuceclientConfiguration.builder()

But there is still a problem, this configuration caused a huge number (more than 200K) of TIME_WAIT connection.
This wiki says a redisClusterConnection object consists of native connections to every node, does this mean that if a single redisclusterConnection object is shared between threads to handle all the operation, there shouldn't be so many TIME_WAIT connections? Does spring-data-redis or lettuce create a new redisClusterConnection object for each redis operation?

@tishun
Copy link
Collaborator

tishun commented Jan 20, 2025

Hey @jmf0526,

But there is still a problem, this configuration caused a huge number (more than 200K) of TIME_WAIT connection.

It's really hard to speculate on this topic, as you have a very specific set up that would take a lot of time to replicate and investigate.

I can however comment on some of the questions you have:

This wiki says a redisClusterConnection object consists of native connections to every node, does this mean that if a single redisclusterConnection object is shared between threads to handle all the operation, there shouldn't be so many TIME_WAIT connections?

Not necessarily. Depending on your usage of the driver it could open up or close connections on different occasions, such as - but not limited to - when there is a periodic topology refresh. You seem to have enabled all adaptive refresh options and also set a periodic refresh, so these are all connections that would be established to keep the topology up-to-date and they are separate from the connections you use to send out commands.

Does spring-data-redis or lettuce create a new redisClusterConnection object for each redis operation?

Depends. The driver has an internal cache (or pool) of connections to reuse connections to nodes, but in a huge environment like yours many things might trigger the connection to be closed - topology refresh, lack of resources, etc. Connecting to 1600 nodes from a single driver is a challanging task for many reasons.

You'd have to do some digging yourself to see if this behavior is acceptable or it is the cause of driver bug / configuration issue. One way to do that is to record the driver actions with the Java Flight Recorder for instance and/or register for events in the EventBus.

Does that help?

@tishun tishun added status: waiting-for-feedback We need additional information before we can continue and removed for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Jan 23, 2025
@jmf0526
Copy link
Author

jmf0526 commented Jan 24, 2025

@tishun Thank you for your reply! It's indeed helpful!
I have switched the configuration to LettucePoolingclientConfiguration, and have tuned pool paramters and topo refresh period, now both normal and TIME_WAITS connections count stay at an acceptable number. But if I use a LettuceClientConfiguration, the number TIME_WAITS connection is still very high, I will try to figure out the reason if I got time in the future.

Depends. The driver has an internal cache (or pool) of connections to reuse connections to nodes,

Anothor question, is this statement true if a non-pool configutaion(LettuceclientConfiguration) is used? I thought only one StatefulRedisClusterConnection would be used in non-pool configuration. And am I supposed to use a non-pool configuration if there is no blocking redis operations in my application?

@tishun
Copy link
Collaborator

tishun commented Jan 24, 2025

@tishun Thank you for your reply! It's indeed helpful! I have switched the configuration to LettucePoolingclientConfiguration, and have tuned pool paramters and topo refresh period, now both normal and TIME_WAITS connections count stay at an acceptable number. But if I use a LettuceClientConfiguration, the number TIME_WAITS connection is still very high, I will try to figure out the reason if I got time in the future.

Hm, can't tell why this worked out better without doing some investigation on the environment itself.
I did some digging and found out there already is an issue that describes the same exact behaviour - #2463
A plausible explanation is that the manual connection pool does a better job at recycling the connections.

Some additional background from the stackoverflow discussion : https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux

Depends. The driver has an internal cache (or pool) of connections to reuse connections to nodes,

Anothor question, is this statement true if a non-pool configutaion(LettuceclientConfiguration) is used? I thought only one StatefulRedisClusterConnection would be used in non-pool configuration. And am I supposed to use a non-pool configuration if there is no blocking redis operations in my application?

Yes, the driver pools connections when there is a cluster connection (RedisClusterClient) and there is no manual connection pool configured. The reason for that is that each thread might end up connecting to a different node. If Thread A opens up a connection to Node A, then Thread B wants to send a command to Node A the driver will pull the connection to Node A from the pool instead of making a new connection. All connections are created eagerly to save resources. Each node has 2 connections (READ + WRITE). See some more details here.

In this mode - however - threads are not guaranteed to have dedicated connections, while when you manually create a pool they are (enabling blocking operations and transactions). Manually creating a connection pool means more configuration and could result in worse performance, but it all depends on the use case.

@tishun tishun added status: waiting-for-triage and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants