Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LettuceConnectionProvider getConnection hang forever #2289

Open
heoYH opened this issue Mar 29, 2022 · 8 comments
Open

LettuceConnectionProvider getConnection hang forever #2289

heoYH opened this issue Mar 29, 2022 · 8 comments
Labels
theme: 4.0 type: enhancement A general enhancement

Comments

@heoYH
Copy link

heoYH commented Mar 29, 2022

We use spring-data-redis and lettuce to access the redis cluster.

I don't know the cause, but there was a problem that the connection could not be initialization complete
As a result, sharedConnection could not be inited and fell into a waiting state forever
And after that, all requests became blocking.

###thread dump

"lettuce-epollEventLoop-5-6" tid=0x3f native=false suspended=false
   java.lang.Thread.State: WAITING
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
	at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
	at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
	at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
	at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
	at io.lettuce.core.cluster.RedisClusterClient.get(RedisClusterClient.java:937)
	at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:329)
	at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:92)
	at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:40)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1527)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1315)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1298)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101)
	at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198)
	at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$1725/0x0000000100cadc40.get(Unknown Source)
	at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85)
"lettuce-epollEventLoop-5-5" tid=0x3e native=false suspended=false
   java.lang.Thread.State: BLOCKED
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1297)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101)
	at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198)
	at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$1725/0x0000000100cadc40.get(Unknown Source)
	at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85)
	at reactor.core.publisher.FluxUsingWhen.subscribe(FluxUsingWhen.java:80)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:236)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203)
	at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:102)
	at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onComplete(FluxSwitchIfEmpty.java:84)
	at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:102)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:88)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:88)
	at reactor.core.publisher.Operators.complete(Operators.java:136)

We are looking for the cause of the connection failing to connect.
My guess is that getConnection should have a timeout.

return LettuceFutureUtils.join(getConnectionAsync(connectionType));


This is because it should not "hang" when sharedConnection init fails for various reasons.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 29, 2022
@mp911de
Copy link
Member

mp911de commented Mar 29, 2022

This is a known issue and we plan to address it. Meanwhile, you can enable eager connection initialization to initialize the connection early on.

@mp911de mp911de added type: enhancement A general enhancement theme: 4.0 and removed status: waiting-for-triage An issue we've not yet triaged labels Mar 29, 2022
@heoYH
Copy link
Author

heoYH commented Mar 29, 2022

@mp911de
Could you please explain this issue if possible?
(I wonder what is causing the problem.)

@mp911de
Copy link
Member

mp911de commented Mar 29, 2022

Sure. SharedConnection synchronizes access to a single shared connection to ensure that we create only a single connection. With one or more event loop threads being blocked, they wait both for completion. They cannot complete because they wait for each other and then you end up with a sort of dead lock.

@TomHaughton
Copy link

Any updates on this? I'm experiencing what I believe to be the same/similar problem and I'm unable to send any commands through the shared connection. I've tried enabling eager initialization to no avail.

@123jiehao
Copy link

@mp911de is there any configuration to avoid this BLOCKED? or only restart redis?

@remnov
Copy link

remnov commented Feb 22, 2024

Proposed workaround works.

To enable eager connection:
public void setEagerInitialization(boolean eagerInitialization)
have to be set to true before afterPropertiesSet()

Example:

LettuceConnectionFactory connectionFactory = new LettuceConnectionFactory(redisConfig, clientConfig);
connectionFactory.setEagerInitialization(true);
connectionFactory.afterPropertiesSet();

@roshansinghkande
Copy link

@mp911de Has there been a permanent fix for this? We are also facing this issue and in turn a lot of consequent issues.

@mp911de
Copy link
Member

mp911de commented Jan 15, 2025

The fix is to enable eager connection initialization and not to use connection validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme: 4.0 type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

7 participants