Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close connections in case of an exception #2201

Merged
merged 2 commits into from
Dec 12, 2024
Merged

Conversation

MZauchner
Copy link
Contributor

@MZauchner MZauchner commented Dec 5, 2024

Fixes: celery/celery#9259

Issue

When an amqp exception occurs while a connection is in use, the connection becomes unusable. Any attempts to use the connection can potentially block indefinitely.
Currently, such broken connections are placed back into the connection pool, which causes a later acquire call on the pool to return a broken connection.

Reproducer

See celery/celery#9259
When a celery app is created with the confirm_publish=True option, it is possible for a celery task's `apply_async' method to block indefinitely if a broken connection is retrieved from the connection pool.

Fix

The ProducerPool::acquire and ConnectionPool::acquire method now return a wrapped Connection/Producer, that ensures the connections are properly closed in case of an exception. A reproducer for the original bug was added as an integration test, to ensure that the fix worked.

Additional comments

I believe there is a similar issue with the ChannelPool, but in this case I think it is not possible to simply close the connection, as other channels acquired from the pool may still be in use.

@Nusnus Nusnus self-requested a review December 5, 2024 15:15
Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests to this please?
That confirms the bug fails without the fix on current main.

Thank you! 🙏

@MZauchner MZauchner marked this pull request as draft December 5, 2024 15:22
@MZauchner
Copy link
Contributor Author

Sure, I'll also have a look at the tests that failed.

@Nusnus
Copy link
Member

Nusnus commented Dec 5, 2024

Sure, I'll also have a look at the tests that failed.

You can run it locally with tox -e 3.13-unit.

@MZauchner MZauchner force-pushed the main branch 2 times, most recently from 82e31c9 to 8869895 Compare December 5, 2024 20:32
@MZauchner
Copy link
Contributor Author

@Nusnus Thanks a lot for your tip with the tests! I added an integration test to the py_amqp tests that should fail without the change.
Also reworked it a bit so the public API of the connection and producer pools are not changed.

@MZauchner MZauchner marked this pull request as ready for review December 5, 2024 20:37
@MZauchner MZauchner force-pushed the main branch 2 times, most recently from f5f43c1 to 388b7c8 Compare December 5, 2024 21:07
Copy link

codecov bot commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.50%. Comparing base (5ae6ebc) to head (c568007).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
kombu/messaging.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2201      +/-   ##
==========================================
+ Coverage   81.49%   81.50%   +0.01%     
==========================================
  Files          77       77              
  Lines        9509     9522      +13     
  Branches     1148     1151       +3     
==========================================
+ Hits         7749     7761      +12     
  Misses       1569     1569              
- Partials      191      192       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MZauchner MZauchner requested a review from Nusnus December 5, 2024 21:19
@MZauchner MZauchner force-pushed the main branch 3 times, most recently from d49e6f6 to 2c3bd2c Compare December 6, 2024 10:48
Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

Good work 🔥

kombu/connection.py Outdated Show resolved Hide resolved
t/unit/test_connection.py Show resolved Hide resolved
@Nusnus
Copy link
Member

Nusnus commented Dec 6, 2024

Sure, I'll also have a look at the tests that failed.

You can run it locally with tox -e 3.13-unit.

See also: tox -e lint ;)

kombu/pools.py Outdated Show resolved Hide resolved
@MZauchner MZauchner force-pushed the main branch 2 times, most recently from d79f9dc to 0839f26 Compare December 7, 2024 01:32
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@MZauchner MZauchner requested a review from Nusnus December 7, 2024 15:54
MZauchner and others added 2 commits December 12, 2024 21:18
When an exception occurs while the connection is in use, the connection
may become unsuable. Such connections need to be closed, such that the
connection to the broker can be reestablished at a later point.
Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!
I also checked a patched Kombu vs all of Celery main’s CI to make sure we won’t break anything there.
Everything passes.

Good work!

Merge once the CI here goes full green :)

@Nusnus Nusnus merged commit 2560060 into celery:main Dec 12, 2024
42 checks passed
zs-neo pushed a commit to zs-neo/kombu that referenced this pull request Dec 24, 2024
* Close connections in case of an exception

When an exception occurs while the connection is in use, the connection
may become unsuable. Such connections need to be closed, such that the
connection to the broker can be reestablished at a later point.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
zs-neo pushed a commit to zs-neo/kombu that referenced this pull request Dec 24, 2024
* Close connections in case of an exception

When an exception occurs while the connection is in use, the connection
may become unsuable. Such connections need to be closed, such that the
connection to the broker can be reestablished at a later point.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
zs-neo pushed a commit to zs-neo/kombu that referenced this pull request Dec 24, 2024
* Close connections in case of an exception

When an exception occurs while the connection is in use, the connection
may become unsuable. Such connections need to be closed, such that the
connection to the broker can be reestablished at a later point.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apply_async hangs indefinitely with confirm_publish=True
3 participants