Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-12553: ensure that memtable is reclaimed even when notification subscribers throw #1545

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jakubzytka
Copy link

What is the issue

Cassandra doesn't properly support throwing notification subscribers
that fail flushes. In such case the flush is interrupted (despite
multiple uses of exception-safe code and accumulating exceptions)
after the sstable creation transaction committed, but before the
memtable has been reclaimed. As a result the memtable allocator believes
more and more memory is being used and being reclaimed eventually
stopping writes due to apparent lack of memory in the memtable.

What does this PR fix and why was it fixed

This patch changes memtable flushing behaviour so that the memtable
is reclaimed iff it has been removed from the View, regardless
of whether the flush fails or not.

Copy link

github-actions bot commented Feb 4, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

subscribers throw

The direct cause of CNDB-12553 is that CNDB-specific subscriber to
SSTableAddingNotification throws an error, and Cassandra doesn't
handle it properly. In such case the flush is interrupted (despite
multiple uses of exception-safe code and accumulating exceptions)
after the sstable creation transaction committed, but before the
memtable has been reclaimed. As a result the memtable allocator believes
more and more memory is being used and being reclaimed eventually
stopping writes due to apparent lack of memory in the memtable.

This patch changes memtable flushing behaviour so that the memtable
is reclaimed iff it has been removed from the View, regardless
of whether the flush fails or not.
@jakubzytka jakubzytka force-pushed the cndb-12553-ensure-memtable-reclaimed-when-notification-subscriber-throws branch from 337bcb0 to 67e28f6 Compare February 5, 2025 09:53
@jakubzytka jakubzytka requested a review from a team February 5, 2025 10:30
@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1545 rejected by Butler


1 new test failure(s) in 2 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔴 🔵🔵🔵🔵🔵🔵🔵

Found 7 known test failures

@jacek-lewandowski jacek-lewandowski self-requested a review February 5, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants