Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual Flush() or CompactRange() can stall in 7.04, 7.3.1, 8.67, 9.7 #13280

Open
matthewvon opened this issue Jan 8, 2025 · 2 comments
Open

Comments

@matthewvon
Copy link
Contributor

It is possible to stall/hang the thread calling Flush() or CompactRange if min_write_buffer_number_to_merge is greater than 1. We previously used v6.20.3 which never had this problem.

Expected behavior

Calls to Flush() or CompactRange() never stall/hang calling thread when min_write_buffer_number_to_merge is greater than 1.

Actual behavior

rocksdb::DBImpl::WaitForFlushMemTables() will wait forever, or until some other activity causes a buffer omitted by PickMemtablesToFlush() to flush.

Steps to reproduce the behavior

Stack trace of the hung thread in v7.3.1 looks like this:
#6 0x00007febda17b9ad in rocksdb::DBImpl::WaitForFlushMemTables (this=0x7feb04cc9300, cfds=..., flush_memtable_ids=..., resuming_from_bg_err=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2355
#7 0x00007febda17a1d7 in rocksdb::DBImpl::FlushMemTable (this=0x7feb04cc9300, cfd=0x7febd127c140, flush_options=..., flush_reason=rocksdb::FlushReason::kManualFlush, writes_stopped=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2101
#8 0x00007febda177dfb in rocksdb::DBImpl::Flush (this=0x7feb04cc9300, flush_options=..., column_family=0x7fecfeb9b650) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:1711
#9 0x00007febd9f4d612 in rocksdb::StackableDB::Flush (this=0x7febd394eeb0, fopts=..., column_family=0x7fecfeb9b650) at /home/mmaszewski/.gradle/caches/8.11/transforms/da103af47e391520fee616b7e7f114b5/transformed/cpp-api-headers/rocksdb/utilities/stackable_db.h:344

Call sequence of background thread looks like this:
FlushMemTableToOutputFile needs_to_sync_closed_wals 1, GetLatestMemTableID 101 (~line 194 db_impl_compaction_flush.cc)
FlushMemTableToOutputFile-2 needs_to_sync_closed_wals 1, GetLatestMemTableID 102 (~line 231 db_impl_compaction_flush.cc)
break in PickMemtablesToFlush: GetID 102, max_memtable_id 101 (~line 365 in memtable_list.cc)

The background thread will not schedule a follow-up flush job because IsFlushPending() (~line 332 memtable_list.cc) sees that the num_flush_not_started_ is less than min_write_buffer_number_to_merge_.

Calling SyncWAL() immediately prior to Flush() or CompactRange() is not reliable. Works sometimes, sometimes not.

Only known workaround at this point is setting min_write_buffer_number_to_merge to 1.

I am currently looking for ways to enhance IsFlushPending() to override the comparison of num_flush_not_started_ to min_write_buffer_to_merge_ if manual flush/compact active.

@matthewvon
Copy link
Contributor Author

I suspect that the cleanest solution is to add an atomic int to MemTableList that counts the number of active manual flush / compact range calls, i.e. an active manual action reference count. Update IsFlushPending() to have a third condition that returns true. This third condition would be if reference_count > 0 and num_flush_not_started_ > 0. This would effectively make min_write_buffer_to_merge_=1 only when one or more manual flush / compacts are active.

Thoughts?

@matthewvon
Copy link
Contributor Author

I am wondering if this bug could also hang calling thread for snapshot and backup. Have not looked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant