You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is possible to stall/hang the thread calling Flush() or CompactRange if min_write_buffer_number_to_merge is greater than 1. We previously used v6.20.3 which never had this problem.
Expected behavior
Calls to Flush() or CompactRange() never stall/hang calling thread when min_write_buffer_number_to_merge is greater than 1.
Actual behavior
rocksdb::DBImpl::WaitForFlushMemTables() will wait forever, or until some other activity causes a buffer omitted by PickMemtablesToFlush() to flush.
Steps to reproduce the behavior
Stack trace of the hung thread in v7.3.1 looks like this: #6 0x00007febda17b9ad in rocksdb::DBImpl::WaitForFlushMemTables (this=0x7feb04cc9300, cfds=..., flush_memtable_ids=..., resuming_from_bg_err=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2355 #7 0x00007febda17a1d7 in rocksdb::DBImpl::FlushMemTable (this=0x7feb04cc9300, cfd=0x7febd127c140, flush_options=..., flush_reason=rocksdb::FlushReason::kManualFlush, writes_stopped=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2101 #8 0x00007febda177dfb in rocksdb::DBImpl::Flush (this=0x7feb04cc9300, flush_options=..., column_family=0x7fecfeb9b650) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:1711 #9 0x00007febd9f4d612 in rocksdb::StackableDB::Flush (this=0x7febd394eeb0, fopts=..., column_family=0x7fecfeb9b650) at /home/mmaszewski/.gradle/caches/8.11/transforms/da103af47e391520fee616b7e7f114b5/transformed/cpp-api-headers/rocksdb/utilities/stackable_db.h:344
Call sequence of background thread looks like this:
FlushMemTableToOutputFile needs_to_sync_closed_wals 1, GetLatestMemTableID 101 (~line 194 db_impl_compaction_flush.cc)
FlushMemTableToOutputFile-2 needs_to_sync_closed_wals 1, GetLatestMemTableID 102 (~line 231 db_impl_compaction_flush.cc)
break in PickMemtablesToFlush: GetID 102, max_memtable_id 101 (~line 365 in memtable_list.cc)
The background thread will not schedule a follow-up flush job because IsFlushPending() (~line 332 memtable_list.cc) sees that the num_flush_not_started_ is less than min_write_buffer_number_to_merge_.
Calling SyncWAL() immediately prior to Flush() or CompactRange() is not reliable. Works sometimes, sometimes not.
Only known workaround at this point is setting min_write_buffer_number_to_merge to 1.
I am currently looking for ways to enhance IsFlushPending() to override the comparison of num_flush_not_started_ to min_write_buffer_to_merge_ if manual flush/compact active.
The text was updated successfully, but these errors were encountered:
I suspect that the cleanest solution is to add an atomic int to MemTableList that counts the number of active manual flush / compact range calls, i.e. an active manual action reference count. Update IsFlushPending() to have a third condition that returns true. This third condition would be if reference_count > 0 and num_flush_not_started_ > 0. This would effectively make min_write_buffer_to_merge_=1 only when one or more manual flush / compacts are active.
It is possible to stall/hang the thread calling Flush() or CompactRange if min_write_buffer_number_to_merge is greater than 1. We previously used v6.20.3 which never had this problem.
Expected behavior
Calls to Flush() or CompactRange() never stall/hang calling thread when min_write_buffer_number_to_merge is greater than 1.
Actual behavior
rocksdb::DBImpl::WaitForFlushMemTables() will wait forever, or until some other activity causes a buffer omitted by PickMemtablesToFlush() to flush.
Steps to reproduce the behavior
Stack trace of the hung thread in v7.3.1 looks like this:
#6 0x00007febda17b9ad in rocksdb::DBImpl::WaitForFlushMemTables (this=0x7feb04cc9300, cfds=..., flush_memtable_ids=..., resuming_from_bg_err=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2355
#7 0x00007febda17a1d7 in rocksdb::DBImpl::FlushMemTable (this=0x7feb04cc9300, cfd=0x7febd127c140, flush_options=..., flush_reason=rocksdb::FlushReason::kManualFlush, writes_stopped=false) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2101
#8 0x00007febda177dfb in rocksdb::DBImpl::Flush (this=0x7feb04cc9300, flush_options=..., column_family=0x7fecfeb9b650) at /stardog/libs/rocksdb/db/db_impl/db_impl_compaction_flush.cc:1711
#9 0x00007febd9f4d612 in rocksdb::StackableDB::Flush (this=0x7febd394eeb0, fopts=..., column_family=0x7fecfeb9b650) at /home/mmaszewski/.gradle/caches/8.11/transforms/da103af47e391520fee616b7e7f114b5/transformed/cpp-api-headers/rocksdb/utilities/stackable_db.h:344
Call sequence of background thread looks like this:
FlushMemTableToOutputFile needs_to_sync_closed_wals 1, GetLatestMemTableID 101 (~line 194 db_impl_compaction_flush.cc)
FlushMemTableToOutputFile-2 needs_to_sync_closed_wals 1, GetLatestMemTableID 102 (~line 231 db_impl_compaction_flush.cc)
break in PickMemtablesToFlush: GetID 102, max_memtable_id 101 (~line 365 in memtable_list.cc)
The background thread will not schedule a follow-up flush job because IsFlushPending() (~line 332 memtable_list.cc) sees that the num_flush_not_started_ is less than min_write_buffer_number_to_merge_.
Calling SyncWAL() immediately prior to Flush() or CompactRange() is not reliable. Works sometimes, sometimes not.
Only known workaround at this point is setting min_write_buffer_number_to_merge to 1.
I am currently looking for ways to enhance IsFlushPending() to override the comparison of num_flush_not_started_ to min_write_buffer_to_merge_ if manual flush/compact active.
The text was updated successfully, but these errors were encountered: