Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of files explodes, compaction does not work.. #165

Open
beviah opened this issue Dec 29, 2024 · 4 comments
Open

Number of files explodes, compaction does not work.. #165

beviah opened this issue Dec 29, 2024 · 4 comments

Comments

@beviah
Copy link

beviah commented Dec 29, 2024

tried numerous settings.. something does not work right..

there are thousands of tiny log and sst files ... not getting merged.

@Congyuwang
Copy link
Collaborator

Congyuwang commented Dec 30, 2024

which version are you using? And what platform are you using?

@beviah
Copy link
Author

beviah commented Jan 1, 2025

rocksdict 0.3.24
Python 3.12.3
Ubuntu 24.04.1 LTS

@Congyuwang
Copy link
Collaborator

That's kind of strange. Are you using too many column families maybe? Do you have a minimum code that can reproduce it?

@beviah
Copy link
Author

beviah commented Jan 5, 2025

I managed to get manual compaction working

def speedb_options():
    opt = Options()
    opt.create_if_missing(True)
    opt.create_missing_column_families(True)
    opt.set_max_open_files(-1)  # You don't have this set
    opt.set_max_background_jobs(4)
    opt.set_max_compaction_bytes(512 * 1024 * 1024)
    opt.set_max_subcompactions(4)
    opt.set_compaction_style(DBCompactionStyle.universal())
    opt.increase_parallelism(4)
    opt.set_use_direct_io_for_flush_and_compaction(True)
    opt.set_use_direct_reads(True)
    opt.set_writable_file_max_buffer_size(1024 * 1024)
    opt.set_write_buffer_size(64 * 1024 * 1024)
    opt.set_min_write_buffer_number(2)
    opt.set_max_write_buffer_number(6) 
    opt.set_min_write_buffer_number_to_merge(2)
    opt.set_target_file_size_base(64 * 1024 * 1024)
    opt.set_prefix_extractor(SliceTransform.create_max_len_prefix(8))
    opt.set_atomic_flush(True)
    return opt

i have 4 column families with above options. do not use defaults.

db = Rdict(
   shard_path, 
    speedb_options(), 
    column_families=column_families, 
    access_type=AccessType.read_write()
)

wb = WriteBatch()
wb.set_default_column_family(db.get_column_family_handle(x.cf))
for vid, content in vector_contents.items():
    wb[vid] = content
db.write(wb)

contents are just small jsons or lists of integers, depending on column family, vids are integers.

Will try to reproduce with separate minimal example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants