Duplicate creation of index in join #70

harshitgupta412 · 2025-01-08T02:19:05Z

Describe the bug
Every join to sem_join_cascade creates the index for second series (l2) from scratch since it call sem_index on it. Also, it can potentially override an existing index.

See:

lotus/lotus/sem_ops/sem_join.py

Line 287 in 9761855

l2_df = l2_df.sem_index(col2_label, f"{col2_label}_index")

Expected behavior
The index should be recreated only if the data has changed. Using time to differentiate the versions or creating it in ~/.cache might be better so that we don't override the user's existing indices.

The text was updated successfully, but these errors were encountered:

harshitgupta412 added the bug Something isn't working label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate creation of index in join #70

Duplicate creation of index in join #70

harshitgupta412 commented Jan 8, 2025

Duplicate creation of index in join #70

Duplicate creation of index in join #70

Comments

harshitgupta412 commented Jan 8, 2025