Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate creation of index in join #70

Open
harshitgupta412 opened this issue Jan 8, 2025 · 0 comments
Open

Duplicate creation of index in join #70

harshitgupta412 opened this issue Jan 8, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@harshitgupta412
Copy link
Collaborator

Describe the bug
Every join to sem_join_cascade creates the index for second series (l2) from scratch since it call sem_index on it. Also, it can potentially override an existing index.

See:

l2_df = l2_df.sem_index(col2_label, f"{col2_label}_index")

Expected behavior
The index should be recreated only if the data has changed. Using time to differentiate the versions or creating it in ~/.cache might be better so that we don't override the user's existing indices.

@harshitgupta412 harshitgupta412 added the bug Something isn't working label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant