Speed Optimizations #89

sidjha1 · 2025-01-26T01:01:13Z

Swap pydantic with dataclass to save on validation times
Use smaller K in run_sem_sim_join
Enjoy better batching behavior in sem_join
Better batch execution in LM::__call__

sidjha1 · 2025-01-26T01:05:46Z

lotus/sem_ops/sem_join.py

@@ -286,7 +310,7 @@ def run_sem_sim_join(l1: pd.Series, l2: pd.Series, col1_label: str, col2_label:
    l2_df = l2.to_frame(name=col2_label)
    l2_df = l2_df.sem_index(col2_label, f"{col2_label}_index")

-    K = len(l2) * len(l1)
+    K = len(l2)


@melissa-pan can you verify this change? I don't think K should be multiplied by len(l1) since we are only searching an index on l2

as per discussion, k should be l2. thanks!

sidjha1 added 2 commits January 25, 2025 16:58

Optimize

6595019

Merge branch 'main' into sid/speed

30e8f73

sidjha1 commented Jan 26, 2025

View reviewed changes

sidjha1 requested a review from melissa-pan January 26, 2025 01:05

liana313 approved these changes Jan 26, 2025

View reviewed changes

liana313 merged commit 1532528 into main Jan 26, 2025
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed Optimizations #89

Speed Optimizations #89

sidjha1 commented Jan 26, 2025

sidjha1 Jan 26, 2025

melissa-pan Jan 26, 2025

Speed Optimizations #89

Speed Optimizations #89

Conversation

sidjha1 commented Jan 26, 2025

sidjha1 Jan 26, 2025

Choose a reason for hiding this comment

melissa-pan Jan 26, 2025

Choose a reason for hiding this comment