-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: build invert index distributely #3452
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3452 +/- ##
==========================================
- Coverage 78.93% 78.62% -0.32%
==========================================
Files 251 251
Lines 92267 93129 +862
Branches 92267 93129 +862
==========================================
+ Hits 72833 73224 +391
- Misses 16463 16929 +466
- Partials 2971 2976 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@BubbleCal could you please help review this PR? |
Related to #3269.
I want to build an inverted index for Lance on a distributed system(ray/spark). Currently, I have modified the interface for creating an index to allow an array of fragment IDs to be passed in. If this array is passed in, the index creation interface will return an index object.
I also changed CreateIndex operation definition in python, make it similar to rust version. I don't know why it's different from rust version.
https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch
both query-then-fetch & dfs-query-then-fetch are supported.
query-then-fetch is fast, and not accurate, if the number of texts is big, this mode is good enough, it's also default mode of es.
dfs-query-then-fetch is slow, but accurate. it's very useful if data is skew or small.
next step I also want to do fts query distributely.