Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arrays contains function #860

Merged
merged 1 commit into from
Feb 3, 2025
Merged

Conversation

shcheklein
Copy link
Member

@shcheklein shcheklein commented Jan 27, 2025

Adds:

chain.filter(array.contains("emd", 1.0)).show()

to be used in queries like:

embeddings = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]
chain = DataChain.from_values(emd=embeddings).save("embeddings")
chain.filter(array.contains("emd", 1.0)).show()

TODO:

  • Add unit tests
  • Add func tests
  • Check basic CH
  • Add complex structures and Pydantic model inputs support (can be a followup)
  • Ability to pass column as a second value (can be a followup)
  • Ability to pass Func as a second value (can be a followup)
  • Update docs
  • Update examples
  • Check return type of the function (bool vs int)
    • CH has return 0/1 - better to be consistent

Copy link

cloudflare-workers-and-pages bot commented Jan 27, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: b646f98
Status: ✅  Deploy successful!
Preview URL: https://8e632ef3.datachain-documentation.pages.dev
Branch Preview URL: https://add-array-contains-function.datachain-documentation.pages.dev

View logs

@shcheklein shcheklein marked this pull request as draft January 27, 2025 03:13
Copy link

codecov bot commented Jan 27, 2025

Codecov Report

Attention: Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.75%. Comparing base (aed4ae7) to head (b646f98).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/sql/sqlite/types.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #860      +/-   ##
==========================================
+ Coverage   87.69%   87.75%   +0.05%     
==========================================
  Files         128      128              
  Lines       11429    11456      +27     
  Branches     1542     1544       +2     
==========================================
+ Hits        10023    10053      +30     
+ Misses       1019     1016       -3     
  Partials      387      387              
Flag Coverage Δ
datachain 87.67% <96.77%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shcheklein shcheklein force-pushed the add-array-contains-function branch 3 times, most recently from bfa2119 to cf9912d Compare February 2, 2025 00:12
@shcheklein shcheklein force-pushed the add-array-contains-function branch from cf9912d to b646f98 Compare February 2, 2025 00:27
@shcheklein shcheklein marked this pull request as ready for review February 2, 2025 00:48
@shcheklein shcheklein requested a review from a team February 2, 2025 00:48
@shcheklein shcheklein merged commit 07df868 into main Feb 3, 2025
37 checks passed
@shcheklein shcheklein deleted the add-array-contains-function branch February 3, 2025 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants