Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow replacement of entire datafile when the schema lines up correctly #3408

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chebbyChefNEQ
Copy link
Contributor

@chebbyChefNEQ chebbyChefNEQ commented Jan 22, 2025

For internal design doc see notion or ping me

What this PR implements
image

Plan of attack:

  • This PR: basic functionality, i.e. when there is no conflict calling this tx should just work
  • Next PR: implement more fine-grained conflict resolution
  • Potential future PR (when time permits): Allow partial replacement of a datafile. This can be done by "dropping" column indice in a datafile, thereby dropping the column in favor of another

TODO:

  • proto definition of the new transaction
  • simple rust tests
  • test error handling
  • PR desc
  • python tests
  • implement conflict detection

@chebbyChefNEQ chebbyChefNEQ changed the title feat: allow replacement of entire datafile when the schema lines up c… feat: allow replacement of entire datafile when the schema lines up correctly Jan 22, 2025
@github-actions github-actions bot added the enhancement New feature or request label Jan 22, 2025
@chebbyChefNEQ chebbyChefNEQ force-pushed the rmeng/null-col branch 3 times, most recently from f636176 to 7881bf7 Compare January 30, 2025 02:25
@@ -23,21 +23,25 @@
//! them to be compatible.
//!
//! | | Append | Delete / Update | Overwrite/Create | Create Index | Rewrite | Merge | Project | UpdateConfig |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed (1), (2), (3) to emoji so the table width align better

@chebbyChefNEQ chebbyChefNEQ force-pushed the rmeng/null-col branch 6 times, most recently from c1845b6 to c615eb0 Compare January 30, 2025 02:59
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 109 lines in your changes missing coverage. Please review.

Project coverage is 78.77%. Comparing base (a7c5216) to head (c615eb0).

Files with missing lines Patch % Lines
rust/lance/src/dataset/transaction.rs 0.00% 109 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3408      +/-   ##
==========================================
- Coverage   78.86%   78.77%   -0.10%     
==========================================
  Files         250      250              
  Lines       91602    91711     +109     
  Branches    91602    91711     +109     
==========================================
  Hits        72244    72244              
- Misses      16395    16503     +108     
- Partials     2963     2964       +1     
Flag Coverage Δ
unittests 78.77% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chebbyChefNEQ chebbyChefNEQ force-pushed the rmeng/null-col branch 3 times, most recently from a532430 to 4d50cac Compare February 1, 2025 21:40
@chebbyChefNEQ chebbyChefNEQ force-pushed the rmeng/null-col branch 2 times, most recently from 6feccd3 to 895d8d0 Compare February 1, 2025 21:53
@chebbyChefNEQ chebbyChefNEQ marked this pull request as ready for review February 1, 2025 22:06
@eddyxu eddyxu requested a review from wkalt February 1, 2025 22:07
@chebbyChefNEQ chebbyChefNEQ added the rust Rust related tasks label Feb 1, 2025
@chebbyChefNEQ chebbyChefNEQ force-pushed the rmeng/null-col branch 3 times, most recently from 7f65391 to d757a62 Compare February 2, 2025 18:04
@github-actions github-actions bot added the python label Feb 2, 2025
@chebbyChefNEQ chebbyChefNEQ force-pushed the rmeng/null-col branch 3 times, most recently from 63e6794 to b300e35 Compare February 3, 2025 14:16
@@ -134,7 +134,7 @@ def take_rows(
if indices[i] > indices[i + 1]:
raise ValueError(
f"Indices must be sorted in ascending order for \
file API, got {indices[i]} > {indices[i+1]}"
file API, got {indices[i]} > {indices[i + 1]}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff did this

@@ -161,7 +161,7 @@ def on_write_complete(

if len(write_results) == 0:
warnings.warn(
"write results is empty. please check ray version " "or internal error",
"write results is empty. please check ray version or internal error",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff did this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python rust Rust related tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants