Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReferenceFileSystem: use fs.open instead of fs._open #823

Merged
merged 1 commit into from
Jan 16, 2025
Merged

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jan 16, 2025

There is a bug in fsspec==2024.12.0 that causes the ReferenceFileSystem to incorrectly make fs._open return a coroutine object instead of a file-like object. (See a proposed PR to fix this issue: fsspec/filesystem_spec#1769.)

We have a test for the expected behavior (test_arrow_generator_partitioned in tests/unit/lib/test_arrow.py) running in the CI environment.
But that does not fail because the latest version of fsspec does not get installed in the CI due to the upper limit set by the datasets library.

The datasets library is only installed as part of the hf and tests extras, so the default installation of datachain will encounter this issue.

How has this been tested?

I have tested this PR with older and newer version of fsspec and the test passes on both. And the test fails with the latest version of fsspec without this patch.

Fixes #806.

There is a bug in `fsspec==2024.12.0` that causes the `ReferenceFileSystem` to
incorrectly make `fs._open` return a coroutine object instead of a file-like object.
(See a proposed PR to fix this issue: fsspec/filesystem_spec#1769.)

We have a test for the expected behavior (`test_arrow_generator_partitioned` in `tests/unit/lib/test_arrow.py`)
running in the CI environment.
But that does not fail because the latest version of `fsspec` does not get installed in the CI
due to the upper limit set by the `datasets` library.

The `datasets` library is only installed as part of the `hf` and `tests` extras,
so the default installation of `datachain` will encounter this issue.

Fixes #806.
@skshetry skshetry requested a review from a team January 16, 2025 11:10
Copy link

codecov bot commented Jan 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.50%. Comparing base (aad99e2) to head (59d8dba).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #823   +/-   ##
=======================================
  Coverage   87.50%   87.50%           
=======================================
  Files         128      128           
  Lines       11344    11344           
  Branches     1538     1538           
=======================================
  Hits         9926     9926           
  Misses       1031     1031           
  Partials      387      387           
Flag Coverage Δ
datachain 87.43% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry skshetry enabled auto-merge (squash) January 16, 2025 14:16
@skshetry skshetry merged commit 1962a64 into main Jan 16, 2025
38 checks passed
@skshetry skshetry deleted the fix-806 branch January 16, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrowGenerator is broken on caching with latest fsspec release
2 participants