Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

referencefs: add streaming _open support #1778

Merged
merged 3 commits into from
Feb 1, 2025

Conversation

skshetry
Copy link
Contributor

Closes #1771.

@skshetry skshetry force-pushed the referencefs-stream-open branch 2 times, most recently from 666d9b5 to a598a9e Compare January 27, 2025 15:40
@skshetry skshetry marked this pull request as ready for review January 27, 2025 15:40
Comment on lines 1103 to 1111
part_or_url, start0, end0 = self._cat_common(path)
if isinstance(part_or_url, bytes):
return io.BytesIO(part_or_url[start0:end0])

protocol, _ = split_protocol(part_or_url)
if start0 is None and end0 is None:
return self.fss[protocol]._open(
Copy link
Contributor Author

@skshetry skshetry Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we could put these in ReferenceFile, but I avoided adding unnecessary redirection. That does mean we will do _cat_common twice if it ends up going to the ReferenceFile. It does not look very critical, so I think that's acceptable(?).

That said, I don't have a strong opinion on this and can move it if deemed necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Maybe add a comment to say what it's doing.

Copy link
Contributor Author

@skshetry skshetry Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small comment. PTAL.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks totally good to me.

There is apparently something being picked up by codespell.

The dask installation failure in the downstream run would be fixed by removing dask-expr from the environment. Please do fix this.

The error in the s3fs might already be fixed, but is in any case due to a recent release of botocore and nothing to do with this PR.

Comment on lines 1103 to 1111
part_or_url, start0, end0 = self._cat_common(path)
if isinstance(part_or_url, bytes):
return io.BytesIO(part_or_url[start0:end0])

protocol, _ = split_protocol(part_or_url)
if start0 is None and end0 is None:
return self.fss[protocol]._open(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Maybe add a comment to say what it's doing.

@skshetry skshetry force-pushed the referencefs-stream-open branch from a598a9e to 4ed21b1 Compare January 27, 2025 16:01
@martindurant martindurant merged commit 7cc1644 into fsspec:master Feb 1, 2025
10 checks passed
@skshetry skshetry deleted the referencefs-stream-open branch February 1, 2025 15:46
skshetry added a commit to iterative/datachain that referenced this pull request Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReferenceFileSystem: streaming support for _open
2 participants