You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This FileSystem is read-only. It is designed to be used with async
targets (for now). This FileSystem only allows whole-file access, no
``open``. We do not get original file details from the target FS.
I’m curious if there’s a specific reason _open was implemented to load the entire file instead of allowing for streaming access. Could it be that I’m misusing ReferenceFileSystem? If not, I’d be happy to work on a PR to implement streaming support. Let me know if this would be useful!
EDIT: I'm basically using it as follows, for pyarrow to preserve partitioning format that it infers from filepath.
An more complete implementation for a file-like object based on ReferenceFS is totally possible and would be welcome. The main use case of the filesystem today is with zarr, which always loads a whole reference at a time, so it was not needed.
I believe all the pieces are there for a relatively simple implementation: AbstractFileSystem merely needs to know how to fetch some given byterange for a particular path, which would typeically be (offset + loc)-(offset + loc + size), where offset if the chunks position according to the references, loc the current file-like's position and size the number of bytes requested.
Hi, I am using
ReferenceFileSystem
as a sort of virtual filesystem, similar to how you have described it in stackoverflow - Does fsspec support virtual filesystems such as pyfileysystem.It works great for my use-case, but I have encountered an issue - the
_open
API reads the entire file instead of streaming it.filesystem_spec/fsspec/implementations/reference.py
Lines 1102 to 1104 in 30af5e1
This behaviour is expected and is documented as such:
filesystem_spec/fsspec/implementations/reference.py
Lines 597 to 599 in 30af5e1
I’m curious if there’s a specific reason
_open
was implemented to load the entire file instead of allowing for streaming access. Could it be that I’m misusingReferenceFileSystem
? If not, I’d be happy to work on a PR to implement streaming support. Let me know if this would be useful!EDIT: I'm basically using it as follows, for pyarrow to preserve partitioning format that it infers from filepath.
The text was updated successfully, but these errors were encountered: