Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add http-sync #1177

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

add http-sync #1177

wants to merge 5 commits into from

Conversation

martindurant
Copy link
Member

@philippjfr this was for you

@philippjfr
Copy link

This is great thanks. I'll try to test this asap. Is there anything you have to enable to switch to the sync implementation?

@martindurant
Copy link
Member Author

You will need to either explicitly instantiate the class yourself, or call fsspec.register_implementation with both "http" and "https" to have it kick out the standard async version when using the standard API, which something like pd.read_csv does.

Mostly copied from async

Question: should async FS use sync/requests stuff for things that don't multiplex?
We would be able to share code.
@martindurant martindurant marked this pull request as ready for review March 11, 2023 20:56
@martindurant
Copy link
Member Author

Would appreciate someone knowledgeable to chime in: how to test the pyodide functionality here?

Also, should this be a separate package, or should it instead merge like code with normal HTTPFileSystem, which of course has a lot in common?

Last question: I implement my own version of JS requests here, but there are others out there. Can we use one of those, so that the CPython and JS versions of this FS look identical? I suspect no...

@martindurant
Copy link
Member Author

Considering merging this as-is, but experimental and unsupported, so that it can be included in a release for wider testing.

@lobis
Copy link
Contributor

lobis commented Nov 13, 2023

Hello @martindurant, I think this PR would be enough so that fsspec can be used to get http files using pyodide, right? Are there any plans to merge this PR in the near future? We would be very interested in this feature and can provide testing. Thanks!

@martindurant
Copy link
Member Author

Yes, I was of the opinion that this PR was at least ready for serious testing. Unfortunately, I don't really know how to test it within CI - binary blocking transfers must happen in a webworker (but pyscript has made running python code in a worker smoother recently). There are probably unfinished features. Also, it's ugly to copy so much code, but perhaps that's not important.

There has been some renewed interest in getting this in, and I would dearly love to demo a pydata flow (e.g., intake-fsspec-pandas, or even fastparquet) in the browser. Note that https://github.com/koenvo/pyodide-http is probably better established than my requests shim, however - we should at least look at how they do things and find the sharp edges.

@wachsylon
Copy link

@martindurant I recently used this http-sync filesystem successfully with pyodide. Is this an outdated approach, especially with zarr v3? If not, can we somehow make it usable with the reference filesystem and create sth similar for an s3 filesystem? that would enhance pyodide-support directly. Thanks a lot!

@martindurant
Copy link
Member Author

Is this an outdated approach, especially with zarr v3?

Zarr v3 is async internally and runs an event loop on a thread, in a way similar to how fsspec does for async FSs. This is definitely not compatible with pyodide! However, there is an async interface to zarr groups/arrays too, so it's possible that #960 would work, or wrapping the implementation in this PR with AsyncFileSystemWrapper, but I am not sure.

I don't believe the zarr team has any thoughts about pyodide, I would raise an issue on zarr-python.

@martindurant
Copy link
Member Author

That pyodide URL does load the interface, but no data. Can it be filled in with a public (no HTTP header needed) URL just to prove the technology stack?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants