-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add http-sync #1177
base: master
Are you sure you want to change the base?
add http-sync #1177
Conversation
This is great thanks. I'll try to test this asap. Is there anything you have to enable to switch to the sync implementation? |
You will need to either explicitly instantiate the class yourself, or call fsspec.register_implementation with both "http" and "https" to have it kick out the standard async version when using the standard API, which something like pd.read_csv does. |
Mostly copied from async Question: should async FS use sync/requests stuff for things that don't multiplex? We would be able to share code.
Would appreciate someone knowledgeable to chime in: how to test the pyodide functionality here? Also, should this be a separate package, or should it instead merge like code with normal HTTPFileSystem, which of course has a lot in common? Last question: I implement my own version of JS requests here, but there are others out there. Can we use one of those, so that the CPython and JS versions of this FS look identical? I suspect no... |
Considering merging this as-is, but experimental and unsupported, so that it can be included in a release for wider testing. |
Hello @martindurant, I think this PR would be enough so that fsspec can be used to get http files using pyodide, right? Are there any plans to merge this PR in the near future? We would be very interested in this feature and can provide testing. Thanks! |
Yes, I was of the opinion that this PR was at least ready for serious testing. Unfortunately, I don't really know how to test it within CI - binary blocking transfers must happen in a webworker (but pyscript has made running python code in a worker smoother recently). There are probably unfinished features. Also, it's ugly to copy so much code, but perhaps that's not important. There has been some renewed interest in getting this in, and I would dearly love to demo a pydata flow (e.g., intake-fsspec-pandas, or even fastparquet) in the browser. Note that https://github.com/koenvo/pyodide-http is probably better established than my requests shim, however - we should at least look at how they do things and find the sharp edges. |
@martindurant I recently used this http-sync filesystem successfully with pyodide. Is this an outdated approach, especially with zarr v3? If not, can we somehow make it usable with the reference filesystem and create sth similar for an s3 filesystem? that would enhance pyodide-support directly. Thanks a lot! |
Zarr v3 is async internally and runs an event loop on a thread, in a way similar to how fsspec does for async FSs. This is definitely not compatible with pyodide! However, there is an async interface to zarr groups/arrays too, so it's possible that #960 would work, or wrapping the implementation in this PR with AsyncFileSystemWrapper, but I am not sure. I don't believe the zarr team has any thoughts about pyodide, I would raise an issue on zarr-python. |
That pyodide URL does load the interface, but no data. Can it be filled in with a public (no HTTP header needed) URL just to prove the technology stack? |
@philippjfr this was for you