Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsspec URIs with '+' instead of '::' #1752

Open
hhoeflin opened this issue Nov 17, 2024 · 2 comments
Open

fsspec URIs with '+' instead of '::' #1752

hhoeflin opened this issue Nov 17, 2024 · 2 comments

Comments

@hhoeflin
Copy link

hhoeflin commented Nov 17, 2024

Hi,

I wanted to ask about the fsspec convention of using '::' to chain URL handling protocols (e.g. simplecache with s3fs). Using '::' means the resulting string is not a URI (e.g. 'simplecache::s3fs://bucket_name/myfile') as the 'simplecache::s3fs' violates https://datatracker.ietf.org/doc/html/rfc3986#section-3.1

This is also related to pandas not being able to use these (see pandas-dev/pandas#59684) as it checks if the path given to pandas follows RFC3986. My question is if it would be possible to also allow specification of '+' to chain URL, which would then result in a valid URI.

This would not cover the more complicated chaining cases, but at least the simple one.

Thanks

@martindurant
Copy link
Member

This was broken in pandas by pandas-dev/pandas#44619

I would say, that "+" is far more likely a valid string within a URL than "::" is.

fsspec URLs already are non-compliant with the RFC even if you make the replacement, for example something like "zip://path/to/file::s3://bucket/file.zip". The change should be made in pandas to make sure that valid fsspec paths are picked up, not here, which would break all other users of the current fsspec pattern.

(aside: I'm not really sure why pandas would accept a string in read_json, given that StringIO exists)

@martindurant
Copy link
Member

Does using a URL like "simplecache://::s3://bucket_name/myfile" work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants