Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in path parameters are encoded as '+' instead of '%20' #12308

Open
1 of 3 tasks
rshkv opened this issue Feb 18, 2025 · 0 comments
Open
1 of 3 tasks

Spaces in path parameters are encoded as '+' instead of '%20' #12308

rshkv opened this issue Feb 18, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@rshkv
Copy link
Contributor

rshkv commented Feb 18, 2025

Apache Iceberg version

1.8.0 (latest release)

Query engine

None

Please describe the bug 🐞

I expect someone hit this before and there's some prior discussion that explains this. I just couldn't find it. Sorry in advance if this is a duplicate.

Iceberg clients encode spaces in URL paths as +, not as %20. Servers decoding paths according to RFC-3986, which is the standard defining URL encodings, will not decode the space and see the literal +.

Servers don't handle + in paths because + is a "reserved" character according to RFC-3986 (2.2). That means it's a path delimiter. And if clients want to use + between delimited parts, they need to percent-encode the +.

The problem with Iceberg is that clients using the Java utilities (mainly RESTUtil#encodeString) are encoding strings according to x-www-form-urlencoded instead of RFC-3986. x-www-form-urlencoded is a standard for form data and URL query strings, but for URL paths.

The difference is that x-www-form-urlencoded produces characters that are reserved in RFC-3986, i.e. they won't get decoded. The concrete issue we're hitting is that x-www-form-urlencoded encodes spaces as + instead of %20.

I can't tell from the code that there was a conscious decision to encode paths using form-encoding. It's just that ResourcePaths uses RESTUtil#encodeString which uses java.net.URLEncoder which encodes according to x-www-form-urlencoded, not RFC-3986.

The encoding difference might extend to other reserved characters used in form encoding but treated as delimiters in RFC-3986. At least with spaces, the fix should be straightforward because %20 is correctly decoded by both x-www-form-urlencoded and RFC-3986. I.e., RESTUtil#decodeString will continue to work and there should be no break.

See also:

  • JDK-8204530: Explaining that URLEncoder is not compliant with RFC-3986.
  • StackOverflow questions here and here. Not necessarily "spec", just supporting evidence.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants