You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I expect someone hit this before and there's some prior discussion that explains this. I just couldn't find it. Sorry in advance if this is a duplicate.
Iceberg clients encode spaces in URL paths as +, not as %20. Servers decoding paths according to RFC-3986, which is the standard defining URL encodings, will not decode the space and see the literal +.
Servers don't handle + in paths because + is a "reserved" character according to RFC-3986 (2.2). That means it's a path delimiter. And if clients want to use + between delimited parts, they need to percent-encode the +.
The problem with Iceberg is that clients using the Java utilities (mainly RESTUtil#encodeString) are encoding strings according to x-www-form-urlencoded instead of RFC-3986. x-www-form-urlencoded is a standard for form data and URL query strings, but for URL paths.
The difference is that x-www-form-urlencoded produces characters that are reserved in RFC-3986, i.e. they won't get decoded. The concrete issue we're hitting is that x-www-form-urlencodedencodes spaces as + instead of %20.
I can't tell from the code that there was a conscious decision to encode paths using form-encoding. It's just that ResourcePaths uses RESTUtil#encodeString which uses java.net.URLEncoder which encodes according to x-www-form-urlencoded, not RFC-3986.
The encoding difference might extend to other reserved characters used in form encoding but treated as delimiters in RFC-3986. At least with spaces, the fix should be straightforward because %20 is correctly decoded by both x-www-form-urlencoded and RFC-3986. I.e., RESTUtil#decodeString will continue to work and there should be no break.
See also:
JDK-8204530: Explaining that URLEncoder is not compliant with RFC-3986.
StackOverflow questions here and here. Not necessarily "spec", just supporting evidence.
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.8.0 (latest release)
Query engine
None
Please describe the bug 🐞
I expect someone hit this before and there's some prior discussion that explains this. I just couldn't find it. Sorry in advance if this is a duplicate.
Iceberg clients encode spaces in URL paths as
+
, not as%20
. Servers decoding paths according to RFC-3986, which is the standard defining URL encodings, will not decode the space and see the literal+
.Servers don't handle
+
in paths because+
is a "reserved" character according to RFC-3986 (2.2). That means it's a path delimiter. And if clients want to use+
between delimited parts, they need to percent-encode the+
.The problem with Iceberg is that clients using the Java utilities (mainly
RESTUtil#encodeString
) are encoding strings according tox-www-form-urlencoded
instead of RFC-3986.x-www-form-urlencoded
is a standard for form data and URL query strings, but for URL paths.The difference is that
x-www-form-urlencoded
produces characters that are reserved in RFC-3986, i.e. they won't get decoded. The concrete issue we're hitting is thatx-www-form-urlencoded
encodes spaces as+
instead of%20
.I can't tell from the code that there was a conscious decision to encode paths using form-encoding. It's just that
ResourcePaths
usesRESTUtil#encodeString
which usesjava.net.URLEncoder
which encodes according tox-www-form-urlencoded
, not RFC-3986.The encoding difference might extend to other reserved characters used in form encoding but treated as delimiters in RFC-3986. At least with spaces, the fix should be straightforward because
%20
is correctly decoded by bothx-www-form-urlencoded
and RFC-3986. I.e.,RESTUtil#decodeString
will continue to work and there should be no break.See also:
URLEncoder
is not compliant with RFC-3986.Willingness to contribute
The text was updated successfully, but these errors were encountered: