Use the default float/double parser as fallback #779

grcevski · 2022-06-24T14:14:01Z

The FastDoubleParser introduced via #747 and #766 will fail on parsing certain esoteric OpenJDK Double/Float supported inputs.

This PR tries to remedy this by introducing a try/catch around the FastDoubleParser and then using the Java default Double/Float parsers as a fall-back.

Relates to #778 and wrandelshofer/FastDoubleParser#19

plokhotnyuk · 2022-06-24T14:22:05Z

@grcevski Have you measured slowdown of parsing additional formats with turned on fast parse routines?

I bet that for deep stacks it could be vulnerable for DoS/DoW attacks when non-trusted input is used.

grcevski · 2022-06-24T14:29:44Z

I don't have performance numbers for the invalid formats, obviously the exception handling will introduce some overhead, but assuming these are exceptional cases the overall impact on performance should be minimal.

The number of frames when an exception is hit for these unsupported formats is 5 at most, which isn't deeper than what FloatingDecimal would do in OpenJDK to parse the numbers.

pjfanning · 2022-06-24T15:12:05Z

@grcevski could you report this issue to https://github.com/wrandelshofer/FastDoubleParser ?

My gut feeling is that jackson-core shouldn't have the try-catch. As @plokhotnyuk, there are DoS/DoW risks with relying on exceptions. The fast parser support is optional and disabled by default. So users who want to parse JSON from untrusted sources would be better off not enabling the fast parser support.

grcevski · 2022-06-24T15:17:41Z

@grcevski could you report this issue to https://github.com/wrandelshofer/FastDoubleParser ?

My gut feeling is that jackson-core shouldn't have the try-catch. As @plokhotnyuk, there are DoS/DoW risks with relying on exceptions. The fast parser support is optional and disabled by default. So users who want to parse JSON from untrusted sources would be better off not enabling the fast parser support.

Sure thing, I'll open an issue there and link back.

grcevski · 2022-06-24T15:27:53Z

If the first example is fixed for 1.1e-23f, I guess an alternative fix would be to avoid using the FastDoubleParser for hexadecimal floating point numbers. I'll double check to see if I can find other non-hexadecimal error patterns.

pjfanning · 2022-06-24T15:33:55Z

If the first example is fixed for 1.1e-23f, I guess an alternative fix would be to avoid using the FastDoubleParser for hexadecimal floating point numbers. I'll double check to see if I can find other non-hexadecimal error patterns.

I understand your concern but every check in the code will slow down the throughput. So far, the issues appear to be with edge cases that are unlikely to appear in real data sets - at least ones that have not been manipulated by a malicious actor.

We may still accept this PR or a something similar but I would be interested to see if @wrandelshofer has any ideas on a solution to supporting these edge cases in his parser code.

grcevski · 2022-06-24T15:34:03Z

My gut feeling is that jackson-core shouldn't have the try-catch. As @plokhotnyuk, there are DoS/DoW risks with relying on exceptions. The fast parser support is optional and disabled by default. So users who want to parse JSON from untrusted sources would be better off not enabling the fast parser support.

I do understand the general concern about try and catch, but I'm not sure if it's an issue here, the default OpenJDK implementation for parseDouble will try catch:

https://github.com/openjdk/jdk/blob/160944bc6bd94d2927f398cf7732027c1b836a42/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L1837

So I don't think this approach is making things worse in this respect.

pjfanning · 2022-06-24T15:36:20Z

@grcevski the use of try/catch in the existing JDK parser code is one of the reasons it is slower than the fast parser support.

grcevski · 2022-06-24T15:43:55Z

I apologize, I wasn't clear in what I meant to say. I meant to say that the default OpenJDK implementation will involve the exception handler for bad input, meaning that any concerns about DoS/DoW risks with relying on exceptions are not made worse in any way by wrapping the fast double parser code in an exception handler. The fast double parser doesn't internally try and catch, but the default one already is, not in Jackson's code, but downstream.

plokhotnyuk · 2022-06-24T15:58:57Z

@grcevski Please, just measure ;)

The problem is not in try/catch itself but in the frequency of throwing of exceptions that will unwind a whole stack trace.

Here is a benchmark that can be easily modified to have hexadecimal mantissas.

You can run it with the following command for a locally published artifacts from your branch and compare results for the latest code from 2.14 branch:

sbt 'jsoniter-scala-benchmarkJVM/jmh:run -p size=128 ArrayOf(Doubles|Floats)Reading.jacksonScala'

Good w/a would be using of some static exception (see https://shipilev.net/blog/2014/exceptional-performance/) or special NaN value with encoded error type in the internal representation.

ChrisHegarty · 2022-06-24T16:26:33Z

@plokhotnyuk To clarify, is the concern related to the additional cost of materialising the stack, with fillInStackTrace, for the initial NFE that will be swallowed? Or is the concern related to the general runtime performance of the JIT'ed code for the regular non-throwing cases (which should not be negatively affected by this change)?

grcevski · 2022-06-24T18:10:47Z

Based on the comment by @wrandelshofer wrandelshofer/FastDoubleParser#19 (comment) on the upstream issue this is likely a non-issue with jackson, since the number parsing differences are only when format specifiers are used. I'll run the full suite of the JDK float tests without format specifiers and if it's all good I'll close this PR and the related issue.

wrandelshofer · 2022-06-24T18:33:00Z

I can easily refactor the back-end classes of FastDoubleParser to better suit your needs.

The back-end classes are DoubleFromByteArray, DoubleFromCharArray
DoubleFromCharSequence, FloatFromByteArray, FloatFromCharArray,
FloatFromCharSequence.

They internally return a primitive long-value that they will then convert into a double or a float using Double.longBitsToDouble(long) or Float.intBitsToFloat(int).

So, instead of throwing an Exception, I can make them encode the error message in the long-value, by using some unused NaN bits.
(Note: unused NaN bits can not be reliably used in a primitive double-value or a float-value, because Double.longBitsToDouble(long) and Float.intBitsToFloat(int) may throw them away! Quote from Double.longBitsToDouble(long): "Note that this method may not be able to return a double NaN with exactly same bit pattern as the long argument."

Also, it might be possible to provide multiple entry points in the back-end classes. For example one entry point that accepts the full lexical grammar of java.util.Double.parseDouble, one that accepts XML Schema grammar, one for JSON grammar, and so on.

pjfanning · 2022-06-24T18:42:30Z

@wrandelshofer thanks for the offer to refactor your code. At the moment, all jackson-core needs are parseDouble and parseFloat methods that are as close as possible to being drop-in replacements for the JDK methods. Whatever you choose to focus on will be of great interest to us nonetheless.

wrandelshofer · 2022-06-24T19:40:39Z

I bet that for deep stacks it could be vulnerable for DoS/DoW attacks when non-trusted input is used.

I honestly haven't looked at this yet from a vulnerability perspective.

In an earlier version of the FastDoubleParser library I had a complete code path for all cases in there, which would have allowed to fully assess and control this. Then - for the sake of code size (and performance) - I decided to fall back to the JDK code for the "slow path". 🤔
For example, we never need to pass more than 768 mantissa digits to the JDK to get the correct results. However, the current implementation of FastDoubleParserFromByteArray/CharArray, will naïvely allocate a String of the same size as the input... The maximal supported input size is 2^31 - 1 characters (!)

cowtowncoder · 2022-06-24T21:59:08Z

I know that the Javadocs in helper methods do not explanation rationale well, but note that these parse methods are only meant to support floating-point values that Jackson parser/decoder (JsonParser) accepts -- which by default is only JSON spec fps, although with various JsonReadFeature flags potentially with some deviations. But trailing type qualifier (f and d) are not accepted ever as far I know.

So, it'd first be necessary to have a problem to fix; I am not sure this PR makes sense at jackson-core level.

There may be other related questions about jackson-databind which may (... unfortunately) allow wider selection of floating-point values for coercion cases -- that is, JSON String that contains number representation may allow more lenient handling. This could be something we have to address, although even there the expected behavior is probably not documented at this point.
(... which means users will invariably assume that the observed behavior is the Correct and Expected behavior :) ).

grcevski · 2022-06-24T23:27:19Z

I'm closing this PR, I ran all OpenJDK Double tests with stripped trailing 'f' and 'd' characters, no problems at all, except for other invalid values like '+inifinity' and lowercase 'nan'.

Use the default float/double parser as fallback

a772343

grcevski mentioned this pull request Jun 24, 2022

FastDoubleParser doesn't support all input formats as the default OpenJDK Float/Double parsers #778

Closed

grcevski closed this Jun 24, 2022

grcevski deleted the fastdoubleparse/try_catch branch June 24, 2022 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the default float/double parser as fallback #779

Use the default float/double parser as fallback #779

grcevski commented Jun 24, 2022 •

edited

Loading

plokhotnyuk commented Jun 24, 2022

grcevski commented Jun 24, 2022

pjfanning commented Jun 24, 2022

grcevski commented Jun 24, 2022

grcevski commented Jun 24, 2022

pjfanning commented Jun 24, 2022

grcevski commented Jun 24, 2022

pjfanning commented Jun 24, 2022

grcevski commented Jun 24, 2022

plokhotnyuk commented Jun 24, 2022 •

edited

Loading

ChrisHegarty commented Jun 24, 2022

grcevski commented Jun 24, 2022

wrandelshofer commented Jun 24, 2022

pjfanning commented Jun 24, 2022

wrandelshofer commented Jun 24, 2022

cowtowncoder commented Jun 24, 2022

grcevski commented Jun 24, 2022

Use the default float/double parser as fallback #779

Use the default float/double parser as fallback #779

Conversation

grcevski commented Jun 24, 2022 • edited Loading

plokhotnyuk commented Jun 24, 2022

grcevski commented Jun 24, 2022

pjfanning commented Jun 24, 2022

grcevski commented Jun 24, 2022

grcevski commented Jun 24, 2022

pjfanning commented Jun 24, 2022

grcevski commented Jun 24, 2022

pjfanning commented Jun 24, 2022

grcevski commented Jun 24, 2022

plokhotnyuk commented Jun 24, 2022 • edited Loading

ChrisHegarty commented Jun 24, 2022

grcevski commented Jun 24, 2022

wrandelshofer commented Jun 24, 2022

pjfanning commented Jun 24, 2022

wrandelshofer commented Jun 24, 2022

cowtowncoder commented Jun 24, 2022

grcevski commented Jun 24, 2022

grcevski commented Jun 24, 2022 •

edited

Loading

plokhotnyuk commented Jun 24, 2022 •

edited

Loading