You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In 665c428, I've added a benchmark that focuses on the performance of decoding strings. Here's the performance of decoding 4KB of json, most of which is just 30-character strings:
Not bad. We're ahead of aeson by a factor of three, but can this number be improved further? String decode currently walks the string, byte-by-byte, until it finds the end of it. As it walks the string, it keeps up with information about whether or not anything will need to be unescaped. I think that it should be possible to instead walk the string w64-by-w64. This could be done by adapting the approach in bytestring-encodings to work with ByteArray instead of Ptr and adding some additional bit twiddling hacks. The general idea would be:
Fail if you encounter a backslash
Fail if you encounter a byte less than 0x20
Fail if you encounter a byte greater than 0x7E
Fail if your read would give you a w64 that straddled the end of the string (simplifies things a little)
Succeed if you encounter a "
Failure just means to fall back to the existing string decode logic, and succeed means that we may perform a memcpy (as we do now). This whole thing is a little bit tricky because it's possible to encounter both a " and a failing byte in them same w64, and then the order that they showed up in matters. But I think that the conservative action of always failing even if the quote showed up first is probably the best course. It simplifies it, and the bytes after a string ends are probably ascii characters anyway, so this shouldn't cost a ton of performance.
This needs to be implemented and benchmarked, but I think it could this benchmark at least 2x faster.
The text was updated successfully, but these errors were encountered:
For the beginning w64, we cannot use the same trick as we can for the end w64. The beginning w64 will include the opening double quote, and we cannot count that there.
For the end double quote, when we detect it, we have to do some kind of CLZ-like operation to figure out where it actually was. We have to know the size of what we are copying.
In 665c428, I've added a benchmark that focuses on the performance of decoding strings. Here's the performance of decoding 4KB of json, most of which is just 30-character strings:
Not bad. We're ahead of aeson by a factor of three, but can this number be improved further? String decode currently walks the string, byte-by-byte, until it finds the end of it. As it walks the string, it keeps up with information about whether or not anything will need to be unescaped. I think that it should be possible to instead walk the string w64-by-w64. This could be done by adapting the approach in bytestring-encodings to work with
ByteArray
instead ofPtr
and adding some additional bit twiddling hacks. The general idea would be:"
Failure just means to fall back to the existing string decode logic, and succeed means that we may perform a memcpy (as we do now). This whole thing is a little bit tricky because it's possible to encounter both a
"
and a failing byte in them same w64, and then the order that they showed up in matters. But I think that the conservative action of always failing even if the quote showed up first is probably the best course. It simplifies it, and the bytes after a string ends are probably ascii characters anyway, so this shouldn't cost a ton of performance.This needs to be implemented and benchmarked, but I think it could this benchmark at least 2x faster.
The text was updated successfully, but these errors were encountered: