Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Deprecate SIMD version of SkipWhitespace to prevent BOF #2213

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

junwha
Copy link

@junwha junwha commented Oct 31, 2023

Hi all,
I'm Junwha Hong, from the research group S2Lab in UNIST, we found a buffer-overflow from rapidjson by our custom tool, and patched it.
the detailed information is as follows.

Summary

  • OS: Ubuntu 22.04
  • version: f9d5341
  • BOF occurs during the schematest reads "additionalItems.json"
  • at include/rapidjson/reader.h:396

Problem Statement

The buffer overflow arises when the reader utilizes RAPIDJSON_SIMD and a filestream which has no specification of length or end.

SkipWhitespace_SIMD(const char* p) checks whether all the char characters of 16-bytes matches the whitespaces characters. thus, it will escape the for loop if there is a null string inside a 16-bytes batch.
The problem occurs here, because null string is not always placed at the end of the 16-bytes. thus, if the null string is placed at the 16-bytes aligned address, there will be loads of 15 invalid bytes.

inline const char *SkipWhitespace_SIMD(const char* p) {
    // Fast return for single non-whitespace
    if (*p == ' ' || *p == '\n' || *p == '\r' || *p == '\t')
        ++p;
    else
        return p;

    // 16-byte align to the next boundary
    const char* nextAligned = reinterpret_cast<const char*>((reinterpret_cast<size_t>(p) + 15) & static_cast<size_t>(~15));
    while (p != nextAligned)
        if (*p == ' ' || *p == '\n' || *p == '\r' || *p == '\t')
            ++p;
        else
            return p;

    // The rest of string
    #define C16(c) { c, c, c, c, c, c, c, c, c, c, c, c, c, c, c, c }
    static const char whitespaces[4][16] = { C16(' '), C16('\n'), C16('\r'), C16('\t') };
    #undef C16

    const __m128i w0 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(&whitespaces[0][0]));
    const __m128i w1 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(&whitespaces[1][0]));
    const __m128i w2 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(&whitespaces[2][0]));
    const __m128i w3 = _mm_loadu_si128(reinterpret_cast<const __m128i *>(&whitespaces[3][0]));

    for (;; p += 16) {
        const __m128i s = _mm_load_si128(reinterpret_cast<const __m128i *>(p));
        __m128i x = _mm_cmpeq_epi8(s, w0);
        x = _mm_or_si128(x, _mm_cmpeq_epi8(s, w1));
        x = _mm_or_si128(x, _mm_cmpeq_epi8(s, w2));
        x = _mm_or_si128(x, _mm_cmpeq_epi8(s, w3));
        unsigned short r = static_cast<unsigned short>(~_mm_movemask_epi8(x));
        if (r != 0) {   // some of characters may be non-whitespace
#ifdef _MSC_VER         // Find the index of first non-whitespace
            unsigned long offset;
            _BitScanForward(&offset, r);
            return p + offset;
#else
            return p + __builtin_ffs(r) - 1;
#endif
        }
    }
}

Patch

The SkipWhitespace_SIMD(const char* p, const char* end) checks sanity safely by for (; p <= end - 16; p += 16). thus we need to deprecate the SIMD feature for the simple StringStream and InsituStringStream which have no end specification.

Callstack at BOF

    #0 0x55ecfcc0ab5d in rapidjson::SkipWhitespace_SIMD(char const*) /home/qbit/rapidjson/include/rapidjson/reader.h:303:27
    #1 0x55ecfcc0ab5d in void rapidjson::SkipWhitespace<rapidjson::GenericStringStream<rapidjson::UTF8<char>>>(rapidjson::GenericStringStream<rapidjson::UTF8<char>>&) /home/qbit/rapidjson/include/rapidjson/reader.h:511:15
    #2 0x55ecfcc0ab5d in void rapidjson::GenericReader<rapidjson::UTF8<char>, rapidjson::UTF8<char>, rapidjson::CrtAllocator>::SkipWhitespaceAndComments<0u, rapidjson::GenericStringStream<rapidjson::UTF8<char>>>(rapidjson::GenericStringStream<rapidjson::UTF8<char>>&) /home/qbit/rapidjson/include/rapidjson/reader.h:712:9
    #3 0x55ecfcc0ab5d in rapidjson::ParseResult rapidjson::GenericReader<rapidjson::UTF8<char>, rapidjson::UTF8<char>, rapidjson::CrtAllocator>::Parse<0u, rapidjson::GenericStringStream<rapidjson::UTF8<char>>, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>>(rapidjson::GenericStringStream<rapidjson::UTF8<char>>&, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>&) /home/qbit/rapidjson/include/rapidjson/reader.h:579:17
    #4 0x55ecfcc0a4cd in rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>& rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>::ParseStream<0u, rapidjson::UTF8<char>, rapidjson::GenericStringStream<rapidjson::UTF8<char>>>(rapidjson::GenericStringStream<rapidjson::UTF8<char>>&) /home/qbit/rapidjson/include/rapidjson/document.h:2646:40
    #5 0x55ecfcc96e96 in rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>& rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>::Parse<0u, rapidjson::UTF8<char>>(rapidjson::UTF8<char>::Ch const*) /home/qbit/rapidjson/include/rapidjson/document.h:2711:16
    #6 0x55ecfcc96e96 in rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>& rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>::Parse<0u>(char const*) /home/qbit/rapidjson/include/rapidjson/document.h:2720:16
    #7 0x55ecfcc96e96 in rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator>, rapidjson::CrtAllocator>::Parse(char const*) /home/qbit/rapidjson/include/rapidjson/document.h:2727:16
    #8 0x55ecfcc96e96 in Schema::SetUp() /home/qbit/rapidjson/test/perftest/schematest.cpp:103:15
    #9 0x55ecfcda2ad5 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2418:10
    #10 0x55ecfcda2ad5 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2454:14
    #11 0x55ecfcd2c469 in testing::Test::Run() /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2488:3
    #12 0x55ecfcd31336 in testing::TestInfo::Run() /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2668:11
    #13 0x55ecfcd32c96 in testing::TestCase::Run() /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2786:28
    #14 0x55ecfcd62463 in testing::internal::UnitTestImpl::RunAllTests() /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:5048:43
    #15 0x55ecfcda5031 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2418:10
    #16 0x55ecfcda5031 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:2454:14
    #17 0x55ecfcd613e9 in testing::UnitTest::Run() /home/qbit/rapidjson/thirdparty/gtest/googletest/src/gtest.cc:4664:10
    #18 0x55ecfcbdee12 in RUN_ALL_TESTS() /home/qbit/rapidjson/thirdparty/gtest/googletest/include/gtest/gtest.h:2329:46
    #19 0x55ecfcbdee12 in main /home/qbit/rapidjson/test/perftest/perftest.cpp:23:12
    #20 0x7f6a64c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #21 0x7f6a64c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #22 0x55ecfcade754 in _start (/home/qbit/rapidjson/build2/bin/perftest+0x99754)

Thank you.

@tencent-adm
Copy link
Member

tencent-adm commented Oct 31, 2023

CLA assistant check
All committers have signed the CLA.

@junwha junwha force-pushed the fix-buffer-overflow branch 3 times, most recently from 69b7030 to 560de19 Compare October 31, 2023 08:05
- checking null string in 16-bytes batch manner causes buffer-overflow
@junwha junwha force-pushed the fix-buffer-overflow branch from 560de19 to 5fa465e Compare October 31, 2023 08:07
@junwha
Copy link
Author

junwha commented Nov 1, 2023

This pattern also occurs for ScanCopyUnescapedString(StringStream& is, StackStream<char>& os) in reader.h. _mm_load_si128 instruction should be used when the length is available.

- disable ScanCopyUnescapedString because it do nothing on generic version (sequential version)
@junwha
Copy link
Author

junwha commented Nov 2, 2023

It is the root cause of the bug fix in #2101 , but the patch only fixed the part of user side codes, and still some test cases are leading to buffer-overflow.
Any user who naively uses RapidJSON's SIMD feature without specifying the length for a file which has non-16-bytes aligned end address will suffer from buffer-overflow.
thus, it is needed to deprecate the SIMD API for the non-length specified use cases.

If you want to keep the way you suggested before in #1723, I will just align the end of buffers for the problematic tests. but I still recommend to provide only end-specified version for SIMD.

@junwha junwha changed the title Deprecate SIMD version of SkipWhitespace to prevent BOF fix: Deprecate SIMD version of SkipWhitespace to prevent BOF Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants