Tarantool JIT engine #17

Brugarolas · 2024-01-08T08:09:40Z

No description provided.

Thanks to Peter Cawley. (cherry-picked from commit e86990f) There is a case of valid Lua C API usage when non-throwing code is executed on coroutine, other than one that an entry-point C function was called with, which results in a segmentation fault because cur_L is not being restored correspondingly. For a comprehensive example, see the test case added within this patch. According to the analysis done in LuaJIT#1066, to hit this issue, we first need to call into a C function. That C function needs to change cur_L by calling lua_resume/lua_p?call, and then the only problematic path is leaving the C function by throwing an error, which is caught by the fast-function x?pcall. When x?pcall then returns to its caller, we can be executing bytecode in the VM with an incorrect cur_L, and things will go badly if we enter a trace before correcting cur_L. The fix proposed in LuaJIT#740 was to set cur_L on error throw; however, the analysis from LuaJIT#1066 suggests that it can be set on the catching phase. This patch changes the error-catching routine, so now the patch sets the actual cur_L there. Now it is possible to throw errors on non-executing coroutines, which is a violation of the Lua C API. So, even though it is now possible, that behavior should be avoided anyway. Maxim Kokryashkin: * added the description and the test for the problem Resolves tarantool/tarantool#6323 Part of tarantool/tarantool#9145 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This reverts commit 97699d9. As was mentioned in tarantool/tarantool#6189, throwing an error not on the currently executed coroutine is a violation[1][2] of the Lua/C API. This patch is a part of the patchset that supports this violation and is reverted because of it. [1]: https://www.lua.org/manual/5.4/manual.html#4.4 [2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html Part of tarantool/tarantool#6323 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This reverts commit ed412cd. As was mentioned in tarantool/tarantool#6189, throwing an error not on the currently executed coroutine is a violation[1][2] of the Lua/C API. This patch is a part of the patchset that supports this violation and is reverted because of it. [1]: https://www.lua.org/manual/5.4/manual.html#4.4 [2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html Part of tarantool/tarantool#6323 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This reverts commit 7570ff6. As was mentioned in tarantool/tarantool#6189, throwing an error not on the currently executed coroutine is a violation[1][2] of the Lua/C API. This patch is a part of the patchset that supports this violation and is reverted because of it. [1]: https://www.lua.org/manual/5.4/manual.html#4.4 [2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html Part of tarantool/tarantool#6323 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This reverts commit 5ccd25d. As was mentioned in tarantool/tarantool#6189, throwing an error not on the currently executed coroutine is a violation[1][2] of the Lua/C API. This patch is a part of the patchset that supports this violation and is reverted because of it. [1]: https://www.lua.org/manual/5.4/manual.html#4.4 [2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html Part of tarantool/tarantool#6323 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Reported by Sergey Kaplun. (cherry-picked from commit d133d67) This patch is the follow-up for the commit 1672bdc ("x64: Fix __call metamethod return dispatch."). That patch uses the incorrect macro x64 (which is undefined in dasm) instead of X64. `KBASE` (r15d) is still compared against `BASE` (edx) instead of `KBASEa` (r15) against rdx. This patch fixes the typo, so the correct registers are chosen. Sergey Kaplun: * added the description for the problem Part of tarantool/tarantool#9145 Reviewed-by: Igor Munkin <[email protected]> Reviewed-by: Maxim Kokryashkin <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

@1

Analyzed by Sergey Kaplun. (cherry-picked from commit b7a8c7c) Assume we have parent and child traces with the following IRs from the IR dump: Parent: | 0009 rax > tab TNEW 0 0 | 0010 p32 FLOAD 0008 tab.node | 0011 p32 HREFK 0010 "Name" @1 | 0012 {0008} tab HSTORE 0011 0009 | .... SNAP 2 [ ---- 0001 0002 0008 ---- ] | 0013 {sink} tab TNEW 0 0 | 0014 {0008} fal HSTORE 0011 false | .... SNAP 3 [ ---- 0001 0002 0008 ---- ] Child: | 0001 r15 tab SLOAD 1 PI | 0002 rbp tab SLOAD 2 PI | 0003 tab PVAL 9 As we can see from the trace dump above, the `rax` register is missing in the `0003 PVAL` IR for the side trace -- so it is assumed to be available in the allow RegSet inside `asm_stack_check()` and its value is spoiled during this check, so if we are restoring from the 3rd snapshot by stack overflow -- we are in trouble. The moment when IR is spoiled is when we set a hint on the register inherited from the parent trace (see `asm_setup_regsp()` for details). The 0th register (`rax`) shapeshifts into `RID_NONE`. Hence, when collecting register dependencies from the parent trace, `0003 PVAL` is considered the IR with `RID_NONE`, i.e., without an assigned register. So, this register is considered free (picked as bottom from the free set) in the `asm_stack_check()` and is used for stack overflow check, so the table reference is gone. This patch introduces another register set for the context of the parent trace to use in the stack check. All registers used on the child trace are excluded from this set. The test case for this patch is omitted since it requires specific register allocation, which is hard to construct and not stable in any future patches. Sergey Kaplun: * added the description for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Igor Munkin <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Thanks to Sergey Kaplun, NiLuJe and Peter Cawley. (cherry-picked from commit aa2db7e) The previous patch fixed just part of the problem with the register coalescing. For example, the parent base register may be used inside the parent or child register sets when it shouldn't. This leads to incorrect register allocations, which may lead to crashes or undefined behaviour. This patch fixes it by excluding the parent base register from both register sets. The test case for this patch doesn't fail before the commit since it requires specific register allocation, which is hard to construct and very fragile. Due to the lack of ideal sync with the upstream repository, the test is passed before the patch. It should become correct after backporting future patches. Sergey Kaplun: * added the description and the test for the problem Resolves tarantool/tarantool#8767 Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Igor Munkin <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

E.g. __concat = function() return setmetatable(...) end Reported by Fezile Manana. (cherry-picked from commit 75ee3a6) During the recording of concat with tailcall to fast function, the fast function recording is postponed. This implementation may lead to the absence of side effects from fast-function and incorrect behavior as a result. For a comprehensive example, see the comment in the test. Maxim Kokryashkin: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Reported by Sergey Kaplun. (cherry-picked from commit 433d7e8) `cp->packstack` is the array of size `CPARSE_MAX_PACKSTACK` (7). Before the patch, `cp->curpack` is checked to be less than `CPARSE_MAX_PACKSTACK`, but then `cp->packstack` is accessed at `cp->curpack + 1`, which is out of bounds, so `cp->curpack` value is overwritten. This patch fixes a condition and also adds the error throw when counter is overflow (instead of rewriting a top `cp->packstack` value). Sergey Kaplun: * added the description and the test for the problem Resolves tarantool/tarantool#9339 Part of tarantool/tarantool#9145 Reviewed-by: Igor Munkin <[email protected]> Reviewed-by: Maxim Kokryashkin <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Reported by XmiliaH. (cherry-picked from commit 7b994e0) Fold optimization x - (-0) ==> x is INVALID for x = -0 in FP arithmetic. Its result is -0 instead of +0. This patch allows only x - (+0) ==> x optimization. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

(cherry-picked from commit 44bd743) When dumping IRs via `jit.dump()`, the incorrect value of `IR_CONV` mode shift (`IRCONV_CSH`) is used. This patch fixes the value. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Thanks to Peter Cawley. (cherry-picked from commit 7a2b83a) Even after the commit c05d103 ("Fix math.min()/math.max() inconsistencies."), some of the corner cases (see tests in the commit for details) for `math.min()`/`math.max()` still inconsistent in the JIT and the VM. This happens because `IR_MIN` and `IR_MAX` are marked as commutative, so their operands were swapped by `asm_swapops()`. As a result, because `minsd`[1]/`maxsd`[2] instructions don't change the sign byte of the destination register, its sign is preserved if we compare 0 with -0. When we compare something with NaN, the second source operand (either a NaN or a valid floating-point value) is written to the result. Hence, swapping the operands changed the resulting value. This patch removes the commutative flag from the aforementioned IRs to prevent swapping of their operands. [1]: https://c9x.me/x86/html/file_module_x86_id_173.html [2]: https://c9x.me/x86/html/file_module_x86_id_168.html Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This commit is the follow-up for b8c8e25 ("Fix jit.dump() output for IR_CONV."). When recording two traces in the aforementioned test for `step` conversion, the second trace isn't always recorded. This happens due to the hotcount penalty during compiling the first trace. Setting `hotloop=2` to the first trace and calling it twice helps to fix the issue. Signed-off-by: Igor Munkin <[email protected]>

Reported by minoki. (cherry-picked from commit 674afcd) The `ceil` (`floor`) math function implementation calculates (|x| + 2^52) - 2^52 for its argument to determine the fractional part of x, so it will be rounded to the nearest integer and its sign is restored. After that, if the original value is < (>) than the result, the -1 (1) is subtracted from it. Take a look at the `ceil()` case. The result of the operation `-1 - (-1)` is +0 for FP arithmetic, against -0 expected as a result. This patch changes the `- (-1)` operation to `+ 1` and restores sign after it again. NB: Since in DUALNUM mode on x86/x64 all results are tried to be converted to integers, the sign of 0 is neglected. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Finalizers are not supposed to throw errors -- this is undefined behavior. Lua 5.1 - 5.3 and (previously) LuaJIT rethrow the error. This randomly breaks some unrelated code that just happens to do an allocation. Bad. Lua 5.4 catches the error and emits a warning instead. But warnings are not enabled by default, so it fails silently. Even worse. LuaJIT (now) catches the error and emits a VM event. The default event handler function prints "ERROR in finalizer: ...". Set a custom handler function with: jit.attach(handler, "errfin") (cherry-picked from commit 1c27912) The default handler for finalizer errors is set during the Lua initialization. Namely, in the `luaL_newstate`. Along with the introduction of the new `ERRFIN` VM event, the high bits for the old VM events are removed since they are scratched anyway by the bitwise operation `(hash)<<3` in the `VMEVENT_DEF` macro. This patch results in a regression in the PUC-Rio test suite. The test in the suite for the error in the GC finalizer fails after the patch because the error is now handled with the VM event handler instead of being rethrown. Hence, the `collectgarbage` finishes successfully despite the error in the GC finalizer. Considering this change, the test was disabled. There is also another regression in the `misclib-getmetrics-capi`, because there are a few test cases reliant on the `lua_gettop(L)` value, which is broken after this patch. The `_VMEVENTS` table, where the error handler for GC finalizers is set, was not cleared from the stack after the initialization. This issue is fixed in the following patch. Maxim Kokryashkin: * added the test for the problem and the description for the patch Part of tarantool/tarantool#9145 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Reported by PluMGMK. (cherry-picked from commit 224129a) The `_VMEVENTS` table, where the error handler for GC finalizers is set, was not cleared from the stack after the initialization. This commit adds stack cleanup. Maxim Kokryashkin: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Sometimes, the Lua stack can be inconsistent during the FFUNC execution, which may lead to a sysprof crash during the stack unwinding. This patch replaces the `top_frame` property of `global_State` with `lj_sysprof_topframe` structure, which contains `top_frame` and `ffid` properties. `ffid` property makes sense only when the LuaJIT VM state is set to `FFUNC`. That property is set to the ffid of the fast function that VM is about to execute. In the same time, `top_frame` property is not updated now, so the top frame of the Lua stack can be streamed based on the ffid, and the rest of the Lua stack can be streamed as usual. Also, this patch fixes the build via Makefile.original by adding the `LJ_HASSYSPROF` flag support to it. Resolves tarantool/tarantool#8594 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This patch refactors symbol resolution in memprof, so now they are resolved during the process of parsing. This makes the generation mechanism excessive since symtab updates are no longer affecting the previous events. Follows up tarantool/tarantool#8700 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Since both of the profiler parsers are now processing the events in a stream-like fashion, the generation mechanism is excessive and can be purged. This results in a significant memory consumption drop, especially for the AVL-tree part. Consider this script: | jit.off() | misc.sysprof.start{mode = 'C', interval=10} | for i = 1, 1e7 do tostring(i) end | misc.sysprof.stop() After executing it with LuaJIT, you can parse it like this: | $ time -v luajit-parse-sysprof sysprof.bin So, before the patch: | Maximum resident set size (kbytes): 224928 And after the patch: | Maximum resident set size (kbytes): 32780 That is the 85% reduction in memory consumption. Follows up tarantool/tarantool#8700 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

It is really inconvenient to use a standalone shell script to parse binary dumps from profilers. That is why this commit introduces a CLI flag for tools in the LuaJIT, so now it is possible to parse a memprof dump as simple as: ``` luajit -tm memprof.bin luajit -tm --leak-only memprof.bin ``` And the sysprof too: ``` luajit -ts sysprof.bin ``` The scripts `luajit-parse-memprof` and `luajit-parse-sysprof` are purged as a result of these changes. Resolves tarantool/tarantool#5688 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Fix typos found with codespell in files with our own source code. Furthermore, several warnings found by the checkpatch.pl [1] script are fixed either. 1. https://github.com/tarantool/checkpatch Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

The patch introduces a new CMake target: "LuaJIT-codespell", that spellchecks files specified in the whitelist by codespell [1]. 1. https://github.com/codespell-project/codespell Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

The patch enables codespell in GitHub workflow 'lint' by installing `codespell` using `pip` and renames job name and description. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

The patch follows up commit a0483bd ("test: introduce module for C tests") and adds additional asserts suitable for comparing strings. Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Igor Munkin <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Reported by Mathias Westerdahl. (cherry picked from commit 633f265) Lua 5.1 Reference Manual [1] defines a function `lua_concat`, that: > void lua_concat (lua_State *L, int n); > > Concatenates the n values at the top of the stack, pops them, and leaves > the result at the top. Without the patch, `lua_concat()` behaved incorrectly with userdata with the defined `__concat` metamethod. The problem is GC64-specific. Assuming we have three literals and a userdata with the defined `__concat` metamethod on top of the Lua stack: 1 [string] 2 [string] 3 [string] 4 [string] 5 [userdata] <--- top On attempt to concatenate *two* items on top of the Lua stack, `lua_concat()` concatenates *four* items and leaves the result on top of the Lua stack: 1 [string] 2 [string][string][string][userdata] <--- top The problem is in the incorrect calculation of `n` counter in the loop in the implementation of function `lua_concat`. Without the fix, `n` is equal to 3 at the end of the first iteration, and therefore it goes to the next iteration of concatenation. In the fixed implementation of `lua_concat()`, `n` is equal to 1 at the end of the first loop iteration, decremented in a loop postcondition, and breaks the loop. For details see implementation of `lj_meta_cat()` in <src/lj_meta.c>. The patch fixes incorrect behaviour. 1. https://www.lua.org/manual/5.1/manual.html Sergey Bronnikov: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This test could do allocation outside of the protected frame, which could result in an uncaught OOM and test failure. This patch adds extra `collectgarbage` calls to the test to avoid such situations. Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Thanks to Peter Cawley. (cherry-picked from commit 7f9907b) When emitting IR NEWREF, there is no check for a non-NaN stored key value. Thus, when the NaN number value is given to trace, it may be stored as a key. This patch adds the corresponding check. If fold optimization is enabled, this IR EQ check is dropped if it references CONV IR from any (unsigned) integer type since NaN can be created via conversion from an integer. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

This patch adds the test for commit 1a40162 ("Fix assertions."). This patch removes incorrect assertions in the fold optimizations for conversions from numbers to different integer types. Since the issue affects only branch 2.0, there is no need to fix it. Nevertheless, the test is required to avoid regressions in the future. Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

Reported by XmiliaH. (cherry-picked from commit c8bcf1e) `fold_abc_k()` doesn't patch the first ABC check when the right constant operand is negative. This leads to out-of-bounds access from the array on a trace. This patch casts the operands to uint32_t for comparison. If the right IR contains a negative integer, the second IR will always be patched. Also, because the ABC check on the trace is unordered, this guard will always fail. Also, this fold rule creates new instructions that reference operands across PHIs. This opens the room for other optimizations (like DCE), so some guards become eliminated, and we use out-of-bounds access from the array part of the table on trace. This patch adds the missing `PHIBARRIER()` check. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#9145 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Igor Munkin <[email protected]>

The LuaJIT-tests suite lacks the ability to specify tags with custom values. This patch adds the ability to specify number tags in the format `+tag=number`. It is useful for version specification of system libraries (like libc). Required for the next patch. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

The `strtod parsing` subtest in the <lib/base/tonumber_scan.lua> checks the results yielded by the `strtod()` via FFI call. In GLibc versions before 2.19 it returns an incorrect result for "0x1p-2075" [1]. This patch skips this test for a smaller version of the libc installed. [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=16151 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

- Fix a typo in a comment. - Replace `skip_all()` with `skip()` because it is strange to skip the overall test group inside a testcase. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

LuaJIT has a macro LUAJIT_DISABLE_SYSPROF that disables sysprof support. Sysprof tests don't respect this macro, and therefore some of them failed when the macro is enabled. The proposed patch: - Skips sysprof testcases in a suite tarantool-c-tests. - Introduces an environment variable LUAJIT_DISABLE_SYSPROF in a suite tarantool-tests that is set to 1 when sysprof support is disabled. - Propagates a status of sysprof support to Lua tests and skip testing when sysprof is disabled. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

If the `foo()` function itself starts to be recorded on the very first call, it leads to the changing of TNEW bytecode when table bump optimization is enabled. This patch skips the test for this type of build. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

When building <tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c> on macOS Sequoia 15.0, I've got the following error from including <sys/ptrace.h>: | error: unknown type name 'caddr_t' | int ptrace(int _request, pid_t _pid, caddr_t _addr, int _data); It can be fixed by including <sys/types.h>, but since this test is skipped for macOS anyway, I prefer just to move all necessary includes under the corresponding define. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

This patch fixes the warning in <libctest.c> produced by Clang: | warning: plain '_Complex' requires a type specifier; assuming | '_Complex double' by adding the `double` type specifier where it is needed. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Since Alpine uses musl [1] as its C standard library, the build for it failed after the commit af0f59d ("test: fix LuaJIT-tests for old libc version"), since `GetLibCVersion()` raises an error. This patch adds the check of the ID in the </etc/os-release> [2][3] of the Linux distribution and avoids setting the glibc version if the distro is "alpine". [1]: https://wiki.alpinelinux.org/wiki/Musl [2]: https://www.linux.org/docs/man5/os-release.html [3]: https://www.freedesktop.org/software/systemd/man/latest/os-release.html Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

The aforementioned test is flaky when it is run by Tarantool, since the necessary trace isn't compiled due to hotcount collisions. This patch fixes this by adding the additional reset of hot counters. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

The `strtod parsing` subtest in the <lib/base/tonumber_scan.lua> checks the results yielded by the `strtod()` via FFI call. In glibc versions before 2.28 it returns an incorrect result (NaN instead of inf) for "0x3p1023" [1]. This patch hardens the skipcond for this test for an older version of the libc installed. [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=23279 Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

The patch fixes a problem with recording `getmetatable()` for an I/O object: recording the `getmetatable` call with file descriptors represented by userdata object `UDTYPE_IO_FILE` (like `io.stdout`) leads to violation of assertion in `rec_check_slots`. Note, the problem was fixed upstream in different manner, see commit 5141cbc ("Fix compiliation of getmetatable() for UDTYPE_IO_FILE."). Note, the specialization omits the check of `__metatable` field precense and the check for the metatable itself. So, if we change the metatable on the object after the trace is compiled, the trace becomes invalid. The proposed test demonstrates these cases as well. Reviewed-by: Maxim Kokryashkin <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

(cherry picked from commit de6b1a1) Before this patch, Valgrind produces tons of warnings during the VMevent calls since the `IR_NOP` instruction isn't fully initialized. Hence, any parsing operations for it in handlers during `jit.dump()` leads to the "uninitialised value" error. This can be reproduced with the following tests from the tarantool-tests suite: * fix-jit-dump-ir-conv.test.lua * lj-1128-table-new-opt-tnew.test.lua * lj-981-folding-0.test.lua * unit-jit-parse.test.lua This patch fixes the issue by the proper init of such IRs. Sergey Kaplun: * added the description for the problem Needed for tarantool/tarantool#3705 Part of tarantool/tarantool#10709 Reviewed-by: Maksim Tiushev <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

This patch enables running tests with Valgrind. There is a `VALGRIND_OPTS` variable [1] that we can set -- it makes the usage of Valgrind more flexible -- we can define any necessary flags in the command line (not at the building stage). By default, the suppression files are set to <src/lj.supp> (original suppression file in LuaJIT) and an additional one <src/lj_extra.supp> (maintained by us). Also, this patch disables the following tests when running with Valgrind due to failures: The <tarantool-tests/lj-512-profiler-hook-finalizers.test.lua> test is disabled due to its time sensitivity (it is not run the expected amount of time with Valgrind). These tests from the tarantool-tests suite are disabled due to tarantool/tarantool#10803: - lj-726-profile-flush-close.test.lua - profilers/gh-5688-tool-cli-flag.test.lua - profilers/gh-7264-add-proto-trace-sysprof-default.test.lua - profilers/misclib-sysprof-lapi.test.lua Timed out due to running under Valgrind: - tarantool-c-tests/gh-8594-sysprof-ffunc-crash.c_test - tarantool-tests/gh-7745-oom-on-trace.test.lua - tarantool-tests/lj-1034-tabov-error-frame.test.lua [1]: https://valgrind.org/docs/manual/manual-core.html#manual-core.defopts Part of tarantool/tarantool#3705 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

This patch adds CI testing with Valgrind in three scenarios: - Full leak checks enabled. - No leak checks, with memory fill set to `--malloc-fill=0x00` and `--free-fill=0x00`. - No leak checks, with memory fill set to `--malloc-fill=0xff` and `--free-fill=0xff`. The use of `0x00` and `0xff` for memory fill helps to detect dirty reads. `0x00` mimics zero-initialized memory, which can mask some uninitialized memory usage. `0xff` fills memory with non-zero values to make such errors easier to spot. Closes tarantool/tarantool#3705 Reviewed-by: Sergey Bronnikov <[email protected]> Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Reported by Peter Cawley. (cherry picked from commit 45c88b7) Load fusing optimization takes into account only the presence of the corresponding stores, but not any calls that may affect the table content. This may lead to the incorrect stores if the fusing optimization occurs across the `table.clear()` call, leading to inconsistent behaviour between the JIT and the VM. This patch adds the corresponding check. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

(cherry picked from commit 113a168) The `noconflict()` function checks if there's no conflicting instruction between the current instruction and the given `ref` instruction. Also, it avoids fusing loads if there are multiple references of the given `ref`. The last check is performed in the presence of the `noload` parameter. Since the `noconflict()`, which is added in the previous patch, checks conflicts for the same `ref` as the call before, there is no need to perform these checks again, so the corresponding parameter is adjusted. Sergey Kaplun: * added the description for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Reported by Peter Cawley. (cherry picked from commit 6447236) Load fusing optimization doesn't take into account the presence of the `IR_NEWREF` which may cause rehashing and deallocation of the array part of the table. This may lead to the incorrect stores if the fusing optimization occurs across this IR, leading to inconsistent behaviour between the JIT and the VM. This patch adds the corresponding check by the refactoring of the `noconflict()` function -- it now accepts the mask of the `check` as the last argument. The first bit stands for the `IR_NEWREF` check, the second for the multiple reference of the given instruction. Unfortunately, this commit misses the check for the `table.clear()` introduced for the preprevious patch. Thus, the corresponding test fails again. This will be fixed in the next commit. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

(cherry picked from commit 43d0a19) The previous commit accidentally removes the check that fusing optimization isn't performing across the call to the `table.clear()`. This commit fixes the behaviour by adding the corresponding check that depends on the first bit of `check` masks in the `noconflict()` routine. Sergey Kaplun: * added the description for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

(cherry picked from commit 0eddcbe) This patch does the following refactoring: 1) Drops optimizations for the Intel Atom CPU [1]: removes the `JIT_F_LEA_AGU` flag and related optimizations. The considerations for the use of LEA are complex and very CPU-specific, mostly dependent on the number of operands. Mostly, it isn't worth it due to the extra register pressure and/or extra instructions. Be aware that it applies to the original and obsolete Atom architecture. Today "Intel Atom" is just a trade name for reduced-performance implementations of the current Intel architecture. 2) Drops optimizations for the AMD K8, K10 CPU [2][3]: removes the `JIT_F_PREFER_IMUL` flag and related optimizations. 3) Refactors JIT flags defined in the <lj_jit.h>. Now all CPU-specific JIT flags are defined as the left shift of `JIT_F_CPU` instead of hardcoded constants, similar for the optimization flags. 4) Adds detection of the ARM8 CPU. 5) Drops the check for SSE2 since the VM already presumes CPU supports it. 6) Adds checks for `__ARM_ARCH`[4] macro in <lj_arch.h>. 7) Drops outdated comment in the amalgamation file about memory requirements. Sergey Kaplun: * added the description for the patch [1]: https://en.wikipedia.org/wiki/Intel_Atom [2]: https://en.wikipedia.org/wiki/AMD_K8 [3]: https://en.wikipedia.org/wiki/AMD_K10 [4]: https://developer.arm.com/documentation/dui0774/l/Other-Compiler-specific-Features/Predefined-macros Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

See the discussion in the corresponding ticket for the rationale. (cherry picked from commit de2e1ca) For the modulo operation, the arm64 VM uses `fmsub` [1] instruction, which is the fused multiply-add (FMA [2]) operation (more precisely, multiply-sub). Hence, it may produce different results compared to the unfused one. This patch fixes the behaviour by using the unfused instructions by default. However, the new JIT optimization flag (fma) is introduced to make it possible to take advantage of the FMA optimizations. Sergey Kaplun: * added the description and the test for the problem [1]: https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB [2]: https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Thanks to Peter Cawley. (cherry picked from commit d854d00) Assembling some instructions (like `IR_CONV int.num`, for example) with many mcode to be emitted may overflow the `MCLIM_REDZONE` (64) at once due to the huge mcode emitting. For example `IR_CONV` in this test requires 66 bytes of the machine code: | cvttsd2si r15d, xmm5 | xorps xmm9, xmm9 | cvtsi2sd xmm9, r15d | ucomisd xmm5, xmm9 | jnz 0x11edb00e5 ->37 | jpe 0x11edb00e5 ->37 | mov [rsp+0x80], r15d | mov r15, [rsp+0xe8] | movsd xmm9, [rsp+0xe0] | movsd xmm5, [rsp+0xd8] The reproducer needs sufficient register pressure as to immediately spill the result of the instruction to the stack and then reload the three registers used by the instruction, and to have chosen enough registers with numbers >=8 (because shaving off a REX prefix [1] or two would get 66 back down to <= `MCLIM_REDZONE`), and to be using lots of spill slots (because memory offsets <= 0x7f are shorter to encode compared to those >= 0x80. So, each reload instruction consumes 9 bytes. This makes this reproducer unstable (regarding the register allocator changes). Thus, only original test case is added as a regression test. This patch adds the red zone overflow checks more often for the IRs with many instructions to be emitted. Sergey Kaplun: * added the description and the test for the problem [1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Reported by XmiliaH. (cherry picked from commit 0e66fc9) It is possible to get an infinite loop in a function `snap_usedef` when a `UCLO` makes a tight loop. This infinite loop could happen when `snap_usedef()` called during trace recording (more precisely, on the creation of the snapshot for the guarded trace check) processes UCLO bytecode instruction, and this instruction attempts a jump back with a negative offset value. The patch fixes the problem by checking a number of slots in a jump argument and replacing this value by `maxslot` if a value is negative, this means that no values will be purged from the snapshot. Sergey Bronnikov: * added the description and the test for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Thanks to Peter Cawley. (cherry picked from commit 304da39) Before the patch, with the missed coercion from string, there is the cast to `i64` from `p64`, where the last one is the string address. This leads to an incorrect result of the bit operation. This patch adds the missing coercion everywhere for bit operations recording. Only the `recff_bit64_nary()` is affected, since all other routines have the corresponding type check and cast emitting if necessary. However, for the consistency, all functions have the same checking routine `crec_bit64_arg()` now. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

(cherry-picked from commit 9ebebc9) Before the patch, there was a situation where `luaL_newstate()` could fail in `main()` and the `argv[0]` could be used as a progname in `l_message()`. However, `argv[0]` is not guaranteed to be non-NULL, so the segmentation fault could occur. This patch fixes the issue by using the predefined name in that case. Moreover, it refactors the `l_message()`, so now there is no need to pass the program name everywhere. The patch is tested with the help of the mocking of `luaL_newstate` by providing an error-injected implementation of it and preloading it. For preload to work, the LuaJIT must be built with dynamic build mode enabled. The corresponding flavor is added to the CI only for x86_64 Debug build to avoid CI jobs outgrowing. The <gh-8594-sysprof-ffunc-crash.test.c> inspects internal symbols from the LuaJIT static library, so it can't be linked for shared build. The test is disabled for the dynamic build mode. Since the Linux kernel 5.18-rc1 release, `argv` is forced to a single empty string if it is empty [1], so the issue is not reproducible on new kernels. You may inspect the `dmesg` log for the corresponding warning entrance. Maxim Kokryashkin: * added the description and the test for the problem [1]: https://lore.kernel.org/all/[email protected]/ Part of tarantool/tarantool#10709 Reviewed-by: Sergey Kaplun <[email protected]> Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Reported by Sergey Kaplun. (cherry picked from commit f602f01) Before the patch `predict_next()` uses the pc allocation limit (`fs->bclim`) instead of the real limit of the defined bytecodes (`fs->pc`). This leads to the use of undefined value and possible crash. This patch fixes the check. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#10709 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

The macro `UNUSED` is widely used across the suite `tarantool-c-tests`. The patch defines macro only once in `test.h` to reuse it in other tests and removes definitions in tests. Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Thanks to Sergey Kaplun. (cherry picked from commit 9d77734) When the number represented the level value is given to the `getfenv()`/`setfenv()`, it is cast to the `int`. Assume the given value is `2^31`, i.e. the resulting value after the cast is `INT_MIN`. After this, it will be decremented in `lj_debug_level()` and underflowed to the `INT_MAX`. That produces the UBSan warning about signed integer overflow. This patch raises the error early in the aforementioned functions, since a negative level value is meaningless. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#11055 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Thanks to Sergey Kaplun. (cherry picked from commit 811e448) This commit is a follow-up to the commit 3a2e484 ("Detect inconsistent renames even in the presence of sunk values."). Due to the reversed assembling order of a trace, all registers are allocated from the bottom of the trace to the top. Furthermore, if the snapshot contains sunk values, IRs for them will contain the `RID_SUNK` after the first processing. If there is another snapshot (with a smaller number) containing this sunk IR, this IR will not be processed during the bloom filter marking in the allocation of the slot that escapes this snapshot (since it is already sunk). Hence, such IRs still may lead to the rename invariant violation like in the aforementioned commit. This patch prevents scanning from stopping when already sunk IR is faced during snapshot processing so bloom filters contain actual information. The disadvantage of this approach is that it assumes that any sunk table-typed IR can't refer to the same table inside it (so there should not be any cycling references in the sunk table). Sergey Kaplun: * added the description and the test for the problem Resolves tarantool/tarantool#10746 Relates to tarantool/aeon#282 Part of tarantool/tarantool#11055 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Thanks to Sergey Bronnikov and Peter Cawley. (cherry picked from commit b138ccf) When recording a trace using a Lua code like `repeat until `a >= 'b' > 'b'` a Lua error is encountered (`attempt to compare string with nil`), which (along with raising the error) causes an asynchronous trace abort. The trace abort remains pending until the next reentrance to the LuaJIT VM via `lj_vm_call()` or `lj_vm_pcall(). On handling abort LuaJIT is searching for the topmost Lua frame on the stack, that was active when the trace abort happened, it is needed to generate a better error message. Unfortunately, because the abort was due to an error, and the error was caught by a `lua_pcall` with an unspecified error function (4th argument), the Lua frame that caused the abort was already removed as part of error processing, so the search cannot find it. Furthermore, in this particular case, there are no Lua frames on the stack, which isn't something the search code had considered possible. Sergey Bronnikov: * added the description and the test for the problem Part of tarantool/tarantool#11055 Reviewed-by: Sergey Kaplun <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Reported by Bachir Bendrissou. (cherry picked from commit 62e362a) When the trace is started after the stitching, it may not have some slots from the first one. That slot may be the first slot given to the `select()` function (if it is determined by the call that caused the stitch). Before the patch, there is no loading of this slot in the `rec_varg()` for the trace, and the 0 value from slots is taken instead. Hence, the following recording of `BC_VARG` is incorrect. This patch fixes this by using the `getslot()` instead of taking the value directly. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#11055 Reviewed-by: Sergey Bronnikov <[email protected]> Signed-off-by: Sergey Kaplun <[email protected]>

Mike Pall and others added 30 commits October 31, 2023 12:47

Buristan and others added 30 commits December 12, 2024 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tarantool JIT engine #17

Tarantool JIT engine #17

Brugarolas commented Jan 8, 2024

Tarantool JIT engine #17

Are you sure you want to change the base?

Tarantool JIT engine #17

Conversation

Brugarolas commented Jan 8, 2024