Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarantool JIT engine #17

Open
wants to merge 652 commits into
base: tarantool
Choose a base branch
from

Conversation

Brugarolas
Copy link
Owner

No description provided.

Mike Pall and others added 30 commits October 31, 2023 12:47
Thanks to Peter Cawley.

(cherry-picked from commit e86990f)

There is a case of valid Lua C API usage when non-throwing code is
executed on coroutine, other than one that an entry-point C function was
called with, which results in a segmentation fault because cur_L is not
being restored correspondingly. For a comprehensive example, see the
test case added within this patch.

According to the analysis done in LuaJIT#1066, to hit this issue,
we first need to call into a C function. That C function needs to change
cur_L by calling lua_resume/lua_p?call, and then the only problematic
path is leaving the C function by throwing an error, which is caught
by the fast-function x?pcall. When x?pcall then returns to its caller,
we can be executing bytecode in the VM with an incorrect cur_L, and
things will go badly if we enter a trace before correcting cur_L.

The fix proposed in LuaJIT#740 was to set cur_L on error throw;
however, the analysis from LuaJIT#1066 suggests that it can be
set on the catching phase.

This patch changes the error-catching routine, so now the patch sets the
actual cur_L there. Now it is possible to throw errors on non-executing
coroutines, which is a violation of the Lua C API. So, even though it is
now possible, that behavior should be avoided anyway.

Maxim Kokryashkin:
* added the description and the test for the problem

Resolves tarantool/tarantool#6323
Part of tarantool/tarantool#9145

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This reverts commit 97699d9.

As was mentioned in tarantool/tarantool#6189, throwing an error
not on the currently executed coroutine is a violation[1][2] of the
Lua/C API. This patch is a part of the patchset that supports
this violation and is reverted because of it.

[1]: https://www.lua.org/manual/5.4/manual.html#4.4
[2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html

Part of tarantool/tarantool#6323

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This reverts commit ed412cd.

As was mentioned in tarantool/tarantool#6189, throwing an error
not on the currently executed coroutine is a violation[1][2] of the
Lua/C API. This patch is a part of the patchset that supports
this violation and is reverted because of it.

[1]: https://www.lua.org/manual/5.4/manual.html#4.4
[2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html

Part of tarantool/tarantool#6323

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This reverts commit 7570ff6.

As was mentioned in tarantool/tarantool#6189, throwing an error
not on the currently executed coroutine is a violation[1][2] of the
Lua/C API. This patch is a part of the patchset that supports
this violation and is reverted because of it.

[1]: https://www.lua.org/manual/5.4/manual.html#4.4
[2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html

Part of tarantool/tarantool#6323

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This reverts commit 5ccd25d.

As was mentioned in tarantool/tarantool#6189, throwing an error
not on the currently executed coroutine is a violation[1][2] of the
Lua/C API. This patch is a part of the patchset that supports
this violation and is reverted because of it.

[1]: https://www.lua.org/manual/5.4/manual.html#4.4
[2]: http://lua-users.org/lists/lua-l/2021-08/msg00086.html

Part of tarantool/tarantool#6323

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Reported by Sergey Kaplun.

(cherry-picked from commit d133d67)

This patch is the follow-up for the commit
1672bdc ("x64: Fix __call metamethod
return dispatch."). That patch uses the incorrect macro x64 (which is
undefined in dasm) instead of X64. `KBASE` (r15d) is still compared
against `BASE` (edx) instead of `KBASEa` (r15) against rdx.

This patch fixes the typo, so the correct registers are chosen.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Maxim Kokryashkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Analyzed by Sergey Kaplun.

(cherry-picked from commit b7a8c7c)

Assume we have parent and child traces with the following IRs from the
IR dump:

Parent:
| 0009 rax   >  tab TNEW   0     0
| 0010          p32 FLOAD  0008  tab.node
| 0011          p32 HREFK  0010  "Name" @1
| 0012  {0008}  tab HSTORE 0011  0009
| ....              SNAP   2    [ ---- 0001 0002 0008 ---- ]
| 0013  {sink}  tab TNEW   0     0
| 0014  {0008}  fal HSTORE 0011  false
| ....              SNAP   3    [ ---- 0001 0002 0008 ---- ]

Child:
| 0001 r15      tab SLOAD  1     PI
| 0002 rbp      tab SLOAD  2     PI
| 0003          tab PVAL   9

As we can see from the trace dump above, the `rax` register is missing
in the `0003 PVAL` IR for the side trace -- so it is assumed to be
available in the allow RegSet inside `asm_stack_check()` and its value
is spoiled during this check, so if we are restoring from the 3rd
snapshot by stack overflow -- we are in trouble.

The moment when IR is spoiled is when we set a hint on the register
inherited from the parent trace (see `asm_setup_regsp()` for details).
The 0th register (`rax`) shapeshifts into `RID_NONE`. Hence, when
collecting register dependencies from the parent trace, `0003 PVAL` is
considered the IR with `RID_NONE`, i.e., without an assigned register.
So, this register is considered free (picked as bottom from the free
set) in the `asm_stack_check()` and is used for stack overflow check, so
the table reference is gone.

This patch introduces another register set for the context of the parent
trace to use in the stack check. All registers used on the child trace
are excluded from this set.

The test case for this patch is omitted since it requires specific
register allocation, which is hard to construct and not stable in any
future patches.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Thanks to Sergey Kaplun, NiLuJe and Peter Cawley.

(cherry-picked from commit aa2db7e)

The previous patch fixed just part of the problem with the register
coalescing. For example, the parent base register may be used inside the
parent or child register sets when it shouldn't. This leads to incorrect
register allocations, which may lead to crashes or undefined behaviour.
This patch fixes it by excluding the parent base register from both
register sets.

The test case for this patch doesn't fail before the commit since it
requires specific register allocation, which is hard to construct and
very fragile. Due to the lack of ideal sync with the upstream
repository, the test is passed before the patch.
It should become correct after backporting future patches.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#8767
Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
E.g. __concat = function() return setmetatable(...) end
Reported by Fezile Manana.

(cherry-picked from commit 75ee3a6)

During the recording of concat with tailcall to fast function,
the fast function recording is postponed. This implementation
may lead to the absence of side effects from fast-function and
incorrect behavior as a result. For a comprehensive example,
see the comment in the test.

Maxim Kokryashkin:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Reported by Sergey Kaplun.

(cherry-picked from commit 433d7e8)

`cp->packstack` is the array of size `CPARSE_MAX_PACKSTACK` (7). Before
the patch, `cp->curpack` is checked to be less than
`CPARSE_MAX_PACKSTACK`, but then `cp->packstack` is accessed at
`cp->curpack + 1`, which is out of bounds, so `cp->curpack` value is
overwritten.

This patch fixes a condition and also adds the error throw when counter
is overflow (instead of rewriting a top `cp->packstack` value).

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#9339
Part of tarantool/tarantool#9145

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Maxim Kokryashkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Reported by XmiliaH.

(cherry-picked from commit 7b994e0)

Fold optimization x - (-0) ==> x is INVALID for x = -0 in FP arithmetic.
Its result is -0 instead of +0. This patch allows only x - (+0) ==> x
optimization.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
(cherry-picked from commit 44bd743)

When dumping IRs via `jit.dump()`, the incorrect value of `IR_CONV` mode
shift (`IRCONV_CSH`) is used. This patch fixes the value.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Thanks to Peter Cawley.

(cherry-picked from commit 7a2b83a)

Even after the commit c05d103 ("Fix
math.min()/math.max() inconsistencies."), some of the corner cases (see
tests in the commit for details) for `math.min()`/`math.max()` still
inconsistent in the JIT and the VM. This happens because `IR_MIN` and
`IR_MAX` are marked as commutative, so their operands were swapped by
`asm_swapops()`. As a result, because `minsd`[1]/`maxsd`[2] instructions
don't change the sign byte of the destination register, its sign is
preserved if we compare 0 with -0. When we compare something with NaN,
the second source operand (either a NaN or a valid floating-point value)
is written to the result. Hence, swapping the operands changed the
resulting value.

This patch removes the commutative flag from the aforementioned IRs to
prevent swapping of their operands.

[1]: https://c9x.me/x86/html/file_module_x86_id_173.html
[2]: https://c9x.me/x86/html/file_module_x86_id_168.html

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This commit is the follow-up for
b8c8e25 ("Fix jit.dump() output for
IR_CONV."). When recording two traces in the aforementioned test for
`step` conversion, the second trace isn't always recorded. This happens
due to the hotcount penalty during compiling the first trace. Setting
`hotloop=2` to the first trace and calling it twice helps to fix the
issue.

Signed-off-by: Igor Munkin <[email protected]>
Reported by minoki.

(cherry-picked from commit 674afcd)

The `ceil` (`floor`) math function implementation calculates (|x| +
2^52) - 2^52 for its argument to determine the fractional part of x, so
it will be rounded to the nearest integer and its sign is restored.
After that, if the original value is < (>) than the result, the -1 (1)
is subtracted from it. Take a look at the `ceil()` case. The result of
the operation `-1 - (-1)` is +0 for FP arithmetic, against -0 expected
as a result.

This patch changes the `- (-1)` operation to `+ 1` and restores sign
after it again.

NB: Since in DUALNUM mode on x86/x64 all results are tried to be
converted to integers, the sign of 0 is neglected.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Finalizers are not supposed to throw errors -- this is undefined behavior.
Lua 5.1 - 5.3 and (previously) LuaJIT rethrow the error. This randomly
breaks some unrelated code that just happens to do an allocation. Bad.
Lua 5.4 catches the error and emits a warning instead. But warnings are
not enabled by default, so it fails silently. Even worse.
LuaJIT (now) catches the error and emits a VM event. The default event
handler function prints "ERROR in finalizer: ...".
Set a custom handler function with: jit.attach(handler, "errfin")

(cherry-picked from commit 1c27912)

The default handler for finalizer errors is set during the
Lua initialization. Namely, in the `luaL_newstate`.

Along with the introduction of the new `ERRFIN` VM event, the high
bits for the old VM events are removed since they are scratched
anyway by the bitwise operation `(hash)<<3` in the `VMEVENT_DEF`
macro.

This patch results in a regression in the PUC-Rio test suite. The
test in the suite for the error in the GC finalizer fails after
the patch because the error is now handled with the VM event
handler instead of being rethrown. Hence, the `collectgarbage`
finishes successfully despite the error in the GC finalizer.
Considering this change, the test was disabled.

There is also another regression in the `misclib-getmetrics-capi`,
because there are a few test cases reliant on the `lua_gettop(L)`
value, which is broken after this patch. The `_VMEVENTS` table,
where the error handler for GC finalizers is set, was not cleared
from the stack after the initialization. This issue is fixed in
the following patch.

Maxim Kokryashkin:
* added the test for the problem and the description for the patch

Part of tarantool/tarantool#9145

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Reported by PluMGMK.

(cherry-picked from commit 224129a)

The `_VMEVENTS` table, where the error handler for GC finalizers
is set, was not cleared from the stack after the initialization.
This commit adds stack cleanup.

Maxim Kokryashkin:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Sometimes, the Lua stack can be inconsistent during the FFUNC execution,
which may lead to a sysprof crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State` with
`lj_sysprof_topframe` structure, which contains `top_frame` and `ffid`
properties. `ffid` property makes sense only when the LuaJIT VM state is
set to `FFUNC`. That property is set to the ffid of the fast function
that VM is about to execute.  In the same time, `top_frame` property is
not updated now, so the top frame of the Lua stack can be streamed based
on the ffid, and the rest of the Lua stack can be streamed as usual.

Also, this patch fixes the build via Makefile.original by adding the
`LJ_HASSYSPROF` flag support to it.

Resolves tarantool/tarantool#8594

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This patch refactors symbol resolution in memprof, so
now they are resolved during the process of parsing.
This makes the generation mechanism excessive since
symtab updates are no longer affecting the previous events.

Follows up tarantool/tarantool#8700

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Since both of the profiler parsers are now processing
the events in a stream-like fashion, the generation
mechanism is excessive and can be purged. This results
in a significant memory consumption drop, especially
for the AVL-tree part.

Consider this script:
| jit.off()
| misc.sysprof.start{mode = 'C', interval=10}
| for i = 1, 1e7 do tostring(i) end
| misc.sysprof.stop()

After executing it with LuaJIT, you can parse it like this:
| $ time -v luajit-parse-sysprof sysprof.bin

So, before the patch:
| Maximum resident set size (kbytes): 224928

And after the patch:
| Maximum resident set size (kbytes): 32780

That is the 85% reduction in memory consumption.

Follows up tarantool/tarantool#8700

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
It is really inconvenient to use a standalone shell script to parse
binary dumps from profilers. That is why this commit introduces a
CLI flag for tools in the LuaJIT, so now it is possible to parse
a memprof dump as simple as:
```
luajit -tm memprof.bin
luajit -tm --leak-only memprof.bin
```

And the sysprof too:
```
luajit -ts sysprof.bin
```

The scripts `luajit-parse-memprof` and `luajit-parse-sysprof`
are purged as a result of these changes.

Resolves tarantool/tarantool#5688

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Fix typos found with codespell in files with our own source code.
Furthermore, several warnings found by the checkpatch.pl [1] script are
fixed either.

1. https://github.com/tarantool/checkpatch

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
The patch introduces a new CMake target: "LuaJIT-codespell", that
spellchecks files specified in the whitelist by codespell [1].

1. https://github.com/codespell-project/codespell

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
The patch enables codespell in GitHub workflow 'lint' by installing
`codespell` using `pip` and renames job name and description.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
The patch follows up commit a0483bd ("test: introduce module for C
tests") and adds additional asserts suitable for comparing strings.

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Reported by Mathias Westerdahl.

(cherry picked from commit 633f265)

Lua 5.1 Reference Manual [1] defines a function `lua_concat`, that:

> void lua_concat (lua_State *L, int n);
>
> Concatenates the n values at the top of the stack, pops them, and leaves
> the result at the top.

Without the patch, `lua_concat()` behaved incorrectly with userdata with
the defined `__concat` metamethod. The problem is GC64-specific.

Assuming we have three literals and a userdata with the defined
`__concat` metamethod on top of the Lua stack:

1 [string]
2 [string]
3 [string]
4 [string]
5 [userdata] <--- top

On attempt to concatenate *two* items on top of the Lua stack,
`lua_concat()` concatenates *four* items and leaves the result on top of
the Lua stack:

1 [string]
2 [string][string][string][userdata] <--- top

The problem is in the incorrect calculation of `n` counter in the loop
in the implementation of function `lua_concat`. Without the fix, `n` is
equal to 3 at the end of the first iteration, and therefore it goes to
the next iteration of concatenation. In the fixed implementation of
`lua_concat()`, `n` is equal to 1 at the end of the first loop iteration,
decremented in a loop postcondition, and breaks the loop.

For details see implementation of `lj_meta_cat()` in <src/lj_meta.c>.

The patch fixes incorrect behaviour.

1. https://www.lua.org/manual/5.1/manual.html

Sergey Bronnikov:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This test could do allocation outside of the protected frame,
which could result in an uncaught OOM and test failure. This
patch adds extra `collectgarbage` calls to the test to avoid
such situations.

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Thanks to Peter Cawley.

(cherry-picked from commit 7f9907b)

When emitting IR NEWREF, there is no check for a non-NaN stored key
value. Thus, when the NaN number value is given to trace, it may be
stored as a key. This patch adds the corresponding check. If fold
optimization is enabled, this IR EQ check is dropped if it references
CONV IR from any (unsigned) integer type since NaN can be created via
conversion from an integer.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
This patch adds the test for commit
1a40162 ("Fix assertions."). This patch
removes incorrect assertions in the fold optimizations for conversions
from numbers to different integer types. Since the issue affects only
branch 2.0, there is no need to fix it. Nevertheless, the test is
required to avoid regressions in the future.

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Reported by XmiliaH.

(cherry-picked from commit c8bcf1e)

`fold_abc_k()` doesn't patch the first ABC check when the right constant
operand is negative. This leads to out-of-bounds access from the array
on a trace. This patch casts the operands to uint32_t for comparison. If
the right IR contains a negative integer, the second IR will always be
patched. Also, because the ABC check on the trace is unordered, this
guard will always fail.

Also, this fold rule creates new instructions that reference operands
across PHIs. This opens the room for other optimizations (like DCE), so
some guards become eliminated, and we use out-of-bounds access from the
array part of the table on trace. This patch adds the missing
`PHIBARRIER()` check.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#9145

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
Buristan and others added 30 commits December 12, 2024 10:53
The LuaJIT-tests suite lacks the ability to specify tags with custom
values. This patch adds the ability to specify number tags in the format
`+tag=number`. It is useful for version specification of system
libraries (like libc).

Required for the next patch.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
The `strtod parsing` subtest in the <lib/base/tonumber_scan.lua> checks
the results yielded by the `strtod()` via FFI call. In GLibc versions
before 2.19 it returns an incorrect result for "0x1p-2075" [1]. This
patch skips this test for a smaller version of the libc installed.

[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=16151

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
- Fix a typo in a comment.
- Replace `skip_all()` with `skip()` because it is strange
  to skip the overall test group inside a testcase.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
LuaJIT has a macro LUAJIT_DISABLE_SYSPROF that disables
sysprof support. Sysprof tests don't respect this macro,
and therefore some of them failed when the macro is enabled.

The proposed patch:

- Skips sysprof testcases in a suite tarantool-c-tests.
- Introduces an environment variable LUAJIT_DISABLE_SYSPROF in
  a suite tarantool-tests that is set to 1 when sysprof support is
  disabled.
- Propagates a status of sysprof support to Lua tests and skip
  testing when sysprof is disabled.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
If the `foo()` function itself starts to be recorded on the very first
call, it leads to the changing of TNEW bytecode when table bump
optimization is enabled. This patch skips the test for this type of
build.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
When building <tarantool-c-tests/gh-8594-sysprof-ffunc-crash.test.c> on
macOS Sequoia 15.0, I've got the following error from including
<sys/ptrace.h>:

| error: unknown type name 'caddr_t'
| int     ptrace(int _request, pid_t _pid, caddr_t _addr, int _data);

It can be fixed by including <sys/types.h>, but since this test is
skipped for macOS anyway, I prefer just to move all necessary includes
under the corresponding define.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
This patch fixes the warning in <libctest.c> produced by Clang:
| warning: plain '_Complex' requires a type specifier; assuming
| '_Complex double'
by adding the `double` type specifier where it is needed.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Since Alpine uses musl [1] as its C standard library, the build for it
failed after the commit af0f59d ("test:
fix LuaJIT-tests for old libc version"), since `GetLibCVersion()` raises
an error. This patch adds the check of the ID in the
</etc/os-release> [2][3] of the Linux distribution and avoids setting
the glibc version if the distro is "alpine".

[1]: https://wiki.alpinelinux.org/wiki/Musl
[2]: https://www.linux.org/docs/man5/os-release.html
[3]: https://www.freedesktop.org/software/systemd/man/latest/os-release.html

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
The aforementioned test is flaky when it is run by Tarantool, since the
necessary trace isn't compiled due to hotcount collisions. This patch
fixes this by adding the additional reset of hot counters.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
The `strtod parsing` subtest in the <lib/base/tonumber_scan.lua> checks
the results yielded by the `strtod()` via FFI call. In glibc versions
before 2.28 it returns an incorrect result (NaN instead of inf) for
"0x3p1023" [1]. This patch hardens the skipcond for this test for an
older version of the libc installed.

[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=23279

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
The patch fixes a problem with recording `getmetatable()`
for an I/O object: recording the `getmetatable` call with file
descriptors represented by userdata object `UDTYPE_IO_FILE`
(like `io.stdout`) leads to violation of assertion in
`rec_check_slots`.

Note, the problem was fixed upstream in different manner, see
commit 5141cbc
("Fix compiliation of getmetatable() for UDTYPE_IO_FILE.").
Note, the specialization omits the check of `__metatable` field
precense and the check for the metatable itself. So, if we change
the metatable on the object after the trace is compiled, the trace
becomes invalid. The proposed test demonstrates these cases as
well.

Reviewed-by: Maxim Kokryashkin <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
(cherry picked from commit de6b1a1)

Before this patch, Valgrind produces tons of warnings during the VMevent
calls since the `IR_NOP` instruction isn't fully initialized. Hence, any
parsing operations for it in handlers during `jit.dump()` leads to the
"uninitialised value" error.

This can be reproduced with the following tests from the
tarantool-tests suite:
* fix-jit-dump-ir-conv.test.lua
* lj-1128-table-new-opt-tnew.test.lua
* lj-981-folding-0.test.lua
* unit-jit-parse.test.lua

This patch fixes the issue by the proper init of such IRs.

Sergey Kaplun:
* added the description for the problem

Needed for tarantool/tarantool#3705
Part of tarantool/tarantool#10709

Reviewed-by: Maksim Tiushev <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
This patch enables running tests with Valgrind. There is a
`VALGRIND_OPTS` variable [1] that we can set -- it makes the usage of
Valgrind more flexible -- we can define any necessary flags in the
command line (not at the building stage). By default, the suppression
files are set to <src/lj.supp> (original suppression file in LuaJIT) and
an additional one <src/lj_extra.supp> (maintained by us).

Also, this patch disables the following tests when running with Valgrind
due to failures:

The <tarantool-tests/lj-512-profiler-hook-finalizers.test.lua> test is
disabled due to its time sensitivity (it is not run the expected amount
of time with Valgrind).

These tests from the tarantool-tests suite are disabled due to
tarantool/tarantool#10803:
  - lj-726-profile-flush-close.test.lua
  - profilers/gh-5688-tool-cli-flag.test.lua
  - profilers/gh-7264-add-proto-trace-sysprof-default.test.lua
  - profilers/misclib-sysprof-lapi.test.lua

Timed out due to running under Valgrind:
  - tarantool-c-tests/gh-8594-sysprof-ffunc-crash.c_test
  - tarantool-tests/gh-7745-oom-on-trace.test.lua
  - tarantool-tests/lj-1034-tabov-error-frame.test.lua

[1]: https://valgrind.org/docs/manual/manual-core.html#manual-core.defopts

Part of tarantool/tarantool#3705

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
This patch adds CI testing with Valgrind in three scenarios:
  - Full leak checks enabled.
  - No leak checks, with memory fill set to `--malloc-fill=0x00`
    and `--free-fill=0x00`.
  - No leak checks, with memory fill set to `--malloc-fill=0xff`
    and `--free-fill=0xff`.

The use of `0x00` and `0xff` for memory fill helps to detect dirty
reads. `0x00` mimics zero-initialized memory, which can mask some
uninitialized memory usage. `0xff` fills memory with non-zero values to
make such errors easier to spot.

Closes tarantool/tarantool#3705

Reviewed-by: Sergey Bronnikov <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Reported by Peter Cawley.

(cherry picked from commit 45c88b7)

Load fusing optimization takes into account only the presence of the
corresponding stores, but not any calls that may affect the table
content. This may lead to the incorrect stores if the fusing
optimization occurs across the `table.clear()` call, leading to
inconsistent behaviour between the JIT and the VM.

This patch adds the corresponding check.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
(cherry picked from commit 113a168)

The `noconflict()` function checks if there's no conflicting instruction
between the current instruction and the given `ref` instruction. Also,
it avoids fusing loads if there are multiple references of the given
`ref`. The last check is performed in the presence of the `noload`
parameter. Since the `noconflict()`, which is added in the previous
patch, checks conflicts for the same `ref` as the call before, there is
no need to perform these checks again, so the corresponding parameter is
adjusted.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Reported by Peter Cawley.

(cherry picked from commit 6447236)

Load fusing optimization doesn't take into account the presence of the
`IR_NEWREF` which may cause rehashing and deallocation of the array part
of the table. This may lead to the incorrect stores if the fusing
optimization occurs across this IR, leading to inconsistent behaviour
between the JIT and the VM.

This patch adds the corresponding check by the refactoring of the
`noconflict()` function -- it now accepts the mask of the `check` as the
last argument. The first bit stands for the `IR_NEWREF` check, the
second for the multiple reference of the given instruction.
Unfortunately, this commit misses the check for the `table.clear()`
introduced for the preprevious patch. Thus, the corresponding test fails
again. This will be fixed in the next commit.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
(cherry picked from commit 43d0a19)

The previous commit accidentally removes the check that fusing
optimization isn't performing across the call to the `table.clear()`.
This commit fixes the behaviour by adding the corresponding check that
depends on the first bit of `check` masks in the `noconflict()` routine.

Sergey Kaplun:
* added the description for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
(cherry picked from commit 0eddcbe)

This patch does the following refactoring:
1) Drops optimizations for the Intel Atom CPU [1]: removes the
   `JIT_F_LEA_AGU` flag and related optimizations. The considerations
   for the use of LEA are complex and very CPU-specific, mostly
   dependent on the number of operands. Mostly, it isn't worth it due to
   the extra register pressure and/or extra instructions.
   Be aware that it applies to the original and obsolete Atom
   architecture. Today "Intel Atom" is just a trade name for
   reduced-performance implementations of the current Intel
   architecture.
2) Drops optimizations for the AMD K8, K10 CPU [2][3]: removes the
   `JIT_F_PREFER_IMUL` flag and related optimizations.
3) Refactors JIT flags defined in the <lj_jit.h>. Now all CPU-specific
   JIT flags are defined as the left shift of `JIT_F_CPU` instead of
   hardcoded constants, similar for the optimization flags.
4) Adds detection of the ARM8 CPU.
5) Drops the check for SSE2 since the VM already presumes CPU supports
   it.
6) Adds checks for `__ARM_ARCH`[4] macro in <lj_arch.h>.
7) Drops outdated comment in the amalgamation file about memory
   requirements.

Sergey Kaplun:
* added the description for the patch

[1]: https://en.wikipedia.org/wiki/Intel_Atom
[2]: https://en.wikipedia.org/wiki/AMD_K8
[3]: https://en.wikipedia.org/wiki/AMD_K10
[4]: https://developer.arm.com/documentation/dui0774/l/Other-Compiler-specific-Features/Predefined-macros

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
See the discussion in the corresponding ticket for the rationale.

(cherry picked from commit de2e1ca)

For the modulo operation, the arm64 VM uses `fmsub` [1] instruction,
which is the fused multiply-add (FMA [2]) operation (more precisely,
multiply-sub). Hence, it may produce different results compared to the
unfused one. This patch fixes the behaviour by using the unfused
instructions by default. However, the new JIT optimization flag (fma) is
introduced to make it possible to take advantage of the FMA
optimizations.

Sergey Kaplun:
* added the description and the test for the problem

[1]: https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FMSUB
[2]: https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Thanks to Peter Cawley.

(cherry picked from commit d854d00)

Assembling some instructions (like `IR_CONV int.num`, for example) with
many mcode to be emitted may overflow the `MCLIM_REDZONE` (64) at once
due to the huge mcode emitting.

For example `IR_CONV` in this test requires 66 bytes of the
machine code:
|  cvttsd2si r15d, xmm5
|  xorps xmm9, xmm9
|  cvtsi2sd xmm9, r15d
|  ucomisd xmm5, xmm9
|  jnz 0x11edb00e5       ->37
|  jpe 0x11edb00e5       ->37
|  mov [rsp+0x80], r15d
|  mov r15, [rsp+0xe8]
|  movsd xmm9, [rsp+0xe0]
|  movsd xmm5, [rsp+0xd8]

The reproducer needs sufficient register pressure as to immediately
spill the result of the instruction to the stack and then reload the
three registers used by the instruction, and to have chosen enough
registers with numbers >=8 (because shaving off a REX prefix [1] or two
would get 66 back down to <= `MCLIM_REDZONE`), and to be using lots of
spill slots (because memory offsets <= 0x7f are shorter to encode
compared to those >= 0x80. So, each reload instruction consumes 9 bytes.
This makes this reproducer unstable (regarding the register allocator
changes). Thus, only original test case is added as a regression test.

This patch adds the red zone overflow checks more often for the IRs with
many instructions to be emitted.

Sergey Kaplun:
* added the description and the test for the problem

[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Reported by XmiliaH.

(cherry picked from commit 0e66fc9)

It is possible to get an infinite loop in a function `snap_usedef` when
a `UCLO` makes a tight loop. This infinite loop could happen when
`snap_usedef()` called during trace recording (more precisely, on the
creation of the snapshot for the guarded trace check) processes UCLO
bytecode instruction, and this instruction attempts a jump back with a
negative offset value. The patch fixes the problem by checking a number
of slots in a jump argument and replacing this value by `maxslot` if a
value is negative, this means that no values will be purged from the
snapshot.

Sergey Bronnikov:
* added the description and the test for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Thanks to Peter Cawley.

(cherry picked from commit 304da39)

Before the patch, with the missed coercion from string, there is the
cast to `i64` from `p64`, where the last one is the string address. This
leads to an incorrect result of the bit operation.

This patch adds the missing coercion everywhere for bit operations
recording. Only the `recff_bit64_nary()` is affected, since all other
routines have the corresponding type check and cast emitting if
necessary. However, for the consistency, all functions have the same
checking routine `crec_bit64_arg()` now.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
(cherry-picked from commit 9ebebc9)

Before the patch, there was a situation where `luaL_newstate()` could
fail in `main()` and the `argv[0]` could be used as a progname in
`l_message()`. However, `argv[0]` is not guaranteed to be non-NULL, so
the segmentation fault could occur. This patch fixes the issue by using
the predefined name in that case. Moreover, it refactors the
`l_message()`, so now there is no need to pass the program name
everywhere.

The patch is tested with the help of the mocking of `luaL_newstate` by
providing an error-injected implementation of it and preloading it. For
preload to work, the LuaJIT must be built with dynamic build mode
enabled. The corresponding flavor is added to the CI only for x86_64
Debug build to avoid CI jobs outgrowing.

The <gh-8594-sysprof-ffunc-crash.test.c> inspects internal symbols from
the LuaJIT static library, so it can't be linked for shared build. The
test is disabled for the dynamic build mode.

Since the Linux kernel 5.18-rc1 release, `argv` is forced to a single
empty string if it is empty [1], so the issue is not reproducible on new
kernels. You may inspect the `dmesg` log for the corresponding warning
entrance.

Maxim Kokryashkin:
* added the description and the test for the problem

[1]: https://lore.kernel.org/all/[email protected]/

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Reported by Sergey Kaplun.

(cherry picked from commit f602f01)

Before the patch `predict_next()` uses the pc allocation limit
(`fs->bclim`) instead of the real limit of the defined bytecodes
(`fs->pc`). This leads to the use of undefined value and possible
crash. This patch fixes the check.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#10709

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
The macro `UNUSED` is widely used across the suite
`tarantool-c-tests`. The patch defines macro only once in
`test.h` to reuse it in other tests and removes definitions in
tests.

Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Thanks to Sergey Kaplun.

(cherry picked from commit 9d77734)

When the number represented the level value is given to the
`getfenv()`/`setfenv()`, it is cast to the `int`. Assume the given value
is `2^31`, i.e. the resulting value after the cast is `INT_MIN`. After
this, it will be decremented in `lj_debug_level()` and underflowed to
the `INT_MAX`. That produces the UBSan warning about signed integer
overflow.

This patch raises the error early in the aforementioned functions, since
a negative level value is meaningless.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11055

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Thanks to Sergey Kaplun.

(cherry picked from commit 811e448)

This commit is a follow-up to the commit
3a2e484 ("Detect inconsistent renames
even in the presence of sunk values."). Due to the reversed assembling
order of a trace, all registers are allocated from the bottom of the
trace to the top. Furthermore, if the snapshot contains sunk values, IRs
for them will contain the `RID_SUNK` after the first processing. If
there is another snapshot (with a smaller number) containing this sunk
IR, this IR will not be processed during the bloom filter marking in the
allocation of the slot that escapes this snapshot (since it is already
sunk). Hence, such IRs still may lead to the rename invariant violation
like in the aforementioned commit.

This patch prevents scanning from stopping when already sunk IR is faced
during snapshot processing so bloom filters contain actual information.
The disadvantage of this approach is that it assumes that any sunk
table-typed IR can't refer to the same table inside it (so there should
not be any cycling references in the sunk table).

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#10746
Relates to tarantool/aeon#282
Part of tarantool/tarantool#11055

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Thanks to Sergey Bronnikov and Peter Cawley.

(cherry picked from commit b138ccf)

When recording a trace using a Lua code like
`repeat until `a >= 'b' > 'b'` a Lua error is encountered
(`attempt to compare string with nil`), which (along with
raising the error) causes an asynchronous trace abort.
The trace abort remains pending until the next reentrance to the
LuaJIT VM via `lj_vm_call()` or `lj_vm_pcall().

On handling abort LuaJIT is searching for the topmost Lua frame
on the stack, that was active when the trace abort happened,
it is needed to generate a better error message.
Unfortunately, because the abort was due to an error, and
the error was caught by a `lua_pcall` with an unspecified
error function (4th argument), the Lua frame that caused the abort
was already removed as part of error processing, so the search
cannot find it. Furthermore, in this particular case, there are
no Lua frames on the stack, which isn't something the search code
had considered possible.

Sergey Bronnikov:
* added the description and the test for the problem

Part of tarantool/tarantool#11055

Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Reported by Bachir Bendrissou.

(cherry picked from commit 62e362a)

When the trace is started after the stitching, it may not have some
slots from the first one. That slot may be the first slot given to the
`select()` function (if it is determined by the call that caused the
stitch). Before the patch, there is no loading of this slot in the
`rec_varg()` for the trace, and the 0 value from slots is taken instead.
Hence, the following recording of `BC_VARG` is incorrect.

This patch fixes this by using the `getslot()` instead of taking the
value directly.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11055

Reviewed-by: Sergey Bronnikov <[email protected]>
Signed-off-by: Sergey Kaplun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants