Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divergent behavior between regalloc_algorithm={backtracking,single_pass} #9980

Closed
alexcrichton opened this issue Jan 10, 2025 · 8 comments · Fixed by #10087
Closed

Divergent behavior between regalloc_algorithm={backtracking,single_pass} #9980

alexcrichton opened this issue Jan 10, 2025 · 8 comments · Fixed by #10087
Labels
fuzz-bug Bugs found by a fuzzer

Comments

@alexcrichton
Copy link
Member

Found via oss-fuzz in https://issues.oss-fuzz.com/issues/387110342 I've minimized this to:

(module
  (func (export "")
    call 1
    f64.const 0
    f64.const 0
    f64.ne
    if
    end
    call 0
  )

  (func
    f64.const nan
    f64.const 0
    f64.eq
    if
      loop
      end
    end
  )
)

which I can show different behavior with:

$ wasmtime run --invoke '' -W fuel=$((1<<62)) -C cranelift-regalloc-algorithm=backtracking foo.wat
Error: failed to run main module `foo.wat`

Caused by:
    0: failed to invoke ``
    1: error while executing at wasm backtrace:
           0: <unknown>!<wasm function 1>
           1:   0x1e - <unknown>!<wasm function 0>
           2:   0x36 - <unknown>!<wasm function 0>
...
         16365:   0x36 - <unknown>!<wasm function 0>
         16366:   0x36 - <unknown>!<wasm function 0>
    2: wasm trap: call stack exhausted

That's expected, this infinitely recurses. With single_pass though:

$ wasmtime run --invoke '' -W fuel=$((1<<62)) -C cranelift-regalloc-algorithm=single_pass foo.wat
Error: failed to run main module `foo.wat`

Caused by:
    0: failed to invoke ``
    1: error while executing at wasm backtrace:
           0:   0x1d - <unknown>!<wasm function 0>
           1:   0x36 - <unknown>!<wasm function 0>
    2: wasm trap: all fuel consumed by WebAssembly
@alexcrichton alexcrichton added the fuzz-bug Bugs found by a fuzzer label Jan 10, 2025
@cfallin
Copy link
Member

cfallin commented Jan 10, 2025

For context: Alex and I briefly discussed this offline and concluded this was not a security issue as backtracking (the production algorithm) is performing the correct behavior, and single-pass is off by default and not part of any support tier (or perhaps tier 3 by default).

@cfallin
Copy link
Member

cfallin commented Jan 10, 2025

Additional context: it seems this only reproduces with x86-64; cannot reproduce with aarch64 (also infinitely recurses correctly). Successfully reproduces on macOS/x86-64 (via Rosetta 2), though, in addition to Linux.

@primoly
Copy link
Contributor

primoly commented Jan 12, 2025

The strange thing is that it seems to be caused by the NaN. If you replace that NaN with any other value (for example -12.345) single-pass recurses infinitely as well. If you look at the disassembly of both variants, nothing is different except that NaN constant in r15 (at address 0000024f). It doesn’t even change the address of any instruction. I have no idea why this would make a difference in fuel consumption. Mysterious.

single-pass and fuel with NaN
Disassembly of function <function[0]>:

00000000    55                                push rbp
00000001    48 89 e5                          mov rbp, rsp
00000004    4c 8b 57 08                       mov r10, qword ptr [rdi + 8]
00000008    4d 8b 12                          mov r10, qword ptr [r10]
0000000b    49 81 c2 90 00 00 00              add r10, 0x90
00000012    49 39 e2                          cmp r10, rsp
00000015    0f 87 4a 01 00 00                 ja 0x165
0000001b    48 81 ec 80 00 00 00              sub rsp, 0x80
00000022    48 89 5c 24 50                    mov qword ptr [rsp + 0x50], rbx
00000027    4c 89 64 24 58                    mov qword ptr [rsp + 0x58], r12
0000002c    4c 89 6c 24 60                    mov qword ptr [rsp + 0x60], r13
00000031    4c 89 74 24 68                    mov qword ptr [rsp + 0x68], r14
00000036    4c 89 7c 24 70                    mov qword ptr [rsp + 0x70], r15
0000003b    48 89 7c 24 18                    mov qword ptr [rsp + 0x18], rdi
00000040    48 8b 44 24 18                    mov rax, qword ptr [rsp + 0x18]
00000045    48 8b 50 08                       mov rdx, qword ptr [rax + 8]
00000049    48 89 14 24                       mov qword ptr [rsp], rdx
0000004d    48 8b 0c 24                       mov rcx, qword ptr [rsp]
00000051    48 8b 71 08                       mov rsi, qword ptr [rcx + 8]
00000055    48 89 74 24 30                    mov qword ptr [rsp + 0x30], rsi
0000005a    4c 8b 44 24 30                    mov r8, qword ptr [rsp + 0x30]
0000005f    4d 8d 48 01                       lea r9, [r8 + 1]
00000063    4c 89 4c 24 40                    mov qword ptr [rsp + 0x40], r9
00000068    4c 8b 54 24 40                    mov r10, qword ptr [rsp + 0x40]
0000006d    48 8b 5c 24 40                    mov rbx, qword ptr [rsp + 0x40]
00000072    4c 85 d3                          test rbx, r10
00000075    48 8b 5c 24 40                    mov rbx, qword ptr [rsp + 0x40]
0000007a    48 89 5c 24 38                    mov qword ptr [rsp + 0x38], rbx
0000007f    0f 8d 0f 00 00 00                 jge 0x94
00000085    48 8b 5c 24 38                    mov rbx, qword ptr [rsp + 0x38]
0000008a    48 89 5c 24 20                    mov qword ptr [rsp + 0x20], rbx
0000008f    e9 33 00 00 00                    jmp 0xc7
00000094    4c 8b 5c 24 30                    mov r11, qword ptr [rsp + 0x30]
00000099    4d 8d 6b 01                       lea r13, [r11 + 1]
0000009d    4c 8b 24 24                       mov r12, qword ptr [rsp]
000000a1    4d 89 6c 24 08                    mov qword ptr [r12 + 8], r13
000000a6    48 8b 7c 24 18                    mov rdi, qword ptr [rsp + 0x18]
000000ab    e8 7e 03 00 00                    call 0x42e
000000b0    4c 8b 34 24                       mov r14, qword ptr [rsp]
000000b4    4d 8b 7e 08                       mov r15, qword ptr [r14 + 8]
000000b8    4c 89 7c 24 28                    mov qword ptr [rsp + 0x28], r15
000000bd    4c 8b 7c 24 28                    mov r15, qword ptr [rsp + 0x28]
000000c2    4c 89 7c 24 20                    mov qword ptr [rsp + 0x20], r15
000000c7    48 8b 44 24 20                    mov rax, qword ptr [rsp + 0x20]
000000cc    48 8d 50 01                       lea rdx, [rax + 1]
000000d0    48 8b 0c 24                       mov rcx, qword ptr [rsp]
000000d4    48 89 51 08                       mov qword ptr [rcx + 8], rdx
000000d8    48 8b 74 24 18                    mov rsi, qword ptr [rsp + 0x18]
000000dd    48 8b 7c 24 18                    mov rdi, qword ptr [rsp + 0x18]
000000e2    e8 99 00 00 00                    call 0x180
000000e7    4c 8b 04 24                       mov r8, qword ptr [rsp]
000000eb    4d 8b 48 08                       mov r9, qword ptr [r8 + 8]
000000ef    4c 89 4c 24 10                    mov qword ptr [rsp + 0x10], r9
000000f4    c4 41 09 57 fe                    vxorpd xmm15, xmm14, xmm14
000000f9    c5 79 2e 3d 6f 00 00 00           vucomisd xmm15, qword ptr [rip + 0x6f]
00000101    0f 8a 00 00 00 00                 jp 0x107
00000107    48 8b 7c 24 18                    mov rdi, qword ptr [rsp + 0x18]
0000010c    4c 8b 54 24 10                    mov r10, qword ptr [rsp + 0x10]
00000111    49 8d 5a 05                       lea rbx, [r10 + 5]
00000115    4c 8b 1c 24                       mov r11, qword ptr [rsp]
00000119    49 89 5b 08                       mov qword ptr [r11 + 8], rbx
0000011d    48 89 fe                          mov rsi, rdi
00000120    e8 db fe ff ff                    call 0
00000125    4c 8b 24 24                       mov r12, qword ptr [rsp]
00000129    4d 8b 6c 24 08                    mov r13, qword ptr [r12 + 8]
0000012e    4c 89 6c 24 08                    mov qword ptr [rsp + 8], r13
00000133    4c 8b 7c 24 08                    mov r15, qword ptr [rsp + 8]
00000138    4c 8b 34 24                       mov r14, qword ptr [rsp]
0000013c    4d 89 7e 08                       mov qword ptr [r14 + 8], r15
00000140    48 8b 5c 24 50                    mov rbx, qword ptr [rsp + 0x50]
00000145    4c 8b 64 24 58                    mov r12, qword ptr [rsp + 0x58]
0000014a    4c 8b 6c 24 60                    mov r13, qword ptr [rsp + 0x60]
0000014f    4c 8b 74 24 68                    mov r14, qword ptr [rsp + 0x68]
00000154    4c 8b 7c 24 70                    mov r15, qword ptr [rsp + 0x70]
00000159    48 81 c4 80 00 00 00              add rsp, 0x80
00000160    48 89 ec                          mov rsp, rbp
00000163    5d                                pop rbp
00000164    c3                                ret
00000165    0f 0b                             ud2
00000167    00 00                             add byte ptr [rax], al
00000169    00 00                             add byte ptr [rax], al
0000016b    00 00                             add byte ptr [rax], al
0000016d    00 00                             add byte ptr [rax], al
0000016f    00 00                             add byte ptr [rax], al
00000171    00 00                             add byte ptr [rax], al
00000173    00 00                             add byte ptr [rax], al
00000175    00 00                             add byte ptr [rax], al
00000177    00 00                             add byte ptr [rax], al
00000179    00 00                             add byte ptr [rax], al
0000017b    00 00                             add byte ptr [rax], al
0000017d    00 00                             add byte ptr [rax], al
0000017f    00                                .byte 0x00

Disassembly of function <function[1]>:

00000180    55                                push rbp
00000181    48 89 e5                          mov rbp, rsp
00000184    4c 8b 57 08                       mov r10, qword ptr [rdi + 8]
00000188    4d 8b 12                          mov r10, qword ptr [r10]
0000018b    49 81 c2 b0 00 00 00              add r10, 0xb0
00000192    49 39 e2                          cmp r10, rsp
00000195    0f 87 b0 01 00 00                 ja 0x34b
0000019b    48 81 ec a0 00 00 00              sub rsp, 0xa0
000001a2    48 89 5c 24 70                    mov qword ptr [rsp + 0x70], rbx
000001a7    4c 89 64 24 78                    mov qword ptr [rsp + 0x78], r12
000001ac    4c 89 ac 24 80 00 00 00           mov qword ptr [rsp + 0x80], r13
000001b4    4c 89 b4 24 88 00 00 00           mov qword ptr [rsp + 0x88], r14
000001bc    4c 89 bc 24 90 00 00 00           mov qword ptr [rsp + 0x90], r15
000001c4    48 89 7c 24 28                    mov qword ptr [rsp + 0x28], rdi
000001c9    4c 8b 7c 24 28                    mov r15, qword ptr [rsp + 0x28]
000001ce    49 8b 77 08                       mov rsi, qword ptr [r15 + 8]
000001d2    48 89 34 24                       mov qword ptr [rsp], rsi
000001d6    48 8b 04 24                       mov rax, qword ptr [rsp]
000001da    48 8b 48 08                       mov rcx, qword ptr [rax + 8]
000001de    48 89 4c 24 58                    mov qword ptr [rsp + 0x58], rcx
000001e3    48 8b 54 24 58                    mov rdx, qword ptr [rsp + 0x58]
000001e8    4c 8d 42 01                       lea r8, [rdx + 1]
000001ec    4c 89 44 24 68                    mov qword ptr [rsp + 0x68], r8
000001f1    4c 8b 4c 24 68                    mov r9, qword ptr [rsp + 0x68]
000001f6    4c 8b 54 24 68                    mov r10, qword ptr [rsp + 0x68]
000001fb    4d 85 ca                          test r10, r9
000001fe    4c 8b 54 24 68                    mov r10, qword ptr [rsp + 0x68]
00000203    4c 89 54 24 60                    mov qword ptr [rsp + 0x60], r10
00000208    0f 8d 0f 00 00 00                 jge 0x21d
0000020e    4c 8b 54 24 60                    mov r10, qword ptr [rsp + 0x60]
00000213    4c 89 54 24 20                    mov qword ptr [rsp + 0x20], r10
00000218    e9 32 00 00 00                    jmp 0x24f
0000021d    48 8b 5c 24 58                    mov rbx, qword ptr [rsp + 0x58]
00000222    4c 8d 63 01                       lea r12, [rbx + 1]
00000226    4c 8b 1c 24                       mov r11, qword ptr [rsp]
0000022a    4d 89 63 08                       mov qword ptr [r11 + 8], r12
0000022e    48 8b 7c 24 28                    mov rdi, qword ptr [rsp + 0x28]
00000233    e8 f6 01 00 00                    call 0x42e
00000238    4c 8b 2c 24                       mov r13, qword ptr [rsp]
0000023c    4d 8b 75 08                       mov r14, qword ptr [r13 + 8]
00000240    4c 89 74 24 50                    mov qword ptr [rsp + 0x50], r14
00000245    4c 8b 74 24 50                    mov r14, qword ptr [rsp + 0x50]
0000024a    4c 89 74 24 20                    mov qword ptr [rsp + 0x20], r14
0000024f    49 bf 00 00 00 00 00 00 f8 7f     movabs r15, 0x7ff8000000000000
00000259    c4 41 f9 6e ff                    vmovq xmm15, r15
0000025e    48 8b 74 24 20                    mov rsi, qword ptr [rsp + 0x20]
00000263    48 8d 46 04                       lea rax, [rsi + 4]
00000267    48 89 44 24 48                    mov qword ptr [rsp + 0x48], rax
0000026c    c5 79 2e 3d dc 00 00 00           vucomisd xmm15, qword ptr [rip + 0xdc]
00000274    0f 8a 10 00 00 00                 jp 0x28a
0000027a    48 8b 44 24 48                    mov rax, qword ptr [rsp + 0x48]
0000027f    48 89 44 24 40                    mov qword ptr [rsp + 0x40], rax
00000284    0f 84 0f 00 00 00                 je 0x299
0000028a    48 8b 44 24 40                    mov rax, qword ptr [rsp + 0x40]
0000028f    48 89 44 24 08                    mov qword ptr [rsp + 8], rax
00000294    e9 77 00 00 00                    jmp 0x310
00000299    48 8b 4c 24 20                    mov rcx, qword ptr [rsp + 0x20]
0000029e    48 8d 51 04                       lea rdx, [rcx + 4]
000002a2    48 89 54 24 38                    mov qword ptr [rsp + 0x38], rdx
000002a7    4c 8b 44 24 38                    mov r8, qword ptr [rsp + 0x38]
000002ac    4c 8b 4c 24 38                    mov r9, qword ptr [rsp + 0x38]
000002b1    4d 85 c1                          test r9, r8
000002b4    4c 8b 4c 24 38                    mov r9, qword ptr [rsp + 0x38]
000002b9    4c 89 4c 24 30                    mov qword ptr [rsp + 0x30], r9
000002be    0f 8d 0f 00 00 00                 jge 0x2d3
000002c4    4c 8b 4c 24 30                    mov r9, qword ptr [rsp + 0x30]
000002c9    4c 89 4c 24 10                    mov qword ptr [rsp + 0x10], r9
000002ce    e9 33 00 00 00                    jmp 0x306
000002d3    48 8b 7c 24 28                    mov rdi, qword ptr [rsp + 0x28]
000002d8    4c 8b 54 24 20                    mov r10, qword ptr [rsp + 0x20]
000002dd    49 8d 5a 04                       lea rbx, [r10 + 4]
000002e1    4c 8b 1c 24                       mov r11, qword ptr [rsp]
000002e5    49 89 5b 08                       mov qword ptr [r11 + 8], rbx
000002e9    e8 40 01 00 00                    call 0x42e
000002ee    4c 8b 24 24                       mov r12, qword ptr [rsp]
000002f2    4d 8b 6c 24 08                    mov r13, qword ptr [r12 + 8]
000002f7    4c 89 6c 24 18                    mov qword ptr [rsp + 0x18], r13
000002fc    4c 8b 6c 24 18                    mov r13, qword ptr [rsp + 0x18]
00000301    4c 89 6c 24 10                    mov qword ptr [rsp + 0x10], r13
00000306    4c 8b 6c 24 10                    mov r13, qword ptr [rsp + 0x10]
0000030b    4c 89 6c 24 08                    mov qword ptr [rsp + 8], r13
00000310    4c 8b 7c 24 08                    mov r15, qword ptr [rsp + 8]
00000315    4c 8b 34 24                       mov r14, qword ptr [rsp]
00000319    4d 89 7e 08                       mov qword ptr [r14 + 8], r15
0000031d    48 8b 5c 24 70                    mov rbx, qword ptr [rsp + 0x70]
00000322    4c 8b 64 24 78                    mov r12, qword ptr [rsp + 0x78]
00000327    4c 8b ac 24 80 00 00 00           mov r13, qword ptr [rsp + 0x80]
0000032f    4c 8b b4 24 88 00 00 00           mov r14, qword ptr [rsp + 0x88]
00000337    4c 8b bc 24 90 00 00 00           mov r15, qword ptr [rsp + 0x90]
0000033f    48 81 c4 a0 00 00 00              add rsp, 0xa0
00000346    48 89 ec                          mov rsp, rbp
00000349    5d                                pop rbp
0000034a    c3                                ret
0000034b    0f 0b                             ud2
0000034d    00 00                             add byte ptr [rax], al
0000034f    00 00                             add byte ptr [rax], al
00000351    00 00                             add byte ptr [rax], al
00000353    00 00                             add byte ptr [rax], al
00000355    00 00                             add byte ptr [rax], al
00000357    00 00                             add byte ptr [rax], al
00000359    00 00                             add byte ptr [rax], al
0000035b    00 00                             add byte ptr [rax], al
0000035d    00 00                             add byte ptr [rax], al
0000035f    00                                .byte 0x00
single-pass and fuel with -12.345 instead of NaN
Disassembly of function <function[0]>:

00000000    55                                push rbp
00000001    48 89 e5                          mov rbp, rsp
00000004    4c 8b 57 08                       mov r10, qword ptr [rdi + 8]
00000008    4d 8b 12                          mov r10, qword ptr [r10]
0000000b    49 81 c2 90 00 00 00              add r10, 0x90
00000012    49 39 e2                          cmp r10, rsp
00000015    0f 87 4a 01 00 00                 ja 0x165
0000001b    48 81 ec 80 00 00 00              sub rsp, 0x80
00000022    48 89 5c 24 50                    mov qword ptr [rsp + 0x50], rbx
00000027    4c 89 64 24 58                    mov qword ptr [rsp + 0x58], r12
0000002c    4c 89 6c 24 60                    mov qword ptr [rsp + 0x60], r13
00000031    4c 89 74 24 68                    mov qword ptr [rsp + 0x68], r14
00000036    4c 89 7c 24 70                    mov qword ptr [rsp + 0x70], r15
0000003b    48 89 7c 24 18                    mov qword ptr [rsp + 0x18], rdi
00000040    48 8b 44 24 18                    mov rax, qword ptr [rsp + 0x18]
00000045    48 8b 50 08                       mov rdx, qword ptr [rax + 8]
00000049    48 89 14 24                       mov qword ptr [rsp], rdx
0000004d    48 8b 0c 24                       mov rcx, qword ptr [rsp]
00000051    48 8b 71 08                       mov rsi, qword ptr [rcx + 8]
00000055    48 89 74 24 30                    mov qword ptr [rsp + 0x30], rsi
0000005a    4c 8b 44 24 30                    mov r8, qword ptr [rsp + 0x30]
0000005f    4d 8d 48 01                       lea r9, [r8 + 1]
00000063    4c 89 4c 24 40                    mov qword ptr [rsp + 0x40], r9
00000068    4c 8b 54 24 40                    mov r10, qword ptr [rsp + 0x40]
0000006d    48 8b 5c 24 40                    mov rbx, qword ptr [rsp + 0x40]
00000072    4c 85 d3                          test rbx, r10
00000075    48 8b 5c 24 40                    mov rbx, qword ptr [rsp + 0x40]
0000007a    48 89 5c 24 38                    mov qword ptr [rsp + 0x38], rbx
0000007f    0f 8d 0f 00 00 00                 jge 0x94
00000085    48 8b 5c 24 38                    mov rbx, qword ptr [rsp + 0x38]
0000008a    48 89 5c 24 20                    mov qword ptr [rsp + 0x20], rbx
0000008f    e9 33 00 00 00                    jmp 0xc7
00000094    4c 8b 5c 24 30                    mov r11, qword ptr [rsp + 0x30]
00000099    4d 8d 6b 01                       lea r13, [r11 + 1]
0000009d    4c 8b 24 24                       mov r12, qword ptr [rsp]
000000a1    4d 89 6c 24 08                    mov qword ptr [r12 + 8], r13
000000a6    48 8b 7c 24 18                    mov rdi, qword ptr [rsp + 0x18]
000000ab    e8 7e 03 00 00                    call 0x42e
000000b0    4c 8b 34 24                       mov r14, qword ptr [rsp]
000000b4    4d 8b 7e 08                       mov r15, qword ptr [r14 + 8]
000000b8    4c 89 7c 24 28                    mov qword ptr [rsp + 0x28], r15
000000bd    4c 8b 7c 24 28                    mov r15, qword ptr [rsp + 0x28]
000000c2    4c 89 7c 24 20                    mov qword ptr [rsp + 0x20], r15
000000c7    48 8b 44 24 20                    mov rax, qword ptr [rsp + 0x20]
000000cc    48 8d 50 01                       lea rdx, [rax + 1]
000000d0    48 8b 0c 24                       mov rcx, qword ptr [rsp]
000000d4    48 89 51 08                       mov qword ptr [rcx + 8], rdx
000000d8    48 8b 74 24 18                    mov rsi, qword ptr [rsp + 0x18]
000000dd    48 8b 7c 24 18                    mov rdi, qword ptr [rsp + 0x18]
000000e2    e8 99 00 00 00                    call 0x180
000000e7    4c 8b 04 24                       mov r8, qword ptr [rsp]
000000eb    4d 8b 48 08                       mov r9, qword ptr [r8 + 8]
000000ef    4c 89 4c 24 10                    mov qword ptr [rsp + 0x10], r9
000000f4    c4 41 09 57 fe                    vxorpd xmm15, xmm14, xmm14
000000f9    c5 79 2e 3d 6f 00 00 00           vucomisd xmm15, qword ptr [rip + 0x6f]
00000101    0f 8a 00 00 00 00                 jp 0x107
00000107    48 8b 7c 24 18                    mov rdi, qword ptr [rsp + 0x18]
0000010c    4c 8b 54 24 10                    mov r10, qword ptr [rsp + 0x10]
00000111    49 8d 5a 05                       lea rbx, [r10 + 5]
00000115    4c 8b 1c 24                       mov r11, qword ptr [rsp]
00000119    49 89 5b 08                       mov qword ptr [r11 + 8], rbx
0000011d    48 89 fe                          mov rsi, rdi
00000120    e8 db fe ff ff                    call 0
00000125    4c 8b 24 24                       mov r12, qword ptr [rsp]
00000129    4d 8b 6c 24 08                    mov r13, qword ptr [r12 + 8]
0000012e    4c 89 6c 24 08                    mov qword ptr [rsp + 8], r13
00000133    4c 8b 7c 24 08                    mov r15, qword ptr [rsp + 8]
00000138    4c 8b 34 24                       mov r14, qword ptr [rsp]
0000013c    4d 89 7e 08                       mov qword ptr [r14 + 8], r15
00000140    48 8b 5c 24 50                    mov rbx, qword ptr [rsp + 0x50]
00000145    4c 8b 64 24 58                    mov r12, qword ptr [rsp + 0x58]
0000014a    4c 8b 6c 24 60                    mov r13, qword ptr [rsp + 0x60]
0000014f    4c 8b 74 24 68                    mov r14, qword ptr [rsp + 0x68]
00000154    4c 8b 7c 24 70                    mov r15, qword ptr [rsp + 0x70]
00000159    48 81 c4 80 00 00 00              add rsp, 0x80
00000160    48 89 ec                          mov rsp, rbp
00000163    5d                                pop rbp
00000164    c3                                ret
00000165    0f 0b                             ud2
00000167    00 00                             add byte ptr [rax], al
00000169    00 00                             add byte ptr [rax], al
0000016b    00 00                             add byte ptr [rax], al
0000016d    00 00                             add byte ptr [rax], al
0000016f    00 00                             add byte ptr [rax], al
00000171    00 00                             add byte ptr [rax], al
00000173    00 00                             add byte ptr [rax], al
00000175    00 00                             add byte ptr [rax], al
00000177    00 00                             add byte ptr [rax], al
00000179    00 00                             add byte ptr [rax], al
0000017b    00 00                             add byte ptr [rax], al
0000017d    00 00                             add byte ptr [rax], al
0000017f    00                                .byte 0x00

Disassembly of function <function[1]>:

00000180    55                                push rbp
00000181    48 89 e5                          mov rbp, rsp
00000184    4c 8b 57 08                       mov r10, qword ptr [rdi + 8]
00000188    4d 8b 12                          mov r10, qword ptr [r10]
0000018b    49 81 c2 b0 00 00 00              add r10, 0xb0
00000192    49 39 e2                          cmp r10, rsp
00000195    0f 87 b0 01 00 00                 ja 0x34b
0000019b    48 81 ec a0 00 00 00              sub rsp, 0xa0
000001a2    48 89 5c 24 70                    mov qword ptr [rsp + 0x70], rbx
000001a7    4c 89 64 24 78                    mov qword ptr [rsp + 0x78], r12
000001ac    4c 89 ac 24 80 00 00 00           mov qword ptr [rsp + 0x80], r13
000001b4    4c 89 b4 24 88 00 00 00           mov qword ptr [rsp + 0x88], r14
000001bc    4c 89 bc 24 90 00 00 00           mov qword ptr [rsp + 0x90], r15
000001c4    48 89 7c 24 28                    mov qword ptr [rsp + 0x28], rdi
000001c9    4c 8b 7c 24 28                    mov r15, qword ptr [rsp + 0x28]
000001ce    49 8b 77 08                       mov rsi, qword ptr [r15 + 8]
000001d2    48 89 34 24                       mov qword ptr [rsp], rsi
000001d6    48 8b 04 24                       mov rax, qword ptr [rsp]
000001da    48 8b 48 08                       mov rcx, qword ptr [rax + 8]
000001de    48 89 4c 24 58                    mov qword ptr [rsp + 0x58], rcx
000001e3    48 8b 54 24 58                    mov rdx, qword ptr [rsp + 0x58]
000001e8    4c 8d 42 01                       lea r8, [rdx + 1]
000001ec    4c 89 44 24 68                    mov qword ptr [rsp + 0x68], r8
000001f1    4c 8b 4c 24 68                    mov r9, qword ptr [rsp + 0x68]
000001f6    4c 8b 54 24 68                    mov r10, qword ptr [rsp + 0x68]
000001fb    4d 85 ca                          test r10, r9
000001fe    4c 8b 54 24 68                    mov r10, qword ptr [rsp + 0x68]
00000203    4c 89 54 24 60                    mov qword ptr [rsp + 0x60], r10
00000208    0f 8d 0f 00 00 00                 jge 0x21d
0000020e    4c 8b 54 24 60                    mov r10, qword ptr [rsp + 0x60]
00000213    4c 89 54 24 20                    mov qword ptr [rsp + 0x20], r10
00000218    e9 32 00 00 00                    jmp 0x24f
0000021d    48 8b 5c 24 58                    mov rbx, qword ptr [rsp + 0x58]
00000222    4c 8d 63 01                       lea r12, [rbx + 1]
00000226    4c 8b 1c 24                       mov r11, qword ptr [rsp]
0000022a    4d 89 63 08                       mov qword ptr [r11 + 8], r12
0000022e    48 8b 7c 24 28                    mov rdi, qword ptr [rsp + 0x28]
00000233    e8 f6 01 00 00                    call 0x42e
00000238    4c 8b 2c 24                       mov r13, qword ptr [rsp]
0000023c    4d 8b 75 08                       mov r14, qword ptr [r13 + 8]
00000240    4c 89 74 24 50                    mov qword ptr [rsp + 0x50], r14
00000245    4c 8b 74 24 50                    mov r14, qword ptr [rsp + 0x50]
0000024a    4c 89 74 24 20                    mov qword ptr [rsp + 0x20], r14
0000024f    49 bf 71 3d 0a d7 a3 b0 28 c0     movabs r15, 0xc028b0a3d70a3d71
00000259    c4 41 f9 6e ff                    vmovq xmm15, r15
0000025e    48 8b 74 24 20                    mov rsi, qword ptr [rsp + 0x20]
00000263    48 8d 46 04                       lea rax, [rsi + 4]
00000267    48 89 44 24 48                    mov qword ptr [rsp + 0x48], rax
0000026c    c5 79 2e 3d dc 00 00 00           vucomisd xmm15, qword ptr [rip + 0xdc]
00000274    0f 8a 10 00 00 00                 jp 0x28a
0000027a    48 8b 44 24 48                    mov rax, qword ptr [rsp + 0x48]
0000027f    48 89 44 24 40                    mov qword ptr [rsp + 0x40], rax
00000284    0f 84 0f 00 00 00                 je 0x299
0000028a    48 8b 44 24 40                    mov rax, qword ptr [rsp + 0x40]
0000028f    48 89 44 24 08                    mov qword ptr [rsp + 8], rax
00000294    e9 77 00 00 00                    jmp 0x310
00000299    48 8b 4c 24 20                    mov rcx, qword ptr [rsp + 0x20]
0000029e    48 8d 51 04                       lea rdx, [rcx + 4]
000002a2    48 89 54 24 38                    mov qword ptr [rsp + 0x38], rdx
000002a7    4c 8b 44 24 38                    mov r8, qword ptr [rsp + 0x38]
000002ac    4c 8b 4c 24 38                    mov r9, qword ptr [rsp + 0x38]
000002b1    4d 85 c1                          test r9, r8
000002b4    4c 8b 4c 24 38                    mov r9, qword ptr [rsp + 0x38]
000002b9    4c 89 4c 24 30                    mov qword ptr [rsp + 0x30], r9
000002be    0f 8d 0f 00 00 00                 jge 0x2d3
000002c4    4c 8b 4c 24 30                    mov r9, qword ptr [rsp + 0x30]
000002c9    4c 89 4c 24 10                    mov qword ptr [rsp + 0x10], r9
000002ce    e9 33 00 00 00                    jmp 0x306
000002d3    48 8b 7c 24 28                    mov rdi, qword ptr [rsp + 0x28]
000002d8    4c 8b 54 24 20                    mov r10, qword ptr [rsp + 0x20]
000002dd    49 8d 5a 04                       lea rbx, [r10 + 4]
000002e1    4c 8b 1c 24                       mov r11, qword ptr [rsp]
000002e5    49 89 5b 08                       mov qword ptr [r11 + 8], rbx
000002e9    e8 40 01 00 00                    call 0x42e
000002ee    4c 8b 24 24                       mov r12, qword ptr [rsp]
000002f2    4d 8b 6c 24 08                    mov r13, qword ptr [r12 + 8]
000002f7    4c 89 6c 24 18                    mov qword ptr [rsp + 0x18], r13
000002fc    4c 8b 6c 24 18                    mov r13, qword ptr [rsp + 0x18]
00000301    4c 89 6c 24 10                    mov qword ptr [rsp + 0x10], r13
00000306    4c 8b 6c 24 10                    mov r13, qword ptr [rsp + 0x10]
0000030b    4c 89 6c 24 08                    mov qword ptr [rsp + 8], r13
00000310    4c 8b 7c 24 08                    mov r15, qword ptr [rsp + 8]
00000315    4c 8b 34 24                       mov r14, qword ptr [rsp]
00000319    4d 89 7e 08                       mov qword ptr [r14 + 8], r15
0000031d    48 8b 5c 24 70                    mov rbx, qword ptr [rsp + 0x70]
00000322    4c 8b 64 24 78                    mov r12, qword ptr [rsp + 0x78]
00000327    4c 8b ac 24 80 00 00 00           mov r13, qword ptr [rsp + 0x80]
0000032f    4c 8b b4 24 88 00 00 00           mov r14, qword ptr [rsp + 0x88]
00000337    4c 8b bc 24 90 00 00 00           mov r15, qword ptr [rsp + 0x90]
0000033f    48 81 c4 a0 00 00 00              add rsp, 0xa0
00000346    48 89 ec                          mov rsp, rbp
00000349    5d                                pop rbp
0000034a    c3                                ret
0000034b    0f 0b                             ud2
0000034d    00 00                             add byte ptr [rax], al
0000034f    00 00                             add byte ptr [rax], al
00000351    00 00                             add byte ptr [rax], al
00000353    00 00                             add byte ptr [rax], al
00000355    00 00                             add byte ptr [rax], al
00000357    00 00                             add byte ptr [rax], al
00000359    00 00                             add byte ptr [rax], al
0000035b    00 00                             add byte ptr [rax], al
0000035d    00 00                             add byte ptr [rax], al
0000035f    00                                .byte 0x00

@alexcrichton
Copy link
Member Author

Another test that came up in @Robbepop's differential fuzzing of wasmi vs wasmtime:

(module
  (type (;0;) (func (result i32)))
  (global (;0;) (mut i32) i32.const 0)
  (export "xxx" (func 0))
  (func (;0;) (type 0) (result i32)
    block (result i32) ;; label = @1
      block (result i32) ;; label = @2
        block (result i32) ;; label = @3
          block (result i32) ;; label = @4
            block (result i32) ;; label = @5
              block (result i32) ;; label = @6
                i32.const 1
                i32.const 1
                f32.convert_i32_s
                f64.const -nan:0xffffffff80000 (;=NaN;)
                f32.demote_f64
                f32.ne
                br_if 3 (;@3;)
                drop
                i32.const 1
              end
              global.get 0
              i32.xor
              global.set 0
              i32.const 1
            end
            global.get 0
            i32.xor
            global.set 0
            i32.const 0
          end
          global.get 0
          i32.xor
          global.set 0
          i32.const 0
        end
        i32.const 1
        i32.xor
        global.set 0
        i32.const 0
      end
      global.get 0
      i32.xor
      global.set 0
      i32.const 1
    end
    global.get 0
    i32.xor
  )
)

@alexcrichton
Copy link
Member Author

For the test case above I used rr to record both the "good" backtracking algorithm and "bad" single_pass algorithm at runtime.

The f32.ne above compiles to jp + jne and the jp is a taken branch. The backtracking algorithm (the "good" execution) looks like this:

(rr) stepi
0x00007f02fad63022 in ?? ()
2: x/5i $pc
=> 0x7f02fad63022:      jp     0x7f02fad63048
   0x7f02fad63028:      jne    0x7f02fad63048
   0x7f02fad6302e:      mov    0x60(%rdi),%r11d
   0x7f02fad63032:      mov    %r11,%rsi
   0x7f02fad63035:      xor    $0x1,%esi
(rr) stepi
0x00007f02fad63048 in ?? ()
2: x/5i $pc
=> 0x7f02fad63048:      mov    %rax,%r8
   0x7f02fad6304b:      xor    $0x1,%r8d
   0x7f02fad6304f:      mov    %r8d,0x60(%rdi)
   0x7f02fad63053:      mov    %r8d,0x60(%rdi)
   0x7f02fad63057:      mov    %rbp,%rsp
(rr) print/x $rax
$2 = 0x1

Namely the jp branch is taken, and we're about to start the xor business of the wasm code itself originating from 0x1 in the %eax register.

The single_pass algorithm (the "bad" execution) looks like this:

0x00007fc8cc983062 in ?? ()
2: x/5i $pc
=> 0x7fc8cc983062:      jp     0x7fc8cc983078
   0x7fc8cc983068:      mov    0x30(%rsp),%rax
   0x7fc8cc98306d:      mov    %rax,0x28(%rsp)
   0x7fc8cc983072:      je     0x7fc8cc983086
   0x7fc8cc983078:      mov    0x28(%rsp),%rax
(rr) stepi
0x00007fc8cc983078 in ?? ()
2: x/5i $pc
=> 0x7fc8cc983078:      mov    0x28(%rsp),%rax
   0x7fc8cc98307d:      mov    %rax,(%rsp)
   0x7fc8cc983081:      jmp    0x7fc8cc9830cf
   0x7fc8cc983086:      mov    0x8(%rsp),%rsi
   0x7fc8cc98308b:      mov    0x60(%rsi),%edi
(rr) stepi
0x00007fc8cc98307d in ?? ()
2: x/5i $pc
=> 0x7fc8cc98307d:      mov    %rax,(%rsp)
   0x7fc8cc983081:      jmp    0x7fc8cc9830cf
   0x7fc8cc983086:      mov    0x8(%rsp),%rsi
   0x7fc8cc98308b:      mov    0x60(%rsi),%edi
   0x7fc8cc98308e:      mov    %rdi,0x20(%rsp)
(rr) print/x $eax
$4 = 0x86d64000
(rr) stepi
0x00007fc8cc983081 in ?? ()
2: x/5i $pc
=> 0x7fc8cc983081:      jmp    0x7fc8cc9830cf
   0x7fc8cc983086:      mov    0x8(%rsp),%rsi
   0x7fc8cc98308b:      mov    0x60(%rsi),%edi
   0x7fc8cc98308e:      mov    %rdi,0x20(%rsp)
   0x7fc8cc983093:      mov    0x20(%rsp),%rdx
(rr) stepi
0x00007fc8cc9830cf in ?? ()
2: x/5i $pc
=> 0x7fc8cc9830cf:      mov    (%rsp),%rbx
   0x7fc8cc9830d3:      xor    $0x1,%ebx
   0x7fc8cc9830d6:      mov    %rbx,0x10(%rsp)
   0x7fc8cc9830db:      mov    0x8(%rsp),%r12
   0x7fc8cc9830e0:      mov    0x10(%rsp),%r13
(rr)
0x00007fc8cc9830d3 in ?? ()
2: x/5i $pc
=> 0x7fc8cc9830d3:      xor    $0x1,%ebx
   0x7fc8cc9830d6:      mov    %rbx,0x10(%rsp)
   0x7fc8cc9830db:      mov    0x8(%rsp),%r12
   0x7fc8cc9830e0:      mov    0x10(%rsp),%r13
   0x7fc8cc9830e5:      mov    %r13d,0x60(%r12)
(rr) print/x $ebx
$6 = 0x86d64000

It again looks like the jp branch was (correctly) taken but the jump destination moves a stack-based variable through the %rax register (presumably this was elided with backtracking). The value being moved is not the same and when we reach the first xor instruction we're xor-ing into the wrong value.

So something about the phi nodes may be off? @cfallin does this look familiar at all? (or anything I can do to help dig in?)

@alexcrichton
Copy link
Member Author

Ok I've done a bit of further digging here using the above module (which notably doesn't need fuel which cleans up IR slightly).

First the two results I get are:

$ wasmtime compile ./foo.wat -C cranelift-regalloc-algorithm=backtracking && wasmtime run --allow-precompiled --invoke xxx foo.cwasm
warning: using `--invoke` with a function that returns values is experimental and may break in the future
1
$ wasmtime compile ./foo.wat -C cranelift-regalloc-algorithm=single_pass && wasmtime run --allow-precompiled --invoke xxx foo.cwasm
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

aka 1 is correct and 0 is wrong.

Looking at the objdump of the single_pass version I see:

      62:       0f 8a 10 00 00 00       jp     78 <wasm[0]::function[0]+0x78>
      68:       48 8b 44 24 30          mov    0x30(%rsp),%rax
      6d:       48 89 44 24 28          mov    %rax,0x28(%rsp)
      72:       0f 84 0e 00 00 00       je     86 <wasm[0]::function[0]+0x86>
      78:       48 8b 44 24 28          mov    0x28(%rsp),%rax
      7d:       48 89 04 24             mov    %rax,(%rsp)
      81:       e9 49 00 00 00          jmp    cf <wasm[0]::function[0]+0xcf>
...
      cf:       48 8b 1c 24             mov    (%rsp),%rbx
      d3:       83 f3 01                xor    $0x1,%ebx

Using the rr trace from above I know that the jp is being taken which means that the problem lies in 0x28(%rsp). The problem here is that 0x28(%rsp) isn't initialized right. This stack slot is only initialized before the instruction executed at 72, which is not run because jp skips over it. Instructions 68 and 6d are skipped and look like regalloc-inserted instructions.

The VCode for this function I generated with:

$ wasmtime compile ./foo.wat -C cranelift-regalloc-algorithm=single_pass --emit-clif clif
$ RUST_LOG=trace cargo run compile -D ../clif/wasm_func_0.clif --target x86_64 --set regalloc_algorithm=single_pass
...

The precise regalloc results are slightly different because wasmtime-the-CLI is 29.0.0 while clif-util is what's in-tree, but the general shape is the same. Notably the VCode looks like this:

VCode {
  Entry block: 0
Block 0([]):
    (original IR block: block0)
    (successor: Block 1([VReg(vreg = 232, class = Int)]))
    (successor: Block 2([]))
  Inst 0: args %v192=%rdi
...
  Inst 9: jp      label1
  Inst 10: jnz     label1; j label2
Block 1([VReg(vreg = 225, class = Int)]):
    (successor: Block 6([VReg(vreg = 225, class = Int)]))
  Inst 11: jmp     label6

I think the problem here is that VCode is lying to regalloc2 in that we no longer have "extended basic blocks". Notably branch-on-float comparisons (and I think only float comparisons) are represented with two MInst jumps (9 and 10 above). This means that a non-terminating instruction is allowed to leave the block, notably Inst 9: jp label1.

In this case I think regalloc2 is inserting code before Inst 10 which is not being executed on the jp label1, hence the "reading undefined stack slot" problem.

In talking with @cfallin it looks like this problem is scoped to the single_pass algorithm because that can insert code before a basic block terminator while backtracking will never insert code before a terminator. That would explain why (a) we've never seen this with backtracking and (b) why single_pass is fuzzing well with regalloc's checker but we're seeing issues here.

So I think the long-and-short of it is that we need to refactor how float comparisons work, notably the OrCondition of float comparisons because we can't have separate MInst instructions representing two jumps because then we're lying to regalloc2 about being in pure-basic-block-form.

@cfallin
Copy link
Member

cfallin commented Jan 21, 2025

while backtracking will never insert code before a terminator

Slightly more precisely (but we're still safe): it will never insert code before a terminator with multiple targets; this is such a case, because the branch sequence is "one-target branch ; two-target branch" (jmp_if + jmp_cond).

cfallin added a commit to cfallin/wasmtime that referenced this issue Jan 23, 2025
In bytecodealliance#9980, we saw that code copmiled with the single-pass register
allocator has incorrect behavior. We eventually narrowed this down to
the fact that the single-pass allocator is inserting code meant to be
at the end of a block, just before its terminator, *between* two
branches that form the terminator sequence. The allocator is correct;
the bug is with Cranelift's x64 backend.

When we produce instructions into a VCode container, we maintain basic
blocks, and we have the invariant (usual for basic block-based IR)
that only the last -- terminator -- instruction is a branch that can
leave the block. Even the conditional branches maintain this
invariant: though VCode is meant to be "almost machine code", we
emit *two-target conditionals* that are semantically like "jcond;
jmp". We then are able to optimize this inline during binary emission
in the `MachBuffer`: the buffer knows about unconditional and
conditional branches and will "chomp" branches off the tail of the
buffer whenever they target the fallthrough block. (We designed the
system this way because it is simpler to think about BBs that are
order-invariant, i.e., not bake the "fallthrough" concept into the
IR.) Thus we have a simpler abstraction but produce optimal terminator
sequences.

Unfortunately, when adding a branch-on-floating-point-compare
lowering, we had the need to branch to a target if either of *two*
conditions were true, and rather than add a new kind of terminator
instruction, we added a "one-armed branch": conditionally branch to
label or fall through. We emitted this in sequence right before the
actual terminator, so semantically it was almost equivalent.

I write "almost" because the register allocator *is* allowed to insert
spills/reloads/moves between any two instructions. Here the distinct
pieces of the terminator sequence matter: the allocator might insert
something just before the last instruction, assuming the basic-block
"single in, single out" invariant means this will always run with the
block. With one-armed branches this is no longer true.

The backtracking allocator (our original RA2 algorithm, and still the
default today) will never insert code at the end of a block when it
has multiple terminators, because it associates such block-start/end
insertions with *edges*; so in such conditions it inserts instructions
into the tops of successor blocks instead. But the single-pass
allocator needs to perform work at the end of every block, so it will
trigger this bug.

This PR removes `JmpIf` and converts the br-of-fcmp lowering to use
`JmpCondOr` instead, which is a pseudoinstruction that does `jcc1;
jcc2; jmp`. This maintains the BB invariant and fixes the bug.

Note that Winch still uses `JmpIf`, so we cannot remove it entirely:
this PR renames it to `WinchJmpIf` instead, and adds a mechanism to
assert failure if it is ever added to `VCode` (rather than emitted
directly, as Winch's macro-assembler does). We could instead write
Winch's `jmp_if` assembler function in terms of `JmpCond` with a
fallthrough label that is immediately bound, and let the MachBuffer
always chomp the jmp; I opted not to regress Winch compiler
performance by doing this. If one day we abstract out the assembler
further, we can remove `WinchJmpIf`.

This is one of two instances of a "one-armed branch"; the other is
s390x's `OneWayCondBr`, used in `br_table` lowerings, which we will
address separately. Once we do, that will address bytecodealliance#9980 entirely.
cfallin added a commit to cfallin/wasmtime that referenced this issue Jan 23, 2025
In bytecodealliance#9980, we saw that code copmiled with the single-pass register
allocator has incorrect behavior. We eventually narrowed this down to
the fact that the single-pass allocator is inserting code meant to be
at the end of a block, just before its terminator, *between* two
branches that form the terminator sequence. The allocator is correct;
the bug is with Cranelift's x64 backend.

When we produce instructions into a VCode container, we maintain basic
blocks, and we have the invariant (usual for basic block-based IR)
that only the last -- terminator -- instruction is a branch that can
leave the block. Even the conditional branches maintain this
invariant: though VCode is meant to be "almost machine code", we
emit *two-target conditionals* that are semantically like "jcond;
jmp". We then are able to optimize this inline during binary emission
in the `MachBuffer`: the buffer knows about unconditional and
conditional branches and will "chomp" branches off the tail of the
buffer whenever they target the fallthrough block. (We designed the
system this way because it is simpler to think about BBs that are
order-invariant, i.e., not bake the "fallthrough" concept into the
IR.) Thus we have a simpler abstraction but produce optimal terminator
sequences.

Unfortunately, when adding a branch-on-floating-point-compare
lowering, we had the need to branch to a target if either of *two*
conditions were true, and rather than add a new kind of terminator
instruction, we added a "one-armed branch": conditionally branch to
label or fall through. We emitted this in sequence right before the
actual terminator, so semantically it was almost equivalent.

I write "almost" because the register allocator *is* allowed to insert
spills/reloads/moves between any two instructions. Here the distinct
pieces of the terminator sequence matter: the allocator might insert
something just before the last instruction, assuming the basic-block
"single in, single out" invariant means this will always run with the
block. With one-armed branches this is no longer true.

The backtracking allocator (our original RA2 algorithm, and still the
default today) will never insert code at the end of a block when it
has multiple terminators, because it associates such block-start/end
insertions with *edges*; so in such conditions it inserts instructions
into the tops of successor blocks instead. But the single-pass
allocator needs to perform work at the end of every block, so it will
trigger this bug.

This PR removes `JmpIf` and converts the br-of-fcmp lowering to use
`JmpCondOr` instead, which is a pseudoinstruction that does `jcc1;
jcc2; jmp`. This maintains the BB invariant and fixes the bug.

Note that Winch still uses `JmpIf`, so we cannot remove it entirely:
this PR renames it to `WinchJmpIf` instead, and adds a mechanism to
assert failure if it is ever added to `VCode` (rather than emitted
directly, as Winch's macro-assembler does). We could instead write
Winch's `jmp_if` assembler function in terms of `JmpCond` with a
fallthrough label that is immediately bound, and let the MachBuffer
always chomp the jmp; I opted not to regress Winch compiler
performance by doing this. If one day we abstract out the assembler
further, we can remove `WinchJmpIf`.

This is one of two instances of a "one-armed branch"; the other is
s390x's `OneWayCondBr`, used in `br_table` lowerings, which we will
address separately. Once we do, that will address bytecodealliance#9980 entirely.
github-merge-queue bot pushed a commit that referenced this issue Jan 23, 2025
* Cranelift/x64 backend: do not use one-way branches.

In #9980, we saw that code copmiled with the single-pass register
allocator has incorrect behavior. We eventually narrowed this down to
the fact that the single-pass allocator is inserting code meant to be
at the end of a block, just before its terminator, *between* two
branches that form the terminator sequence. The allocator is correct;
the bug is with Cranelift's x64 backend.

When we produce instructions into a VCode container, we maintain basic
blocks, and we have the invariant (usual for basic block-based IR)
that only the last -- terminator -- instruction is a branch that can
leave the block. Even the conditional branches maintain this
invariant: though VCode is meant to be "almost machine code", we
emit *two-target conditionals* that are semantically like "jcond;
jmp". We then are able to optimize this inline during binary emission
in the `MachBuffer`: the buffer knows about unconditional and
conditional branches and will "chomp" branches off the tail of the
buffer whenever they target the fallthrough block. (We designed the
system this way because it is simpler to think about BBs that are
order-invariant, i.e., not bake the "fallthrough" concept into the
IR.) Thus we have a simpler abstraction but produce optimal terminator
sequences.

Unfortunately, when adding a branch-on-floating-point-compare
lowering, we had the need to branch to a target if either of *two*
conditions were true, and rather than add a new kind of terminator
instruction, we added a "one-armed branch": conditionally branch to
label or fall through. We emitted this in sequence right before the
actual terminator, so semantically it was almost equivalent.

I write "almost" because the register allocator *is* allowed to insert
spills/reloads/moves between any two instructions. Here the distinct
pieces of the terminator sequence matter: the allocator might insert
something just before the last instruction, assuming the basic-block
"single in, single out" invariant means this will always run with the
block. With one-armed branches this is no longer true.

The backtracking allocator (our original RA2 algorithm, and still the
default today) will never insert code at the end of a block when it
has multiple terminators, because it associates such block-start/end
insertions with *edges*; so in such conditions it inserts instructions
into the tops of successor blocks instead. But the single-pass
allocator needs to perform work at the end of every block, so it will
trigger this bug.

This PR removes `JmpIf` and converts the br-of-fcmp lowering to use
`JmpCondOr` instead, which is a pseudoinstruction that does `jcc1;
jcc2; jmp`. This maintains the BB invariant and fixes the bug.

Note that Winch still uses `JmpIf`, so we cannot remove it entirely:
this PR renames it to `WinchJmpIf` instead, and adds a mechanism to
assert failure if it is ever added to `VCode` (rather than emitted
directly, as Winch's macro-assembler does). We could instead write
Winch's `jmp_if` assembler function in terms of `JmpCond` with a
fallthrough label that is immediately bound, and let the MachBuffer
always chomp the jmp; I opted not to regress Winch compiler
performance by doing this. If one day we abstract out the assembler
further, we can remove `WinchJmpIf`.

This is one of two instances of a "one-armed branch"; the other is
s390x's `OneWayCondBr`, used in `br_table` lowerings, which we will
address separately. Once we do, that will address #9980 entirely.

* Add test for cascading branch-chomping behavior.

* keep the paperclip happy
cfallin added a commit to cfallin/wasmtime that referenced this issue Jan 23, 2025
This is a followup to bytecodealliance#10086, this time removing the one-armed branch
variant for s390x. This branch was only used as the default-target
branch in the `br_table` lowering. This PR incorporates the branch into
the `JTSequence` pseudo-instruction. Some care is needed to keep the
`ProducesBool` abstraction; it is unwrapped into its `ProducesFlags` and
the `JTSequence` becomes a `ConsumesFlags`, so the compare for the
jump-table bound (for default target) is not part of the pseudoinst.
(This is OK because regalloc-inserted moves never alter flags, by
explicit contract; the same reason allows cmp/branch terminators.)

Along the way I noticed that the comments on `JTSequence` claimed that
`targets` included the default, but this is (no longer?) the case, as
the targets are unwrapped by `jump_table_targets` which peels off the
first (default) separately. Aside from comments, this only affected
pretty-printing; codegen was correct.

With this, we have no more one-armed branches; hence, this fixes bytecodealliance#9980.
cfallin added a commit to cfallin/wasmtime that referenced this issue Jan 23, 2025
This is a followup to bytecodealliance#10086, this time removing the one-armed branch
variant for s390x. This branch was only used as the default-target
branch in the `br_table` lowering. This PR incorporates the branch into
the `JTSequence` pseudo-instruction. Some care is needed to keep the
`ProducesBool` abstraction; it is unwrapped into its `ProducesFlags` and
the `JTSequence` becomes a `ConsumesFlags`, so the compare for the
jump-table bound (for default target) is not part of the pseudoinst.
(This is OK because regalloc-inserted moves never alter flags, by
explicit contract; the same reason allows cmp/branch terminators.)

Along the way I noticed that the comments on `JTSequence` claimed that
`targets` included the default, but this is (no longer?) the case, as
the targets are unwrapped by `jump_table_targets` which peels off the
first (default) separately. Aside from comments, this only affected
pretty-printing; codegen was correct.

With this, we have no more one-armed branches; hence, this fixes bytecodealliance#9980.
@cfallin
Copy link
Member

cfallin commented Jan 23, 2025

I've confirmed that post-#10086 the above test now shows the same behavior (i.e., "wasm trap: call stack exhausted") under the single-pass allocator as under backtracking. Strictly speaking the underlying bug is still present on s390x until #10087 also merges (so I guess I'll leave this issue open for now) but hopefully we'll see the fuzzbug close as fixed on the next build.

github-merge-queue bot pushed a commit that referenced this issue Jan 23, 2025
* Cranelift/s390x: do not use one-way conditional branches.

This is a followup to #10086, this time removing the one-armed branch
variant for s390x. This branch was only used as the default-target
branch in the `br_table` lowering. This PR incorporates the branch into
the `JTSequence` pseudo-instruction. Some care is needed to keep the
`ProducesBool` abstraction; it is unwrapped into its `ProducesFlags` and
the `JTSequence` becomes a `ConsumesFlags`, so the compare for the
jump-table bound (for default target) is not part of the pseudoinst.
(This is OK because regalloc-inserted moves never alter flags, by
explicit contract; the same reason allows cmp/branch terminators.)

Along the way I noticed that the comments on `JTSequence` claimed that
`targets` included the default, but this is (no longer?) the case, as
the targets are unwrapped by `jump_table_targets` which peels off the
first (default) separately. Aside from comments, this only affected
pretty-printing; codegen was correct.

With this, we have no more one-armed branches; hence, this fixes #9980.

* Review feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzz-bug Bugs found by a fuzzer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants