Winch: refactor heap address computation #10076

MarinPostma · 2025-01-22T14:05:26Z

This PR factors out heap address computation from the CodeGen struct, so that it can be passed as a callback to masm. The reason for this refactor is that it avoid uncontidional spill on x64 for atomic operations that have specific requirement on some register.

This happens because the address for those instruction are under the operands, forcing us to pop those operands to registers, computing the address, and pushing the operands back on the stack. This danse made is almost certain that the required registers would be already allocated when needed, thus forcing a spill. By factring out address computation, we can pass it as a callback to the macro assembler, that can pop arguements in their natural order, reserving specific registers as necessary.

runnig a few benchs with sightglass (not all the testsuite is supported yet), there's no performance regression in doing so:

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 19296541.11 ± 5094859.49 (confidence = 99%)

  refacto.so is 1.01x to 1.01x faster than main.so!

  [2361419777 2391240753.01 2441310853] main.so
  [2340893254 2371944211.90 2421142219] refacto.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [371434 422720.96 560161] main.so
  [341377 417196.75 494199] refacto.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [905704442 928015775.03 981552142] main.so
  [905318861 932491387.89 998574294] refacto.so

-------------------------------------------------------------------------

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [103974 128183.43 181890] main.so
  [105307 131811.77 177160] refacto.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [21460612 25083573.56 51889605] main.so
  [21764794 25410240.69 53043037] refacto.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [108341768 110171123.02 119094004] main.so
  [107632268 109609358.98 114981054] refacto.so

-------------------------------------------------------------------------

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 2168975.90 ± 1429157.42 (confidence = 99%)

  refacto.so is 1.00x to 1.02x faster than main.so!

  [230279362 234821937.90 252109774] main.so
  [229225690 232652962.00 243118560] refacto.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [310116 341486.65 583080] main.so
  [298291 337779.62 385882] refacto.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [227528308 234328619.97 266022424] main.so
  [227071476 232270717.37 262020199] refacto.so

Those benches do not indicate improvement in performance, but that's probably because they don't rely on atomic operations. Looking at the generated asm in the disas tests, we can see that a good amount of insructions are shaved off: they correspond to the spills.

github-actions · 2025-01-22T17:45:00Z

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

saulecabrera · 2025-01-23T18:08:21Z

Thanks for this and thanks for providing benchmarks.

I'm personally not entirely convinced that the improvements in terms of code generation are worth the complexity of this change, for multiple reasons, but to name a few that come to mind:

The provided benchmarks are probably not representative enough to warrant some of the complexity in the CodeGen module. Even though in the code generation output we can see that some instructions are improved, and that there's no change in compilation time, I think it'd be beneficial to see a more real-world benchmark, one that actually uses the instructions for which this refactor is introduced. I'm aware that there are probably none currently, which makes me think that if it might be worth waiting until we have some real world benchmarks that we can use to decide to take an informed decision: it'd be beneficial to compare concrete % improvements for runtime performance and similar for compilation performance.

I fear that instruction-focused optimizations, like in this case, might start compromising the compiler's simplicity design principle; as a rule of thumb, I think it's worth considering new architectural patterns that can be adopted throughout the compiler, but we should probably take a step back when considering changes that have the potential to improve certain instructions at the cost of higher complexity.

That said though, I really appreciate you taking the time to explore this; I decided to explore a slightly different alternative. My goal with this exploration was to achieve a similar result, but with potentially less changes required in the CodeGen module.

The high level idea:

Refactor emit_compute_heap_addr to take in an Index; this lifts the responsibility of having to pop the index, and gives more flexibility to the caller to decide when the heap address calculation should take place.
Introduce a atomic_cas_clobbers method at the MacroAssembler, which will inform the caller which registers are expected to be clobbered by a particular instruction.
When preparing the operations for atomic_cas, exclude any clobbers, via CodeGenContext::without

Refer to the details section below for an (incomplete) diff of this idea:

diff --git a/winch/codegen/src/codegen/mod.rs b/winch/codegen/src/codegen/mod.rs
index f2eb2f976..6648bc5c9 100644
--- a/winch/codegen/src/codegen/mod.rs
+++ b/winch/codegen/src/codegen/mod.rs
@@ -647,6 +647,7 @@ where
         &mut self,
         memarg: &MemArg,
         access_size: OperandSize,
+        index: Index,
     ) -> Result<Option<Reg>> {
         let ptr_size: OperandSize = self.env.ptr_type().try_into()?;
         let enable_spectre_mitigation = self.env.heap_access_spectre_mitigation();
@@ -656,7 +657,6 @@ where
 
         let memory_index = MemoryIndex::from_u32(memarg.memory);
         let heap = self.env.resolve_heap(memory_index);
-        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         let offset = bounds::ensure_index_and_offset(
             self.masm,
             index,
@@ -680,6 +680,7 @@ where
             self.emit_fuel_increment()?;
             self.masm.trap(TrapCode::HEAP_OUT_OF_BOUNDS)?;
             self.context.reachable = false;
+            self.context.free_reg(index.as_typed_reg().reg);
             None
         } else if !can_elide_bounds_check {
             // Account for the general case for bounds-checked memories. The
@@ -840,38 +841,30 @@ where
     }
 
     /// Emit checks to ensure that the address at `memarg` is correctly aligned for `size`.
-    fn emit_check_align(&mut self, memarg: &MemArg, size: OperandSize) -> Result<()> {
-        if size.bytes() > 1 {
-            // Peek addr from top of the stack by popping and pushing.
-            let addr = *self
-                .context
-                .stack
-                .peek()
-                .ok_or_else(|| CodeGenError::missing_values_in_stack())?;
-            let tmp = self.context.any_gpr(self.masm)?;
-            self.context.move_val_to_reg(&addr, tmp, self.masm)?;
+    fn emit_check_align(&mut self, memarg: &MemArg, size: OperandSize, index: Index) -> Result<()> {
+        if size.bytes() < 2 {
+            return Ok(());
+        }
 
-            if memarg.offset != 0 {
-                self.masm.add(
-                    writable!(tmp),
-                    tmp,
-                    RegImm::Imm(Imm::I64(memarg.offset)),
-                    size,
-                )?;
-            }
+        let tmp = self.context.any_gpr(self.masm)?;
+        let index_reg = index.as_typed_reg().reg;
 
-            self.masm.and(
+        self.masm
+            .mov(writable!(tmp), RegImm::reg(index_reg), size)?;
+
+        if memarg.offset != 0 {
+            self.masm.add(
                 writable!(tmp),
                 tmp,
-                RegImm::Imm(Imm::I32(size.bytes() - 1)),
+                RegImm::Imm(Imm::I64(memarg.offset)),
                 size,
             )?;
-
-            self.masm.cmp(tmp, RegImm::Imm(Imm::i64(0)), size)?;
-            self.masm.trapif(IntCmpKind::Ne, TRAP_HEAP_MISALIGNED)?;
-            self.context.free_reg(tmp);
         }
 
+        self.masm.cmp(tmp, RegImm::Imm(Imm::i64(0)), size)?;
+        self.masm.trapif(IntCmpKind::Ne, TRAP_HEAP_MISALIGNED)?;
+        self.context.free_reg(tmp);
+
         Ok(())
     }
 
@@ -879,9 +872,10 @@ where
         &mut self,
         memarg: &MemArg,
         access_size: OperandSize,
+        index: Index,
     ) -> Result<Option<Reg>> {
-        self.emit_check_align(memarg, access_size)?;
-        self.emit_compute_heap_address(memarg, access_size)
+        self.emit_check_align(memarg, access_size, index)?;
+        self.emit_compute_heap_address(memarg, access_size, index)
     }
 
     /// Emit a WebAssembly load.
@@ -891,13 +885,16 @@ where
         target_type: WasmValType,
         kind: LoadKind,
         op_kind: MemOpKind,
+        index: Index,
     ) -> Result<()> {
         let maybe_addr = match op_kind {
-            MemOpKind::Atomic => {
-                self.emit_compute_heap_address_align_checked(&arg, kind.derive_operand_size())?
-            }
+            MemOpKind::Atomic => self.emit_compute_heap_address_align_checked(
+                &arg,
+                kind.derive_operand_size(),
+                index,
+            )?,
             MemOpKind::Normal => {
-                self.emit_compute_heap_address(&arg, kind.derive_operand_size())?
+                self.emit_compute_heap_address(&arg, kind.derive_operand_size(), index)?
             }
         };
 
@@ -926,12 +923,13 @@ where
         arg: &MemArg,
         size: OperandSize,
         op_kind: MemOpKind,
+        index: Index,
     ) -> Result<()> {
         let src = self.context.pop_to_reg(self.masm, None)?;
 
         let maybe_addr = match op_kind {
-            MemOpKind::Atomic => self.emit_compute_heap_address_align_checked(&arg, size)?,
-            MemOpKind::Normal => self.emit_compute_heap_address(&arg, size)?,
+            MemOpKind::Atomic => self.emit_compute_heap_address_align_checked(&arg, size, index)?,
+            MemOpKind::Normal => self.emit_compute_heap_address(&arg, size, index)?,
         };
 
         if let Some(addr) = maybe_addr {
@@ -1391,27 +1389,35 @@ where
         size: OperandSize,
         extend: Option<Extend<Zero>>,
     ) -> Result<()> {
-        // Emission for this instruction is a bit trickier. The address for the CAS is the 3rd from
-        // the top of the stack, and we must emit instruction to compute the actual address with
-        // `emit_compute_heap_address_align_checked`, while we still have access to self. However,
-        // some ISAs have requirements with regard to the registers used for some arguments, so we
-        // need to pass the context to the masm. To solve this issue, we pop the two first
-        // arguments from the stack, compute the address, push back the arguments, and hand over
-        // the control to masm. The implementer of `atomic_cas` can expect to find `expected` and
-        // `replacement` at the top the context's stack.
+        let clobbers = M::atomic_cas_clobbers();
 
-        // pop the args
-        let replacement = self.context.pop_to_reg(self.masm, None)?;
-        let expected = self.context.pop_to_reg(self.masm, None)?;
-
-        if let Some(addr) = self.emit_compute_heap_address_align_checked(arg, size)? {
-            // push back the args
-            self.context.stack.push(expected.into());
-            self.context.stack.push(replacement.into());
+        let (replacement, expected, index) =
+            self.context
+                .without::<Result<(TypedReg, TypedReg, TypedReg)>, _, _>(
+                    &clobbers,
+                    self.masm,
+                    |context, masm| {
+                        Ok((
+                            context.pop_to_reg(masm, None)?,
+                            context.pop_to_reg(masm, None)?,
+                            context.pop_to_reg(masm, None)?,
+                        ))
+                    },
+                )??;
 
+        if let Some(addr) =
+            self.emit_compute_heap_address_align_checked(arg, size, Index::from_typed_reg(index))?
+        {
             let src = self.masm.address_at_reg(addr, 0)?;
-            self.masm
-                .atomic_cas(&mut self.context, src, size, UNTRUSTED_FLAGS, extend)?;
+            self.masm.atomic_cas(
+                &mut self.context,
+                src,
+                replacement.into(),
+                expected.into(),
+                size,
+                UNTRUSTED_FLAGS,
+                extend,
+            )?;
 
             self.context.free_reg(addr);
         }
diff --git a/winch/codegen/src/isa/aarch64/masm.rs b/winch/codegen/src/isa/aarch64/masm.rs
index 65b0a4eef..f0276c569 100644
--- a/winch/codegen/src/isa/aarch64/masm.rs
+++ b/winch/codegen/src/isa/aarch64/masm.rs
@@ -27,6 +27,7 @@ use cranelift_codegen::{
     settings, Final, MachBufferFinalized, MachLabel,
 };
 use regalloc2::RegClass;
+use smallvec::smallvec;
 use wasmtime_environ::{PtrSize, WasmValType};
 
 /// Aarch64 MacroAssembler.
@@ -933,12 +934,18 @@ impl Masm for MacroAssembler {
         &mut self,
         _context: &mut CodeGenContext<Emission>,
         _addr: Self::Address,
+        _replacement: Reg,
+        _expected: Reg,
         _size: OperandSize,
         _flags: MemFlags,
         _extend: Option<Extend<Zero>>,
     ) -> Result<()> {
         Err(anyhow!(CodeGenError::unimplemented_masm_instruction()))
     }
+
+    fn atomic_cas_clobbers() -> smallvec::SmallVec<[Reg; 1]> {
+        smallvec![]
+    }
 }
 
 impl MacroAssembler {
diff --git a/winch/codegen/src/isa/x64/masm.rs b/winch/codegen/src/isa/x64/masm.rs
index 2250d7a25..f4f8985fe 100644
--- a/winch/codegen/src/isa/x64/masm.rs
+++ b/winch/codegen/src/isa/x64/masm.rs
@@ -5,6 +5,7 @@ use super::{
     regs::{self, rbp, rsp},
 };
 use anyhow::{anyhow, bail, Result};
+use smallvec::smallvec;
 
 use crate::masm::{
     DivKind, Extend, ExtendKind, ExtractLaneKind, FloatCmpKind, Imm as I, IntCmpKind, LoadKind,
@@ -1544,42 +1545,33 @@ impl Masm for MacroAssembler {
         &mut self,
         context: &mut CodeGenContext<Emission>,
         addr: Self::Address,
+        replacement: Reg,
+        expected: Reg,
         size: OperandSize,
         flags: MemFlags,
         extend: Option<Extend<Zero>>,
     ) -> Result<()> {
-        // `cmpxchg` expects `expected` to be in the `*a*` register.
-        // reserve rax for the expected argument.
-        let rax = context.reg(regs::rax(), self)?;
-
-        let replacement = context.pop_to_reg(self, None)?;
-
-        // mark `rax` as allocatable again.
-        context.free_reg(rax);
-        let expected = context.pop_to_reg(self, Some(regs::rax()))?;
-
-        self.asm.cmpxchg(
-            addr,
-            expected.reg,
-            replacement.reg,
-            writable!(expected.reg),
-            size,
-            flags,
-        );
+        // rax is marked as clobber for `atomic_cas`.
+        let dst = writable!(regs::rax());
+        self.asm.mov_rr(expected, dst, size); // TODO: Verify this size.
+        self.asm
+            .cmpxchg(addr, dst.to_reg(), replacement, dst, size, flags);
 
         if let Some(extend) = extend {
             // We don't need to zero-extend from 32 to 64bits.
             if !(extend.from_bits() == 32 && extend.to_bits() == 64) {
-                self.asm
-                    .movzx_rr(expected.reg.into(), writable!(expected.reg.into()), extend);
+                self.asm.movzx_rr(dst.to_reg().into(), dst, extend);
             }
         }
 
-        context.stack.push(expected.into());
-        context.free_reg(replacement);
+        context.free_reg(expected);
 
         Ok(())
     }
+
+    fn atomic_cas_clobbers() -> smallvec::SmallVec<[Reg; 1]> {
+        smallvec![regs::rax()]
+    }
 }
 
 impl MacroAssembler {
diff --git a/winch/codegen/src/masm.rs b/winch/codegen/src/masm.rs
index 7a96248ea..262ec81f4 100644
--- a/winch/codegen/src/masm.rs
+++ b/winch/codegen/src/masm.rs
@@ -10,6 +10,7 @@ use cranelift_codegen::{
     ir::{Endianness, LibCall, MemFlags, RelSourceLoc, SourceLoc, UserExternalNameRef},
     Final, MachBufferFinalized, MachLabel,
 };
+use smallvec::SmallVec;
 use std::{fmt::Debug, ops::Range};
 use wasmtime_environ::PtrSize;
 
@@ -1439,8 +1440,13 @@ pub(crate) trait MacroAssembler {
         &mut self,
         context: &mut CodeGenContext<Emission>,
         addr: Self::Address,
+        replacement: Reg,
+        expected: Reg,
         size: OperandSize,
         flags: MemFlags,
         extend: Option<Extend<Zero>>,
     ) -> Result<()>;
+
+    /// Registers that will be clobbered by the atomic cas instruction.
+    fn atomic_cas_clobbers() -> SmallVec<[Reg; 1]>;
 }
diff --git a/winch/codegen/src/visitor.rs b/winch/codegen/src/visitor.rs
index ab82afa76..b2c616d89 100644
--- a/winch/codegen/src/visitor.rs
+++ b/winch/codegen/src/visitor.rs
@@ -14,6 +14,7 @@ use crate::masm::{
     SPOffset, ShiftKind, Signed, SplatKind, SplatLoadKind, TruncKind, VectorExtendKind, Zero,
 };
 
+use crate::codegen::bounds::Index;
 use crate::reg::{writable, Reg};
 use crate::stack::{TypedReg, Val};
 use anyhow::{anyhow, bail, ensure, Result};
@@ -1992,47 +1993,57 @@ where
     }
 
     fn visit_i32_load(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::Operand(OperandSize::S32),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i32_load8_s(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::ScalarExtend(Extend::<Signed>::I32Extend8.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i32_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::ScalarExtend(Extend::<Zero>::I32Extend8.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i32_load16_s(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::ScalarExtend(Extend::<Signed>::I32Extend16.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i32_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::ScalarExtend(Extend::<Zero>::I32Extend16.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
@@ -2049,108 +2060,132 @@ where
     }
 
     fn visit_i64_load8_s(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Signed>::I64Extend8.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Zero>::I64Extend8.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Zero>::I64Extend16.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_load16_s(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Signed>::I64Extend16.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_load32_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Zero>::I64Extend32.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_load32_s(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Signed>::I64Extend32.into()),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_load(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::Operand(OperandSize::S64),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_i64_store(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal, index)
     }
 
     fn visit_i64_store8(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Normal)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Normal, index)
     }
 
     fn visit_i64_store16(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Normal)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Normal, index)
     }
 
     fn visit_i64_store32(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal, index)
     }
 
     fn visit_f32_load(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::F32,
             LoadKind::Operand(OperandSize::S32),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_f32_store(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal, index)
     }
 
     fn visit_f64_load(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::F64,
             LoadKind::Operand(OperandSize::S64),
             MemOpKind::Normal,
+            index,
         )
     }
 
     fn visit_f64_store(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal, index)
     }
 
     fn visit_i32_trunc_sat_f32_s(&mut self) -> Self::Output {
@@ -2256,94 +2291,115 @@ where
     }
 
     fn visit_i32_atomic_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::ScalarExtend(Extend::<Zero>::I32Extend8.into()),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i32_atomic_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::ScalarExtend(Extend::<Zero>::I32Extend16.into()),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i32_atomic_load(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I32,
             LoadKind::Operand(OperandSize::S32),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i64_atomic_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Zero>::I64Extend8.into()),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i64_atomic_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Zero>::I64Extend16.into()),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i64_atomic_load32_u(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::ScalarExtend(Extend::<Zero>::I64Extend32.into()),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i64_atomic_load(&mut self, memarg: MemArg) -> Self::Output {
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
         self.emit_wasm_load(
             &memarg,
             WasmValType::I64,
             LoadKind::Operand(OperandSize::S64),
             MemOpKind::Atomic,
+            index,
         )
     }
 
     fn visit_i32_atomic_store(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic, index)
     }
 
     fn visit_i64_atomic_store(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Atomic, index)
     }
 
     fn visit_i32_atomic_store8(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic, index)
     }
 
     fn visit_i32_atomic_store16(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic, index)
     }
 
     fn visit_i64_atomic_store8(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic, index)
     }
 
     fn visit_i64_atomic_store16(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic, index)
     }
 
     fn visit_i64_atomic_store32(&mut self, memarg: MemArg) -> Self::Output {
-        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic)
+        let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+        self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic, index)
     }
 
     fn visit_i32_atomic_rmw8_add_u(&mut self, arg: MemArg) -> Self::Output {

This approach isn't perfect either, as we are introducing a net new concept here: instruction requirements, which is something that didn't explicitly exist before. However, I think of this approach as a different version of what we have today, which is passing the CodeGenContext to the MacroAssembler whenever there are ISA-specific requirements.

MarinPostma · 2025-01-23T18:51:32Z

You're implementation is indeed more elegant than what I came up with. But generally I agree with you. This would only be benificial for atomic operations, which by themselves are accepted to be somewhat costly. I don't think the cost of added complexity outweights the benefits. When I started this, I hoped that it would benefit more instructions, but that was a bad intuition.

I'll close this one, but thanks for your feedback! :)

introduce compute_heap_addr_ctx

227de61

github-actions bot added the winch Winch issues or pull requests label Jan 22, 2025

MarinPostma closed this Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Winch: refactor heap address computation #10076

Winch: refactor heap address computation #10076

MarinPostma commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

saulecabrera commented Jan 23, 2025

MarinPostma commented Jan 23, 2025

Winch: refactor heap address computation #10076

Winch: refactor heap address computation #10076

Conversation

MarinPostma commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

Subscribe to Label Action

saulecabrera commented Jan 23, 2025

MarinPostma commented Jan 23, 2025