-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Winch: refactor heap address computation #10076
Winch: refactor heap address computation #10076
Conversation
Subscribe to Label Action
This issue or pull request has been labeled: "winch"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Thanks for this and thanks for providing benchmarks. I'm personally not entirely convinced that the improvements in terms of code generation are worth the complexity of this change, for multiple reasons, but to name a few that come to mind: The provided benchmarks are probably not representative enough to warrant some of the complexity in the I fear that instruction-focused optimizations, like in this case, might start compromising the compiler's simplicity design principle; as a rule of thumb, I think it's worth considering new architectural patterns that can be adopted throughout the compiler, but we should probably take a step back when considering changes that have the potential to improve certain instructions at the cost of higher complexity. That said though, I really appreciate you taking the time to explore this; I decided to explore a slightly different alternative. My goal with this exploration was to achieve a similar result, but with potentially less changes required in the The high level idea:
Refer to the details section below for an (incomplete) diff of this idea: diff --git a/winch/codegen/src/codegen/mod.rs b/winch/codegen/src/codegen/mod.rs
index f2eb2f976..6648bc5c9 100644
--- a/winch/codegen/src/codegen/mod.rs
+++ b/winch/codegen/src/codegen/mod.rs
@@ -647,6 +647,7 @@ where
&mut self,
memarg: &MemArg,
access_size: OperandSize,
+ index: Index,
) -> Result<Option<Reg>> {
let ptr_size: OperandSize = self.env.ptr_type().try_into()?;
let enable_spectre_mitigation = self.env.heap_access_spectre_mitigation();
@@ -656,7 +657,6 @@ where
let memory_index = MemoryIndex::from_u32(memarg.memory);
let heap = self.env.resolve_heap(memory_index);
- let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
let offset = bounds::ensure_index_and_offset(
self.masm,
index,
@@ -680,6 +680,7 @@ where
self.emit_fuel_increment()?;
self.masm.trap(TrapCode::HEAP_OUT_OF_BOUNDS)?;
self.context.reachable = false;
+ self.context.free_reg(index.as_typed_reg().reg);
None
} else if !can_elide_bounds_check {
// Account for the general case for bounds-checked memories. The
@@ -840,38 +841,30 @@ where
}
/// Emit checks to ensure that the address at `memarg` is correctly aligned for `size`.
- fn emit_check_align(&mut self, memarg: &MemArg, size: OperandSize) -> Result<()> {
- if size.bytes() > 1 {
- // Peek addr from top of the stack by popping and pushing.
- let addr = *self
- .context
- .stack
- .peek()
- .ok_or_else(|| CodeGenError::missing_values_in_stack())?;
- let tmp = self.context.any_gpr(self.masm)?;
- self.context.move_val_to_reg(&addr, tmp, self.masm)?;
+ fn emit_check_align(&mut self, memarg: &MemArg, size: OperandSize, index: Index) -> Result<()> {
+ if size.bytes() < 2 {
+ return Ok(());
+ }
- if memarg.offset != 0 {
- self.masm.add(
- writable!(tmp),
- tmp,
- RegImm::Imm(Imm::I64(memarg.offset)),
- size,
- )?;
- }
+ let tmp = self.context.any_gpr(self.masm)?;
+ let index_reg = index.as_typed_reg().reg;
- self.masm.and(
+ self.masm
+ .mov(writable!(tmp), RegImm::reg(index_reg), size)?;
+
+ if memarg.offset != 0 {
+ self.masm.add(
writable!(tmp),
tmp,
- RegImm::Imm(Imm::I32(size.bytes() - 1)),
+ RegImm::Imm(Imm::I64(memarg.offset)),
size,
)?;
-
- self.masm.cmp(tmp, RegImm::Imm(Imm::i64(0)), size)?;
- self.masm.trapif(IntCmpKind::Ne, TRAP_HEAP_MISALIGNED)?;
- self.context.free_reg(tmp);
}
+ self.masm.cmp(tmp, RegImm::Imm(Imm::i64(0)), size)?;
+ self.masm.trapif(IntCmpKind::Ne, TRAP_HEAP_MISALIGNED)?;
+ self.context.free_reg(tmp);
+
Ok(())
}
@@ -879,9 +872,10 @@ where
&mut self,
memarg: &MemArg,
access_size: OperandSize,
+ index: Index,
) -> Result<Option<Reg>> {
- self.emit_check_align(memarg, access_size)?;
- self.emit_compute_heap_address(memarg, access_size)
+ self.emit_check_align(memarg, access_size, index)?;
+ self.emit_compute_heap_address(memarg, access_size, index)
}
/// Emit a WebAssembly load.
@@ -891,13 +885,16 @@ where
target_type: WasmValType,
kind: LoadKind,
op_kind: MemOpKind,
+ index: Index,
) -> Result<()> {
let maybe_addr = match op_kind {
- MemOpKind::Atomic => {
- self.emit_compute_heap_address_align_checked(&arg, kind.derive_operand_size())?
- }
+ MemOpKind::Atomic => self.emit_compute_heap_address_align_checked(
+ &arg,
+ kind.derive_operand_size(),
+ index,
+ )?,
MemOpKind::Normal => {
- self.emit_compute_heap_address(&arg, kind.derive_operand_size())?
+ self.emit_compute_heap_address(&arg, kind.derive_operand_size(), index)?
}
};
@@ -926,12 +923,13 @@ where
arg: &MemArg,
size: OperandSize,
op_kind: MemOpKind,
+ index: Index,
) -> Result<()> {
let src = self.context.pop_to_reg(self.masm, None)?;
let maybe_addr = match op_kind {
- MemOpKind::Atomic => self.emit_compute_heap_address_align_checked(&arg, size)?,
- MemOpKind::Normal => self.emit_compute_heap_address(&arg, size)?,
+ MemOpKind::Atomic => self.emit_compute_heap_address_align_checked(&arg, size, index)?,
+ MemOpKind::Normal => self.emit_compute_heap_address(&arg, size, index)?,
};
if let Some(addr) = maybe_addr {
@@ -1391,27 +1389,35 @@ where
size: OperandSize,
extend: Option<Extend<Zero>>,
) -> Result<()> {
- // Emission for this instruction is a bit trickier. The address for the CAS is the 3rd from
- // the top of the stack, and we must emit instruction to compute the actual address with
- // `emit_compute_heap_address_align_checked`, while we still have access to self. However,
- // some ISAs have requirements with regard to the registers used for some arguments, so we
- // need to pass the context to the masm. To solve this issue, we pop the two first
- // arguments from the stack, compute the address, push back the arguments, and hand over
- // the control to masm. The implementer of `atomic_cas` can expect to find `expected` and
- // `replacement` at the top the context's stack.
+ let clobbers = M::atomic_cas_clobbers();
- // pop the args
- let replacement = self.context.pop_to_reg(self.masm, None)?;
- let expected = self.context.pop_to_reg(self.masm, None)?;
-
- if let Some(addr) = self.emit_compute_heap_address_align_checked(arg, size)? {
- // push back the args
- self.context.stack.push(expected.into());
- self.context.stack.push(replacement.into());
+ let (replacement, expected, index) =
+ self.context
+ .without::<Result<(TypedReg, TypedReg, TypedReg)>, _, _>(
+ &clobbers,
+ self.masm,
+ |context, masm| {
+ Ok((
+ context.pop_to_reg(masm, None)?,
+ context.pop_to_reg(masm, None)?,
+ context.pop_to_reg(masm, None)?,
+ ))
+ },
+ )??;
+ if let Some(addr) =
+ self.emit_compute_heap_address_align_checked(arg, size, Index::from_typed_reg(index))?
+ {
let src = self.masm.address_at_reg(addr, 0)?;
- self.masm
- .atomic_cas(&mut self.context, src, size, UNTRUSTED_FLAGS, extend)?;
+ self.masm.atomic_cas(
+ &mut self.context,
+ src,
+ replacement.into(),
+ expected.into(),
+ size,
+ UNTRUSTED_FLAGS,
+ extend,
+ )?;
self.context.free_reg(addr);
}
diff --git a/winch/codegen/src/isa/aarch64/masm.rs b/winch/codegen/src/isa/aarch64/masm.rs
index 65b0a4eef..f0276c569 100644
--- a/winch/codegen/src/isa/aarch64/masm.rs
+++ b/winch/codegen/src/isa/aarch64/masm.rs
@@ -27,6 +27,7 @@ use cranelift_codegen::{
settings, Final, MachBufferFinalized, MachLabel,
};
use regalloc2::RegClass;
+use smallvec::smallvec;
use wasmtime_environ::{PtrSize, WasmValType};
/// Aarch64 MacroAssembler.
@@ -933,12 +934,18 @@ impl Masm for MacroAssembler {
&mut self,
_context: &mut CodeGenContext<Emission>,
_addr: Self::Address,
+ _replacement: Reg,
+ _expected: Reg,
_size: OperandSize,
_flags: MemFlags,
_extend: Option<Extend<Zero>>,
) -> Result<()> {
Err(anyhow!(CodeGenError::unimplemented_masm_instruction()))
}
+
+ fn atomic_cas_clobbers() -> smallvec::SmallVec<[Reg; 1]> {
+ smallvec![]
+ }
}
impl MacroAssembler {
diff --git a/winch/codegen/src/isa/x64/masm.rs b/winch/codegen/src/isa/x64/masm.rs
index 2250d7a25..f4f8985fe 100644
--- a/winch/codegen/src/isa/x64/masm.rs
+++ b/winch/codegen/src/isa/x64/masm.rs
@@ -5,6 +5,7 @@ use super::{
regs::{self, rbp, rsp},
};
use anyhow::{anyhow, bail, Result};
+use smallvec::smallvec;
use crate::masm::{
DivKind, Extend, ExtendKind, ExtractLaneKind, FloatCmpKind, Imm as I, IntCmpKind, LoadKind,
@@ -1544,42 +1545,33 @@ impl Masm for MacroAssembler {
&mut self,
context: &mut CodeGenContext<Emission>,
addr: Self::Address,
+ replacement: Reg,
+ expected: Reg,
size: OperandSize,
flags: MemFlags,
extend: Option<Extend<Zero>>,
) -> Result<()> {
- // `cmpxchg` expects `expected` to be in the `*a*` register.
- // reserve rax for the expected argument.
- let rax = context.reg(regs::rax(), self)?;
-
- let replacement = context.pop_to_reg(self, None)?;
-
- // mark `rax` as allocatable again.
- context.free_reg(rax);
- let expected = context.pop_to_reg(self, Some(regs::rax()))?;
-
- self.asm.cmpxchg(
- addr,
- expected.reg,
- replacement.reg,
- writable!(expected.reg),
- size,
- flags,
- );
+ // rax is marked as clobber for `atomic_cas`.
+ let dst = writable!(regs::rax());
+ self.asm.mov_rr(expected, dst, size); // TODO: Verify this size.
+ self.asm
+ .cmpxchg(addr, dst.to_reg(), replacement, dst, size, flags);
if let Some(extend) = extend {
// We don't need to zero-extend from 32 to 64bits.
if !(extend.from_bits() == 32 && extend.to_bits() == 64) {
- self.asm
- .movzx_rr(expected.reg.into(), writable!(expected.reg.into()), extend);
+ self.asm.movzx_rr(dst.to_reg().into(), dst, extend);
}
}
- context.stack.push(expected.into());
- context.free_reg(replacement);
+ context.free_reg(expected);
Ok(())
}
+
+ fn atomic_cas_clobbers() -> smallvec::SmallVec<[Reg; 1]> {
+ smallvec![regs::rax()]
+ }
}
impl MacroAssembler {
diff --git a/winch/codegen/src/masm.rs b/winch/codegen/src/masm.rs
index 7a96248ea..262ec81f4 100644
--- a/winch/codegen/src/masm.rs
+++ b/winch/codegen/src/masm.rs
@@ -10,6 +10,7 @@ use cranelift_codegen::{
ir::{Endianness, LibCall, MemFlags, RelSourceLoc, SourceLoc, UserExternalNameRef},
Final, MachBufferFinalized, MachLabel,
};
+use smallvec::SmallVec;
use std::{fmt::Debug, ops::Range};
use wasmtime_environ::PtrSize;
@@ -1439,8 +1440,13 @@ pub(crate) trait MacroAssembler {
&mut self,
context: &mut CodeGenContext<Emission>,
addr: Self::Address,
+ replacement: Reg,
+ expected: Reg,
size: OperandSize,
flags: MemFlags,
extend: Option<Extend<Zero>>,
) -> Result<()>;
+
+ /// Registers that will be clobbered by the atomic cas instruction.
+ fn atomic_cas_clobbers() -> SmallVec<[Reg; 1]>;
}
diff --git a/winch/codegen/src/visitor.rs b/winch/codegen/src/visitor.rs
index ab82afa76..b2c616d89 100644
--- a/winch/codegen/src/visitor.rs
+++ b/winch/codegen/src/visitor.rs
@@ -14,6 +14,7 @@ use crate::masm::{
SPOffset, ShiftKind, Signed, SplatKind, SplatLoadKind, TruncKind, VectorExtendKind, Zero,
};
+use crate::codegen::bounds::Index;
use crate::reg::{writable, Reg};
use crate::stack::{TypedReg, Val};
use anyhow::{anyhow, bail, ensure, Result};
@@ -1992,47 +1993,57 @@ where
}
fn visit_i32_load(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::Operand(OperandSize::S32),
MemOpKind::Normal,
+ index,
)
}
fn visit_i32_load8_s(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::ScalarExtend(Extend::<Signed>::I32Extend8.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i32_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::ScalarExtend(Extend::<Zero>::I32Extend8.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i32_load16_s(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::ScalarExtend(Extend::<Signed>::I32Extend16.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i32_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::ScalarExtend(Extend::<Zero>::I32Extend16.into()),
MemOpKind::Normal,
+ index,
)
}
@@ -2049,108 +2060,132 @@ where
}
fn visit_i64_load8_s(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Signed>::I64Extend8.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Zero>::I64Extend8.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Zero>::I64Extend16.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_load16_s(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Signed>::I64Extend16.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_load32_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Zero>::I64Extend32.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_load32_s(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Signed>::I64Extend32.into()),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_load(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::Operand(OperandSize::S64),
MemOpKind::Normal,
+ index,
)
}
fn visit_i64_store(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal, index)
}
fn visit_i64_store8(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Normal)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Normal, index)
}
fn visit_i64_store16(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Normal)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Normal, index)
}
fn visit_i64_store32(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal, index)
}
fn visit_f32_load(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::F32,
LoadKind::Operand(OperandSize::S32),
MemOpKind::Normal,
+ index,
)
}
fn visit_f32_store(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Normal, index)
}
fn visit_f64_load(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::F64,
LoadKind::Operand(OperandSize::S64),
MemOpKind::Normal,
+ index,
)
}
fn visit_f64_store(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Normal, index)
}
fn visit_i32_trunc_sat_f32_s(&mut self) -> Self::Output {
@@ -2256,94 +2291,115 @@ where
}
fn visit_i32_atomic_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::ScalarExtend(Extend::<Zero>::I32Extend8.into()),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i32_atomic_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::ScalarExtend(Extend::<Zero>::I32Extend16.into()),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i32_atomic_load(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I32,
LoadKind::Operand(OperandSize::S32),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i64_atomic_load8_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Zero>::I64Extend8.into()),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i64_atomic_load16_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Zero>::I64Extend16.into()),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i64_atomic_load32_u(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::ScalarExtend(Extend::<Zero>::I64Extend32.into()),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i64_atomic_load(&mut self, memarg: MemArg) -> Self::Output {
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
self.emit_wasm_load(
&memarg,
WasmValType::I64,
LoadKind::Operand(OperandSize::S64),
MemOpKind::Atomic,
+ index,
)
}
fn visit_i32_atomic_store(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic, index)
}
fn visit_i64_atomic_store(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S64, MemOpKind::Atomic, index)
}
fn visit_i32_atomic_store8(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic, index)
}
fn visit_i32_atomic_store16(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic, index)
}
fn visit_i64_atomic_store8(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S8, MemOpKind::Atomic, index)
}
fn visit_i64_atomic_store16(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S16, MemOpKind::Atomic, index)
}
fn visit_i64_atomic_store32(&mut self, memarg: MemArg) -> Self::Output {
- self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic)
+ let index = Index::from_typed_reg(self.context.pop_to_reg(self.masm, None)?);
+ self.emit_wasm_store(&memarg, OperandSize::S32, MemOpKind::Atomic, index)
}
fn visit_i32_atomic_rmw8_add_u(&mut self, arg: MemArg) -> Self::Output {
This approach isn't perfect either, as we are introducing a net new concept here: instruction requirements, which is something that didn't explicitly exist before. However, I think of this approach as a different version of what we have today, which is passing the |
You're implementation is indeed more elegant than what I came up with. But generally I agree with you. This would only be benificial for atomic operations, which by themselves are accepted to be somewhat costly. I don't think the cost of added complexity outweights the benefits. When I started this, I hoped that it would benefit more instructions, but that was a bad intuition. I'll close this one, but thanks for your feedback! :) |
This PR factors out heap address computation from the
CodeGen
struct, so that it can be passed as a callback to masm. The reason for this refactor is that it avoid uncontidional spill on x64 for atomic operations that have specific requirement on some register.This happens because the address for those instruction are under the operands, forcing us to pop those operands to registers, computing the address, and pushing the operands back on the stack. This danse made is almost certain that the required registers would be already allocated when needed, thus forcing a spill. By factring out address computation, we can pass it as a callback to the macro assembler, that can pop arguements in their natural order, reserving specific registers as necessary.
runnig a few benchs with sightglass (not all the testsuite is supported yet), there's no performance regression in doing so:
Those benches do not indicate improvement in performance, but that's probably because they don't rely on atomic operations. Looking at the generated asm in the
disas
tests, we can see that a good amount of insructions are shaved off: they correspond to the spills.