Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsicpr #53

Open
wants to merge 23 commits into
base: v3.0-simple
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
235ab3b
Added support for separate mcode areas
fsfod Feb 9, 2016
c84b176
Initial support for intrinsics on x86/x64 interpreter only
fsfod Dec 4, 2015
e1a1721
Template intrinsics user machine code
fsfod Feb 9, 2016
c0797d3
Extended op_emit to support 2 byte vex opcodes and optionally expand …
fsfod Mar 29, 2016
f21526d
Added support for ymm registers in intrinsics
fsfod Mar 29, 2016
e6fecee
Added support for casting vectors to a pointer when using lj_cconv_ct…
fsfod Mar 29, 2016
de4c0b6
Implement support for opcodes with dynamic registers
fsfod Mar 29, 2016
7c697b0
Extended emit_op to support 4 byte opcodes based on checking a new fl…
fsfod Mar 29, 2016
57ff675
Added support for 4 byte opcode intrinsics
fsfod Mar 29, 2016
f7331e9
Added JIT support for intrinsics. Support for vector registers is NYI.
fsfod Mar 29, 2016
275d0dc
Treat IR_INTRN as potential load with respect to DSE
fsfod Mar 29, 2016
a374e90
Added a flag(s) for opcodes with non memory store side effects and en…
fsfod Mar 29, 2016
239f8ad
Added a JIT flag for AVX1 support
fsfod Mar 29, 2016
befcdc6
Added VEX opcode support for intrinsics
fsfod Mar 29, 2016
748091c
CSE support for intrinsics only enabled for single value returning in…
fsfod Mar 29, 2016
4eaf7f6
Fix store opcodes with dynamic destructive out register not being cor…
fsfod Mar 29, 2016
96f1f83
Wip Intrinsic documentation
fsfod Mar 29, 2016
cb3c483
Don't allow stitching to defeat our black listing of the loop in jit_…
fsfod Sep 19, 2017
ede73f2
Fixed LJ_GC64 builds breaking intrinsic interpreter wrappers in vario…
fsfod Sep 19, 2017
a81dcf6
Fix Intrinsics crashing in the JIT for LJ_GC64 because they were usin…
fsfod Sep 19, 2017
7a82653
Fix wrong function name in some assert_cdef calls for intrinsic CSE t…
fsfod Sep 19, 2017
1e9a1bb
Improve the runtests shell script and add support with testing with d…
fsfod Sep 19, 2017
26808ed
Change intrinsic REX opcode mode X to be silently ignored in 32 build…
fsfod Sep 19, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions Intrinsics.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Register configuration for the opcode
rM 1-2 dynamic input registers. Non VEX opcode are inferred to be destructive to the second input register.
mR
m Singe input Register/Mem with the register part of MorRM used asan extension to the opcode The last digit of the opcode will be stripped from the opcode and used as second register id.
rR Neither register supports fused load/store effectively the inverse of the Indirect flag.
? Template intrinsic that has no opcode but instead a blob of user machine code. Created with ffi.intrinsic("intrinsicname", codeptr[, codesize]). The code is
inlined when called in a trace, unless it has the Called flag set also. The code pointed to by the pointer passed in does not need tobe kept around either

_ optional divider to avoid the parser mistaking mode flags as being part of the opcode.

Op Mode Flags:

s(Side effects) Has non memory side effects and should never be optimized away in the JIT.

S(Memory side effects) Has observable memory side effects which needs a IR memory barrier to be emitted in the IR after intrinsic in the JIT.

c(Commutative) Opcode is commutative allowing its 2 input values to be swapped to allow better code generation in the JIT.

C(Called) Intrinsic should be emitted as a naked function that is called instead of copied into JITed code. Can be combined with Indirect flag to specify that
the passed in address to ffi.intrinsic is executable memory and shouldn't be copied to the wrapper.

I(Indirect) Set the memory part of ModRM to always be in indirect(fused load/store) mode. The first input parameter should now be address of the value.

E(Explicit Registers) The names of the parameters in FFI definition of the intrinsic explicitly specify the register name/kind for the parameter there declared on.
See also explicit dynamic registers. Explicit dynamic registers must be declared before fixed registers.

X(Extended register) Sets the REX.W bit for when emitting opcode which most the time is used to switch the register size from 32 to 64 bits or if opcode is VEX
encoded set the VEX.W bit instead.

v(Vex) The opcode supports optionally being re-encoded in VEX form if AVX is supported by the processor. This can be set on most SSE opcodes. If they are destructive in SSE
form they are inferred to be non destructive in VEX form otherwise the VVVV register part of the opcode is set to 1111b.

V(Force VEX) Opcode is only can only be encoded in VEX form. If the processor does not support AVX it will be flagged as unsupported but only triggers an error if you try access it from a C name space.

P(Prefix byte) Emit a prefix byte like lock(0xf0) before the opcode. The prefix value is specified after the op/mode string inside __mcode("opmode", prefix).

U(Immediate byte) Emit a byte after the opcode and its modrm value. The value of the byte is specified like Prefix byte but if the opcode also declares a prefix byte it
will be the second number after the op/mode string __mcode("opmode", prefix, immediate).


If the intrinsic has more than 1 dynamic output or fixed registers the intrinsic needs a __reglist entry after the __mcode definition __reglist(out, int eax)

Dynamic place holder register names are gpr32, gpr64, xmmf = float, xmm = double, xmmv = 128 bit vector

Fixed arrays can be passed in place of ffi vectors for intrinsic vector arguments. you can also declare vector argument as void* using explicit register "float vadd(void* xmmv, float4 xmmv)"

The wrapper generated for intrinsics also preserve any callee saved registers
2 changes: 1 addition & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@ LJLIB_C= $(LJLIB_O:.o=.c)
LJCORE_O= lj_gc.o lj_err.o lj_char.o lj_bc.o lj_obj.o lj_buf.o \
lj_str.o lj_tab.o lj_func.o lj_udata.o lj_meta.o lj_debug.o \
lj_state.o lj_dispatch.o lj_vmevent.o lj_vmmath.o lj_strscan.o \
lj_strfmt.o lj_strfmt_num.o lj_api.o lj_profile.o \
lj_strfmt.o lj_strfmt_num.o lj_api.o lj_profile.o lj_intrinsic.o\
lj_lex.o lj_parse.o lj_bcread.o lj_bcwrite.o lj_load.o \
lj_ir.o lj_opt_mem.o lj_opt_fold.o lj_opt_narrow.o \
lj_opt_dce.o lj_opt_loop.o lj_opt_split.o lj_opt_sink.o \
Expand Down
13 changes: 13 additions & 0 deletions src/lib_ffi.c
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
#include "lj_ff.h"
#include "lj_lib.h"

#include "lj_intrinsic.h"

/* -- C type checks ------------------------------------------------------- */

/* Check first argument for a C type and returns its ID. */
Expand Down Expand Up @@ -490,6 +492,16 @@ LJLIB_CF(ffi_cdef)
return 0;
}

LJLIB_CF(ffi_intrinsic)
{
#if LJ_HASINTRINSICS
lj_intrinsic_create(L);
return 1;
#else
lj_err_callermsg(L, "Intrinsics disabled");
#endif
}

LJLIB_CF(ffi_new) LJLIB_REC(.)
{
CTState *cts = ctype_cts(L);
Expand Down Expand Up @@ -849,6 +861,7 @@ LUALIB_API int luaopen_ffi(lua_State *L)
{
CTState *cts = lj_ctype_init(L);
settabV(L, L->top++, (cts->miscmap = lj_tab_new(L, 0, 1)));
lj_intrinsic_init(L);
cts->finalizer = ffi_finalizer(L);
LJ_LIB_REG(L, NULL, ffi_meta);
/* NOBARRIER: basemt is a GC root. */
Expand Down
1 change: 1 addition & 0 deletions src/lib_jit.c
Original file line number Diff line number Diff line change
Expand Up @@ -661,6 +661,7 @@ static uint32_t jit_cpudetect(lua_State *L)
#if LJ_HASJIT
flags |= ((features[2] >> 0)&1) * JIT_F_SSE3;
flags |= ((features[2] >> 19)&1) * JIT_F_SSE4_1;
flags |= ((features[2] >> 28)&1) * JIT_F_AVX1;
if (vendor[2] == 0x6c65746e) { /* Intel. */
if ((features[0] & 0x0fff0ff0) == 0x000106c0) /* Atom. */
flags |= JIT_F_LEA_AGU;
Expand Down
6 changes: 6 additions & 0 deletions src/lj_arch.h
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,12 @@
#define LJ_64 1
#endif

#if defined(LJ_TARGET_X86ORX64) && LJ_HASJIT && LJ_HASFFI
#define LJ_HASINTRINSICS 1
#else
#define LJ_HASINTRINSICS 0
#endif

#ifndef LJ_TARGET_UNALIGNED
#define LJ_TARGET_UNALIGNED 0
#endif
Expand Down
Loading