Skip to content

Latest commit

 

History

History
734 lines (492 loc) · 23.4 KB

rvv-intrinsic-api.md

File metadata and controls

734 lines (492 loc) · 23.4 KB

RISC-V Vector Extension Intrinsic API Reference Manual

1. Preface

These builtins targets on rvv 1.0-draft and trying to document rvv_intrinsics programming model.

2. Design Decisions and philosophy

Please see rvv-intrinsic-rfc.md

3. None

Keep this chapter none to aligned to riscv-v-spec chapters

4. None

Keep this chapter none to aligned to riscv-v-spec chapters

5. General Naming Rules

Please see rvv-intrinsic-rfc.md

6. Configuration-Setting and Utility Functions

Instructions

  • vsetvli
  • vsetvl

Set vl and vtype Functions

Set vl to VLMAX with specific vtype

Reinterpret Cast Conversion Functions

Reinterpret the contents of a data as a different type, without changing any bits and generating any RVV instructions.

Vector Initialization Functions

Vector LMUL Extension and Truncation Functions

These utility functions help users to truncate or extend current LMUL under same SEW regardless of vl, it won't change content of vl register.

Vector Insertion Functions

These utility functions help users insert a smaller LMUL value into a larger LMUL value. Does not support fractional LMULs. Index must be a constant expression less than the ratio between the larger and smaller LMUL.

Vector Extraction Functions

These utility functions help users extract a smaller LMUL value from a larger LMUL value. Does not support fractional LMULs. Index must be a constant expression less than the ratio between the larger and smaller LMUL.

Read/Write URW vector CSRs

enum RVV_CSR {
  RVV_VSTART = 0,
  RVV_VXSAT,
  RVV_VXRM,
  RVV_VCSR,
};

unsigned long __riscv_vread_csr(enum RVV_CSR csr);
void __riscv_vwrite_csr(enum RVV_CSR csr, unsigned long value);

unsigned long __riscv_vlenb();

7. Vector Loads and Stores

7.4. Vector Unit-Stride Operations

Instructions

  • vle<eew>.v
  • vse<eew>.v

7.5. Vector Strided Load/Store Operations

Instructions

  • vlse<eew>.v
  • vsse<eew>.v

7.6. Vector Indexed Load/Store Operations

Instructions

  • vlxei<eew>.v
  • vsxei<eew>.v
  • vsuxei<eew>.v

7.7. Unit-stride Fault-Only-First Loads Operations

Instructions

  • vle<eew>ff.v

Notes

  • The unit-stride fault-only-first load instruction is used to vectorize loops with data-dependent exit conditions (while loops). These instructions execute as a regular load except that they will only take a trap on element 0. If an element > 0 raises an exception, that element and all following elements in the destination vector register are not modified, and the vector length vl is reduced to the number of elements processed without a trap.

7.8. Vector Load/Store Segment Operations (Zvlsseg)

7.8.1. Vector Unit-Stride Segment Loads and Stores

Instructions

  • vlsege<eew>.v
  • vssege<eew>.v

7.8.2. Vector Strided Segment Loads and Stores

Instructions

  • vlssege<eew>.v
  • vsssege<eew>.v

7.8.3. Vector Indexed Segment Loads and Stores

Instructions

  • vlxsegei<eew>.v
  • vsxsegei<eew>.v

8. None

Keep this chapter none to aligned to riscv-v-spec chapters

9. None

Keep this chapter none to aligned to riscv-v-spec chapters

10. None

Keep this chapter none to aligned to riscv-v-spec chapters

11. Vector Integer Arithmetic Operations

11.1. Vector Single-Width Integer Add and Subtract

Instructions

  • vadd.{vv,vx,vi}
  • vsub.{vv,vx}
  • vrsub.{vx,vi}
  • vneg.v

11.2. Vector Widening Integer Add/Subtract Operations

Instructions

  • vwaddu.{vv,vx,wv,wx}
  • vwsubu.{vv,vx,wv,wx}
  • vwadd.{vv,vx,wv,wx}
  • vwsub.{vv,vx,wv,wx}
  • vwcvt.x.x.v
  • vwcvtu.x.x.v

11.3 Vector Integer Extension

Instructions

  • vzext.vf{2,4,8}
  • vsext.vf{2,4,8}

11.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Operations

Instructions

  • vadc.{vvm,vxm,vim}
  • vmadc.{vvm,vxm,vim}
  • vsbc.{vvm,vxm}
  • vmsbc.{vvm,vxm}

11.5. Vector Bitwise Logical Operations

Instructions

  • vand.{vv,vx,vi}
  • vxor.{vv,vx,vi}
  • vor.{vv,vx,vi}
  • vnot.v

11.6. Vector Single-Width Bit Shift Operations

Instructions

  • vsll.{vv,vx,vi}
  • vsrl.{vv,vx,vi}
  • vsra.{vv,vx,vi}

Notes

  • A full complement of vector shift instructions are provided, including logical shift left, and logical (zero-extending) and arithmetic (sign-extending) shift right.

11.7. Vector Narrowing Integer Right Shift Operations

Instructions

  • vnsra.{vv,vx,vi}
  • vnsrl.{vv,vx,vi}
  • vncvt.x.x.w

11.8. Vector Integer Comparison Operations

Instructions

  • vmseq.{vv,vx,vi}
  • vmsne.{vv,vx,vi}
  • vmsltu.{vv,vx,vi}
  • vmslt.{vv,vx,vi}
  • vmsleu.{vv,vx,vi}
  • vmsle.{vv,vx,vi}
  • vmsgtu.{vv.vx,vi}
  • vmsgt.{vv.vx,vi}

11.9. Vector Integer Min/Max Operations

Instructions

  • vminu.{vv,vx}
  • vmin.{vv,vx}
  • vmaxu.{vv,vx}
  • vmax.{vv,vx}

11.10. Vector Single-Width Integer Multiply Operations

Instructions

  • vmul.{vv,vx}
  • vmulh.{vv,vx}
  • vmulhu.{vv,vx}
  • vmulhsu.{vv,vx}

11.11. Vector Integer Divide Operations

Instructions

  • vdivu.{vv,vx}
  • vdiv.{vv,vx}
  • vremu.{vv,vx}
  • vrem.{vv,vx}

11.12. Vector Widening Integer Multiply Operations

Instructions

  • vwmul.{vv,vx}
  • vwmulu.{vv,vx}
  • vwmulsu.{vv,vx}

11.13. Vector Single-Width Integer Multiply-Add Operations

Instructions

  • vmacc.{vv,vx}
  • vnmsac.{vv,vx}
  • vmadd.{vv,vx}
  • vnmsub.{vv,vx}

11.14. Vector Widening Integer Multiply-Add Operations

Instructions

  • vwmaccu.{vv,vx}
  • vwmacc.{vv,vx}
  • vwmaccsu.{vv,vx}
  • vwmaccus.{vv,vx}

11.15. Vector Integer Merge Operations

Instructions

  • vmerge.{vvm,vxm,vim}

11.16. Vector Integer Move Operations

Instructions

  • vmv.v.v
  • vmv.v.x
  • vmv.v.i

12. Vector Fixed-Point Arithmetic Operations

12.1. Vector Single-Width Saturating Add and Subtract

Instructions

  • vsaddu.{vv,vx,vi}
  • vsadd.{vv,vx,vi}
  • vssubu.{vv,vx}
  • vssub.{vv,vx}

12.2. Vector Single-Width Averaging Add and Subtract

Instructions

  • vaadd.{vv,vx,vi}
  • vasub.{vv,vx}

12.3. Vector Single-Width Fractional Multiply with Rounding and Saturation

Instructions

  • vsmul.{vv,vx}

12.4. Vector Single-Width Scaling Shift Operations

Instructions

  • vssrl.{vv,vx,vi}
  • vssra.{vv,vx,vi}

12.5. Vector Narrowing Fixed-Point Clip Operations

Instructions

  • vnclipu.{wx,wv,wi}
  • vnclip.{wx,wv,wi}

13. Vector Floating-Point Operations

13.2. Vector Single-Width Floating-Point Add/Subtract Operations

Instructions

  • vfadd.{vv,vf}
  • vfsub.{vv,vf}
  • vfrsub.vf

13.3. Vector Widening Floating-Point Add/Subtract Operations

Instructions

  • vfwadd.{vv,vf,wv,wf}
  • vfwsub.{vv,vf,wv,wf}

13.4. Vector Single-Width Floating-Point Multiply/Divide Operations

Instructions

  • vfmul.{vv,vf}
  • vfdiv.{vv,vf}
  • vfrdiv.{vv,vf}

13.5. Vector Widening Floating-Point Multiply Operations

Instructions

  • vfwmul.{vv,vf}

13.6. Vector Single-Width Floating-Point Fused Multiply-Add Operations

Instructions

  • vfmacc.{vv,vf}
  • vfnmacc.{vv,vf}
  • vfmsac.{vv,vf}
  • vfnmsac.{vv,vf}
  • vfmadd.{vv,vf}
  • vfnmadd.{vv,vf}
  • vfmsub.{vv,vf}
  • vfnmsub.{vv,vf}

13.7. Vector Widening Floating-Point Fused Multiply-Add Operations

Instructions

  • vfwmacc.{vv,vf}
  • vfwnmacc.{vv,vf}
  • vfwmsac.{vv,vf}
  • vfwnmsac.{vv,vf}

13.8. Vector Floating-Point Square-Root Operations

Instructions

  • vfsqrt.v

13.9. Vector Floating-Point Reciprocal Square-Root Estimate Operations

  • vfrsqrt7.v

13.10. Vector Floating-Point Reciprocal Estimate Operations

  • vfrec7.v

13.11. Vector Floating-Point MIN/MAX Operations

  • vfmin.{vv,vf}
  • vfmax.{vv,vf}

13.12. Vector Floating-Point Sign-Injection Operations

Instructions

  • vfsgnj.{vv,vf}
  • vfsgnjn.{vv,vf}
  • vfsgnjx.{vv,vf}
  • vfneg.v
  • vfabs.v

13.13. Vector Floating-Point Compare Operations

Instructions

  • vmfeq.{vv,vf}
  • vmfne.{vv,vf}
  • vmflt.{vv,vf}
  • vmfle.{vv,vf}
  • vmfgt.{vv,vf}
  • vmfge.{vv,vf}

13.14. Vector Floating-Point Classify Operations

Instructions

  • vfclass.v

13.15. Vector Floating-Point Merge Operations

Instructions

  • vfmerge.vfm

13.16. Vector Floating-Point Move Operations

Instructions

  • vfmv.v.f

13.17. Single-Width Floating-Point/Integer Type-Convert Operations

Instructions

  • vfcvt.xu.f.v
  • vfcvt.x.f.v
  • vfcvt.rtz.xu.f.v
  • vfcvt.rtz.x.f.v
  • vfcvt.f.xu.v
  • vfcvt.f.x.v

13.18. Widening Floating-Point/Integer Type-Convert Operations

Instructions

  • vfwcvt.xu.f.v
  • vfwcvt.x.f.v
  • vfwcvt.rtz.xu.f.v
  • vfwcvt.rtz.x.f.v
  • vfwcvt.f.xu.v
  • vfwcvt.f.x.v
  • vfwcvt.f.f.v

13.19. Narrowing Floating-Point/Integer Type-Convert Operations

Instructions

  • vfncvt.xu.f.w
  • vfncvt.x.f.w
  • vfncvt.rtz.xu.f.w
  • vfncvt.rtz.x.f.w
  • vfncvt.f.xu.w
  • vfncvt.f.x.w
  • vfncvt.f.f.w
  • vfncvt.rod.f.f.w

14. Vector Reduction Operations

14.1. Vector Single-Width Integer Reduction Operations

Instructions

  • vredsum.vs
  • vredmaxu.vs
  • vredmax.vs
  • vredminu.vs
  • vredmin.vs
  • vredand.vs
  • vredor.vs
  • vredxor.vs

Notes

  • Reduction intrinsics will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

14.2. Vector Widening Integer Reduction Operations

Instructions

  • vwredsumu.vs
  • vwredsum.vs

Notes

  • Reduction intrinsics will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

14.3. Vector Single-Width Floating-Point Reduction Operations

Instructions

  • vfredosum.vs
  • vfredusum.vs
  • vfredmax.vs
  • vfredmin.vs

Notes

  • Reduction intrinsics will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

14.4. Vector Widening Floating-Point Reduction Operations

Instructions

  • vfwredosum.vs
  • vfwredusum.vs

Notes

  • Reduction intrinsics will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

15. Vector Mask Instructions

15.1. Vector Mask-Register Logical Operations

Instructions

  • vmand.mm
  • vmnand.mm
  • vmandn.mm
  • vmxor.mm
  • vmor.mm
  • vmnor.mm
  • vmorn.mm
  • vmxnor.mm
  • vmmv.m
  • vmclr.m
  • vmset.m
  • vmnot.m

15.2. Vector count population in mask vcpop.m

Instructions

  • vcpop.m

15.3. vfirst find-first-set mask bit

Instructions

  • vfirst.m

15.4. vmsbf.m set-before-first mask bit

Instructions

  • vmsbf.m

15.5. vmsif.m set-including-first mask bit

Instructions

  • vmsif.m

15.6. vmsof.m set-only-first mask bit

Instructions

  • vmsof.m

15.8. Vector Iota Operations

Instructions

  • viota.m

15.9. Vector Element Index Operations

Instructions

  • vid.v

16. Vector Permutation Operations

16.1. Integer Scalar Move Operations

Instructions

  • vmv.s.x
  • vmv.x.s

Notes

  • vmv.s.x intrinsic will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

16.2. Floating-Point Scalar Move Operations

Instructions

  • vfmv.f.s
  • vfmv.s.f

Notes

  • vfmv.s.f intrinsic will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

16.3. Vector Slide Operations

Instructions

  • vslideup.{vx,vi}
  • vslidedown.{vx,vi}
  • vslide1up.vx
  • vslide1down.vx
  • vfslide1up.vx
  • vfslide1down.vx

Notes

  • Unmasked vslideup and vslidedown intrinsics will generate code using tail undisturbed policy unless vundefined() is passed to the dst argument.

16.4. Vector Register Gather Operations

Instructions

  • vrgather.{vx,vi}

16.5. Vector Compress Operations

Instructions

  • vcompress.vm

Notes

  • vcompress intrinsics will generate code using tail undisturbed policy unless vundefined() is passed to the dest argument.

17. None

Keep this chapter none to aligned to riscv-v-spec chapters

18. Divided Element Extension ('Zvediv')

18.3. Vector Integer Dot-Product Operations

Instructions

  • vdotu.vv
  • vdot.vv

Intrinsic functions list

TODO

18.4. Vector Floating-Point Dot Product Operations

Instructions

  • vfdotu.vv

Intrinsic functions list

TODO

19. RVV Intrinsic Examples