From e57a159987faf01bae3b83e8fb1ab20c79d4b538 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Mon, 24 Jul 2023 08:00:06 -0500 Subject: [PATCH 01/17] A set of typographic errors and editorial updates were made. --- src/images/ddt-base.svg | 6 +- src/images/ddt-ext.svg | 6 +- src/images/guest-OS.svg | 6 +- src/images/hypervisor.svg | 6 +- src/images/msi-imsic.svg | 6 +- src/images/non-virt-OS.svg | 6 +- src/images/pdt.svg | 6 +- src/iommu.bib | 8 ++ src/iommu_data_structures.adoc | 74 ++++++++----- src/iommu_debug.adoc | 6 +- src/iommu_extensions.adoc | 5 + src/iommu_hw_guidelines.adoc | 2 + src/iommu_in_memory_queues.adoc | 69 +++++++----- src/iommu_intro.adoc | 36 +++--- src/iommu_preface.adoc | 48 ++++++++ src/iommu_registers.adoc | 187 ++++++++++++++++++++------------ src/iommu_sw_guidelines.adoc | 13 ++- src/riscv-iommu.adoc | 23 ++-- 18 files changed, 316 insertions(+), 197 deletions(-) create mode 100644 src/iommu_extensions.adoc create mode 100644 src/iommu_preface.adoc diff --git a/src/images/ddt-base.svg b/src/images/ddt-base.svg index 10e632e3..03420fca 100644 --- a/src/images/ddt-base.svg +++ b/src/images/ddt-base.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/images/ddt-ext.svg b/src/images/ddt-ext.svg index 671ca012..ac334791 100644 --- a/src/images/ddt-ext.svg +++ b/src/images/ddt-ext.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/images/guest-OS.svg b/src/images/guest-OS.svg index 7dff72de..f3f3e0a9 100644 --- a/src/images/guest-OS.svg +++ b/src/images/guest-OS.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/images/hypervisor.svg b/src/images/hypervisor.svg index eb8a5485..0a7b5e94 100644 --- a/src/images/hypervisor.svg +++ b/src/images/hypervisor.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/images/msi-imsic.svg b/src/images/msi-imsic.svg index d48df744..8b01415c 100644 --- a/src/images/msi-imsic.svg +++ b/src/images/msi-imsic.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/images/non-virt-OS.svg b/src/images/non-virt-OS.svg index 338547a1..4c79ed31 100644 --- a/src/images/non-virt-OS.svg +++ b/src/images/non-virt-OS.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/images/pdt.svg b/src/images/pdt.svg index bfeaadb2..d4e3543b 100644 --- a/src/images/pdt.svg +++ b/src/images/pdt.svg @@ -1,5 +1 @@ - - - - - + diff --git a/src/iommu.bib b/src/iommu.bib index 55e462c3..49ef1d72 100644 --- a/src/iommu.bib +++ b/src/iommu.bib @@ -17,3 +17,11 @@ @electronic{AIA title = {RISC-V Advanced Interrupt Architecture}, url = {https://github.com/riscv/riscv-aia} } +@electronic{CFI, + title = {RISC-V Shadow Stacks and Landing Pads}, + url = {https://github.com/riscv/riscv-cfi} +} +@electronic{PR243, + title = {Clarification updates to IOMMU v1.0.0}, + url = {https://github.com/riscv-non-isa/riscv-iommu/pull/243/commits} +} diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index b54bb8a1..765c993d 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -140,6 +140,8 @@ is not prohibited by this specification. The DDT is a 1, 2, or 3-level radix-tree indexed using the device directory index (DDI) bits of the `device_id` to locate a `DC`. +<<< + The following diagrams illustrate the DDT radix-tree. The PPN of the root device-directory-table is held in a memory-mapped register called the device-directory-table pointer (`ddtp`). @@ -150,7 +152,7 @@ next device-directory-table. A valid leaf device-directory-table entry holds the device-context (`DC`). .Three, two and single-level device directory with extended format `DC` -image::ddt-ext.svg[width=800,height=400] +image::ddt-ext.svg[width=800,height=400, align="center"] //["ditaa",shadows=false, separation=false, font=courier, fontsize: 16] //.... // +-------+-------+-------+ +-------+-------+ +-------+ @@ -174,7 +176,7 @@ image::ddt-ext.svg[width=800,height=400] //.... .Three, two and single-level device directory with base format `DC` -image::ddt-base.svg[width=800,height=400] +image::ddt-base.svg[width=800,height=400, align="center"] //["ditaa",shadows=false, separation=false, font=courier, fontsize: 16] //.... // +-------+-------+-------+ +-------+-------+ +-------+ @@ -213,6 +215,8 @@ A valid (`V==1`) non-leaf DDT entry provides the PPN of the next level DDT. ], config:{lanes: 2, hspace:1024, fontsize: 16}} .... +<<< + ==== Leaf DDT entry The leaf DDT page is indexed by `DDI[0]` and holds the device-context (`DC`). @@ -312,6 +316,15 @@ Such addresses also cannot be routed within the device when peer-to-peer transactions within the device (e.g. between functions of a device) are supported. +Use of `T2GPA` set to 1 may not be compatible with devices that implement caches +tagged by the translated address returned in response to a PCIe ATS Translation +Request. +==== + +<<< + +[NOTE] +==== Hypervisors that configure `T2GPA` to 1 must ensure through protocol-specific means that translated accesses are routed through the host such that the IOMMU may translate the GPA and then route the transaction based on PA to memory or @@ -319,10 +332,6 @@ to a peer device. For PCIe, for example, the Access Control Service (ACS) must be configured to always redirect peer-to-peer (P2P) requests upstream to the host. -Use of `T2GPA` set to 1 may not be compatible with devices that implement caches -tagged by the translated address returned in response to a PCIe ATS Translation -Request. - As an alternative to setting `T2GPA` to 1, the hypervisor may establish a trust relationship with the device if authentication protocols are supported by the device. For PCIe, for example, the PCIe component measurement and authentication @@ -406,8 +415,7 @@ When `SXL` is 1, the following rules apply: * If the first-stage is not Bare, then a page fault corresponding to the original access type occurs if the `IOVA` has bits beyond bit 31 set to 1. * If the second-stage is not Bare, then a guest page fault corresponding to the - original access type occurs if the incoming GPA has bits beyond bit 33 set to - 1. + original access type occurs if the incoming GPA has bits beyond bit 33 set to 1. ===== IO hypervisor guest address translation and protection (`iohgatp`) @@ -437,11 +445,11 @@ encodings are as follows: [[IOHGATP_MODE_ENC]] .Encodings of `iohgatp.MODE` field -[width=75%] -[%header, cols="3,3,20"] +[%autowidth,float="center",align="center"] +[%header, cols="^3,^3,20"] |=== 3+^| `fctl.GXL=0` -|Value | Name | Description +^|Value ^| Name ^| Description | 0 | `Bare` | No translation or protection. | 1-7 | -- | Reserved for standard use. | 8 | `Sv39x4` | Page-based 41-bit virtual addressing (2-bit extension @@ -524,11 +532,11 @@ address. [[IOSATP_MODE_ENC]] .Encodings of `iosatp.MODE` field -[width=75%] -[%header, cols="3,3,20"] +[%autowidth,float="center",align="center"] +[%header, cols="^3,^3,20"] |=== 3+^| `DC.tc.SXL=0` -|Value | Name | Description +^|Value ^| Name ^| Description | 0 | `Bare` | No translation or protection. | 1-7 | -- | Reserved for standard use. | 8 | `Sv39` | Page-based 39-bit virtual addressing. @@ -571,11 +579,11 @@ directly edit the PDT to associate a virtual-address space identified by a first-stage page table with a `process_id`. [[PDTP_MODE_ENC]] -.Encoding of `pdtp.MODE` field -[width=75%] -[%header, cols="3,3,20"] +.Encodings of `pdtp.MODE` field +[%autowidth,float="center",align="center"] +[%header, cols="^3,^3,20"] |=== -|Value | Name | Description +^|Value ^| Name ^| Description | 0 | `Bare` | No first-stage address translation or protection. | 1 | `PD8` | 8-bit process ID enabled. The directory has 1 levels with 256 entries.The bits 19:8 of `process_id` must be 0. @@ -607,11 +615,13 @@ defined by the Advanced Interrupt Architecture specification. The `msiptp.MODE` field is used to select the MSI address translation scheme. -.Encoding of `msiptp.MODE` field -[width=75%] -[%header, cols="3,3,20"] +<<< + +.Encodings of `msiptp.MODE` field +[%autowidth,float="center",align="center"] +[%header, cols="^3,^3,20"] |=== -|Value | Name | Description +^|Value ^| Name ^| Description | 0 | `Off` | Recognition of accesses to a virtual interrupt file using MSI address mask and pattern is not performed. @@ -882,6 +892,8 @@ misconfigured" (cause = 267). . `DC.tc.SXL` is 1 and `PC.fsc.MODE` is not one of the supported modes .. `capabilities.Sv32` is 0 and `PC.fsc.MODE` is `Sv32` +<<< + [NOTE] ==== Some `PC` fields hold supervisor physical addresses or @@ -1151,8 +1163,8 @@ file and translating the address using the MSI page table is as follows: process are equivalent to that of a regular RISC-V second-stage PTE with `R`=`W`=`U`=1 and `X`=0. Similar to a second-stage PTE, when checking the `U` bit, the transaction is treated as not requesting supervisor privilege. -. If the transaction is an Untranslated or Translated read-for-execute then stop - and report "Instruction access fault" (cause = 1). +.. If the transaction is an Untranslated or Translated read-for-execute then stop + and report "Instruction access fault" (cause = 1). . MSI address translation process is complete. [NOTE] @@ -1182,6 +1194,8 @@ PTEs atomically. When updating of A and D bits in second-stage PTEs is enabled memory access from the device using the translated address becomes globally visible. +<<< + [NOTE] ==== The A and D bits are never cleared by the IOMMU. If the supervisor software does @@ -1348,6 +1362,8 @@ of "Page Request". When the IOMMU generates the response, the status field of the response depends on the cause of the error. +<<< + The status is set to Response Failure if the following faults are encountered: * `ddtp.iommu_mode` is `Off` @@ -1399,6 +1415,8 @@ the following conditions: * "Page Request" could not be queued due to the page-request queue being full (`pqt == pqh - 1`) or had a overflow (`pqcsr.pqof == 1`). +<<< + [[CACHING]] === Caching in-memory data structures @@ -1424,10 +1442,10 @@ more IDs to tag the cached entries to identify a specific entry or a group of entries. .Identifiers used to tag IOATC entries -[width=90%] +[%autowidth,float="center",align="center"] [%header, cols="8,10,10"] |=== -|Data Structure cached |IDs used to tag entries | Invalidation command +^|Data Structure cached ^|IDs used to tag entries ^| Invalidation command |Device Directory Table |`device_id` | <> |Process Directory Table|`device_id`, `process_id` | <> |First-stage page table @@ -1498,8 +1516,8 @@ determined by `fctl.BE` or by `DC.tc.SBE` as follows: [[ENDIAN_CONFIG]] .Endianness of memory access to data structures -[width=75%] -[%header, cols="16,8"] +[%autowidth,float="center",align="center"] +[%header, cols="10,8"] |=== ^|Data Structure ^| Controlled by | Device directory table | `fctl.BE` diff --git a/src/iommu_debug.adoc b/src/iommu_debug.adoc index 25ee2f6b..f33b2b10 100644 --- a/src/iommu_debug.adoc +++ b/src/iommu_debug.adoc @@ -42,10 +42,10 @@ when the process completes (successfully or due to encountering a fault). When the `Go/Busy` bit goes from 1 to 0, a response is valid in the `tr_response` register. -The IOMMU behavior is `UNSPECIFIED` if: +When the `Go/Busy` bit is 1, the IOMMU behavior is `UNSPECIFIED` if: -* The `tr_req_iova` or `tr_req_ctl` are modified when the `Go/Busy` bit is 1. -* IOMMU configurations such as `ddtp.iommu_mode`, etc. are modified. +* The `tr_req_iova` or `tr_req_ctl` are modified. +* IOMMU configurations, such as `ddtp.iommu_mode`, are modified. The time to complete a translation request through this debug interface is `UNSPECIFIED` but is required to be finite. If the IOMMU is serving translation diff --git a/src/iommu_extensions.adoc b/src/iommu_extensions.adoc new file mode 100644 index 00000000..5332a466 --- /dev/null +++ b/src/iommu_extensions.adoc @@ -0,0 +1,5 @@ +[[extensions]] + +== IOMMU Extensions + +This chapter specifies the standard extensions to the IOMMU Base Architecture. diff --git a/src/iommu_hw_guidelines.adoc b/src/iommu_hw_guidelines.adoc index 17e2d3c5..bed0915b 100644 --- a/src/iommu_hw_guidelines.adoc +++ b/src/iommu_hw_guidelines.adoc @@ -53,6 +53,8 @@ enabling error detection, logging the detected errors (including their severity, nature, and location), and configuring means to report the error to an error handler. +<<< + Some errors, such as those in the IOATC, may be correctable by reloading the cached in-memory data structures when the error is detected. Such errors are not expected to affect the functioning of the IOMMU. diff --git a/src/iommu_in_memory_queues.adoc b/src/iommu_in_memory_queues.adoc index 6fdf4c4c..074868af 100644 --- a/src/iommu_in_memory_queues.adoc +++ b/src/iommu_in_memory_queues.adoc @@ -62,6 +62,8 @@ tail update are ordered such that the consumer that observes an update to the tail register must also observe all data produced into the queue between the offsets determined by the head and the tail. +<<< + [NOTE] ==== All RISC-V IOMMU implementations are required to support in-memory queues @@ -69,7 +71,6 @@ located in main memory. Supporting in-memory queues in I/O memory is not require but is not prohibited by this specification. ==== - === Command-Queue (CQ) Command queue is used by software to queue commands to be processed by the @@ -118,13 +119,13 @@ determined by `fctl.BE` (<>). The following command opcodes are defined: .IOMMU command opcodes -[width=100%] +[%autowidth,float="center",align="center"] [%header, cols="12,^12,70"] |=== |`opcode` | Encoding ^| Description |`IOTINVAL`| 1 | IOMMU page-table cache invalidation commands. |`IOFENCE` | 2 | IOMMU command-queue fence commands. -|`IOTDIR` | 3 | IOMMU directory cache invalidation commands. +|`IODIR` | 3 | IOMMU directory cache invalidation commands. |`ATS` | 4 | IOMMU PCIe cite:[PCI] ATS commands. | Reserved | 5-63 | Reserved for future standard use. | Custom | 64-127 | Designated for custom use. @@ -182,6 +183,8 @@ is Bare) are operated on. When `GV` is 0, the `GSCID` operand is ignored. When `AV` is 0, the `ADDR` operand is ignored. When `PSCV` operand is 0, the `PSCID` operand is ignored. +<<< + `IOTINVAL.VMA` ensures that previous stores made to the first-stage page tables by the harts are observed by the IOMMU before all subsequent implicit reads from IOMMU to the corresponding first-stage page tables. @@ -189,8 +192,8 @@ reads from IOMMU to the corresponding first-stage page tables. [[IVMA]] .`IOTINVAL.VMA` operands and operations -[width=75%] -[%header, cols="2,2,3,20"] +[%autowidth,float="center",align="center"] +[%header, cols="^2,^2,^3,20"] |=== |`GV`|`AV`|`PSCV`| Operation |0 |0 |0 | Invalidates all address-translation cache entries, including @@ -234,8 +237,8 @@ is illegal. [[IGVMA]] .`IOTINVAL.GVMA` operands and operations -[width=75%] -[%header, cols="2,2,20"] +[%autowidth,float="center",align="center"] +[%header, cols="^2,^2,20"] |=== | `GV` | `AV` | Operation | 0 | ignored| Invalidates information cached from any level of the @@ -245,8 +248,8 @@ is illegal. identified by the `GSCID` operand. | 1 | 1 | Invalidates information cached from leaf second-stage page table entries corresponding to the guest-physical-address in - `ADDR` operand, for only for VM address spaces identified - `GSCID` operand. + `ADDR` operand, but only for VM address spaces identified + by the `GSCID` operand. |=== [NOTE] @@ -257,7 +260,12 @@ that maps guest physical addresses to supervisor physical addresses. `IOTINVAL.GVMA` need not invalidate the former cache, but it must invalidate entries from the latter cache that match the `IOTINVAL.GVMA` address and `GSCID` operands. +==== +<<< + +[NOTE] +==== More commonly, implementations contain address-translation caches that map guest virtual addresses directly to supervisor physical addresses, removing a level of indirection. For such implementations, any entry whose guest virtual @@ -268,12 +276,13 @@ which is costly, and so a common technique is to invalidate all entries that match the `GSCID` argument, regardless of the address argument. Simpler implementations may ignore the operand of `IOTINVAL.VMA` and/or -`IOTINVAL.GVMA` and always perform a global invalidation of all +`IOTINVAL.GVMA` and perform a global invalidation of all address-translation entries. -==== -[NOTE] -==== +Some implementations may cache an identity-mapped translation for the stage of +address translation operating in `Bare` mode. Since these identity mappings +are invariably correct, an explicit invalidation is unnecessary. + A consequence of this specification is that an implementation may use any translation for an address that was valid at any time since the most recent `IOTINVAL` that subsumes that address. In particular, if a leaf PTE is @@ -378,16 +387,18 @@ may not pass previous posted writes. The ordering guarantees are made for accesses to main-memory. For accesses to I/O memory, the ordering guarantees are implementation and I/O protocol -defined. - -Simpler implementations may unconditionally order all previous memory accesses -globally. +defined. Simpler implementations may unconditionally order all previous memory +accesses globally. ==== -The `AV` command operand indicates if `ADDR[63:2]` operand and `DATA` operands are +The `AV` command operand indicates if `ADDR[63:2]` and `DATA` operands are valid. If `AV`=1, the IOMMU writes `DATA` to memory at a 4-byte aligned address `ADDR[63:2] * 4` as a 4-byte store when the command completes. When `AV` is 0, -the `ADDR[63:2]` and `DATA` operands are ignored. +the `ADDR[63:2]` and `DATA` operands are ignored. If the attempt to perform this +write encounters a memory fault, the `cmd_mf` bit in `cqcsr` <> is set to +signal this condition, and the `cqh` holds the index of the `IOFENCE.C` that +encountered such a memory fault and did not complete. + [NOTE] ==== @@ -608,8 +619,8 @@ The `CAUSE` is a code indicating the cause of the fault/event. [[FAULT_CAUSE]] .Fault record `CAUSE` field encodings -[width=75%] -[%header, cols="4,20,6"] +[%autowidth,float="center",align="center"] +[%header, cols="^4,20,^6"] |=== |CAUSE | Description | Reported if `DTF` is 1? |1 | Instruction access fault | No @@ -655,8 +666,8 @@ value assumed for reporting such faults is 0. The `TTYP` field reports inbound transaction type. .Fault record `TTYP` field encodings -[width=75%] -[%header, cols="3,20"] +[%autowidth,float="center",align="center"] +[%header, cols="^3,20"] |=== |TTYP | Description |0 | None. Fault not caused by an inbound transaction. @@ -673,10 +684,10 @@ The `TTYP` field reports inbound transaction type. |31 - 63| Designated for custom use |=== -If the `TTYP` is a transaction with an IOVA then its reported in `iotval`. If -the `TTYP` is a PCIe message request then the message code is reported in `iotval`. -If `TTYP` is 0, then the value reported in `iotval` and `iotval2` fields is -as defined by the `CAUSE`. +If the `TTYP` is a transaction with an IOVA, the IOVA is reported in `iotval`. If +the `TTYP` is a PCIe message request, the message code of the PCIe message +is reported in `iotval`. If `TTYP` is 0, the values reported in `iotval` and +`iotval2` fields are as defined by the `CAUSE`. [NOTE] ==== @@ -688,6 +699,8 @@ bridge in some implementations may not provide the page offset part of the Likewise, an IOMMU may report the page offset of a GPA in `iotval2` as 0. ==== +<<< + `DID` holds the `device_id` of the transaction. If `PV` is 0, then `PID` and `PRIV` are 0. If `PV` is 1, the `PID` holds a `process_id` of the transaction and if the privilege of the transaction was Supervisor then the `PRIV` bit is 1 @@ -724,7 +737,7 @@ fault-queue memory, the IOMMU sets the fault-queue memory access fault (`fqmf`) bit in `fqcsr`. While either error bit is set in `fqcsr`, the IOMMU discards the record that led to the fault and all further fault records. When an error bit in `fqcsr` is 1 or when a new fault record is produced in the fault-queue, -the fault interrupt pending (`fip`) bit is set in the `ipsr` if interrupts from +the fault interrupt pending (`fip`) bit is set in `ipsr` if interrupts from the fault-queue are enabled i.e. `fqcsr.fie` is 1. The IOMMU may identify multiple requests as having detected an identical fault. diff --git a/src/iommu_intro.adoc b/src/iommu_intro.adoc index f7d341a0..6fcbfe2f 100644 --- a/src/iommu_intro.adoc +++ b/src/iommu_intro.adoc @@ -52,8 +52,8 @@ management complexity for DMA. Use of an identical format also allows the same page tables to be used simultaneously by both the CPU MMU and the IOMMU. Although there is no option to disable two-stage address translation, either -stage may be effectivly disabled by configuring the virtual memory scheme for -that stage to be `Bare` i.e. perfom no address translation or memory protection. +stage may be effectively disabled by configuring the virtual memory scheme for +that stage to be `Bare` i.e. perform no address translation or memory protection. The virtual memory scheme employed by the IOMMU may be configured individually per device in the IOMMU. Devices perform DMA using an I/O virtual address (IOVA). @@ -87,7 +87,7 @@ is a VA. Two-stage address translation is in effect. The first-stage translates the VA to a GPA and the second-stage translates the GPA to a SPA. Each stage enforces the configured memory protections. Such a configuration would be typically be employed when the device control is passed-through to a virtual -machine and the Guest OS in the VM uses the first-stage addresss translation to +machine and the Guest OS in the VM uses the first-stage address translation to further constrain the memory accessed by such devices and associated privileges and memory protections. Comparing to a RISC-V hart, this configuration is analogous to two-stage address translation being in effect on a RISC-V hart with @@ -190,10 +190,6 @@ in the device context. address space of a process. The PASID value is provided in the PASID TLP prefix of the request. | PBMT | Page-Based Memory Types. -| PPN | Physical Page Number. -| PRI | Page Request Interface - a PCIe protocol that enables - devices to request OS memory manager services to make pages - resident cite:[PCI]. | PC | Process Context. | PCIe | Peripheral Component Interconnect Express bus standard cite:[PCI]. @@ -293,6 +289,10 @@ in <> the OS may configure the IOMMU with a page table to translate the IOVA and thereby limit the addresses that may be accessed to those allowed by the page table. +[[fig:device-isolation]] +.Device isolation in non-virtualized OS +image::non-virt-OS.svg[width=300,height=300, align="center"] + Legacy 32-bit devices cannot access the memory above 4 GiB. The IOMMU, through its address remapping capability, offers a simple mechanism for the device to directly access any address in the system (with appropriate access permission). @@ -313,9 +313,6 @@ When the IOMMU is used by a non-virtualized OS, the first-stage suffices to provide the required address translation and protection function and the second-stage may be set to Bare. -[[fig:device-isolation]] -.Device isolation in non-virtualized OS -image::non-virt-OS.svg[width=300,height=300] //["ditaa",shadows=false, separation=false, fontsize: 16] //.... @@ -360,7 +357,7 @@ and from D2 to VM-2 associated memory. [[fig:dma-translation-direct-device-assignment]] .DMA translation to enable direct device assignment -image::hypervisor.svg[width=300,height=300] +image::hypervisor.svg[width=300,height=300, align="center"] //["ditaa",shadows=false, separation=false, fontsize: 16] //.... //+----------------+ +----------------+ @@ -394,7 +391,7 @@ address, the same as supported by regular RISC-V page-based address translation. [[MSI_REDIR]] .MSI address translation to direct guest programmed MSI to IMSIC guest interrupt files -image::msi-imsic.svg[width=500,height=400] +image::msi-imsic.svg[width=500,height=400, align="center"] //["ditaa",shadows=false, separation=false, font=courier, fontsize: 16] //.... // +-----------------------+ @@ -440,6 +437,10 @@ hypervisor. <> illustrates the concept. +[[fig:iommu-for-guest-os]] +.Address translation in IOMMU for Guest OS +image::guest-OS.svg[width=500,height=400, align="center"] + The IOMMU is configured to perform address translation using a first-stage and second-stage page table for device D1. The second-stage is typically used by the hypervisor to translate GPA to SPA and limit the device D1 to memory @@ -452,9 +453,6 @@ The host OS or hypervisor may also retain a device, such as D3, for its own use. The first-stage suffices to provide the required address translation and protection function for device D3 and the second-stage is set to Bare. -[[fig:iommu-for-guest-os]] -.Address translation in IOMMU for Guest OS -image::guest-OS.svg[width=500,height=400] //["ditaa",shadows=false, separation=false, fontsize: 16] //.... //+---------------------------------------------------+ @@ -504,6 +502,8 @@ protocol by the IOMMU. The example shows an endpoint device with a device side ATC (DevATC) that holds translations obtained by the device from IOMMU 0 using the PCIe ATS protocol cite:[PCI]. +<<< + When such IO-protocol-to-system-fabric-protocol translation using a Root Port is not required, the devices may interface directly with the system fabric. The second IOMMU instance, IOMMU 1 (associated with the IO Bridge 1), @@ -596,10 +596,8 @@ image::interfaces.svg[width=800] Similar to the RISC-V harts, physical memory attributes (PMA) and physical memory protection (PMP) checks must be completed on all inbound IO transactions even when the IOMMU is in bypass (`Bare` mode). The placement and integration of -the PMA and PMP checkers is a platform choice. - -PMA and PMP checkers reside outside the IOMMU. The example above is showing -them in the IO Bridge. +the PMA and PMP checkers is a platform choice. PMA and PMP checkers reside +outside the IOMMU. The example above is showing them in the IO Bridge. Implicit accesses by the IOMMU itself through the Data Structure interface are checked by the PMA checker. PMAs are tightly tied to a given physical platform’s diff --git a/src/iommu_preface.adoc b/src/iommu_preface.adoc new file mode 100644 index 00000000..3df69ca4 --- /dev/null +++ b/src/iommu_preface.adoc @@ -0,0 +1,48 @@ +== Preface + +[.big]*_Preface to Version 20240901_* + +Chapters 2 through 8 of this document form the RISC-V IOMMU Base Architecture +Specification. Chapter 9 includes the standard extensions to the base +architecture. This release, version 20240901, contains the following versions +of the RISC-V IOMMU Base Architecture specification and standard extensions: + +[%autowidth,float="center",align="center",cols="^,^,^",options="header",] +|=== +| Specification |Version |Status +|*RISC-V IOMMU Base Architecture specification* + + *Quality-of-Service (QoS) Identifiers Extension* + |*1.0* + + *1.0* + |*Ratified* + + *Ratified* +|=== + +The following backward-compatible changes, comprising a set of clarifications +and corrections, have been made since version 1.0.0: + +* A set of typographic errors and editorial updates were made. +* Translations cached, if any, in `Bare` mode do not require invalidation. +* Clarified that memory faults encountered by commands also set the `cqmf` flag. +* Values tested by algorithms in SW Guidelines are before modifications made by + the algorithms. +* Included SW guidelines for modifying non-leaf PDT entries. +* Clarified the behavior for in-flight transactions observed at the time of `ddtp` + write operations. +* Clarified the behavior when `IOTINVAL` is invoked with an invalid address. +* Stated that faults leading to UR/CA ATS responses are reported in the Fault Queue. +* Added a detailed description of the `capabilities.PAS` field. +* SW guidelines for changing IOMMU modes and programming `tr_req_ctl` and HPM + counters. +* PCIe ATS Translation Resp. grants execute permission only if requested. +* Clarified the handling of hardware implementations that internally split + 8-byte transactions. +* Shadow stack encodings introduced by Zicfiss are reserved for IOMMU use. +* Listed the fault codes reported for faults detected by Page Request. +* Updated Fig 31 to remove the unused Destination ID field for ATS.PRGR + +These changes were made through PR#243 cite:[PR243]. + +[.big]*_Preface to Version 1.0.0_* + +* Ratified version of the RISC-V IOMMU Architecture Specification. diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 56305e16..01876826 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -14,16 +14,16 @@ performed. [NOTE] ==== -The 8 byte IOMMU registers are defined in such a way that software can perform -two individual 4 byte accesses, or hardware can perform two independent 4 byte -transactions resulting from an 8 byte access, to the high and low halves of the -register as long as the register semantics, with regards to side-effects, are +The 8-byte IOMMU registers are defined in such a way that software can perform +two individual 4-byte accesses, or hardware can perform two independent 4-byte +transactions resulting from an 8-byte access, to the high and low halves of the +register, as long as the register semantics, with regard to side-effects, are respected between the two software accesses, or two hardware transactions, respectively. ==== -The IOMMU registers have little-endian byte order (even for systems where -all harts are big-endian-only). +The IOMMU registers have little-endian byte order, even for systems where +all harts are big-endian-only. [NOTE] ==== @@ -40,14 +40,14 @@ the register returns 0 and writes to that offset are ignored. === Register layout .IOMMU Memory-mapped register layout -[width=100%] +[%autowidth,float="center",align="center"] [%header, cols="^3,6,^3, 12, 10"] |=== -|Offset|Name |Size|Description | Is Optional? +|Offset ^|Name |Size ^|Description ^| Is Optional? |0 |`capabilities` |8 |<> | No |8 |`fctl` |4 |<> | No -|12 |_custom_ |4 |_Designated For custom use_ | +|12 |custom |4 |Designated For custom use | |16 |`ddtp` |8 |<> | No |24 |`cqb` |8 |<> | No @@ -68,8 +68,10 @@ the register returns 0 and writes to that offset are ignored. CSR >> | if `capabilities.ATS==0` |84 |`ipsr` |4 |<>| No -|88 |`iocntovf` |4 |<> | if `capabilities.HPM==0` -|92 |`iocntinh` |4 |<> | if `capabilities.HPM==0` +|88 |`iocountovf` |4 |<> | if `capabilities.HPM==0` +|92 |`iocountinh` |4 |<> | if `capabilities.HPM==0` |96 |`iohpmcycles` |8 |<> | if `capabilities.HPM==0` |104 |`iohpmctr1-31` |248 |<> | if `capabilities.HPM==0` |352 |`iohpmevt1-31` |248 |<> | if `capabilities.HPM==0` @@ -81,8 +83,8 @@ the register returns 0 and writes to that offset are ignored. response>> | if `capabilities.DBG==0` |624 |Reserved |64 |Reserved for future use (`WPRI`) | -|688 |_custom_ |72 |_Designated for custom use - (`WARL`)_ | +|688 |custom |72 |Designated for custom use + (`WARL`) | |760 |`icvec` |8 |<> | No |768 |`msi_cfg_tbl` |256 |<>, and accesses to in-memory queues are performed @@ -314,7 +318,7 @@ are enabled (i.e. `cqcsr.cqon/cqen == 1`, `fqcsr.fqon/cqen == 1`, or may be used for guest physical addresses as defined in <>. |15:3 |reserved |WPRI | Reserved for standard use. -|31:16 |_custom_ |WPRI | _Designated for custom use._ +|31:16 |custom |WPRI | Designated for custom use. |=== [[DDTP]] @@ -334,12 +338,12 @@ are enabled (i.e. `cqcsr.cqon/cqen == 1`, `fqcsr.fqon/cqen == 1`, or [width=100%] [%header, cols="^1,2,^1,5"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |3:0 |`iommu_mode` |WARL a| The IOMMU may be configured to be in the following modes: [%header, cols="^1,1,3"] !=== - !Value !Name ! Description + ^!Value ^!Name ^! Description !0 ! `Off` ! No inbound memory transactions are allowed by the IOMMU. @@ -403,7 +407,7 @@ the previous value of the `iommu_mode` is not `Off` or `Bare` is `UNSPECIFIED`. To change DDT levels, the IOMMU must first be transitioned to `Bare` or `Off` state. -When an IOMMU is transitioned to `Bare` of `Off` state, the IOMMU may retain +When an IOMMU is transitioned to `Bare` or `Off` state, the IOMMU may retain information cached from in-memory data structures such as page tables, DDT, PDT, etc. Software must use suitable invalidation commands to invalidate cached entries. @@ -441,7 +445,7 @@ assume a valid but otherwise `UNSPECIFIED` value. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field ^|Attribute ^| Description |4:0 |`LOG2SZ-1` |WARL a| The `LOG2SZ-1` field holds the number of entries in command-queue as a log to base 2 minus 1. @@ -488,7 +492,7 @@ the IOMMU will fetch the next command. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |31:0 |`index` |RO | Holds the `index` into the command-queue from where the next command will be fetched by the IOMMU. |=== @@ -510,7 +514,7 @@ the software queues the next command for the IOMMU. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |31:0 |`index` |WARL | Holds the `index` into the command-queue where software queues the next command for IOMMU. Only `LOG2SZ-1:0` bits are writable. @@ -544,7 +548,7 @@ assume a valid but otherwise `UNSPECIFIED` value. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |4:0 |`LOG2SZ-1`|WARL a| The `LOG2SZ-1` field holds the number of entries in the fault-queue as a log-to-base-2 minus 1. A value of 0 indicates a queue of 2 @@ -590,7 +594,7 @@ software will fetch the next fault record. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute |Description +|Bits ^|Field |Attribute ^|Description |31:0 |`index` |WARL | Holds the `index` into the fault-queue from which software reads the next fault record. Only `LOG2SZ-1:0` bits are writable. @@ -614,7 +618,7 @@ IOMMU queues the next fault record. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |31:0 |`index` |RO | Holds the `index` into the fault-queue where IOMMU writes the next fault record. |=== @@ -648,7 +652,7 @@ assume a valid but otherwise `UNSPECIFIED` value. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |4:0 |`LOG2SZ-1`|WARL | The `LOG2SZ-1` field holds the number of entries in the page-request-queue as a log-to-base-2 minus 1. A value of 0 indicates a queue of 2 entries. @@ -695,7 +699,7 @@ software will fetch the next page-request. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |31:0 |`index` |WARL | Holds the `index` into the page-request-queue from which software reads the next "Page Request" message. Only `LOG2SZ-1:0` bits are writable. @@ -719,11 +723,13 @@ where the IOMMU writes the next page-request. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |31:0 |`index` |RO | Holds the `index` into the page-request-queue where IOMMU writes the next "Page Request" message. |=== +<<< + [[CSR]] === Command-queue CSR (`cqcsr`) @@ -752,7 +758,7 @@ status of the command-queue. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`cqen` |RW | The command-queue-enable bit enables the command- queue when set to 1. + + @@ -796,7 +802,7 @@ status of the command-queue. sets the `cmd_ill` bit and stops processing from the command-queue. To re-enable command processing software should clear this bit by writing 1. -|11 |`fence_w_ip`|RW1C | An IOMMU that supports only wire-signaled-interrupts +|11 |`fence_w_ip`|RW1C | An IOMMU that supports wire-signaled-interrupts sets the `fence_w_ip` bit to indicate completion of an `IOFENCE.C` command. To re-enable interrupts on `IOFENCE.C` completion, @@ -822,8 +828,8 @@ status of the command-queue. + An IOMMU that can complete these operations synchronously may hard-wire this bit to 0. -|27:18 |reserved |WPRI | Reserved for standard use -|31:28 |_custom_ |WPRI | _Designated for custom use._ +|27:18 |reserved |WPRI | Reserved for standard use. +|31:28 |custom |WPRI | Designated for custom use. |=== When `cmd_ill` or `cqmf` is 1 in `cqcsr`, the `cqh` references the command in the @@ -852,6 +858,8 @@ to wait for all previous commands to be committed, if so desired, before turning off the command-queue. ==== +<<< + [[FQCSR]] === Fault queue CSR (`fqcsr`) @@ -879,7 +887,7 @@ status of the fault-queue. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`fqen`|RW | The fault-queue enable bit enables the fault-queue when set to 1. + + @@ -932,8 +940,8 @@ status of the fault-queue. + An IOMMU that can complete controls synchronously may hard-wire this bit to 0. -|27:18 |reserved |WPRI | Reserved for standard use -|31:28 |_custom_ |WPRI | _Designated for custom use._ +|27:18 |reserved |WPRI | Reserved for standard use. +|31:28 |custom |WPRI | Designated for custom use. |=== [[PQCSR]] @@ -963,11 +971,11 @@ status of the page-request-queue. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`pqen` |RW | The page-request-enable bit enables the page-request-queue when set to 1. + + - Changing `pqen` from 0 to 1, sets the `pqh` + Changing `pqen` from 0 to 1, sets the `pqt` register and the `pqcsr` bits `pqmf` and `pqof` to 0. The page-request-queue may take some time to be active following setting the `pqen` to 1. @@ -1042,7 +1050,7 @@ status of the page-request-queue. An IOMMU that can complete controls synchronously may hard-wire this bit to 0 |27:18 |reserved |WPRI | Reserved for standard use -|31:28 |_custom_ |WPRI | _Designated for custom use._ +|31:28 |custom |WPRI | Designated for custom use. |=== [[IPSR]] @@ -1082,7 +1090,7 @@ interrupt-pending bit. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`cip` |RW1C a| The command-queue-interrupt-pending bit is set to 1 if `cqcsr.cie` is 1 and any of the following are true: @@ -1111,20 +1119,22 @@ interrupt-pending bit. * `pqcsr.pqmf` is 1. * A new message is produced in the PQ. -|7:4 |reserved |WPRI | Reserved for standard use -|15:8 |_custom_ |WPRI | _Designated for custom use._ +|7:4 |reserved |WPRI | Reserved for standard use. +|15:8 |custom |WPRI | Designated for custom use. |31:16 |reserved |WPRI | Reserved for standard use |=== If a bit in `ipsr` is 1 then a write of 1 to the bit transitions the bit from 1->0. -If the conditions to set that bit are still present (See <>) or if +If the conditions to set that bit are still present (See <>) or if they occur after the bit is cleared then that bit transitions again from 0->1. +<<< + [[OVF]] === Performance-monitoring counter overflow status (`iocountovf`) The performance-monitoring counter overflow status is a 32-bit read-only register that contains shadow copies of the OF bits in the `iohpmevt1-31` -registers - where `iocntovf` bit X corresponds to `iohpmevtX` and bit 0 +registers - where `iocountovf` bit X corresponds to `iohpmevtX` and bit 0 corresponds to the `OF` bit of `iohpmcycles`. This register enables overflow interrupt handler software to quickly and easily @@ -1144,7 +1154,7 @@ determine which counter(s) have overflowed. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`CY` |RO | Shadow of `iohpmcycles.OF` |31:1 |`HPM` |RO | Shadow of `iohpmevt[1-31].OF` |=== @@ -1169,7 +1179,7 @@ when set inhibits counting in `iohpmctrX` and bit 0 inhibits counting in [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`CY` |RW | When set, `iohpmcycles` counter is inhibited from counting. |31:1 |`HPM` |WARL | When bit X is set, then counting of events in @@ -1184,6 +1194,8 @@ inhibit all counters allows a) one or more counters to be atomically programmed with events to count b) one or more counters to be sampled atomically. ==== +<<< + [[CYC]] === Performance-monitoring cycles counter (`iohpmcycles`) This 64-bit register is a free running clock cycle counter. @@ -1202,7 +1214,7 @@ There is no associated `iohpmevt0`. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |62:0 |`counter`|WARL | Cycles counter value. |63 |`OF` |RW | Overflow |=== @@ -1236,7 +1248,7 @@ These registers are 64-bit WARL counter registers. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |63:0 |`counter`|WARL | Event counter value. |=== @@ -1272,7 +1284,7 @@ transactions with a range of IDs to be counted by the counter. [width=100%] [%header, cols="^1,2,^1,5"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |14:0 |`eventID` |WARL a| Indicates the event to count. A value of 0 indicates no events are counted. + Encodings 1 to 16383 are reserved for standard @@ -1443,7 +1455,7 @@ translation-request interface for debug. This register is present when [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description | 11:0 |reserved |WPRI | Reserved for standard use | 63:12 |`vpn` |WARL | The IOVA virtual page number |=== @@ -1475,7 +1487,7 @@ translation-request interface for debug. This register is present when [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description | 0 |`Go/Busy` |RW1S | This bit is set to indicate a valid request has been setup in the `tr_req_iova/tr_req_ctl` registers @@ -1504,8 +1516,8 @@ translation-request interface for debug. This register is present when for this translation request. If set to 0 then the `PID` field is not used and a `process_id` is not valid for this translation request. -| 35:33 |reserved |WPRI | Reserved for standard use -| 39:36 a|_custom_ |WPRI a| _Designated for custom use_ +| 35:33 |reserved |WPRI | Reserved for standard use. +| 39:36 a|custom |WPRI a| Designated for custom use. | 63:40 |`DID` |WARL | This field provides the `device_id` for this translation request. |=== @@ -1534,7 +1546,7 @@ This register is present when `capabilities.DBG == 1`. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`fault` |RO | If the process to translate the IOVA detects a fault then the `fault` field is set to 1. The detected fault may be reported through the @@ -1575,17 +1587,17 @@ This register is present when `capabilities.DBG == 1`. + .Example of encoding of super page size in `PPN` -[width=80%] +[%autowidth,float="center",align="center"] [%header, cols="3,^1,2"] !=== - ! `PPN` !`S`! Size + ^! `PPN` !`S` ^! Size !`yyyy....yyyy yyyy yyyy` !`0`! 4 KiB !`yyyy....yyyy yyyy 0111` !`1`! 64 KiB !`yyyy....yyy0 1111 1111` !`1`! 2 MiB !`yyyy....yy01 1111 1111` !`1`! 4 MiB !=== -|59:54 |reserved |RO | Reserved for standard use -|63:60 a|_custom_|RO a| _Designated for custom use_ +|59:54 |reserved |RO | Reserved for standard use. +|63:60 a|custom |RO a| Designated for custom use. |=== [NOTE] @@ -1597,6 +1609,45 @@ allowed to report a 4 KiB translation corresponding to the requested size configured in the page tables. ==== +[[IOQOSID]] +=== IOMMU QoS ID (`iommu_qosid`) + +The `iommu_qosid` register fields are defined as follows: + +.`iommu_qosid` register fields + +[wavedrom, , ] +.... +{reg: [ + {bits: 12, name: 'RCID'}, + {bits: 4, name: 'WPRI'}, + {bits: 12, name: 'MCID'}, + {bits: 4, name: 'WPRI'}, +], config:{lanes: 1, hspace:1024}} +.... + +[width=100%] +[%header, cols="^1,2,^1,5"] +|=== +|Bits ^|Field |Attribute ^| Description +|11:0 |`RCID` |WARL | `RCID` for IOMMU-initiated requests. +|15:12 |reserved |WPRI | Reserved for standard use. +|27:16 |`MCID` |WARL | `MCID` for IOMMU-initiated requests. +|31:28 |reserved |WPRI | Reserved for standard use. +|=== + +IOMMU-initiated requests for accessing the following data structures use the +value programmed in the `RCID` and `MCID` fields of the `iommu_qosid` register. + +* Device directory table (`DDT`) +* Fault queue (`FQ`) +* Command queue (`CQ`) +* Page-request queue (`PQ`) +* IOMMU-initiated MSI (Message-signaled interrupts) + +When `ddtp.iommu_mode == Bare`, all device-originated requests are +associated with the QoS IDs configured in the `iommu_qosid` register. + [[ICVEC]] === Interrupt-cause-to-vector register (`icvec`) @@ -1637,7 +1688,7 @@ supported then only bit 0 for each cause could be writable. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description | 3:0 |`civ` |WARL | The command-queue-interrupt-vector (`civ`) is the vector number assigned to the command-queue-interrupt. @@ -1650,8 +1701,8 @@ supported then only bit 0 for each cause could be writable. | 15:12 |`piv` |WARL | The page-request-queue-interrupt-vector (`piv`) is the vector number assigned to the page-request-queue-interrupt. -| 31:16 |reserved |WPRI | Reserved for standard use -| 63:32 |_custom_ |WPRI | _Designated for custom use_ +| 31:16 |reserved |WPRI | Reserved for standard use. +| 63:32 |custom |WPRI | Designated for custom use. |=== [[MSI]] @@ -1692,7 +1743,7 @@ and `iotval` set to the value of `msi_addr_x`. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute |Description +|Bits ^|Field |Attribute ^|Description |1:0 | 0 |RO |Fixed to 0 |55:2 |`ADDR`|WARL |Holds the 4-byte aligned MSI address. |63:56 |reserved|WPRI | Reserved for standard use. @@ -1710,7 +1761,7 @@ and `iotval` set to the value of `msi_addr_x`. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute |Description +|Bits ^|Field |Attribute ^|Description |31:0 |`data`| WARL | Holds the MSI data |=== @@ -1726,7 +1777,7 @@ and `iotval` set to the value of `msi_addr_x`. [width=100%] [%header, cols="^1,1,^1,6"] |=== -|Bits |Field |Attribute | Description +|Bits ^|Field |Attribute ^| Description |0 |`M` |RW | When the mask bit `M` is 1, the corresponding interrupt vector is masked and the IOMMU is prohibited from sending the associated message. diff --git a/src/iommu_sw_guidelines.adoc b/src/iommu_sw_guidelines.adoc index 8585bb1e..b3191277 100644 --- a/src/iommu_sw_guidelines.adoc +++ b/src/iommu_sw_guidelines.adoc @@ -144,7 +144,6 @@ If software changes a leaf-level DDT entry (i.e, a device context (`DC`), of device with `device_id = D`) then the following invalidations must be performed: * `IODIR.INVAL_DDT` with `DV=1` and `DID=D` -* If `DC.tc.PDTV==1` then `IODIR.INVAL_PDT` with `DV=1`, `PV=0`, and `DID=D` * If `DC.iohgatp.MODE != Bare` ** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC.iohgatp.GSCID` @@ -170,7 +169,7 @@ If software changes a leaf-level PDT entry (i.e, a process context (`PC`), for `device_id=D` and `process_id=P`) then the following invalidations must be performed: -* `IODIR.INVAL_PDT` with `DV=1`, `PV=1`, `DID=D` and `PID=P` +* `IODIR.INVAL_PDT` with `DV=1`, `DID=D` and `PID=P` * If `DC.iohgatp.MODE != Bare` ** `IOTINVAL.VMA` with `GV=1`, `AV=0`, `PV=1`, `GSCID=DC.iohgatp.GSCID`, and `PSCID=PC.PSCID` @@ -245,6 +244,8 @@ Between a change to the first-stage PTE and when an invalidation command to invalidate the cached PTE is processed by the IOMMU, the IOMMU may use the old PTE value or the new PTE value. +<<< + ==== Accessed (A)/Dirty (D) bit updates and page promotions When IOMMU supports hardware-managed A and D bit updates, if software clears @@ -302,16 +303,16 @@ the DevATC may be satisfied by the IOMMU from the IOATC, to ensure correct operation software must first invalidate the IOATC before sending invalidations to the DevATC. +<<< + ==== Caching invalid entries This specification does not allow the caching of first/second-stage PTEs whose `V` (valid) bit is clear, non-leaf DDT entries whose `V` (valid) bit is clear, Device-context whose `V` (valid) bit is clear, non-leaf PDT entries whose `V` (valid) bit is clear, Process-context whose `V` (valid) bit is clear, or MSI -PTEs whose `V` bit is clear. - -Software need not perform invalidations when changing the `V` bit in these -entries from 0 to 1. +PTEs whose `V` bit is clear. Software need not perform invalidations when +changing the `V` bit in these entries from 0 to 1. === Reconfiguring PMAs diff --git a/src/riscv-iommu.adoc b/src/riscv-iommu.adoc index c0855744..bc20bd97 100644 --- a/src/riscv-iommu.adoc +++ b/src/riscv-iommu.adoc @@ -21,8 +21,9 @@ include::../docs-resources/global-config.adoc[] :lang: en :listing-caption: Listing :sectnums: +:sectnumlevels: 5 :toc: left -:toclevels: 4 +:toclevels: 5 :source-highlighter: pygments ifdef::backend-pdf[] :source-highlighter: coderay @@ -34,17 +35,17 @@ endif::[] :xrefstyle: short // Preamble - Begin -[preface] -== List of figures -list-of::image[hide_empty_section=true, enhanced_rendering=true] +//[preface] +//== List of figures +//list-of::image[hide_empty_section=true, enhanced_rendering=true] -[preface] -== List of tables -list-of::table[hide_empty_section=true, enhanced_rendering=true] +//[preface] +//== List of tables +//list-of::table[hide_empty_section=true, enhanced_rendering=true] -[preface] -== List of listings -list-of::listing[hide_empty_section=true, enhanced_rendering=true] +//[preface] +//== List of listings +//list-of::listing[hide_empty_section=true, enhanced_rendering=true] [WARNING] .This document is link:http://riscv.org/spec-state[Ratified]. @@ -67,6 +68,7 @@ include::contributors.adoc[] // Preamble - End :imagesdir: images +include::iommu_preface.adoc[] include::iommu_intro.adoc[] include::iommu_data_structures.adoc[] include::iommu_in_memory_queues.adoc[] @@ -74,4 +76,5 @@ include::iommu_debug.adoc[] include::iommu_registers.adoc[] include::iommu_sw_guidelines.adoc[] include::iommu_hw_guidelines.adoc[] +include::iommu_extensions.adoc[] include::bibliography.adoc[] From 8b6491199ce63884faaa232a4a8e0f2e1c411873 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Tue, 15 Aug 2023 12:56:42 -0500 Subject: [PATCH 02/17] Clarified that translations cached in IOMMU ATC do not require explicit invalidation when the IOMMU operates in Bare mode. --- src/iommu_in_memory_queues.adoc | 4 ++++ src/iommu_sw_guidelines.adoc | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/iommu_in_memory_queues.adoc b/src/iommu_in_memory_queues.adoc index 074868af..5778ea8e 100644 --- a/src/iommu_in_memory_queues.adoc +++ b/src/iommu_in_memory_queues.adoc @@ -283,6 +283,10 @@ Some implementations may cache an identity-mapped translation for the stage of address translation operating in `Bare` mode. Since these identity mappings are invariably correct, an explicit invalidation is unnecessary. +Some implementations may cache an identity-mapped translation for the stage of +address translation operating in `Bare` mode. Since these identity mappings +are invariably correct, an explicit invalidation is unnecessary. + A consequence of this specification is that an implementation may use any translation for an address that was valid at any time since the most recent `IOTINVAL` that subsumes that address. In particular, if a leaf PTE is diff --git a/src/iommu_sw_guidelines.adoc b/src/iommu_sw_guidelines.adoc index b3191277..b6398a39 100644 --- a/src/iommu_sw_guidelines.adoc +++ b/src/iommu_sw_guidelines.adoc @@ -149,9 +149,9 @@ device with `device_id = D`) then the following invalidations must be performed: ** `IOTINVAL.VMA` with `GV=1`, `AV=PSCV=0`, and `GSCID=DC.iohgatp.GSCID` ** `IOTINVAL.GVMA` with `GV=1`, `AV=0`, and `GSCID=DC.iohgatp.GSCID` * else -** If `DC.tc.PDTV==1 || DC.tc.PDTV == 0 && DC.fsc.MODE == Bare` +** If `DC.tc.PDTV==1` *** `IOTINVAL.VMA` with `GV=AV=PSCV=0` -** else +** else if `DC.fsc.MODE != Bare` *** `IOTINVAL.VMA` with `GV=AV=0` and `PSCV=1`, and `PSCID=DC.ta.PSCID` If software changes a non-leaf-level DDT entry the following invalidations From 5b3a4801e5c0e2579f0a171e9a735235576e5750 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Mon, 28 Aug 2023 11:28:49 -0500 Subject: [PATCH 03/17] Clarified that memory faults encountered by commands also set the `cqmf` flag. --- src/iommu_in_memory_queues.adoc | 1 - src/iommu_registers.adoc | 11 ++++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/iommu_in_memory_queues.adoc b/src/iommu_in_memory_queues.adoc index 5778ea8e..4a0ab4fc 100644 --- a/src/iommu_in_memory_queues.adoc +++ b/src/iommu_in_memory_queues.adoc @@ -403,7 +403,6 @@ write encounters a memory fault, the `cmd_mf` bit in `cqcsr` <> is set to signal this condition, and the `cqh` holds the index of the `IOFENCE.C` that encountered such a memory fault and did not complete. - [NOTE] ==== Software may configure the `ADDR[63:2]` command operand to specify the address diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 01876826..6492cb94 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -785,11 +785,12 @@ status of the command-queue. generation of interrupts from command-queue when set to 1. |7:2 |reserved|WPRI | Reserved for standard use -|8 |`cqmf` |RW1C | If command-queue access leads to a memory fault then - the command-queue-memory-fault bit is set to 1 and - the command-queue stalls until this bit is cleared. - To re-enable command processing, software should - clear this bit by writing 1. +|8 |`cqmf` |RW1C | If command-queue access to fetch a command or a + memory access made by a command leads to a memory + fault, then the command-queue-memory-fault bit is set + to 1, and the command-queue stalls until this bit is + cleared. To re-enable command processing, software + should clear this bit by writing 1. |9 |`cmd_to`|RW1C | If the execution of a command leads to a timeout (e.g. a command to invalidate device ATC may timeout waiting for a completion), then the From 89c53f507790cb89bcdb1ebae98884d4929a4d36 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Fri, 8 Sep 2023 11:52:50 -0500 Subject: [PATCH 04/17] Clarified that values tested by the algorithm in the SW Guidelines section are those before any modifications made by the algorithm. --- src/iommu_sw_guidelines.adoc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/iommu_sw_guidelines.adoc b/src/iommu_sw_guidelines.adoc index b6398a39..cfb7e6fb 100644 --- a/src/iommu_sw_guidelines.adoc +++ b/src/iommu_sw_guidelines.adoc @@ -138,6 +138,10 @@ previous read and/or write requests, that have already been processed by the IOMMU, be committed to a global ordering point as part of the `IOFENCE.C` command. +In subsequent sections, when an algorithm step tests values in the in-memory +data structures to determine the type of invalidation operation to perform, the +data values tested are the old values i.e. values before a change is made. + [[DC_CHANGE]] ==== Changing device directory table entry If software changes a leaf-level DDT entry (i.e, a device context (`DC`), of From 51fb75257a2c82607e55e430283ef2ef57122f62 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Fri, 8 Sep 2023 11:53:40 -0500 Subject: [PATCH 05/17] Included SW guidelines for modifying non-leaf PDT entries. --- src/iommu_sw_guidelines.adoc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/iommu_sw_guidelines.adoc b/src/iommu_sw_guidelines.adoc index cfb7e6fb..d646d99c 100644 --- a/src/iommu_sw_guidelines.adoc +++ b/src/iommu_sw_guidelines.adoc @@ -180,6 +180,11 @@ performed: * else ** `IOTINVAL.VMA` with `GV=0`, `AV=0`, `PV=1`, and `PSCID=PC.PSCID` +If software changes a non-leaf-level PDT entry the following invalidations +must be performed: + +* `IODIR.INVAL_DDT` with `DV=1` and `DID=D` + Between a change to the PDT entry and when an invalidation command to invalidate the cached entry is processed by the IOMMU, the IOMMU may use the old value or the new value of the entry. From f8b00bf7b885f79339626eaa7ca5d4fd0d3ede2e Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Tue, 19 Sep 2023 11:29:27 -0500 Subject: [PATCH 06/17] Clarified the behavior for in-flight transactions observed at the time of `ddtp` write operations. --- src/iommu_registers.adoc | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 6492cb94..69c2c932 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -396,11 +396,13 @@ subset of directory-table levels and device-context widths. At a minimum one of the modes must be supported. When the `iommu_mode` field value is changed to `Off` the IOMMU guarantees that -in-flight transactions from devices connected to the IOMMU will be processed -with the configurations applicable to the old value of the `iommu_mode` field -and that all transactions and previous requests from devices that have already -been processed by the IOMMU be committed to a global ordering point such that -they can be observed by all RISC-V harts, devices, and IOMMUs in the platform. +in-flight transactions, observed at the time of the write to this field, from devices +connected to the IOMMU will either be processed with the configurations +applicable to the old value of the `iommu_mode` field or be aborted +(<>). It also ensures that all transactions and previous +requests from devices that have already been processed by the IOMMU are committed +to a global ordering point such that they can be observed by all RISC-V harts, +devices, and IOMMUs in the platform. The IOMMU behavior of writing `iommu_mode` to `1LVL`, `2LVL`, or `3LVL`, when the previous value of the `iommu_mode` is not `Off` or `Bare` is `UNSPECIFIED`. From cd6c7983a624515c8c878f9d24de89d632c50ca9 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Sat, 24 Feb 2024 16:37:30 -0600 Subject: [PATCH 07/17] Clarified the behavior when `IOTINVAL` is invoked with an invalid address. --- src/iommu_in_memory_queues.adoc | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/iommu_in_memory_queues.adoc b/src/iommu_in_memory_queues.adoc index 4a0ab4fc..7c8aaa66 100644 --- a/src/iommu_in_memory_queues.adoc +++ b/src/iommu_in_memory_queues.adoc @@ -181,7 +181,17 @@ operand is valid. Setting `PSCV` to 1 is allowed only for `IOTINVAL.VMA`. The the translations associated with the host (i.e. those where the second-stage is Bare) are operated on. When `GV` is 0, the `GSCID` operand is ignored. When `AV` is 0, the `ADDR` operand is ignored. When `PSCV` operand is 0, the -`PSCID` operand is ignored. +`PSCID` operand is ignored. When the `AV` operand is set to 1, if the `ADDR` +operand specifies an invalid address, the command may or may not perform any +invalidations. + +[NOTE] +==== +When an invalid address is specified, an implementation may either complete the +command with no effect or may complete the command using an alternate, yet +`UNSPECIFIED`, legal value for the address. Note that entries may generally be +invalidated from the address translation cache at any time. +==== <<< From c343c0491a0cfe279a4ebd1b2437badef9cb6f3c Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Tue, 9 Apr 2024 09:15:57 -0500 Subject: [PATCH 08/17] Stated that faults leading to UR/CA ATS responses are reported in the Fault Queue. --- src/iommu_data_structures.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index 765c993d..8ac089cc 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -1271,7 +1271,9 @@ requires read permission to be granted if the execute permission is granted. When a Success response is generated for an ATS translation request, no fault records are reported to software through the fault/event reporting mechanism, even when the response indicates no access was granted or some permissions were -denied. +denied. Conversely, when a UR or CA response is generated for an ATS translation +request, the corresponding fault is reported to software through the fault/event +reporting mechanism. If the translation request has an address determined to be an MSI address using the rules defined by the <> but the MSI PTE is configured in MRIF From 02fee6d59aa7e11bc522e56308ed3c4132191276 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Tue, 9 Apr 2024 09:16:35 -0500 Subject: [PATCH 09/17] Added a detailed description of the `capabilities.PAS` field. --- src/iommu_registers.adoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 69c2c932..4e4fb58e 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -238,6 +238,9 @@ must be supported. IOMMU implementations must support the Svnapot standard extension for NAPOT Translation Contiguity. +The physical address space addressable by the IOMMU ranges from 0 to +stem:[2^{capabilities.PAS} - 1]. + [NOTE] ==== Hypervisor may provide an SW emulated IOMMU to allow the guest to manage From 61a352729d174b9417ca8ef9b602f4fed8434036 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Wed, 17 Apr 2024 13:08:19 -0500 Subject: [PATCH 10/17] Included software guidelines for changing IOMMU modes and provided RV32-specific guidelines for programming tr_req_ctl and HPM counters. --- src/iommu_registers.adoc | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 4e4fb58e..deec4a48 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -405,12 +405,14 @@ applicable to the old value of the `iommu_mode` field or be aborted (<>). It also ensures that all transactions and previous requests from devices that have already been processed by the IOMMU are committed to a global ordering point such that they can be observed by all RISC-V harts, -devices, and IOMMUs in the platform. +devices, and IOMMUs in the platform. Software must not change the `PPN` field +value when transitioning the `iommu_mode` to `Off`. The IOMMU behavior of writing `iommu_mode` to `1LVL`, `2LVL`, or `3LVL`, when the previous value of the `iommu_mode` is not `Off` or `Bare` is `UNSPECIFIED`. To change DDT levels, the IOMMU must first be transitioned to `Bare` or `Off` -state. +state. The behavior resulting from changing the `iommu_mode` to `Bare` when the +previous value of the `iommu_mode` was not `Off` is `UNSPECIFIED`. When an IOMMU is transitioned to `Bare` or `Off` state, the IOMMU may retain information cached from in-memory data structures such as page tables, DDT, @@ -1198,6 +1200,11 @@ When the `iohpmcycles` counter is not needed, it is desirable to conditionally inhibit it to reduce energy consumption. Providing a single register to inhibit all counters allows a) one or more counters to be atomically programmed with events to count b) one or more counters to be sampled atomically. + +To initialize an event counter or the cycles counter to a desired value, it +should be first inhibited if it is enabled to count. This measure ensures that +it does not count during the update process. The inhibition should be removed +after the register has been programmed with the desired value. ==== <<< @@ -1528,6 +1535,11 @@ translation-request interface for debug. This register is present when this translation request. |=== +[NOTE] +==== +In RV32, the high half of the register should be written first, followed by the +low half, which includes the `Go/Busy` bit, to initiate a translation. +==== [[TRR_RSP]] === Translation-response (`tr_response`) From 628db6084eb24682263c8df21a9b51cfb8a38505 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Fri, 19 Apr 2024 06:33:56 -0500 Subject: [PATCH 11/17] Stated that the PCIe specification requires granting execute permission in translation responses only if explicitly requested. --- src/iommu_data_structures.adoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index 8ac089cc..84898fce 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -1260,13 +1260,14 @@ process-context is 0 then a Success response with R and W bits set to 0 is generated. If the translation could be successfully completed but the requested -permissions are not present (Execute requested but no execute permission; +permissions are not present in either stage (Execute requested but no execute permission; no-write not requested and no write permission; no read permission) then a Success response is returned with the denied permission (R, W or X) set to 0 and the other permission bits set to the value determined from the page tables. The X permission is granted only if the R permission is also -granted. Execute-only translations are not compatible with PCIe ATS as PCIe -requires read permission to be granted if the execute permission is granted. +granted and the execute permission was requested. Execute-only translations are +not compatible with PCIe ATS as PCIe requires read permission to be granted +if the execute permission is granted. When a Success response is generated for an ATS translation request, no fault records are reported to software through the fault/event reporting mechanism, From 371eb3d32aa574bc019b12755f8c5700c08a2c16 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Fri, 19 Apr 2024 08:48:20 -0500 Subject: [PATCH 12/17] Clarified the handling of hardware implementations that internally split 8-byte transactions. --- src/iommu_registers.adoc | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index deec4a48..0b643ee6 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -9,15 +9,15 @@ the size of the access, or if the access spans multiple registers, or if the size of the access is not 4 bytes or 8 bytes, is `UNSPECIFIED`. A 4 byte access to an IOMMU register must be single-copy atomic. Whether an 8 byte access to an IOMMU register is single-copy atomic is `UNSPECIFIED`, and such an access may -appear, internally to the IOMMU, as if two separate 4 byte accesses were -performed. +appear, internally to the IOMMU, as if two separate 4 byte accesses -- first to +the high half and second to the low half -- were performed. [NOTE] ==== The 8-byte IOMMU registers are defined in such a way that software can perform two individual 4-byte accesses, or hardware can perform two independent 4-byte transactions resulting from an 8-byte access, to the high and low halves of the -register, as long as the register semantics, with regard to side-effects, are +register, in that order, as long as the register semantics, with regard to side-effects, are respected between the two software accesses, or two hardware transactions, respectively. ==== @@ -363,10 +363,10 @@ are enabled (i.e. `cqcsr.cqon/cqen == 1`, `fqcsr.fqon/cqen == 1`, or !5-13 ! reserved ! Reserved for standard use. !14-15 ! custom ! Designated for custom use. !=== -|4 |`busy` |RO | A write to `ddtp` may require the IOMMU to +|4 |`busy` |RO | A write to `ddtp.iommu_mode` may require the IOMMU to perform many operations that may not occur synchronously to the write. When a write is - observed by the `ddtp`, the `busy` bit is set + observed by the `ddtp.iommu_mode`, the `busy` bit is set to 1. When the `busy` bit is 1, behavior of additional writes to the `ddtp` is `UNSPECIFIED`. Some implementations @@ -377,7 +377,7 @@ are enabled (i.e. `cqcsr.cqon/cqen == 1`, `fqcsr.fqon/cqen == 1`, or + If the `busy` bit reads 0 then the IOMMU has completed the operations associated with the - previous write to `ddtp`. + + previous write to `ddtp.iommu_mode`. + + An IOMMU that can complete these operations synchronously may hard-wire this bit to 0. From 1e3e13efebec01562ed52b788d5eea77a592d6e0 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Sun, 21 Apr 2024 12:23:53 -0500 Subject: [PATCH 13/17] Noted that shadow stack encodings introduced by Zicfiss are reserved and not usable for IOMMU use. --- src/iommu_data_structures.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index 84898fce..43a9ff0f 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -1003,7 +1003,9 @@ The process to translate an `IOVA` is as follows: . Translation process is complete When checking the `U` bit in a second-stage PTE, the transaction is treated as -not requesting supervisor privilege. +not requesting supervisor privilege. The `pte.xwr=010` encoding, as specified by +the Zicfiss cite:[CFI] extension for the Shadow Stack page type in single-stage +and VS-stage page tables, remains a reserved encoding for IO transactions. When the translation process reports a fault, and the request is an Untranslated request or a Translated request, the IOMMU requests the IO bridge to abort the From 29147aaab0c873f579c86bcaeb602a388b9c4e61 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Thu, 25 Apr 2024 10:26:58 -0500 Subject: [PATCH 14/17] Listed the fault codes reported for faults detected by Page Request. --- src/iommu_data_structures.adoc | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index 43a9ff0f..4e6dea67 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -1365,13 +1365,14 @@ of "Page Request". a "Page Request Group Response" message to the device. When the IOMMU generates the response, the status field of the response depends -on the cause of the error. +on the cause of the error. If a fault condition prevents locating a valid device +context then the `PRPR` value assumed is 0. <<< The status is set to Response Failure if the following faults are encountered: -* `ddtp.iommu_mode` is `Off` +* `ddtp.iommu_mode` is `Off` (cause = 256) * DDT entry load access fault (cause = 257) * DDT entry misconfigured (cause = 259) * DDT entry not valid (cause = 258) @@ -1380,8 +1381,8 @@ The status is set to Response Failure if the following faults are encountered: The status is set to Invalid Request if the following faults are encountered: -* `ddtp.iommu_mode` is `Bare` -* `EN_PRI` is set to 0 +* `ddtp.iommu_mode` is `Bare` (cause = 260) +* `EN_PRI` is set to 0 (cause = 260) The status is set to Success if no other faults were encountered but the "Page Request" could not be queued due to the page-request queue being full From 173d29ea34eca4679cc1c801f4dbbb246604abec Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Sat, 1 Jun 2024 15:42:31 -0500 Subject: [PATCH 15/17] Updated Fig 31 to remove the unused Destination ID field for ATS.PRGR --- src/iommu_in_memory_queues.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/iommu_in_memory_queues.adoc b/src/iommu_in_memory_queues.adoc index 7c8aaa66..a3a2eadc 100644 --- a/src/iommu_in_memory_queues.adoc +++ b/src/iommu_in_memory_queues.adoc @@ -574,7 +574,7 @@ formatted as follows: {bits: 9, name: 'Page Request Group Index'}, {bits: 3, name: '0'}, {bits: 4, name: 'Response Code'}, - {bits: 16, name: 'Destination ID'}, + {bits: 16, name: '0'}, ], config:{lanes: 2, hspace:1024, fontsize:12}} .... From efea91a5a9d1fe83280ca87e751cd30fff17ffb3 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Wed, 12 Jun 2024 07:51:07 -0500 Subject: [PATCH 16/17] Included a software guideline for IOMMU emulation. --- src/iommu_preface.adoc | 1 + src/iommu_sw_guidelines.adoc | 12 ++++++++++++ 2 files changed, 13 insertions(+) diff --git a/src/iommu_preface.adoc b/src/iommu_preface.adoc index 3df69ca4..4efe8945 100644 --- a/src/iommu_preface.adoc +++ b/src/iommu_preface.adoc @@ -40,6 +40,7 @@ and corrections, have been made since version 1.0.0: * Shadow stack encodings introduced by Zicfiss are reserved for IOMMU use. * Listed the fault codes reported for faults detected by Page Request. * Updated Fig 31 to remove the unused Destination ID field for ATS.PRGR +* Included a software guideline for IOMMU emulation. These changes were made through PR#243 cite:[PR243]. diff --git a/src/iommu_sw_guidelines.adoc b/src/iommu_sw_guidelines.adoc index d646d99c..be9b2c69 100644 --- a/src/iommu_sw_guidelines.adoc +++ b/src/iommu_sw_guidelines.adoc @@ -323,6 +323,18 @@ Device-context whose `V` (valid) bit is clear, non-leaf PDT entries whose `V` PTEs whose `V` bit is clear. Software need not perform invalidations when changing the `V` bit in these entries from 0 to 1. +==== Guidelines for emulating an IOMMU + +Certain uses may involve emulating a RISC-V IOMMU. In such cases, the emulator +may require the IOMMU driver to notify the emulator for efficient operation when +updates are made to in-memory data structure entries, including when making such +entries valid. Queueing an appropriate invalidation command when making such +updates is a common way to provide notifications to the emulator. While usually +an invalidation is not required when marking an invalid entry as valid, the +emulator may indicate the need to invoke such invalidation commands for +emulation efficiency purposes through a suitable flag in the device tree or ACPI +table describing such emulated IOMMU instances. + === Reconfiguring PMAs Where platforms support dynamic reconfiguration of PMAs, a machine-mode driver From 0076b5f14cd15fbb9e6e39880543b910cc2931b8 Mon Sep 17 00:00:00 2001 From: Ved Shanbhogue Date: Sat, 13 Jul 2024 16:01:12 -0500 Subject: [PATCH 17/17] Include QoS ID standard extension --- src/iommu.bib | 40 +++++++++++++++ src/iommu_data_structures.adoc | 22 +++++++- src/iommu_extensions.adoc | 92 +++++++++++++++++++++++++++++++++- src/iommu_registers.adoc | 11 ++-- 4 files changed, 159 insertions(+), 6 deletions(-) diff --git a/src/iommu.bib b/src/iommu.bib index 49ef1d72..291ad131 100644 --- a/src/iommu.bib +++ b/src/iommu.bib @@ -25,3 +25,43 @@ @electronic{PR243 title = {Clarification updates to IOMMU v1.0.0}, url = {https://github.com/riscv-non-isa/riscv-iommu/pull/243/commits} } +@electronic{CBQRI, + title = {RISC-V Capacity and Bandwidth QoS Register Interface}, + url = {https://github.com/riscv-non-isa/riscv-cbqri} +} +@article{PTCAMP, + author = {Du Bois, Kristof and Eyerman, Stijn and Eeckhout, Lieven}, + title = {Per-Thread Cycle Accounting in Multicore Processors}, + year = {2013}, + issue_date = {January 2013}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + volume = {9}, + number = {4}, + issn = {1544-3566}, + url = {https://doi.org/10.1145/2400682.2400688}, + doi = {10.1145/2400682.2400688}, + journal = {ACM Trans. Archit. Code Optim.}, + month = {jan}, + articleno = {29}, + numpages = {22}, +} +@inproceedings{HERACLES, + author = {Lo, David and Cheng, Liqun and Govindaraju, Rama and Ranganathan, Parthasarathy and Kozyrakis, Christos}, + title = {Heracles: Improving Resource Efficiency at Scale}, + year = {2015}, + isbn = {9781450334020}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + url = {https://doi.org/10.1145/2749469.2749475}, + doi = {10.1145/2749469.2749475}, + booktitle = {Proceedings of the 42nd Annual International Symposium on Computer Architecture}, + pages = {450–462}, + numpages = {13}, + location = {Portland, Oregon}, + series = {ISCA '15} +} +@electronic{SSQOSID, + title = {RISC-V Quality-of-Service (QoS) Identifiers}, + url = {https://github.com/riscv/riscv-ssqosid} +} diff --git a/src/iommu_data_structures.adoc b/src/iommu_data_structures.adoc index 4e6dea67..9fcc1f2a 100644 --- a/src/iommu_data_structures.adoc +++ b/src/iommu_data_structures.adoc @@ -484,6 +484,7 @@ the PTEs from the first page table or the second page table. These are the only expected behaviors. ==== +[[DC_TA]] ===== Translation attributes (`ta`) .Translation attributes (`ta`) field @@ -492,7 +493,9 @@ expected behaviors. {reg: [ {bits: 12, name: 'reserved'}, {bits: 20, name: 'PSCID'}, - {bits: 32, name: 'reserved'}, + {bits: 8, name: 'reserved'}, + {bits: 12, name: 'RCID'}, + {bits: 12, name: 'MCID'}, ], config:{lanes: 2, hspace: 1024, fontsize: 16}} .... @@ -502,6 +505,21 @@ fences on a per-address-space basis. The `PSCID` field in `ta` is used as the address-space ID if `DC.tc.PDTV` is 0 and the `iosatp.MODE` field is not `Bare`. When `DC.tc.PDTV` is 1, the `PSCID` field in `ta` is ignored. +The `RCID` and `MCID` fields are added by the QoS ID extension. If +`capabilities.QOSID` is 0, these bits are reserved and must be set to 0. +IOMMU-initiated requests for accessing the following data structures use the +value configured in the `RCID` and `MCID` fields of `DC.ta`. + +* Process directory table (`PDT`) +* Second-stage page table +* First-stage page table +* MSI page table +* Memory-resident interrupt file (`MRIF`) + +The `RCID` and `MCID` configured in `DC.ta` are provided to the IO bridge on +successful address translations. The IO bridge should associate these QoS IDs +with device-initiated requests. + ===== First-Stage context (`fsc`) If `DC.tc.PDTV` is 0, the `DC.fsc` field holds the `iosatp` that provides the controls for first-stage address translation and protection. @@ -716,6 +734,8 @@ misconfigured" (cause = 259). . `DC.tc.SBE` value is not a legal value. If `fctl.BE` is writable then `DC.tc.SBE` may be 0 or 1. If `fctl.BE` is not writable then `DC.tc.SBE` must be the same as `fctl.BE`. +. `capabilities.QOSID` is 1 and `DC.ta.RCID` or `DC.ta.MCID` values + are wider than that supported by the IOMMU. [NOTE] ==== diff --git a/src/iommu_extensions.adoc b/src/iommu_extensions.adoc index 5332a466..301d3b84 100644 --- a/src/iommu_extensions.adoc +++ b/src/iommu_extensions.adoc @@ -2,4 +2,94 @@ == IOMMU Extensions -This chapter specifies the standard extensions to the IOMMU Base Architecture. +This chapter specifies the following standard extensions to the IOMMU Base +Architecture: + +[%autowidth,float="center",align="center",cols="^,^,^",options="header",] +|=== +| Specification |Version |Status +| <> + |*1.0* + |*Ratified* +|=== + +[[QOSID]] +=== Quality-of-Service (QoS) Identifiers Extension, Version 1.0 + +Quality of Service (QoS) is defined as the minimal end-to-end performance +guaranteed in advance by a service level agreement (SLA) to a workload. +Performance metrics might include measures such as instructions per cycle (IPC), +latency of service, etc. + +When multiple workloads execute concurrently on modern processors -- equipped +with large core counts, multiple cache hierarchies, and multiple memory +controllers -- the performance of any given workload becomes less +deterministic, or even non-deterministic, due to shared resource contention +cite:[PTCAMP]. + +To manage performance variability, system software needs resource allocation +and monitoring capabilities. These capabilities allow for the reservation of +resources like cache and bandwidth, thus meeting individual performance targets +while minimizing interference cite:[HERACLES]. For resource management, hardware +should provide monitoring features that allow system software to profile +workload resource consumption and allocate resources accordingly. + +To facilitate this, the QoS Identifiers ISA extension (Ssqosid) cite:[SSQOSID] +introduces the `srmcfg` register, which configures a hart with two identifiers: +a Resource Control ID (`RCID`) and a Monitoring Counter ID (`MCID`). These +identifiers accompany each request issued by the hart to shared resource +controllers. + +These identifiers are crucial for the RISC-V Capacity and Bandwidth Controller +QoS Register Interface cite:[CBQRI], which provides methods for setting resource +usage limits and monitoring resource consumption. The `RCID` controls resource +allocations, while the `MCID` is used for tracking resource usage. + +The IOMMU QoS ID extension provides a method to associate QoS IDs with requests +to access resources by the IOMMU, as well as with devices governed by it. This +complements the Ssqosid extension that provides a method to associate QoS IDs +with requests originated by the RISC-V harts. Assocating QoS IDs with device +and IOMMU originated requests is required for effective monitoring and +allocation of shared resources. + +The IOMMU `capabilities` register (<>) is extended with a `QOSID` field +which enumerates support for associating QoS IDs with requests made through the +IOMMU. When `capabilities.QOSID` is 1, the memory-mapped register layout is +extended to add a register named `iommu_qosid` (<>). This register is +used to configure the Quality of Service (QoS) IDs associated with +IOMMU-originated requests. The `ta` field of the device context (<>) is +extended with two fields, `RCID` and `MCID`, to configure the QoS IDs to +associate with requests originated by the devices. + +==== Reset Behavior + +If the reset value for `ddtp.iommu_mode` field is `Bare`, then the +`iommu_qosid.RCID` field must have a reset value of 0. + +[NOTE] +==== +At reset, it is required that the `RCID` field of `iommu_qosid` is set to 0 if +the IOMMU is in `Bare` mode, as typically the resource controllers in the +SoC default to a reset behavior of associating all capacity or bandwidth to the +`RCID` value of 0. When the reset value of the `ddtp.iommu_mode` is not `Bare`, +the `iommu_qosid` register should be initialized by software before changing +the mode to allow DMA. +==== + +==== Sizing QoS Identifiers + +The size (or width) of `RCID` and `MCID`, as fields in registers or in data +structures, supported by the IOMMU must be at least as large as that supported +by any RISC-V application processor hart in the system. + +==== IOMMU ATC Capacity Allocation and Monitoring + +Some IOMMUs might support capacity allocation and usage monitoring in the IOMMU +address translation cache (IOATC) by implementing the capacity controller +register interface. + +Additionally, some IOMMUs might support multiple IOATCs, each potentially having +different capacities. In scenarios where multiple IOATCs are implemented, such +as an IOATC for each supported page size, the IOMMU can implement a capacity +controller register interface for each IOATC to facilitate individual capacity +allocation. diff --git a/src/iommu_registers.adoc b/src/iommu_registers.adoc index 0b643ee6..986c6406 100644 --- a/src/iommu_registers.adoc +++ b/src/iommu_registers.adoc @@ -81,7 +81,8 @@ the register returns 0 and writes to that offset are ignored. control>> | if `capabilities.DBG==0` |616 |`tr_response` |8 |<> | if `capabilities.DBG==0` -|624 |Reserved |64 |Reserved for future use +|624 |`iommu_qosid` |4 |<> | if `capabilities.QOSID==0` +|628 |Reserved |60 |Reserved for future use (`WPRI`) | |688 |custom |72 |Designated for custom use (`WARL`) | @@ -153,7 +154,8 @@ the IOMMU. At reset, the register shall contain the IOMMU supported features. {bits: 1, name: 'PD8'}, {bits: 1, name: 'PD17'}, {bits: 1, name: 'PD20'}, - {bits: 15, name: 'reserved'}, + {bits: 1, name: 'QOSID'}, + {bits: 14, name: 'reserved'}, {bits: 8, name: 'custom'}, ], config:{lanes: 8, hspace:1024}} .... @@ -223,7 +225,8 @@ the IOMMU. At reset, the register shall contain the IOMMU supported features. |38 |`PD8` |RO | One level PDT with 8-bit process_id supported. |39 |`PD17` |RO | Two level PDT with 17-bit process_id supported. |40 |`PD20` |RO | Three level PDT with 20-bit process_id supported. -|55:41 | reserved |RO | Reserved for standard use. +|41 |`QOSID` |RO | Associating QoS IDs with requests is supported. +|55:42 | reserved |RO | Reserved for standard use. |63:56 |custom |RO | Designated for custom use. |=== @@ -1647,7 +1650,7 @@ The `iommu_qosid` register fields are defined as follows: [width=100%] [%header, cols="^1,2,^1,5"] |=== -|Bits ^|Field |Attribute ^| Description +|Bits |Field |Attribute | Description |11:0 |`RCID` |WARL | `RCID` for IOMMU-initiated requests. |15:12 |reserved |WPRI | Reserved for standard use. |27:16 |`MCID` |WARL | `MCID` for IOMMU-initiated requests.