Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on chapter 2 and chapter 3 #79

Open
ved-rivos opened this issue Nov 19, 2024 · 44 comments
Open

Feedback on chapter 2 and chapter 3 #79

ved-rivos opened this issue Nov 19, 2024 · 44 comments
Assignees

Comments

@ved-rivos
Copy link

  1. The distinction between a fast channel vs. a normal channel is unclear.
    Do fast and normal channels co-exist between same pair of producers
    and consumers? If so what is the criteria for selecting between the two.
  2. Section 2.2/2.3 - explain more about the caching side effects and what is
    being avoided.
  3. For P2A channels - what is the distinction between a notification and a
    request? Do notifications not require an Ack?
  4. Replace "cache line" with "cache block"
  5. Explain rationale for the head and tail to use the -2 notation? As the
    effective size of the buffer is M-2, there is already a need to special case
    the wrap decision.
  6. Section 2.3.2 - explain the cache maintenance intended to be done to the
    head and tail slots.
  7. What are the semantics of the head and tail. Are they expected to have a
    wrap indicator or is the convention that tail = head-1 indicates a full queue?
  8. Section 2.3.3 - the non-normative comment about having as many slots as
    number of application processors indicates that this queue is expected to be
    concurrently operated on by multiple application processors? If so then in the
    P2A direction: how is the de-multiplixing of requests/notifications/acks
    performed and which application processor fields the interrupts?
  9. section 2.3.3 - the privileged architecture does not have a concept of PMA
    entries. Each location in memory has a PMA. Please uplevel this note to
    explain the intent of the note and call out what type of optimization
    are intended.
  10. Section 2.3.3 : typo "shared shared" -> "shared"
  11. Chapter 2: Which entity is responsible for initializing the head and tail pointer
    and the queue memory. What coordination mechanisms are used to re-init the
    queue (for instance on a kexec).
  12. Section 3.1 - is non-posted request a better term to complement posted request?
  13. section 3.2 - is the message length variable within a slot or can a message
    occupy multiple slots?
  14. Section 3.2.2 - what is the intent of flag[3]. The earlier sections indicate that
    the producer must explicitly ring a doorbell. Is "enabled/disabled" meant to indicate
    that doorbells must not be used for a queue or does it indicate that they are not
    used for this specific message?
  15. Section 3.2.2 - what is the intent of the Token. Are the sequence numbers required
    to be monotonic? What are the sequence numbers initialized to? Are there any error
    checks for missing sequence numbers? How are sequence numbers synchronized
    (for instance after a kexec).
  16. section 3.2.2 - are there any error checks done on data len? What is handling for
    a zero length message as the header does not have any operation code?
  17. When multipart acknowledgements are used, how are the multiple parts identified?
    Can acknowledgements for two different messages be interleaved? Does the STATUS
    field of each multipart message need to be identical and if not what are the rules for
    the overall STATUS?
@pathakraul pathakraul self-assigned this Nov 21, 2024
@dhaval-rivos
Copy link

Additional comments:

  1. Now that we are extending RPMI to MM as well, in the introduction we should clarify that it is not just for PuC.
  2. Figure 2 should be updated to reflect req-fwd API implementation
  3. Nit: figure2 should have "System Monitoring And Control" instead of just system control.
  4. How do we define Ownership of the channel (I think it is mentioned in one of the diagrams). Is it of any significance?
  5. Messaging Protocol: Typo, response which is "s/send/sent" back as an RPMI acknowledgement
  6. Since we are dealing with PuC, should spec cover a scenario where request times out and any corrective action is required or that is left to the platform FW? There is RPMI_ERR_TIMEOUT but for that Ack has to come which may not happen in some scenarios?
  7. Messaging Protocol: Should this be renamed to /sRPMI_ERR_ALREADY/RPMI_ERR_INPROGRESS?
  8. MPXY: "The SBI MPXY channel must correspond to a single RPMI service group". This means one channel can not be mapped to multiple groups. Can we also clarify other way around that it is possible to have multiple channels that support a service group type. i.e. there can be 2 channels for MM service group.
  9. RPMI service group: Typo "A RPMI service group may be partially implement".."service group may implement..."
  10. We do not need this now I believe: any specific use case for this? MM_ENABLE_NOTIFICATION
  11. MM_SHMEM_ADDR_LOW: In a typical MM implementation NS buffer is allocated by edk2 and then passed into MM as a HOB. If that implementation is done then this attr may not be necessary. But there are some challenges with that implementation that is under discussion. So if eventually implementation does not support this attribute, is there a way to inform caller? Just send 0 in the address?
  12. REQFWD_ENABLE_NOTIFICATION this may not be necessary?

@pathakraul
Copy link
Collaborator

Hi @ved-rivos and ARC team, thanks for the feedback.

Provided the answers below.

  1. The distinction between a fast channel vs. a normal channel is unclear.
    Do fast and normal channels co-exist between same pair of producers
    and consumers? If so what is the criteria for selecting between the two.

A normal channel is a shared memory queue which is shared among multiple RPMI clients and normal channel transmits a RPMI message which has RPMI message header.

A fast channel is a shared memory without any message format which transmits raw data usually few bytes (though its not a limitation). A service group if it requires fastchannel, then discovery of the fastchannel details, semantics, which services can send the data over fastchannel are defined by that particular service group.

Yes, a service group supporting fastchannel can use both normal RPMI message via normal channel fastchannel. A service group has to define, if fastchannel is supported and implemented, then which services in that service group will only use fastchannel to send the data.

For example, the CPPC service group defines the fastchannel support (Section: 4.5.1. CPPC Fast-channel) and also provide the mechanism to discover its support (Section: 4.5.7). It also mentions where AP should write the desired performance level based on the fastchannel availability.

  1. Section 2.2/2.3 - explain more about the caching side effects and what is
    being avoided.

Noted, will update.

  1. For P2A channels - what is the distinction between a notification and a
    request? Do notifications not require an Ack?

Notification is a RPMI message with its own message format different from P2A Request message which is a request from Platform(PuC) to Application Processor. Request message is meant to carry a command which needs to be serviced by the service provider like PuC. Notification message is only to report events

Acknowledgement for notification message is not necessary because events which are carried by notification message are only meant for logging or reporting purposes. AP can choose to subscribe to these notification messages for any service group and even if it has subscribed to them it's upto AP and implementation of that service group to decide what to do with the events.

  1. Replace "cache line" with "cache block"

Noted, will update

  1. Explain rationale for the head and tail to use the -2 notation? As the
    effective size of the buffer is M-2, there is already a need to special case
    the wrap decision.

For implementation the indices for the message slots only matters which represents the actual queue. The text in this section and the notation (slot index - 2) is just to represent that effectively from all slots perspective(total shared memory slots as in the image) which includes head and tail slots then head and tail stores the message slots indices which are (slot index - 2).

We can add a non-normative text and elaborate this.

  1. Section 2.3.2 - explain the cache maintenance intended to be done to the
    head and tail slots.

The cache maintenance will involve using cache flush/invalidate operation whenever the head and tail are updated.
Noted, will update.

  1. What are the semantics of the head and tail. Are they expected to have a
    wrap indicator or is the convention that tail = head-1 indicates a full queue?

RPMI spec defines the role of Head for dequeuing and Tail for enqueuing. It also states that ownership of head and tail is exclusive and only one entity is allowed to write to any of head or tail according to their role as producer or consumer. A queue has a message transmission direction associated which implicitly defines who will be the producer and who will be the consumer.

There are special wrap indicators defined to highlight the wrapping of the queue. Checking empty and full can be done by only head and tail. Currently, there are no queue empty/full conditions explicitly defined in the specification because these are obvious based on above semantics. We certainly don’t mind defining these conditions.

  1. Section 2.3.3 - the non-normative comment about having as many slots as
    number of application processors indicates that this queue is expected to be
    concurrently operated on by multiple application processors? If so then in the
    P2A direction: how is the de-multiplixing of requests/notifications/acks
    performed and which application processor fields the interrupts?

Multiple application processors share the same RPMI queues and the messages in the queues are not bound to a particular application processor. Now it is up to the software running on the application processors about how to synchronize the access to the queues.

Regarding interrupt to the application processors, there is only one interrupt defined for RPMI transport in P2A direction. This interrupt can be wired interrupt or MSI. If P2A interrupt is wired interrupt then it is managed through PLIC or APLIC which allows application processors to select the target processor. If P2A interrupt is MSI then the application processor can configure IMSIC coordinates (MSI address + data) of target processor using BASE service group.

  1. section 2.3.3 - the privileged architecture does not have a concept of PMA
    entries. Each location in memory has a PMA. Please uplevel this note to
    explain the intent of the note and call out what type of optimization
    are intended.

The RPMI shared memory may or may not be cache-coherent for application processor and it may also be some on-chip RAM. This means the platform must set up PMA for the RPMI shared memory so that both PuC and application processors can coherently access it.

The number of PMA regions supported for a RISC-V implementation is generally limited and non-contiguous RPMI shared memory required multiple PMA regions hence the note in the Section 2.3.3 suggests.

We can explicitly mention that defining these attributes is the part of platform/implementation.

  1. Section 2.3.3 : typo "shared shared" -> "shared"

Noted, will update.

  1. Chapter 2: Which entity is responsible for initializing the head and tail pointer
    and the queue memory. What coordination mechanisms are used to re-init the
    queue (for instance on a kexec).

The shared memory queues (including the head and tail slots) are initialized by the implementation (aka PuC). From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

Upon kexec, the application processors can simply resume from the current head and tail values without any special coordination.

Please note that kexec is a Linux feature and we can have any OS or firmware running on the application processors.

  1. Section 3.1 - is non-posted request a better term to complement posted request?

A normal message communication usually involves sending a message and then expecting a response hence the term “normal request”. Now in limited cases, we may not have any response so we choose the term “posted request” for such cases.

There was a good consensus for the above terms among folks who have already reviewed the spec and a lot of software is already written using these terms. Unless absolutely required, we would like to continue with these terms.

  1. section 3.2 - is the message length variable within a slot or can a message
    occupy multiple slots?

Message header is fixed, message data length is variable which makes message length variable. But message length cannot be greater than the Slot Size.

And a message cannot occupy multiple slots. The multipart message is different from this which i have explained in point 17.

  1. Section 3.2.2 - what is the intent of flag[3]. The earlier sections indicate that
    the producer must explicitly ring a doorbell. Is "enabled/disabled" meant to indicate
    that doorbells must not be used for a queue or does it indicate that they are not
    used for this specific message?

Transport doorbell are optional. Spec mentions that even if doorbells are supported on either side they can ignore and do the polling. Thats why the flag[3] is provided to let the entity which is servicing the request each time to either trigger the doorbell or not.

Yes, this can be used for a particular message or for the entire lifecycle. For example if AP and PuC are capable for MSIs and AP has configured MSI details via defined service in Section 4.1.10 then this bit can always be enabled so that PuC always send MSI for each response. But AP can selectively disable it so that PuC in that case does not trigger a doorbell.

RPMI spec can elaborate this more for this flag and we can update this.

  1. Section 3.2.2 - what is the intent of the Token. Are the sequence numbers required
    to be monotonic? What are the sequence numbers initialized to? Are there any error
    checks for missing sequence numbers? How are sequence numbers synchronized
    (for instance after a kexec).

The message token helps application processors track the origin of the request when it sees a response.
This is very useful when multiple application processors are sharing the same queues. For example, two different processors may queue the same type ( ServiceGroup ID + Service ID) of requests so when responses of both requests are received the token helps differentiate which response belongs to which request.

It is generally recommended to have monotonically increasing token numbers and the token number can be initialized to any value and there is no constraint here.

PuC does not check on the token number since this is only for application processors. For Kexec, the application processors must start from a new value of token so no special handling required.

  1. section 3.2.2 - are there any error checks done on data len? What is handling for
    a zero length message as the header does not have any operation code?

A RPMI message is valid even if DATALEN field is 0 for example the service defined in Section 4.1.5 which does not have any request data and the Request is identified from ServiceGroupID and ServiceID.

Implementations can perform the datalen checks. For example, the datalen field value received in an acknowledgement must not be greater than whats RPMI spec has defined for that service acknowledgement. More error checks can also be done by implementations. These checks are not defined by the RPMI specifications.

  1. When multipart acknowledgements are used, how are the multiple parts identified?
    Can acknowledgements for two different messages be interleaved? Does the STATUS
    field of each multipart message need to be identical and if not what are the rules for
    the overall STATUS?

A multipart acknowledgement means that the data which is supposed to be sent in acknowledgement is greater and cannot be accommodated in a single message. In such cases there are extra fields defined in that service REMAINING and RETURNED to show that some data is remaining and requester must again
send another request message with appropriate START_INDEX to get remaining data. One such example is in Section 4.4.6. All services where such a case may arise has defined the Acknowledgement layout in such a manner.

In this case the multiple request messages made to get the complete data are considered as separate messages. Now since these are considered as different messages the STATUS field is also taken independent. AP in such cases must consider all STATUS together after all transactions are complete and make decisions accordingly.

@pathakraul
Copy link
Collaborator

@dhaval-rivos Thanks for your feedback, will reply to your queries asap

@pathakraul
Copy link
Collaborator

Additional comments:

  1. Now that we are extending RPMI to MM as well, in the introduction we should clarify that it is not just for PuC.

In Chapter 1. we have already expanded the scope of RPMI and who can implement it and also making it implicit that not every service group defined in the RPMI spec is meant for PuC.

  1. Figure 2 should be updated to reflect req-fwd API implementation

This image is not meant to encompass everything defined in this spec and rather only present an overview.
We can put an image for this in the service group section itself. Like we have done for Chapter 5.

  1. Nit: figure2 should have "System Monitoring And Control" instead of just system control.

Noted. Will update

  1. How do we define Ownership of the channel (I think it is mentioned in one of the diagrams). Is it of any significance?

I will remove that text from figure-2. But actually it just meant to highlight that a RPMI transport in SEE has SEE as owner which is implicit.

  1. Messaging Protocol: Typo, response which is "s/send/sent" back as an RPMI acknowledgement

Noted, will update.

  1. Since we are dealing with PuC, should spec cover a scenario where request times out and any corrective action is required or that is left to the platform FW? There is RPMI_ERR_TIMEOUT but for that Ack has to come which may not happen in some scenarios?

There are two scenarios possible -

One is that ACK not getting to the application processor can happen either due to PuC not able to send ACK or issue with the transport. In such cases the type of corrective action will depend on the service being called and the RPMI clients.

Similarly, a service can also return ACK with RPMI_ERR_TIMEOUT and corrective action is again depends on that service called on RPMI client side. For example a voltage change request may timeout when its implementation at PuC side is not able to service the request in time due to slow PSU or VRM. In such case the corrective action depends on RPMI client if it wants to retry again asap, or retry after sometime or any other measure.

  1. Messaging Protocol: Should this be renamed to /sRPMI_ERR_ALREADY/RPMI_ERR_INPROGRESS?

Yes, this is little subjective and was also discussed with others and we decided to go with ALREADY as we found it more generic and appropriate for conditions which are already in progress or already happened. INPROGRESS was not chosen since it made more sense to use it only for in progress actions.

  1. MPXY: "The SBI MPXY channel must correspond to a single RPMI service group". This means one channel can not be mapped to multiple groups. Can we also clarify other way around that it is possible to have multiple channels that support a service group type. i.e. there can be 2 channels for MM service group.

The specification only mandates to assign one MPXY channel to have one RPMI service group which means that two service groups cannot share one MPXY channel. But the implementation can have multiple MPXY channels with same service group mapped either in a single RPMI transport instance or across multiple RPMI transport instances in an implementation.

  1. RPMI service group: Typo "A RPMI service group may be partially implement".."service group may implement..."

Noted, will update.

  1. We do not need this now I believe: any specific use case for this? MM_ENABLE_NOTIFICATION

If there are no events supported then it can simply return RPMI_ERR_NOT_SUPPORTED. But every service group has to follow requirements 1 and 2 and 3 as stated in Chapter 4 starting and need to implement the ENABLE_NOTIFICATION service.

  1. MM_SHMEM_ADDR_LOW: In a typical MM implementation NS buffer is allocated by edk2 and then passed into MM as a HOB. If that implementation is done then this attr may not be necessary. But there are some challenges with that implementation that is under discussion. So if eventually implementation does not support this attribute, is there a way to inform caller? Just send 0 in the address?

The discussion in the meeting on ReqFwd and MM concluded that the memory will already be allocated to avoid passing memory physical address every time which may require PMP mapping/unmapping and sanitization of pased memory addresses every time and its allocation is mandatory which led to adding these attributes. Passing 0 will reflect that its not present which will defy that. In case of EDK2 even if these shared memory details are made static then EDK2 non secure domain client can ignore these attributes instead of passing them as 0 from the MM service provider in secure domain.

  1. REQFWD_ENABLE_NOTIFICATION this may not be necessary?

Same as mentioned for question 10.

@ved-rivos
Copy link
Author

Currently, there are no queue empty/full conditions explicitly defined in the specification because these are obvious based on above semantics. We certainly don’t mind defining these conditions.

There are at least two ways to implement the head and tail for a circular buffer:

  1. In a scheme where both the head and tail pointers are augmented with a wrap bit, the full capacity of the circular buffer can be utilized without reserving a slot to differentiate between full and empty states. The wrap bit toggles each time the head or tail pointer wraps around from the last slot back to the first slot. This additional bit allows the system to distinguish between the buffer being full and empty, even when the head and tail pointers point to the same slot. Specifically, the buffer is empty when the head and tail indices are equal, and their wrap bits are also the same. Conversely, the buffer is full when the head and tail indices are equal, but their wrap bits differ. This approach eliminates the need to reserve a slot, maximizing the usable capacity of the buffer while maintaining simple and efficient enqueue and dequeue operations.
  2. In the traditional circular buffer scheme, one slot is intentionally left unused to differentiate between the full and empty states. The head pointer indicates the next slot to dequeue from, and the tail pointer indicates the next slot to enqueue into. The buffer is considered empty when head == tail, as there are no elements to dequeue. It is considered full when ((tail + 1) % N) == head, where N is the total number of slots, meaning advancing the tail would overwrite the head. This reserved slot simplifies the logic for distinguishing between full and empty states but slightly reduces the effective capacity of the buffer by one slot. Despite this trade-off, the simplicity and robustness of this scheme make it widely used.

The updated specification should be normative about which of the two schemes is intended. Based on the latest update it seems like the second scheme is the intent. If so that should be in normative text.

@ved-rivos
Copy link
Author

From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

The section 2.2.4 does not mention the details about the initialization. If the intent is for the PuC to initialize the queue head and tail then it should be mentioned. kexec is a linux concept but similar mechanisms such as vmphu may apply to other OSs. If an OS is expected to continue from wherever the head and tail happen to be then such an OS may receive notifications that it has not subscribed to or may get ACK for requests it has never made. This should be noted and based on the comments in this thread seems like they are expected to be benignly ignored.

@ved-rivos
Copy link
Author

ved-rivos commented Dec 10, 2024

RPMI spec can elaborate this more for this flag and we can update this.

It is better to rename the flag as "doorbell interrupt is requested". The description of the flag states "inform the platform micro-controller to ring the doorbell". Is this flag not applicable to requests sent in P2A direction? The treatment of the flag[3] being set to 1 when doorbell interrupts are not enabled or not supported is undefined. Further, based on this description this flag seems to be not applicable in an ACK.

@ved-rivos
Copy link
Author

ved-rivos commented Dec 10, 2024

The message token helps application processors track the origin of the request when it sees a response.

The specification does not state anywhere that the TOKEN in a request must be echoed back in an ACK for the purposes of such correlation. Further describing this as a "sequence number" is misleading. A better description should be used as these are primarily message tags and not used to identify any form of sequence of messages.

Please provide some explanation about the use of a TOKEN in a notification. Also when a channel is shared by multiple application processor harts then what method is used to determine which hart should process the notification message and whether events for multiple harts be packed into a single notification message.

@pathakraul
Copy link
Collaborator

Currently, there are no queue empty/full conditions explicitly defined in the specification because these are obvious based on above semantics. We certainly don’t mind defining these conditions.

There are at least two ways to implement the head and tail for a circular buffer:

  1. In a scheme where both the head and tail pointers are augmented with a wrap bit, the full capacity of the circular buffer can be utilized without reserving a slot to differentiate between full and empty states. The wrap bit toggles each time the head or tail pointer wraps around from the last slot back to the first slot. This additional bit allows the system to distinguish between the buffer being full and empty, even when the head and tail pointers point to the same slot. Specifically, the buffer is empty when the head and tail indices are equal, and their wrap bits are also the same. Conversely, the buffer is full when the head and tail indices are equal, but their wrap bits differ. This approach eliminates the need to reserve a slot, maximizing the usable capacity of the buffer while maintaining simple and efficient enqueue and dequeue operations.
  2. In the traditional circular buffer scheme, one slot is intentionally left unused to differentiate between the full and empty states. The head pointer indicates the next slot to dequeue from, and the tail pointer indicates the next slot to enqueue into. The buffer is considered empty when head == tail, as there are no elements to dequeue. It is considered full when ((tail + 1) % N) == head, where N is the total number of slots, meaning advancing the tail would overwrite the head. This reserved slot simplifies the logic for distinguishing between full and empty states but slightly reduces the effective capacity of the buffer by one slot. Despite this trade-off, the simplicity and robustness of this scheme make it widely used.

The updated specification should be normative about which of the two schemes is intended. Based on the latest update it seems like the second scheme is the intent. If so that should be in normative text.

Noted, will make the conditions as normative text

@pathakraul
Copy link
Collaborator

From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

The section 2.2.4 does not mention the details about the initialization. If the intent is for the PuC to initialize the queue head and tail then it should be mentioned. kexec is a linux concept but similar mechanisms such as vmphu may apply to other OSs. If an OS is expected to continue from wherever the head and tail happen to be then such an OS may receive notifications that it has not subscribed to or may get ACK for requests it has never made. This should be noted and based on the comments in this thread seems like they are expected to be benignly ignored.

Noted, Can this be a non-normative text explaining this condition?

@pathakraul
Copy link
Collaborator

RPMI spec can elaborate this more for this flag and we can update this.

It is better to rename the flag as "doorbell interrupt is requested". The description of the flag states "inform the platform micro-controller to ring the doorbell". Is this flag not applicable to requests sent in P2A direction? The treatment of the flag[3] being set to 1 when doorbell interrupts are not enabled or not supported is undefined. Further, based on this description this flag seems to be not applicable in an ACK.

This flag is applicable to Normal Requests from both AP or PuC.

A sender if sets this bit without having doorbell interrupts enabled or without support will result in undefined behavior. We can mention this as a normative text.

@pathakraul
Copy link
Collaborator

The message token helps application processors track the origin of the request when it sees a response.

The specification does not state anywhere that the TOKEN in a request must be echoed back in an ACK for the purposes of such correlation. Further describing this as a "sequence number" is misleading. A better description should be used as these are primarily message tags and not used to identify any form of sequence of messages.

Specification does mention that TOKEN needs to be preserved, this is mentioned in Section 3.2.2 just below the RPMI Message Header table.

Its true that these are not used as sequence number among a set of message sent over a period. Specification can clarify this in a non-normative text since this is not any requirement but just explanation what TOKEN represents.

Please provide some explanation about the use of a TOKEN in a notification. Also when a channel is shared by multiple application processor harts then what method is used to determine which hart should process the notification message and whether events for multiple harts be packed into a single notification message.

Though TOKEN in a notification message does not serve a purpose but since RPMI has a common message header it has become part of notification message. I agree and specification can mention this wrt notification message.

Noted, will update

@ved-rivos
Copy link
Author

From the AP perspective the shared memory is initialized and ready to use and details of queues as mentioned in Section 2.3.4 are provided to the AP.

The section 2.2.4 does not mention the details about the initialization. If the intent is for the PuC to initialize the queue head and tail then it should be mentioned. kexec is a linux concept but similar mechanisms such as vmphu may apply to other OSs. If an OS is expected to continue from wherever the head and tail happen to be then such an OS may receive notifications that it has not subscribed to or may get ACK for requests it has never made. This should be noted and based on the comments in this thread seems like they are expected to be benignly ignored.

Noted, Can this be a non-normative text explaining this condition?

Yes. Non-normative text explaining this condition would suffice.

@ved-rivos
Copy link
Author

ved-rivos commented Dec 15, 2024

Its true that these are not used as sequence number among a set of message sent over a period. Specification can clarify this in a non-normative text since this is not any requirement but just explanation what TOKEN represents.

The use of term "sequence number" is misleading. Since these are not sequence numbers, the specification should not use that term. For instance, one use of the token would be to fill it with the hart_id.

@ved-rivos
Copy link
Author

Few further questions on the RPMI spec:
1)

If the P2A doorbell is a MSI then the application processors must configure the MSI on the platform microcontroller side using the RPMI service defined by the BASE service group.

This implies there is some hidden state - "MSI is configured" - associated with the P2A doorbell. The BASE_SET_MSI message allows setting of this hidden state but there is no way to unset this state.

In the shared memory region allocated by the platform for fast-channels, the Performance Request fastchannels must be grouped together for all the application processors. Similarly, the Performance Feedback fast-channels must be grouped together.

This poses an undue constraint on design of the platform. When the platform supports multiple chiplets or multiple sockets, the desire to group them all together may not be acceptable.

This service is used to get the details of the shared memory region containing all the fast-channels, attributes of the fast-channel and the details of the doorbell.

Besides the question 2, the syntax and semantics of the doorbell - DB_ADDR, DB_SETMASK, DB_PRESERVEMASK - are not specified in this specification.

@pathakraul
Copy link
Collaborator

Few further questions on the RPMI spec: 1)

If the P2A doorbell is a MSI then the application processors must configure the MSI on the platform microcontroller side using the RPMI service defined by the BASE service group.

This implies there is some hidden state - "MSI is configured" - associated with the P2A doorbell. The BASE_SET_MSI message allows setting of this hidden state but there is no way to unset this state.

This configuration is required to configure the MSI details if the MSI based doorbell is preferred by the APs. Now, irrespective of MSI support and configuration of details using this service, the requests made by the AP to PuC should also have the flags[3] bit set so that PuC will use the MSI doorbell. If that flag is not set then irrespective of the configured MSI details the MSI will not be used. This is the rationale for current details.

I agree that this service for MSI details configuration must also be complete which will require support to disable the details. I will update this service to mention that if this service is called again with MSI_ADDR = 0 and MSI_DATA = 0 then earlier MSI configured details will render unconfigured.

In the shared memory region allocated by the platform for fast-channels, the Performance Request fastchannels must be grouped together for all the application processors. Similarly, the Performance Feedback fast-channels must be grouped together.

This poses an undue constraint on design of the platform. When the platform supports multiple chiplets or multiple sockets, the desire to group them all together may not be acceptable.

We decided to group the fastchannels according to the Read and Write done by the AP and PuC. The Performance Request fastchannels are Write by the AP and Read by the PuC. Similarly the Performance Feedback fastchannels are Write by the PuC and Read by the AP.

Another possibility we discussed earlier to group both fastchannels per AP. Which means that performance request and performance feedback will be adjacent to each other. But, in this approach, depending on the system there was a likelihood of performance issues such as false sharing.

Also we couldn't leave this layout open without any requirement and decided to have current mentioned approach.

What is your suggestion?

This service is used to get the details of the shared memory region containing all the fast-channels, attributes of the fast-channel and the details of the doorbell.

Besides the question 2, the syntax and semantics of the doorbell - DB_ADDR, DB_SETMASK, DB_PRESERVEMASK - are not specified in this specification.

Noted, will add details for these fields, what they represent and how they should be used by the implementations.

@pathakraul
Copy link
Collaborator

@avpatel @lftan , Please check and add more details if required or i missed something.

@pathakraul
Copy link
Collaborator

Its true that these are not used as sequence number among a set of message sent over a period. Specification can clarify this in a non-normative text since this is not any requirement but just explanation what TOKEN represents.

The use of term "sequence number" is misleading. Since these are not sequence numbers, the specification should not use that term. For instance, one use of the token would be to fill it with the hart_id.

Noted, will remove the use of "sequence number" for TOKEN and simply mention that its a integer tag/number

@avpatel
Copy link
Contributor

avpatel commented Dec 18, 2024

In the shared memory region allocated by the platform for fast-channels, the Performance Request fastchannels must be grouped together for all the application processors. Similarly, the Performance Feedback fast-channels must be grouped together.

This poses an undue constraint on design of the platform. When the platform supports multiple chiplets or multiple sockets, the desire to group them all together may not be acceptable.

The current approach is much more suitable and convenient for multiple chiplets scenario because a single fastchannel region minimizes the required PMA regions and PMP regions on AP side. In fact, the multi-chiplet scenario is already modeled in our QEMU based PoC.

If a platform still wants separate shared memory for each chiplet then platform can easily create separate RPMI shared memory transports for each chiplet where the per-chiplet RPMI shared memory transport will only provide services local to that chiplet cluster. This is also modeled in the past and works fine with the current approach.

Now grouping the Request channels together and Feedback channels together was considering a platform where the fastchannel region is not cache-coherent and requires explicit cache-maintainence. In such scenario, if Request channel and Feedback channel of HART are clubbed together then it becomes complicated to do cache-maintainence since both have different direction of data movement.

@ved-rivos
Copy link
Author

I think there are certain assumptions here such as the fast channel being in memory that needs cache maintenance. The fast channel registers may be just memory mapped IO registers. The design already supports the discovery of the offset from the base of the region for each hart. That suffices and there need not be a mandate that they need to be grouped in any form. There may not be a PMA or PMP constraint depending on the platform or the constraints may be different for different platforms. The platforms can choose the most optimal layout. Since the offset of each register in the channel per hart can be discovered using GET_FAST_CHANNEL_OFFSET, the consumers of RPMI need not care about the order in which they are grouped.

@ved-rivos
Copy link
Author

I will update this service to mention that if this service is called again with MSI_ADDR = 0 and MSI_DATA = 0 then earlier MSI configured details will render unconfigured.

Why would address value of 0 and data value of 0 not be a valid MSI address/data pair? MSI addresses are required to be naturally aligned to 4 bytes. The bits 1 of the MSI low address may be defined as reserved/ignored and bit 0 as an enable.

@avpatel
Copy link
Contributor

avpatel commented Dec 18, 2024

I think there are certain assumptions here such as the fast channel being in memory that needs cache maintenance. The fast channel registers may be just memory mapped IO registers. The design already supports the discovery of the offset from the base of the region for each hart. That suffices and there need not be a mandate that they need to be grouped in any form. There may not be a PMA or PMP constraint depending on the platform or the constraints may be different for different platforms. The platforms can choose the most optimal layout. Since the offset of each register in the channel per hart can be discovered using GET_FAST_CHANNEL_OFFSET, the consumers of RPMI need not care about the order in which they are grouped.

I agree.

From RPMI spec perspective, a single fastchannel region and the individual harts having offsets within this region should be sufficient.

@pathakraul
Copy link
Collaborator

I will update this service to mention that if this service is called again with MSI_ADDR = 0 and MSI_DATA = 0 then earlier MSI configured details will render unconfigured.

Why would address value of 0 and data value of 0 not be a valid MSI address/data pair? MSI addresses are required to be naturally aligned to 4 bytes. The bits 1 of the MSI low address may be defined as reserved/ignored and bit 0 as an enable.

Yes, I agree, having 0 from spec perspective which can be a valid address is not right.

@ved-rivos
Copy link
Author

From RPMI spec perspective, a single fastchannel region and the individual harts having offsets within this region should be sufficient.

Yes, the region could describe all of addressable memory. Effectively the region is providing a some base for the offsets.

@ved-rivos
Copy link
Author

In section 4.6, a description of what the VOLTAGE_MIN_SEL/VOLTAGE_MAX_SEL mean would be helpful. Its not clear in the specification how the multi-linear format should be interpreted.

@pathakraul
Copy link
Collaborator

In section 4.6, a description of what the VOLTAGE_MIN_SEL/VOLTAGE_MAX_SEL mean would be helpful. Its not clear in the specification how the multi-linear format should be interpreted.

Noted, will add more details

@ved-rivos
Copy link
Author

Usually clocks and voltages are related i.e.a voltage domain would expect the voltage be appropriate for the current selected clock as part of dynamic voltage/frequency scaling. Is that comprehended in some form? How are the voltage domain and clock IDs expected to be correlated?

@ved-rivos
Copy link
Author

Section 4.8.1 - the power state values 15:0 seem to encode CONTEXT state in them - only two states are defined: ON with context preserved and OFF with context is lost. Is that the intent? Or was the encoding meant to just state OFF with the bit 16 providing an indication whether context is lost or preserved?

@pathakraul
Copy link
Collaborator

pathakraul commented Dec 18, 2024

Usually clocks and voltages are related i.e.a voltage domain would expect the voltage be appropriate for the current selected clock as part of dynamic voltage/frequency scaling. Is that comprehended in some form? How are the voltage domain and clock IDs expected to be correlated?

No, the voltage and clock domains are considered separate from RPMI perspective even though there are dependencies among them in implementation and these are left for the platform/implementation and the drivers. Here the voltage and clocks services are mainly meant for the system clocks or system regulators apart from the CPU. For example, UART, Accelerator, etc.

For DVFS like you asked, we have defined separate service groups CPPC (ACPI CPPC) and PERFORMANCE which are meant for CPU and Device performance scaling. These groups works on a defined performance scale(semantics are separate for CPPC and PERFORMANCE) and platform/implementation is responsible for translating the performance scale into the hardware operating points which may not be just voltage and frequency but will have more hardware control knobs.

@ved-rivos
Copy link
Author

Yes, I agree, having 0 from spec perspective which can be a valid address is not right.

You could use data value of 0 as implying the MSI is not valid as the interrupt ID of 0 is not a valid interrupt ID.

@avpatel
Copy link
Contributor

avpatel commented Dec 18, 2024

Yes, I agree, having 0 from spec perspective which can be a valid address is not right.

You could use data value of 0 as implying the MSI is not valid as the interrupt ID of 0 is not a valid interrupt ID.

We should not assume anything about MSI controller in a platform. Its better to have explicit mechanism to enable/disable MSI in Base service group.

@ved-rivos
Copy link
Author

We should not assume anything about MSI controller in a platform. Its better to have explicit mechanism to enable/disable MSI in Base service group.

Agree.

@pathakraul
Copy link
Collaborator

Yes, I agree, having 0 from spec perspective which can be a valid address is not right.

You could use data value of 0 as implying the MSI is not valid as the interrupt ID of 0 is not a valid interrupt ID.

We should not assume anything about MSI controller in a platform. Its better to have explicit mechanism to enable/disable MSI in Base service group.

I also inclined towards having explicit service for enable/disable the MSI. Configure MSI with addr and data will complement that

@lftan
Copy link
Collaborator

lftan commented Dec 19, 2024

Section 4.8.1 - the power state values 15:0 seem to encode CONTEXT state in them - only two states are defined: ON with context preserved and OFF with context is lost. Is that the intent? Or was the encoding meant to just state OFF with the bit 16 providing an indication whether context is lost or preserved?

Yes, this is intentional. For the defined states ON and OFF, when in the ON state, we expect CONTEXT to be preserved. However, in the OFF state, CONTEXT is lost.

For vendor-specific states, vendors can define their own CONTEXT requirements.

@ved-rivos
Copy link
Author

Yes, this is intentional. For the defined states ON and OFF, when in the ON state, we expect CONTEXT to be preserved. However, in the OFF state, CONTEXT is lost.

Its not clear why this overloading is required. The protocol correctly defines a power state and context field in the interface. There does not seem to be a need to overload the context value using a power state value. If the PuC does not support particular combination of power state and context - e.g., off without losing context - then it can deny that and this is also quite well specified in the protocol. What motivates creating a power state encoding that overrides context?

@ved-rivos
Copy link
Author

A few comments on section 4.9:

  1. Is the intent of set_limit to allow for devices to change their perf level autonomously? Explain if there is an interaction between set_limit and set_level - e.g., if a limit has been set using set_limit command and subsequently a set_level command is issued, is the limit checked?
  2. The limit and level change notifications imply that they could change asynchronously. How are these asynchronous changes related to the explicitly set limits and levels?
  3. section 4.9.7 - in the pseudo-code was the rate_limit supposed to be transition_latency?
  4. Is there any intent for the level numbers to be indicative of performance levels?

@lftan
Copy link
Collaborator

lftan commented Dec 23, 2024

Yes, this is intentional. For the defined states ON and OFF, when in the ON state, we expect CONTEXT to be preserved. However, in the OFF state, CONTEXT is lost.

Its not clear why this overloading is required. The protocol correctly defines a power state and context field in the interface. There does not seem to be a need to overload the context value using a power state value. If the PuC does not support particular combination of power state and context - e.g., off without losing context - then it can deny that and this is also quite well specified in the protocol. What motivates creating a power state encoding that overrides context?

These are only typical use cases. Will remove the context overloading for the ON/OFF states, leaving it to the PuC firmware to reject unsupported cases.

@lftan
Copy link
Collaborator

lftan commented Dec 23, 2024

A few comments on section 4.9:

  1. Is the intent of set_limit to allow for devices to change their perf level autonomously? Explain if there is an interaction between set_limit and set_level - e.g., if a limit has been set using set_limit command and subsequently a set_level command is issued, is the limit checked?
  2. The limit and level change notifications imply that they could change asynchronously. How are these asynchronous changes related to the explicitly set limits and levels?
  3. section 4.9.7 - in the pseudo-code was the rate_limit supposed to be transition_latency?
  4. Is there any intent for the level numbers to be indicative of performance levels?
  1. The set_limit function enables users to define the minimum and maximum performance levels. Subsequent calls to set_level will validate that the specified performance level remains within this defined range. Will update this to spec.

  2. If the current performance level falls outside the newly set minimum and maximum range, the firmware will adjust the current performance level accordingly and send a notification (e.g PERF_LEVEL_CHANGE, PERF_POWER_CHANGE) to the AP, provided notifications are enabled. WIll add more description to the spec.

  3. Good catch, will fix it.

  4. We are having a separate discussion in rpmi: performance: Add memory region service #81 to update the performance level details (these changes are not yet included in the PR). A 'performance level index' will be added to Section 4.9.1. This index is an integer, with its implementation-specific values defined by the platform.

lftan pushed a commit to lftan/riscv-rpmi that referenced this issue Dec 23, 2024
Update based on feedback from ARC review.
riscv-non-isa#79

Signed-off-by: Ley Foon Tan <[email protected]>
@ved-rivos
Copy link
Author

Section 4.4.9 - explain relation between EXIT_LATENCY and WAKEUP_LATENCY
Section 4.4.9 - explain what is provided in MIN_RESIDENCY and how it relates to WAKEUP_LATENCY
section 4.4.10 - Once a hart stop or suspend had been requested and acknowledged, can it be aborted? There seems to be no mechanism available to clear the hidden state associated with inferring a stop/suspend on subsequent WFI.
Section 4.11 - it seems prudent to reserve some FLAGS[3:0] code points for future standard extensions such as GHESv3

@pathakraul
Copy link
Collaborator

The message token helps application processors track the origin of the request when it sees a response.

The specification does not state anywhere that the TOKEN in a request must be echoed back in an ACK for the purposes of such correlation. Further describing this as a "sequence number" is misleading. A better description should be used as these are primarily message tags and not used to identify any form of sequence of messages.

Please provide some explanation about the use of a TOKEN in a notification.

Also when a channel is shared by multiple application processor harts then what method is used to determine which hart should process the notification message and whether events for multiple harts be packed into a single notification message.

Hi @ved-rivos On the above question, the events are service groups specific and multiple events can be packed in a single notification message. Upon the arrival of a notification message, its really up to the implementation how it wants to consume the events in that message and its upto the implementation if it wants to have specific AP or any AP to consume the events. Are you expecting that spec should have a note about this?

@pathakraul
Copy link
Collaborator

In section 4.6, a description of what the VOLTAGE_MIN_SEL/VOLTAGE_MAX_SEL mean would be helpful. Its not clear in the specification how the multi-linear format should be interpreted.

Noted, will update

lftan pushed a commit to lftan/riscv-rpmi that referenced this issue Jan 6, 2025
- Fix rate_limit to transition_latency
- Add more description for PERF_SET_LIMIT

riscv-non-isa#79 (comment)

Signed-off-by: Ley Foon Tan <[email protected]>
@pathakraul
Copy link
Collaborator

Section 4.4.9 - explain relation between EXIT_LATENCY and WAKEUP_LATENCY Section 4.4.9 - explain what is provided in MIN_RESIDENCY and how it relates to WAKEUP_LATENCY section 4.4.10

  • Once a hart stop or suspend had been requested and acknowledged, can it be aborted? There seems to be no mechanism available to clear the hidden state associated with inferring a stop/suspend on subsequent WFI.

Hi @ved-rivos, On the above question, there are no hidden states and all the states including the intermediate states *_PENDING which are supported already by the SBI HSM are followed here also. Regarding "abort", once a suspend is issued from the OS via SBI call and also acknowledged by the platform microcontroller, it will be parked in the M-Mode and will only return once resume via wakeup event. The provision of abort can be done in the future which will require its support in the OS and the M-mode firmware too.

lftan pushed a commit to lftan/riscv-rpmi that referenced this issue Jan 7, 2025
- Fix rate_limit to transition_latency
- Add more description for PERF_SET_LIMIT

riscv-non-isa#79 (comment)

Signed-off-by: Ley Foon Tan <[email protected]>
lftan pushed a commit to lftan/riscv-rpmi that referenced this issue Jan 7, 2025
- Fix rate_limit to transition_latency
- Add more descriptions for PERF_SET_LIMIT

riscv-non-isa#79 (comment)

Signed-off-by: Ley Foon Tan <[email protected]>
lftan pushed a commit to lftan/riscv-rpmi that referenced this issue Jan 8, 2025
- Fix rate_limit to transition_latency
- Add more descriptions for PERF_SET_LIMIT

riscv-non-isa#79 (comment)

Signed-off-by: Ley Foon Tan <[email protected]>
@ved-rivos
Copy link
Author

Hi @ved-rivos, On the above question, there are no hidden states and all the states including the intermediate states *_PENDING which are supported already by the SBI HSM are followed here also.

For instance, the HSM_START_STOP specification states:
"This service returns successful if the platform microcontroller has successfully acknowledged that the
target hart can be stopped. The hart upon successful acknowledgement can perform the final context
saving if required and must enter into a quiesced state such as WFI which can be detected and allow the
platform microcontroller to proceed to stop the hart. The mechanism to detect the hart quiesced state by
the platform microcontroller is platform specific."

This implies that there is a piece of state in the microcontroller that says "detect next WFI invocation and
proceed to stop the hart". This state once set cannot be unset or be queried.

Regarding "abort", once a suspend is issued from the OS via SBI call and also acknowledged by the platform microcontroller, it will be parked in the M-Mode and will only return once resume via wakeup event.

If the intention is that services such as SYSSYSUP or HART_STATE_MANAGEMENT are only supported over M-mode
transport then that should be stated in the specification.

@pathakraul
Copy link
Collaborator

Hi @ved-rivos, On the above question, there are no hidden states and all the states including the intermediate states *_PENDING which are supported already by the SBI HSM are followed here also.

For instance, the HSM_START_STOP specification states: "This service returns successful if the platform microcontroller has successfully acknowledged that the target hart can be stopped. The hart upon successful acknowledgement can perform the final context saving if required and must enter into a quiesced state such as WFI which can be detected and allow the platform microcontroller to proceed to stop the hart. The mechanism to detect the hart quiesced state by the platform microcontroller is platform specific."

This implies that there is a piece of state in the microcontroller that says "detect next WFI invocation and proceed to stop the hart". This state once set cannot be unset or be queried.

Regarding "abort", once a suspend is issued from the OS via SBI call and also acknowledged by the platform microcontroller, it will be parked in the M-Mode and will only return once resume via wakeup event.

If the intention is that services such as SYSSYSUP or HART_STATE_MANAGEMENT are only supported over M-mode transport then that should be stated in the specification.

Hi @ved-rivos Please check the "Table 7. RPMI Service Groups" in the Chapter 4 Service groups. This list mentions about the priv modes too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants