Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zephyr: enable HIFI_SHARING if DP scheduler is enabled with HIFI #9644

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Nov 8, 2024

If data processing (DP) scheduler is enabled on Xtensa HiFi platforms, CONFIG_HIFI_SHARING should be enabled as well. Otherwise modules running via DP scheduler cannot use HiFi extensions (as the HiFi register state is not saved and restored on context switches).

If data processing (DP) scheduler is enabled on Xtensa HiFi
platforms, CONFIG_HIFI_SHARING should be enabled as well. Otherwise
modules running via DP scheduler cannot use HiFi extensions (as
the HiFi register state is not saved and restored on context
switches).

Signed-off-by: Kai Vehmanen <[email protected]>
@lgirdwood
Copy link
Member

@wszypelt @lrudyX good to merge ?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 8, 2024

@wszypelt @lgirdwood It seems the fail is a System.BadImageFormatException on TGL only, so could be an issue with test infra and not the PR. In general, this is a rather invasive change, so regressions are also possible.

@@ -49,6 +49,7 @@ config ZEPHYR_DP_SCHEDULER
depends on ZEPHYR_SOF_MODULE
depends on ACE
depends on PIPELINE_2_0
imply XTENSA_HIFI_SHARING if XTENSA_HIFI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why imply and not select?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh imply leaves it possible to turn it off https://docs.zephyrproject.org/latest/build/kconfig/tips.html#the-imply-statement . I guess this is up to debata, but sharing does have a perf impact and if you don't you hifi in DP modules, then you strictly speaking don't need this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i yeah, but disabling it is rather dangerous, isn't it. I'm wondering if we could automate this. Compile-time checking could be done at some primitive level, I suppose, like if DP modules are enabled and they use HiFi, but that's too broad - would trigger almost always. We probably want to only turn on HiFi context saving when a DP thread is started, that uses HiFi. Can we add that to the DP scheduler as a DP HiFi use-count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh but with loadable modules (potentially out of tree), this won't be enough, right? There was discussion with the Zephyr PR on the option to make the context save on-demand. I.e. take an exception on hifi use and mark a bit for the thread context. This would add some cost to the common case (e.g. typical LL execution with no preemption of threads between LL tasks).

I think this is now a product policy call and basicly with this PR, you have to go out of your way to disable this on SOF. But given this was disabled before, this construct using "imply" at least allows to revert to that older policy in case anyone needs. If we'd use select, that would lock the policy with no per-target way out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i would it be valid to write

select XTENSA_HIFI_SHARING if DP and XTENSA_HIFI
imply XTENSA_HIFI_SHARING if XTENSA_HIFI

? But well, yes, there's always a way to break things

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh Yes, that could be done as well, the second imply would have to be under some other Kconfig option that is picked by all SOF targets.

That would make it impossible to use the current workarounds:

  • run the DP tasks on a dedicated core (in use)
  • do not use HIFI in DP tasks

This certainly should not be the default, but whether it's allowed at all, that's a good question.

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

@wszypelt @lgirdwood It seems the fail is a System.BadImageFormatException on TGL only, so could be an issue with test infra and not the PR. In general, this is a rather invasive change, so regressions are also possible.

Lets run CI again to be sure.

@lgirdwood
Copy link
Member

@wszypelt @lgirdwood It seems the fail is a System.BadImageFormatException on TGL only, so could be an issue with test infra and not the PR. In general, this is a rather invasive change, so regressions are also possible.

Lets run CI again to be sure.

Results as expected so looks fine. @wszypelt @lrudyX good to merge (CI still pending) ?

@wszypelt
Copy link

@lgirdwood @kv2019i first there was a problem with one of the machines, after fix
there is a issue IPC error 6 in updownmixer for PTL, I still need some time to check it out

@marcinszkudlinski
Copy link
Contributor

marcinszkudlinski commented Nov 13, 2024

According to @wszypelt findings

This change in fact is enabling HiFi preemption, developed by Zephyr some time ago. I was under impression it had been enabled in SOF right after Zephyr change was merged, but looks like it had not.

Such change at kernel level may of course cause a random and hard to track crashes because of races etc. and the result above suggest we do have such issue :(

Required test - start a pipeline with an LL module using HiFI and a DP module (on the same core. also using hifi) with a long processing time (10ms). This would enforce HiFI preemption. Run such test several times.

So at the moment DON"T MERGE IT, we need to prove that HiFI preemption is a problem and go back to Zephyr team for investigation

@kv2019i kv2019i added the DNM Do Not Merge tag label Nov 13, 2024
@lgirdwood lgirdwood added this to the v2.12 milestone Nov 13, 2024
@lgirdwood
Copy link
Member

@rljordan @nashif fyi - looks like we need to deep dive Zephyr HiFi context switching as 1st step.

@lgirdwood
Copy link
Member

@kv2019i @marcinszkudlinski any updates ?

Have we tried rerunning CI and downgrade HIFI version to see if this changes result ? i.e it could be a HiFi4 only issue that would shine the light on any new Hifi4 logic/registers.

Also HiFi has audio processing instructions (which would probably glitch/distort) and also some general purpose instructions/registers (e.g. loops, loads, stores) that may more likely be the culprit here if we are seeing unexpected IPC results.

One other thing is that I dont think the HiFi version Kconfig has impact outside of modules today (since it only for intrinsics), i.e. general purpose code would still use HiFi version that's baked into CC.

@marcinszkudlinski
Copy link
Contributor

@lgirdwood @kv2019i
we DO need hifi preemption
@wszypelt pls provide test

Debugging won't be easy :(

@wszypelt
Copy link

@kv2019i @marcinszkudlinski sure thing, we are during creating new test, but we still need some time

@kv2019i kv2019i modified the milestones: v2.12, v2.13 Dec 13, 2024
@kv2019i
Copy link
Collaborator Author

kv2019i commented Dec 13, 2024

Feature cutoff for v2.12, moving this to v2.13.

@lgirdwood
Copy link
Member

@wszypelt any update on your test ? Btw, I would like to run your test with GCC binaries too since GCC does not use HiFi registers and would help confirm if stability is due to HiFi context or something else.

@lgirdwood lgirdwood added the P2 Critical bugs or normal features label Dec 13, 2024
@wszypelt
Copy link

@lgirdwood Unfortunately, reproduction is not possible at the moment, Marcin is currently working on a test module that will help us with this.

@marcinszkudlinski
Copy link
Contributor

marcinszkudlinski commented Dec 19, 2024

I verified that the preemption feature most likely works fine.
However, HiFI preemption has been developed without any optimalization, as Zephyr doc says:
During each thread context switch, the kernel saves the outgoing
thread's HiFi registers and loads the incoming thread's HiFi registers,
regardless of whether the thread utilizes them or not."

As I checked, for MTL and ace10_LX7HiFi4_2022_10 core it adds 134 CPU cycles per each context switch (that means - 268cycles per preemption) what may have impact on performance. And - it requires 208 bytes of extra space on each stack, it may result with stack overflows.
Still investigating if any of the above caused problems.

There's a possible optimalization, implemented by Cadence for FreeRTOS - do "on demand" or "lazy preemption" - meaning generate an exception at the moment HiFi is to be used by a thread, than do a HiFI context switch only if it is really needed. HiFi state storage area also may in this case be located on each thread's stack but in an extra space in thread context, allocated only for threads that declare they may/will use HiFi.
https://github.com/FreeRTOS/FreeRTOS-Kernel-Partner-Supported-Ports/tree/main/Cadence/Xtensa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DNM Do Not Merge tag P2 Critical bugs or normal features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants