Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] compare-linux-win CI build check fails due to zephyr_version.h difference #9797

Closed
kv2019i opened this issue Jan 27, 2025 · 15 comments
Closed
Assignees
Labels
bug Something isn't working as expected Known PR Failures Issues seen in SOF github pull-request checks P3 Low-impact bugs or features

Comments

@kv2019i
Copy link
Collaborator

kv2019i commented Jan 27, 2025

Status update:

Starting with daily build 795, https://github.com/thesofproject/sof/actions/runs/12940934767
the Linux + mnft (manifest) builds started to lose git tags in the git describe output that goes into their BUILD_VERSION. For instance v4.0.0-2813-g42701fdb2729 became 42701fdb2729. The Windows builds kept it and the Linux "zmain" (Zephyr main branch) kept it too.
This correlate with a Ubuntu VM upgrade, see below.


Describe the bug
Starting today 27th Jan 2025, multiple pull-requests are showing a failure in build reproducibility between Linux and Windows builds with
error:

https://github.com/thesofproject/sof/actions/runs/12947318306/job/36114439673?pr=9794

 Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8/stripped-zephyr.elf and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8/stripped-zephyr.elf differ
Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8/zephyr.lst and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8/zephyr.lst differ
Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8/zephyr_version.h and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8/zephyr_version.h differ
Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8m/stripped-zephyr.elf and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8m/stripped-zephyr.elf differ
Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8m/zephyr.lst and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8m/zephyr.lst differ
Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8m/zephyr_version.h and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8m/zephyr_version.h differ
Files linux-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8ulp/stripped-zephyr.elf and windows-build  imx8 imx8x imx8m imx8ulp/build-sof-staging/sof-info/imx8ulp/stripped-zephyr.elf differ

This is shown in multiple open PRs:

To add to the mystery, I can't see the failure in any of the pull requests failed last week.

To Reproduce
Submit a pull request to SOF repository.

Reproduction Rate
100%

Expected behavior
Linux and Windows host builds should create identical binaries.

Impact
Failure seen on every CI request, can mask other errors.

@kv2019i kv2019i added bug Something isn't working as expected Known PR Failures Issues seen in SOF github pull-request checks labels Jan 27, 2025
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jan 27, 2025

@thesofproject/nxp Rings any bells to you? I checked multiple PRs merged last week, but I couldn't find a PR where this failure would have been seen (already pre merge). I could have missed some.

@dbaluta
Copy link
Collaborator

dbaluta commented Jan 27, 2025

@LaurentiuM1234 @marc-hb didn't we have this issue couple of months ago? Will need to check on that.

@dbaluta
Copy link
Collaborator

dbaluta commented Jan 27, 2025

@kv2019i looks like this is not only for imx builds.

 Files linux-build  tgl tgl-h/build-sof-staging/sof-info/tgl-h/stripped-zephyr.elf and windows-build  tgl tgl-h/build-sof-staging/sof-info/tgl-h/stripped-zephyr.elf differ
Files linux-build  tgl tgl-h/build-sof-staging/sof-info/tgl-h/zephyr.lst and windows-build  tgl tgl-h/build-sof-staging/sof-info/tgl-h/zephyr.lst differ
Files linux-build  tgl tgl-h/build-sof-staging/sof-info/tgl-h/zephyr_version.h and windows-build  tgl tgl-h/build-sof-staging/sof-info/tgl-h/zephyr_version.h differ
Files linux-build -d mtl/build-sof-staging/sof-info/mtl/boot.mod and windows-build -d mtl/build-sof-staging/sof-info/mtl/boot.mod differ
Files linux-build -d mtl/build-sof-staging/sof-info/mtl/stripped-main.elf and windows-build -d mtl/build-sof-staging/sof-info/mtl/stripped-main.elf differ
Files linux-build -d mtl/build-sof-staging/sof-info/mtl/stripped-zephyr.elf and windows-build -d mtl/build-sof-staging/sof-info/mtl/stripped-zephyr.elf differ
Files linux-build -d mtl/build-sof-staging/sof-info/mtl/zephyr.lst and windows-build -d mtl/build-sof-staging/sof-info/mtl/zephyr.lst differ
Files linux-build -d mtl/build-sof-staging/sof-info/mtl/zephyr_version.h and windows-build -d mtl/build-sof-staging/sof-info/mtl/zephyr_version.h differ

@kv2019i kv2019i changed the title [BUG] compare-linux-win CI build check fails to binary difference in imx8/imx8 zephyr.elf [BUG] compare-linux-win CI build check fails to binary difference in zephyr.elf for multiple platforms Jan 27, 2025
@dbaluta
Copy link
Collaborator

dbaluta commented Jan 27, 2025

@kv2019i Relevant issue in the past #9034 was solved by opting out on adding the source code on the assembly listing file.

@LaurentiuM1234
Copy link
Contributor

FYI test also failed when trying to add build job for imx95 #9546 a while ago. If I recall correctly, one of the compilers would add some comments on one of the OSes but not the other. Superficially looking at the 2 .lst's from #9794 it would seem like a dif. issue though.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 27, 2025

Files linux-build -d mtl/build-sof-staging/sof-info/mtl/zephyr_version.h and windows-build -d mtl/build-sof-staging/sof-info/mtl/zephyr_version.h differ

This is the easiest bit to focus on first.

You can downloads artefacts from every PR page, please compare these.

Relevant issue in the past #9034 was solved by opting out on adding the source code on the assembly listing file.

zephyr_version.h looks like a very different problem this time.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 27, 2025

You can downloads artefacts from every PR page,

Actually, daily tests are more convenient for this: because zero PR interference.

Started failing in daily 795 , 4 days ago: https://github.com/thesofproject/sof/actions/runs/12940934767

please compare these.

Don't wait because artefacts expire after some time.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 27, 2025

There's been a progressive rollout of a new Windows image ( Version: 20250113.1.0 -> 20250120.2.0) around the same time but it does not correlate with the failure. EDIT: it's unrelated. The problem is on the Linux side.

For instance, the IMX build got the newer Windows image in 794 but 794 was still all green:
https://github.com/thesofproject/sof/actions/runs/12898787138/job/35966473504

As you can tell from the version numbers, minor upgrades like these happen all the time.

The first step is really to download and compare zephyr_version.h.

@marc-hb marc-hb changed the title [BUG] compare-linux-win CI build check fails to binary difference in zephyr.elf for multiple platforms [BUG] compare-linux-win CI build check fails to binary difference in zephyr_version.h for multiple platforms Jan 27, 2025
@marc-hb marc-hb changed the title [BUG] compare-linux-win CI build check fails to binary difference in zephyr_version.h for multiple platforms [BUG] compare-linux-win CI build check fails due to zephyr_version.h difference Jan 27, 2025
@lyakh lyakh mentioned this issue Jan 28, 2025
@abonislawski abonislawski added the P3 Low-impact bugs or features label Jan 28, 2025
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jan 28, 2025

It would seem the builds really are different now. I wonder if this is somehow related to reent cold/llext change -- the .cold section at least looks different. E.g. here for github build 796 (https://github.com/thesofproject/sof/actions/runs/12940934767#artifacts)

--- linux/build-sof-staging/sof-info/mtl/stripped-zephyr.elf.lst        2025-01-28 14:20:56.645624296 +0200
+++ windows/build-sof-staging/sof-info/mtl/stripped-zephyr.elf.lst      2025-01-28 14:21:12.225625128 +0200
@@ -83,13 +83,13 @@
  a10484e0 0c38a088 20c02000 89094603 00a83b98  .8.. . ...F...;.
  a10484f0 0b0c18a0 8820c020 0089090c 021df0    ..... . ....... 
 Contents of section .cold:
- a1049000 006002a0 609708a0 949808a0 40200000  .`..`.......@ ..
- a1049010 f8870aa0 789708a0 dc4104a0 2c0e08a0  ....x....A..,...
+ a1049000 006002a0 6c9708a0 a09808a0 40200000  .`..l.......@ ..
+ a1049010 10880aa0 849708a0 dc4104a0 2c0e08a0  .........A..,...

The debug sections have been different already before, so I wonder if this just pushes out the sections to different addresses (and use of cold sections just triggers this).

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jan 28, 2025

#795 daily build has clean zephyr_version.h but #796 seems have this delta:

--- linux/build-sof-staging/sof-info/mtl/zephyr_version.h       2025-01-24 00:52:46.000000000 +0200
+++ windows/build-sof-staging/sof-info/mtl/zephyr_version.h     2025-01-24 00:53:50.000000000 +0200
@@ -19,7 +19,7 @@
 #define KERNEL_VERSION_EXTENDED_STRING  "4.0.99+0"
 #define KERNEL_VERSION_TWEAK_STRING     "4.0.99+0"
 
-#define BUILD_VERSION 42701fdb2729
+#define BUILD_VERSION v4.0.0-2813-g42701fdb2729

That will indeed impact the binary.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 28, 2025

Thanks @kv2019i . Indeed, the difference can be observed in the build log:

[7/445] Generating include/generated/zephyr/version.h
-- Zephyr version: 4.0.99 (D:/a/sof/sof/workspace/zephyr), build: v4.0.0-2813-g42701fdb2729

versus

[9/447] Generating include/generated/zephyr/version.h
-- Zephyr version: 4.0.99 (/zep_workspace/zephyr), build: 42701fdb2729

A long time ago, I added magit west and git tricks to speed things up (a lot). They worked fine for a very long time but maybe some subtle git version difference or other environmental difference just broke them. I'll take a quick look.

EDIT: The Linux + mnft (manifest) builds started to lose git tags in the git describe output that goes in their BUILD_VERSION. The Windows builds kept it and the Linux "zmain" (Zephyr main branch) kept it too.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 28, 2025

Around the time of daily build 795, there was a progressive rollout of Ubuntu 22 VM version 20250120.2, up from 20250105.1

This upgrade seems to correlate 100% with the loss of the git tag in git describe.

The newer Ubuntu VM upgraded git from version 2.47.1 to version Git 2.48.1

Meanwhile, git on Windows did not change and stuck to Git 2.47.1.windows.1

https://github.com/actions/runner-images/blob/ubuntu22/20250105.1/images/ubuntu/Ubuntu2204-Readme.md
https://github.com/actions/runner-images/blob/ubuntu22/20250120.2/images/ubuntu/Ubuntu2204-Readme.md

marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
More progress towards trimming down pull-request.yml

Important benefit and "secret" agenda: remove the constantly failing
sof-docs from daily builds, which will make them green again.

Green daily builds are important to quickly spot regressions like for
instance thesofproject#9797

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
More progress towards trimming down pull-request.yml

Important benefit and "secret" agenda: remove the constantly failing
sof-docs from daily builds, which will make them green again.

Green daily builds are important to quickly spot regressions like for
instance thesofproject#9797

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue thesofproject#9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
More progress towards trimming down pull-request.yml

Important benefit and "secret" agenda: remove the constantly failing
sof-docs from daily builds, which will make them green again.

Green daily builds are important to quickly spot regressions like for
instance thesofproject#9797

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
More progress towards trimming down pull-request.yml

Important benefit and "secret" agenda: remove the constantly failing
sof-docs from daily builds, which will make them green again.

Green daily builds are important to quickly spot regressions like for
instance thesofproject#9797

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this issue Jan 28, 2025
More progress towards trimming down pull-request.yml

Important benefit and "secret" agenda: remove the constantly failing
sof-docs from daily builds, which will make them green again.

Green daily builds are important to quickly spot regressions like for
instance thesofproject#9797

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Jan 28, 2025

Fix submitted. This should also turn the daily tests back to green.

kv2019i pushed a commit that referenced this issue Jan 30, 2025
Fixes the git describe/tag performance hack added in
commit 2328478 (".github/zephyr.yml: fix tags missing from `git -C
zephyr/ describe`") which worked for an amazingly long time (1.5 year)
but apparently ran its course. Git version 2.48 apparently does not like
it anymore. Replace it with something slower but simpler and safer.

Should fix build reproducibility issue #9797, much more details there.

Also fixes commit 4bc6488 (".github/zephyr: de-hardcode the name of
the zephyr remote")

Signed-off-by: Marc Herbert <[email protected]>
kv2019i pushed a commit that referenced this issue Jan 30, 2025
More progress towards trimming down pull-request.yml

Important benefit and "secret" agenda: remove the constantly failing
sof-docs from daily builds, which will make them green again.

Green daily builds are important to quickly spot regressions like for
instance #9797

Signed-off-by: Marc Herbert <[email protected]>
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jan 30, 2025

Closed via #9801 . Thanks @marc-hb !

@kv2019i kv2019i closed this as completed Jan 30, 2025
@marc-hb
Copy link
Collaborator

marc-hb commented Jan 31, 2025

After dropping sof-docs from them, daily tests are back to green!

https://github.com/thesofproject/sof/actions/workflows/daily-tests.yml

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected Known PR Failures Issues seen in SOF github pull-request checks P3 Low-impact bugs or features
Projects
None yet
Development

No branches or pull requests

5 participants