-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CI stability #1843
Improve CI stability #1843
Conversation
What is the "timer sync issue" ? Can you add a reference to the original bug report please ? |
Ah, #1834 has some background |
Yes sorry, this was just an attempt to verify if this parameter was solving the flaky CI tests. The thing is that this seams to be an issue only appearing in MacOS runners, hence likely to be a problem with hvf and qemu. It is pretty annoying as pretty often we get a kernel panic a boot... 😞 We haven't ever seen it in any other env beyond MacOS github runners. So we actually need to make a PR to check if the issue is solved. |
a49ba5b
to
73b619d
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1843 +/- ##
==========================================
+ Coverage 75.33% 75.34% +0.01%
==========================================
Files 67 67
Lines 6814 6818 +4
==========================================
+ Hits 5133 5137 +4
Misses 1311 1311
Partials 370 370
☔ View full report in Codecov by Sentry. |
bb42655
to
c0f8fc2
Compare
24a8842
to
9b441ee
Compare
Still a draft as it builds on top of rancher-sandbox/ele-testhelpers#38, hence this should be merged first and then readapt go.mod on this project again. |
@@ -157,7 +157,6 @@ jobs: | |||
- if: ${{ env.ARCH == 'x86_64' }} | |||
name: Run VM script dependencies | |||
run: | | |||
brew update; brew upgrade qemu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this was added since there was an update to qemu which was not picked up by our runner, which broke the build... Probably not needed now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I guess we were suffering MacOS runner upgrades. I remember there used to be a very old qemu version when we first started using it, now I see it comes with Qemu v8 by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! 🚀
9b441ee
to
aeffeec
Compare
@@ -157,7 +157,6 @@ jobs: | |||
- if: ${{ env.ARCH == 'x86_64' }} | |||
name: Run VM script dependencies | |||
run: | | |||
brew update; brew upgrade qemu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I guess we were suffering MacOS runner upgrades. I remember there used to be a very old qemu version when we first started using it, now I see it comes with Qemu v8 by default.
@@ -6,19 +6,20 @@ SCRIPT=$(realpath -s "${0}") | |||
SCRIPTS_PATH=$(dirname "${SCRIPT}") | |||
TESTS_PATH=$(realpath -s "${SCRIPTS_PATH}/../tests") | |||
|
|||
: "${ELMNTL_PREFIX:=}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a prefix to the files so we could consider at some point running in parallel multiple tests on the same machine without overlapping files.
@@ -67,12 +68,12 @@ function start { | |||
;; | |||
esac | |||
|
|||
[ "hvf" == "${ELMNTL_ACCEL}" ] && accel_arg="-accel ${ELMNTL_ACCEL}" && firmware_arg="-bios ${ELMNTL_FIRMWARE} ${firmware_arg}" | |||
[ "hvf" == "${ELMNTL_ACCEL}" ] && accel_arg="-accel ${ELMNTL_ACCEL}" && firmware_arg="-bios ${ELMNTL_FIRMWARE} ${firmware_arg}" && cpu_arg="-cpu max,-pdpe1gb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this -pdpe1gb
flag surfing the net... Apparently this disables hugepages and for some reason this is not supported in MacOS 🤷🏽♂️ I do believe this is the real change that fixes flakytests.
$(GINKGO) run $(GINKGO_ARGS) ./tests/installer | ||
$(GINKGO) run $(GINKGO_ARGS) ./tests/smoke | ||
VM_PID=$$(scripts/run_vm.sh vmpid) go run $(GINKGO) $(GINKGO_ARGS) ./tests/installer | ||
VM_PID=$$(scripts/run_vm.sh vmpid) go run $(GINKGO) $(GINKGO_ARGS) ./tests/smoke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably there are more elegant ways pass the VM pid to the test code, but so far I opted for this simple solution. This is not strictly required and can be omitted on local runs.
Adds several improvements: * Check VM pid on 'EventuallyConnects'. So it does not wait for a command to succeed if the underlaying VM crashed and it is not running. * Does not use '-daemonize' flag of qemu, now it simply runs on the background and the stdout and stderr are redirected to vmstdout file. * Does not install QEMU, the runner already has a recent QEMU version installed. This saves several minutes on each macos job. * Fixes some of the stability issues on macOS by disabling hugepages on the kernel. This is not supported on macOS. Signed-off-by: David Cassany <[email protected]>
aeffeec
to
f973c2b
Compare
This is brute force attempt to check if adding this simple no_timer_check kernel parameter is solving the tests flakiness.[EDIT]
Adds several improvements:
Check VM pid on 'EventuallyConnects'. So it does not wait for a
command to succeed if the underlaying VM crashed and it is not
running.
Does not use '-daemonize' flag of qemu, now it simply runs on the
background and the stdout and stderr are redirected to vmstdout file.
Does not install QEMU, the runner already has a recent QEMU version
installed. This saves several minutes on each macos job.
Fixes some of the stability issues on macOS by disabling hugepages on
the kernel. This is not supported on macOS.