Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Version Testing Framework improvements #1224

Open
4 of 14 tasks
FedeDP opened this issue Jul 24, 2023 · 19 comments
Open
4 of 14 tasks

Kernel Version Testing Framework improvements #1224

FedeDP opened this issue Jul 24, 2023 · 19 comments
Labels
kind/feature New feature or request
Milestone

Comments

@FedeDP
Copy link
Contributor

FedeDP commented Jul 24, 2023

See #1191

These are some improvements that need to land for kernel version testing framework.

Needed before "v1":

v2 stuff

  • Switch to drivers_test executable instead of scap-open to also verify drivers correct behavior
  • Terraform for nodes deployment
  • Cache ignite root somehow (ie: only rebuild the ignite root used for the VMs when changes to dockerfiles are made); this would greatly speed up tests duration
  • attach the matrixes markdown to the github release page for new driver releases (new(ci): add a release-body CI for drivers releases. #1238)
  • upstream our ignite patch from https://github.com/therealbobo/ignite upstream project is archived
  • avoid using any weaveworks docker images as weaveworks is shutting down:
    • weaveworks/ignite-kernel:5.14.16
    • weaveworks/ubuntu-kernel:5.14.16

Future ideas

  • Automatically fetch needed info (kernel images, modules and so on) from kernel-crawler
  • Automatically build input test matrix (ie: list of images to be tested) given weekly kernel-crawler output (ie: add eg: 1 image per each crawled distro each week, enlarging our input test matrix)
  • make ignite concurrent (right now it does not support concurrent runs at all, preventing us to add kernel tests to PR ci)
@FedeDP FedeDP added the kind/feature New feature or request label Jul 24, 2023
@FedeDP
Copy link
Contributor Author

FedeDP commented Jul 27, 2023

Copy all images used by the matrix (ie: https://github.com/falcosecurity/kernel-testing/blob/main/ansible-playbooks/group_vars/all/vars.yml#L18) under the falcosecurity dockerhub repo

Coolest thing we can do is to add a CI on kernel-testing repo to automatically push images to ghcr if needed after a new release.
Right now, it is a bit hard because we haven't got any access to the arm64 node used for kernel-testing (it's self-hosted runner is linked to the libs repo), thus we are not able to build and push arm64 images natively.
And pushing 6 "big" images using QEMU is going to take hours and hours.

@incertum
Copy link
Contributor

Ideas for v3:

  • What is the response to a failed test? Since the CI tests use the optimal compiler version it means the distributed artifact is not working, do we try a different compiler version (likely more relevant for bpf drivers)? Something else?
  • More for us developers and maintainers: The locahost VM tests focus more on testing different compiler versions in addition to looping through a few kernels. Historically this has been valuable to spot possible regressions in particular in the bpf drivers. It's related to the suggestion above.

@alacuku
Copy link
Member

alacuku commented Aug 1, 2023

Related to the CI that pushes the images, it would be nice to cache those images on the runner for both docker and ignite. That would speed up the testing process.

@FedeDP
Copy link
Contributor Author

FedeDP commented Aug 1, 2023

I think that it would actually just work ™️ if we use the same nodes to push images and run the tests, right?

@alacuku
Copy link
Member

alacuku commented Aug 1, 2023

For the docker images, the answer is yes, but we need to remove the one cached by ignite and import the new ones.

@FedeDP
Copy link
Contributor Author

FedeDP commented Aug 1, 2023

First drivers release with matrixes attached: https://github.com/falcosecurity/libs/releases/tag/5.1.0%2Bdriver

@Andreagit97 Andreagit97 added this to the TBD milestone Sep 4, 2023
@FedeDP
Copy link
Contributor Author

FedeDP commented Sep 6, 2023

Since ignite has been archived, we:

@poiana
Copy link
Contributor

poiana commented Dec 5, 2023

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor Author

FedeDP commented Dec 5, 2023

/remove-lifecycle stale

@FedeDP
Copy link
Contributor Author

FedeDP commented Feb 7, 2024

So, falcosecurity/kernel-testing#70 and falcosecurity/kernel-testing#74 were merged and we now have:

I am currently:

Then, we will need to either fork ignite and improve it to suit our needs, or switch to use flintlock or find something else; moreover, we also rely on weaveworks/ignite-kernel:5.14.16 as kernel image for builders; given that weaveworks is shutting down (https://news.ycombinator.com/item?id=39262650), we should probably either copy those images under falcosecurity or just use one of our kernel images.

@FedeDP
Copy link
Contributor Author

FedeDP commented Feb 8, 2024

Cache ignite root somehow (ie: only rebuild the ignite root used for the VMs when changes to dockerfiles are made); this would greatly speed up tests duration

Idea would be to let the kernel-testing repo access the cncf nodes, then:

  • the images would be built on the cncf nodes
  • main and release CI would avoid setting CLEANUP env, so that main and $tag images are already cached on the nodes
  • moreover, we could introduce a new playbook that creates ignite roots + one that cleans them, and call them in the release CI, like: ansible-playbook cleanup-roots.yml && ansible-playbook generate-roots.yml. We first cleanup existing roots, then generate the new one. After this, the main.yml should avoid deleting/generating the roots each time.

@poiana
Copy link
Contributor

poiana commented May 8, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor Author

FedeDP commented May 8, 2024

/remove-lifecycle stale

@FedeDP
Copy link
Contributor Author

FedeDP commented May 8, 2024

For caching, we could try to leverage actions/cache somehow; cache limits for github actions is 10GB that should be enough, possibly: https://github.com/actions/cache?tab=readme-ov-file#cache-limits

@incertum
Copy link
Contributor

incertum commented May 8, 2024

Just a quick additional note: @FedeDP I'll get back to trying to also integrate the vagrant test VM loop end of June as we previously discussed, just FYI. I'll ping you to get access to the servers then.

@poiana
Copy link
Contributor

poiana commented Aug 6, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Nov 4, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 4, 2024

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants