-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KaaS robustness feature tests #714
base: main
Are you sure you want to change the base?
Conversation
4e8fc4d
to
6d98860
Compare
5c2f787
to
d0c4d95
Compare
There is a positive and a negative test case.
…axMutatingRequestInflight" and "minRequestTimeout".
d0c4d95
to
cbcca65
Compare
For reference, here the successful test logs of sonobuoy: cat results/plugins/scs-kaas-conformance/sonobuoy_results.yaml | yq
[Displaying results...] |
In order to make the tests pass on your K8s cluster, you would need to apply the following configurations:
Location: Apply via kubectl |
For reference, I used a self configured KubeAdm cluster to develop those tests. |
Impressive! I'm not sure I am competent to review it, but I will give it a shot. About these preconditions, wouldn't it be good to put them into a 'Testing and implementation notes' supplement? This can happen within this same PR. |
Impressive again! Just for increased safety, could you please also test on moin once we have the necessary permissions? |
I talked about including the configurations with @tonifinger. We came to the same conclusion. Also, my guess is that there will be more configuration snippets from the other tested features in other PRs. |
Sure, I can do that. |
t.Errorf("Required setting %s not found in API server configuration", setting) | ||
} | ||
} | ||
if !foundSettings["EventRateLimit"] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this check also be carried out in the conditional statement in line 67? I think the error can already be logged there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, I changed that.
config, err := clientset.CoreV1().ConfigMaps(loc.namespace).Get(context.Background(), loc.name, metav1.GetOptions{}) | ||
if err == nil { | ||
if data, ok := config.Data[loc.key]; ok { | ||
if strings.Contains(data, "eventratelimit.admission.k8s.io") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from the test in Test_scs_0215_requestLimits()
Isn't this check already handled on this line L67
On second though, if this does the same check, I think it would be better to only handle it here in Test_scs_0215_minRequestTimeout()
, as this is the testfunction related to "EventRateLimit"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are different tests. The first test checks if the EventRateLimit is enabled in the API server command line flags, while the second test specifically looks for EventRateLimit configuration in ConfigMaps. The second test is more thorough as it searches multiple locations for the actual configuration details.
The first test only verifies the admission plugin is enabled, while the second test verifies the configuration exists and is properly set up.
} | ||
|
||
if isKindCluster(clientset) { | ||
t.Skip("Running on kind cluster - skipping APF test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must raise an error as well. Otherwise this will be unnoticed by the scs-test-runner.py
in case someone does run this the testsuite against a kind cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about the "skip tests if kind cluster" topic again. I tend to exclude those skipping statements. The tests should fail if the cluster cannot support the test features, that is the whole purpose of the tests.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. we should not allow to skip any tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change that
} | ||
|
||
if isKindCluster(clientset) { | ||
t.Skip("Running on kind cluster - skipping rate limit values test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above and to all other t.Skip
related conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
for k, v := range expectedValues { | ||
if !strings.Contains(config, fmt.Sprintf("%s: %s", k, v)) { | ||
allFound = false | ||
break | ||
} | ||
} | ||
if allFound { | ||
return | ||
} | ||
} | ||
} | ||
|
||
t.Error("Recommended rate limit values (qps: 5000, burst: 20000) not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the standards, these values are described as RECOMMENDED and furthermore “SHOULD be adapted to the needs of the environment and the expected load”.
We should therefore not regard the values described in the standard as fixed values. Rather, we should check whether we meet them as minimum requirements.
See: ../scs-0215-v1-robustness-features.md#kube-api-rate-limiting-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right here, I have overseen that it is just recommended. However, for that reason I exclude the test for now. Maybe some test for this could be added in the future if needed. The overall check for the presence of event rate limits is there.
} | ||
|
||
if isKindCluster(clientset) { | ||
t.Skip("Running on kind cluster - skipping etcd backup test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
LabelSelector: "component=etcd", | ||
}) | ||
if err != nil || len(pods.Items) == 0 { | ||
t.Skip("No etcd pods found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above this must throw an error as well. We currently don't consider someone using something else as etcd for k8s.
If there is the need to use something else then etcd
the standard itself needs to be updated first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
This PR will add tests for the K8s cluster robustness features defined in the scs standard: scs-0215-v1-robustness-features
Here is a detailed listing of what is tested:
SCS-0215-v1 Robustness Features Test Coverage
1. API Server Rate Limiting
Test_scs_0215_requestLimits
Verifies basic request limit configurations
Checks API server configuration for required settings
Test_scs_0215_minRequestTimeout
Validates min-request-timeout setting
Checks configuration in API server args
Test_scs_0215_eventRateLimit
Confirms EventRateLimit admission controller configuration
Verifies plugin is enabled in API server
Test_scs_0215_apiPriorityAndFairness
Checks APF feature gate enablement
Validates API server configuration for priority and fairness
Test_scs_0215_rateLimitValues
Verifies specific rate limit values
Checks recommended settings:
QPS: 5000
Burst: 20000
2. etcd Management
Test_scs_0215_etcdCompaction
Validates compaction configuration:
Mode: periodic
Retention: 8h
Test_scs_0215_etcdBackup
Verifies backup CronJobs setup
Checks backup configuration:
Hourly backups
Daily backups
Proper paths and schedules
3. Certificate Management
Test_scs_0215_certificateRotation
Check_Certificate_Rotation_Configuration:
Verifies kubelet certificate rotation settings
Validates serverTLSBootstrap and rotateCertificates
Check_Certificate_Controller:
Confirms cert-manager deployment
Validates certificate controller functionality