bug: StatefulSets are not reevaluated after rescheduling #67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Noe caches the Pods it has already processed to avoid reprocessing
them unless there are changes. We do this by storing the
NamespacedName
of the Pod, which includes the namespace and Podname.
For Deployments, rescheduling the Pod actually changes its name as the
termination is generated. For StatefulSets, the name is kept the same
no matter how many reschedules happen.
Our caching approach prevents us form reevaluating the location of
StatefulSet pods if they were previously scheduled, and we've seen
this allocate StatefulSet pods to nodes whose architectures where not
supported by the Pod.
Bugfix approach
The intention of the cache is to reduce load over Registries and Noe
itself. In our case so far StatefulSets are much less common than
Deployments, so we choose to avoid the cache for StatefulSets.
This is a stopgap measure to prevent misscheduling of current
workloads until a better approach to prevent reprocessing of Pods can
be devised that supports StatefulSets.
What we change
We add a
skipCache(*v1.Pod)
function that checks if the Pod is ownedby a StatefulSet. If so, the caching mechanism is skipped for this
workload.