-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design-proposal: Feature lifecycle #251
Conversation
Skipping CI for Draft Pull Request. |
/sig api |
@EdDev: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
## Implementation Phases | ||
N/A | ||
|
||
## Open Issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional possible open issues for discussion:
- Can Alpha features get introduced in z versions?
- CI & e2e restrictions:
- Run e2e tests of Alpha features in their own separate job on the latest k8s version, non gating.
- Monitor Alpha features e2e execution results and use the data as a gating (e.g. 95% success rate) to pass it to Beta stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you inline these questions? (and preferably answer them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to hear some feedback and opinions on these points before in-lining the answers.
Actually, once we do have a partial agreement, I will remove it from here and add it in a more appropriate location above.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alpha in zstreams: No. zstreams are just for stability. Exceptions confirm the norm.
Blocking CI for Alpha: No. Alpha fetaures should not gate main. But it sounds like Beta features should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
This is the first opportunity to evaluate a new feature. | ||
The proposal needs to include motivation, goals, implementation details | ||
and phases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we could point to the design template we already have in this repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few English suggestions and technical questions. Thank you very much for taking the time to codify what it means to add a feature to KubeVirt.
- As a Kubevirt cluster operator, I would like to experiment with | ||
an undergraduate feature. | ||
- As a Kubevirt cluster operator, I would like to evaluate an | ||
undergraduate feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the big difference between "evaluate" and "experiment"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experiment: Testing something in a controlled env, usually something new, to see if it makes sense.
Evaluation: Trying something out in a closer to reality scenario, possibly with real users.
The first is Alpha, the seconds is Beta.
If you can suggest something clearer, it will be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to inline your text in the use cases. It makes them clearer and distinguishable.
- As a Kubevirt cluster operator, I would like to keep using a feature | ||
that got graduated after I used it during the evaluation period. | ||
- As a Kubevirt cluster operator, I would like to know that | ||
an undergraduate feature is planned to be discontinued. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an undergraduate feature is planned to be discontinued. | |
a beta feature is planned to be discontinued. |
Can we use the common "beta" term? I am familiar with "undergraduate" only in the context of academy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW beta is also inprecise, as an alpha feature could also be discontinued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-graduate == undergraduate == Alpha& Beta
### Deprecation | ||
One reason for features to go through the Alpha and Beta stages, | ||
is the opportunity to examine its usefulness and adoption by users. | ||
Same goes with major releases that intentionally break backward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the logic between the former sentence and this one. Do both have the same reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Edy is referring to the fact that in SEMVER a new major version permits API breakages i.e. moving from v1 to v2
If this is the case, then I'd reference to semver here specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both try to explain that the stages we pass through are for checking if we really need it.
Both feature stages and the semver major version bumps allow this checking to take place. In the sense that if it is found unneeded, it can be dropped.
I tried to add a small fix to the second sentence and mentioned semver.
## Implementation Phases | ||
N/A | ||
|
||
## Open Issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you inline these questions? (and preferably answer them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
Just to make sure that all discussion converges.
## Summary | ||
|
||
Kubevirt requires a process with a clear policy on how features | ||
are introduced, evaluated and finally graduated or dropped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it include deprecation?
are introduced, evaluated, graduated or dropped, and deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is referred to as the dropped
part.
maintainers. | ||
|
||
## Implementation Phases | ||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have two items here:
- Enhancement proposals to include the graduation stage Alpha/Beta/GA, similar to kep.yaml i.e. https://github.com/kubernetes/enhancements/pull/3957/files#diff-f3e0c3877ff3b24f0e8de858c08ffed555efdba17357275f04ac38c3cabf947d
- Add a checklist to the enhcamenet proposal template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can add a required section in the template and describe the needed data required.
I am not sure what you meant regarding the checklist.
Makes also sense to reference from the proposal template to this document, assuring it is clear what is expected from contributors.
Another item can be to make this information visible to users, with focus on consuming alpha & beta features.
I am adding the ones I understand for now.
@EdDev gentle ping |
I had to finish other tasks, but now I'm focus on it. Looking to finish going over all the feedback and respond today. |
2150894
to
37c1a75
Compare
change: Answered comments. |
Done. |
3. Release as Alpha (experimental). | ||
4. Release as Beta (pre-release for evaluation). | ||
5. Release as GA (graduation). | ||
6. Discontinuation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense to me, but it's better be part of the text. I think that before discontinuation, there should be a stage of deprecation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed review @dankenigsberg .
- As a KubeVirt cluster operator, I would like to evaluate an | ||
undergraduate feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
3. Release as Alpha (experimental). | ||
4. Release as Beta (pre-release for evaluation). | ||
5. Release as GA (graduation). | ||
6. Discontinuation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The details on how to discontinue a feature is describe in detail later in this document.
At this location, a more abstract view of the feature stages is given, I think that adding details about "how" is less useful.
…Gate as Beta from 1.2.0 With the feature lifecycle discussion [1] heading towards consensus this change retrospectively moves the CommonInstancetypesDeploymentGate to Beta from 1.2.0 with an eye on moving to GA and removal of the FG by 1.4.0. This is done retrospectively as the FG itself is now enabled by default by HCO with KubeVirt 1.2 and as such is already meeting the criteria of the beta phase defined by the feature lifecycle document. As this is purely documentation the impact of this change is zero and just helps us move the process forward in the coming 1.3 and 1.4 releases. [1] kubevirt/community#251 Signed-off-by: Lee Yarwood <[email protected]>
+1 Maybe somewhere we want to note: It does not have to be perfect, but better than before. @EdDev thanks for taking this journey. Please look at converging the opern conversations, and then it seems like there is much support for this proposal. I do think that we should look at some things async
|
|----------------|-----------------|--------|----------------------------| | ||
| Alpha | 1 to 2 releases | YES | Between **minor** releases | | ||
| Beta | 1 to 3 releases | YES | Between **minor** releases | | ||
| GA | - | NO | Between **major** releases | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some features may need to keep their feature gates. The default, however, would need to be set to the relevant value. If you look at k8s, there are many features that are already GA'ed but continue to have a feature gate.
This is needed for the cluster admin to control which functionality is provided in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, once K8S declares a feature as GA, the FG is hard-coded enabled, with no option to disable it.
The only reason the fields still exists, is to allow downgrades and keep the DB intact and not introduce in the future a new FG with the same name.
Ref: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#feature-stages
A General Availability (GA) feature is also referred to as a stable feature. It means:
- The feature is always enabled; you cannot disable it.
- The corresponding feature gate is no longer needed.
- Stable versions of features will appear in released software for many subsequent versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CPUManager can be disabled afaik - just one example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... I missed the The feature is always enabled; you cannot disable it.
Not sure why some of the features can still be disabled that way.
I think the main issue for me is that, unlike k8s features, most of KubeVirt's features don't have a secondary config option to disable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a blocker from my side but an area of concern. We should be able to provide a way for admins to disable a functionality. One of the well known requests is for the admin to disable all VFIO related functionality, which isn't a single feature per se.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the concerns raised here, but I feel we are in a deadlock.
The proposal is based on the need to stop queuing FGs indefinitely and with the understanding that the current observation of what a FG is in Kubevirt [1] is well known and understood.
What you are attempting to argue here is that the current FG definition needs to change, together with it API (that I honestly to not understand how you can change).
To me its sounds like another proposal.
It makes sense to defer things to a follow up discussion, but in this case, I do not understand how we can do it without breaking this whole thing.
The maintainers of the project have the power to use exceptions to keep FGs forever and redefining what it means in parallel. I may not agree with what you are trying to do, but at least when the FG is redefined, this document can be adjusted accordingly.
I will need help to see how this suggestion can be done and at the same time keep all the rest of this proposal relevant to the goals set.
[1] https://kubevirt.io/user-guide/operations/activating_feature_gates/#activating-feature-gates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all of the above I'd suggest replacing NO with Optional and moving this discussion into a new document/PR specific to FGs in KubeVirt to unblock this.
@lyarwood @vladikr IIUIC then the friction arises from the question around todays FGs, is this correct?
Assumgin Yes to my question above, then a few notes:
- This proposal is about our future desired state.
- Existing FGs should be untouched by this proposal, but can be covered by follow up work. AKA we should not block on the current state
- FGs are about declaring a feature as stable and always available.
- NOT every feature requires a configuration. We can simply say that some feature (like LiveMigration) is turned on by default without configuration (enable/disable). Optionally we can say that we want to add configuration to enable/disable it, or configure some detail.
IOW
- Removing - or having a FG unrevokably enabled - means that we as KubeVirt consider this feature to be useable for all.
- If we think that the feature should not always be used, then can add a configuration.
Decision diagram wise it would look like this:
flowchart TD
Start --> QCFG{Requires configuration?}
QCFG -- Yes --> AddConfig[Add configuration]
QCFG -- No --> QDEFAULT
AddConfig --> QDEFAULT
QDEFAULT{Should be default?}
QDEFAULT -- Yes --> MakeDefault[Make default] --> QOPTOUT
QDEFAULT -- No --> QOPTIN
QOPTIN{Requires Opt-In?} -- Yes --> AddOptIn[Add Opt-In] --> END
QOPTOUT{Requires Opt-Out?} -- Yes --> AddOptOut[Add Opt-Out] --> END
QOPTIN & QOPTOUT -- No --> END
END
This should all be decided before GA. Configuration
and Opt-In
/Opt-Out
can however even be added post-GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the concerns raised here, but I feel we are in a deadlock.
Yup apologies by attempting to break the deadlock here I've just made it worse. /o\
My intent with the Optional
suggestion was to defer making any concrete decisions on GA FGs until we also talk about reworking the current API but I see that's not helpful in moving this proposal forward.
Given the already included warning about FGs not replacing cluster configurables for features admins might need to configure and/or disable I'm personally fine keeping the requirement to drop new feature FGs as they graduate to GA for now.
@lyarwood @vladikr IIUIC then the friction arises from the question around todays FGs, is this correct?
The behaviour and IMHO abuse (using FGs to control feature configuration and not just visibility in the cluster) of the current implementation yes.
Assumgin Yes to my question above, then a few notes:
- This proposal is about our future desired state.
The future feature lifecycle given the current FG implementation yes.
- Existing FGs should be untouched by this proposal, but can be covered by follow up work. AKA we should not block on the current state
Yes that's my understanding.
- FGs are about declaring a feature as stable and always available.
They don't define a feature as stable or always available, quiet the opposite in fact given the current proposal.
- NOT every feature requires a configuration. We can simply say that some feature (like LiveMigration) is turned on by default without configuration (enable/disable). Optionally we can say that we want to add configuration to enable/disable it, or configure some detail.
IOW
- Removing - or having a FG unrevokably enabled - means that we as KubeVirt consider this feature to be useable for all.
- If we think that the feature should not always be used, then can add a configuration.
Yup agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't define a feature as stable or always available, quiet the opposite in fact given the current proposal.
Yes, bad wording on my side in that regards. You are right :)
Thus in general agremeent, that's good, thanks for the quick feedback.
Let's wait for Vlaidk, and others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on @vladikr 's concerns.
The main concern (Vladik, kepe me honest) is that there is a technical redundancy to toggle a feature.
A FG is technically allowing an admin to enable or disable a feature.
A enable/disable config item is also allowing an admin to enable or disable a feature.
Thus the question is: If we have the FG, why should we introduce a config toogle for the same functionality?
On the technical side there is not much that I have to say, maybe besides the fact that there are possibly slightly different nuances between "not present" and "disabled".
However, semantically I'd take the compromise of this technical redundancy in favor of a clear separation between
a) signaling a feature is stable (enabled by default, no FG)
b) admin is permitted to disable/enable certain features using a configuration
Thus even given the concerns, I'm advocating to move forward and keep this separation between feature life-cycle and feature configuration.
given a very good reasoning and an agreement from 2/3 of the project | ||
maintainers (also known as "approvers"). | ||
|
||
## Implementation Phases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to align this effort with an actual release cycle as we are trying to do with the new and slightly overlapping enhancement process?
IOW saying that new features landing in 1.4 will need to adhere to this lifecycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit the timeline once we have agreement on the proposal.
After all to me 1.4 so far looks like a reasonable goal.
|----------------|-----------------|--------|----------------------------| | ||
| Alpha | 1 to 2 releases | YES | Between **minor** releases | | ||
| Beta | 1 to 3 releases | YES | Between **minor** releases | | ||
| GA | - | NO | Between **major** releases | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all of the above I'd suggest replacing NO
with Optional
and moving this discussion into a new document/PR specific to FGs in KubeVirt to unblock this. Happy to help draft something in the coming weeks if folks agree.
@EdDev is the following flow correct: flowchart LR
Start --> Design
Design -- Accepted --> Implementation
Design -- Rejected --> END
Implementation --> QAlpha
QAlpha{Requires Alpha?}
QAlpha -- Yes --> Alpha
QAlpha -- No --> QBeta
QBeta{Requires Beta?}
QBeta -- Yes --> Beta
QBeta -- No --> GA
GA --> Maintenance
Maintenance --> QDeprecation
QDeprecation{Should be deprecated?}
QDeprecation -- Yes --> Deprecation --> Removal
Maintenance & Removal --> END
END
|
Nice. How about something like this: flowchart LR
Start --> Design
Design -- Accepted --> Implementation
Design -- Rejected --> END
Implementation --> QGA
QGA{Requires evaluation?}
QGA -- Yes --> QAlpha
QGA -- No --> GA
QAlpha{Requires Alpha?}
QAlpha -- Yes --> Alpha
QAlpha -- No --> Beta
Alpha -- Successful --> Beta
Alpha -- Aborted --> Deprecation
Beta -- Successful --> GA
Beta -- Aborted --> Deprecation
GA --> Maintenance
Maintenance -- Exception --> Deprecation
Deprecation --> Removal
Removal --> END
Maintenance --> Maintenance
END
|
Nice! Fixed some arrows and added new major api: flowchart LR
Start --> Design
Design -- Accepted --> Implementation
Design -- Rejected --> END
Implementation --> QGA
QGA{Requires evaluation?}
QGA -- Yes --> QAlpha
Beta -- Successful --> GA
QGA -- No --> GA
QAlpha{Requires Alpha?}
QAlpha -- Yes --> Alpha
Alpha -- Successful --> Beta
QAlpha -- No --> Beta
Alpha -- Aborted --> Deprecation
Beta -- Aborted --> Deprecation
GA --> Maintenance
Maintenance -- Exception --> Deprecation
Deprecation --> Removal
Removal --> END
Maintenance --> Maintenance
Maintenance -- New Major API --> Removal
END
|
Approvers, where do we stand? /lgtm |
Looks like no one else is objecting besides me, so let's proceed with this. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vladikr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Aren't you an approver @fabiand ? |
I am. But I was looking for a consensus and thus seeked for somebody else
to approve
|
@fabiand , do you think we have enough agreement to take this in? (we have other members & approvers lgtm it already) |
I was only looking for another approver to chime in. Vladik did so, thus: /unhold |
Folks, this is dope! :) @EdDev now that you have became a writer (the doc is so detailed and long) - How do you feel about 3 things
IOW we should make sure that people start to learn about this. |
…Gate as Beta from 1.2.0 With the feature lifecycle discussion [1] heading towards consensus this change retrospectively moves the CommonInstancetypesDeploymentGate to Beta from 1.2.0 with an eye on moving to GA and removal of the FG by 1.4.0. This is done retrospectively as the FG itself is now enabled by default by HCO with KubeVirt 1.2 and as such is already meeting the criteria of the beta phase defined by the feature lifecycle document. As this is purely documentation the impact of this change is zero and just helps us move the process forward in the coming 1.3 and 1.4 releases. [1] kubevirt/community#251 Signed-off-by: Lee Yarwood <[email protected]>
Kubevirt requires a process with a clear policy on how features are introduced, evaluated and finally graduated or dropped.
This proposal defines the steps and policies to follow in order to manage a feature (lifecycle) in Kubevirt.
This work is attempting to formalize various discussions held at different occasions in the last few years.
Mostly triggered by the observation that we collect unused features and frag them for long periods with the need to maintain them in the codebase and documentation.
A previous draft version of this content has been started by @iholder1011.
It is a bit different in the format and focus, but it tried to solve the same challenge.
There has also been a fruitful discussion in the new sig-api, on several API stability topics2, some with strong relation to the topics discussed in this proposal.
We should continue the discussion and evolve the proposal in order to agree on a formal policy.
I hope to see the big picture agreed on first, letting the details follow up immediately after (not necessarily in this proposal).
The feature lifecycle can be summarized with the following flowchart:
Footnotes
https://github.com/kubevirt/community/commit/ced467b79c034e719b2fc177b4125125b3871b12 ↩
https://github.com/kubevirt/community/pull/221 ↩