-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SEV_SNP instance type configuration #1410
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bgartzi The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @bgartzi! |
Hi @bgartzi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
8269aa6
to
5cbf79b
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass on the API design, left some comments
api/v1beta1/gcpmachine_types.go
Outdated
// ConfidentialVMTechSEV sets AMD SEV as the VM instance's confidential computing technology of choice. | ||
ConfidentialVMTechSEV ConfidentialVMTechnology = "sev" | ||
// ConfidentialVMTechSEVSNP sets AMD SEV-SNP as the VM instance's confidential computing technology of choice. | ||
ConfidentialVMTechSEVSNP ConfidentialVMTechnology = "sev-snp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to use the fully typed out version of this, so
ConfidentialVMTechnologySEV
and ConfidentialVMTechnologySEVSNP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adressed in the last pushed version.
api/v1beta1/gcpmachine_types.go
Outdated
@@ -339,10 +352,18 @@ type GCPMachineSpec struct { | |||
// ConfidentialCompute Defines whether the instance should have confidential compute enabled. | |||
// If enabled OnHostMaintenance is required to be set to "Terminate". | |||
// If omitted, the platform chooses a default, which is subject to change over time, currently that default is false. | |||
// If ConfidentialInstanceType is configured, even if ConfidentialCompute is Disabled, a confidential compute instance will be configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this condition, and considering what we stated in the API above, I'm not sure this is the best approach we could take.
Looking deeper at what we are trying to set here, it seems like we trying to build up the underlying GCP
compute.ConfidentialInstanceConfig
field in the compute/instance GCP SDK API.
Which is defined here: https://pkg.go.dev/google.golang.org/api/compute/v1#ConfidentialInstanceConfig
So we are trying to set these two fields:
type ConfidentialInstanceConfig struct {
// ConfidentialInstanceType: Defines the type of technology used by the
// confidential instance.
//
// Possible values:
// "CONFIDENTIAL_INSTANCE_TYPE_UNSPECIFIED" - No type specified. Do not use
// this value.
// "SEV" - AMD Secure Encrypted Virtualization.
// "SEV_SNP" - AMD Secure Encrypted Virtualization - Secure Nested Paging.
// "TDX" - Intel Trust Domain eXtension.
ConfidentialInstanceType [string](https://pkg.go.dev/builtin#string) `json:"confidentialInstanceType,omitempty"`
// EnableConfidentialCompute: Defines whether the instance should have
// confidential compute enabled.
EnableConfidentialCompute [bool](https://pkg.go.dev/builtin#bool) `json:"enableConfidentialCompute,omitempty"`
[...]
}
and since I don't see any mention on the ConfidentialInstanceType
taking precedence over EnableConfidentialCompute
, I assume the latter takes the overall precedence and flicks the switch on whether or not the whole feature is enabled or not.
As such I think we should do the same.
- ConfidentialCompute = enabled, ConfidentialInstanceType not set => enable, default confidential instance type
- ConfidentialCompute = disabled, ConfidentialInstanceType not set => disable
- ConfidentialCompute = disabled, ConfidentialInstanceType set, fail webhooks validation (we could also have a discriminated union here I think, but it might break the existing API, so we could leverage the WH)
- ConfidentialCompute = unknown, ConfidentialInstanceType not set => disable
- ConfidentialCompute = unknown, ConfidentialInstanceType set => fail webhooks validation (as the default for unknown is disabled).
cc. @JoelSpeed for API expertise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Dam, if we already have an api that controls this enablement, having a second field that implicitly enables the feature is not good. It should be a validation error for the instance type to exist when confidential compute is disabled, as mentioned by Dam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with both of you. That would make more sense than the implementation I provided in this draft.
However, even if not documented explicitly that way, that's how the compute api behaves. I know after trying. I know it's a weak excuse, but I thought there might have been some reason for that, as it makes the former EnableConfidentialComputing
somewhat redundant.
That's why I decided to propose this as the first draft; an API that behaved as the compute api did, and decide if the true-to-the-backend or behavior we all would expect was the correct one.
Now, if we all agree on the proposed implementation instead, I will provide the changes, I hope that soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bgartzi thanks for providing details on how the GCP compute API behaves.
I still think that our API should behave the way we all expect it to in this case, even if the GCP API doesn't. Our won't be surprising as users will be prompted by webhook failures that inform them why it didn't go through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated the API design and the webhook implementation (+tests) according to the design agreed in the comments above.
// confidentialInstanceType determines the required type of confidential computing technology. | ||
// confidentialInstanceType will precede confidentialCompute. That is, if confidentialCompute is "Disabled" but a valid confidentialInstanceType is specified, a confidential instance will be configured. | ||
// If confidentialInstanceType isn't set and confidentialCompute is "Enabled" the platform will set the default, which is subject to change over time. Currently the default is "sev" for "c2d", "c3d", and "n2d" machineTypes. For the other machine cases, a valid confidentialInstanceType must be specified. | ||
// +kubebuilder:validation:Enum=sev;sev-snp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kube conventions state that enum values should be PascalCase. What does sev and sev-snp actually mean? These aren't particularly expressive and might be hard for a consumer of this API to understand which value they may want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, I also agree with you on this: those names aren't expressive at all. They are acronyms that stand for AMD's "Secure Encrypted Virtualization" [0] and "Secure Encrypted Virtualization - Secure Nested Paging" [1].
However, using the short acronyms instead seem to be a standard. One example that comes in handy is the compute API documentation and supported values for the confidential instance type field [2] (as referenced in the comment above). Even after reading the GCloud documentation[3], users might expect to find SEV/SEVSNP? I'm not personally sure.
Checking on how acronyms are written in pascalcase, would Sev
/SevSnp
+ an expansion on the documentation work for you? Or would you rather go for the whole SecureEncryptedVirtualization
/Secure EncryptedVirtualizationSecureNestedPaging
combo?
[0] https://www.amd.com/es/developer/sev.html
[1] https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white-papers/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
[2] https://pkg.go.dev/google.golang.org/api/compute/v1#ConfidentialInstanceConfig
[3] https://cloud.google.com/confidential-computing/confidential-vm/docs/confidential-vm-overview
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about renaming the field secureInstanceEncryptionType
and having values of Virtualization
and VirtualizationWithNestedPaging
? They would be fairly descriptive and flow right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding secureInstanceEncryptionType
, I think it would be harder from users coming from the GCloud documentation to find the proper field name, and vice versa.
In terms of the supported values, the problem I see on mentioning virtualization is that most of the proposed confidential computing technologies are based on virtualization (probably all except for intel's sgx?). As these are AMD's solutions, intel has their own (TDX) and other architectures as ARM (CCA), also do (although it hasn't made it to GCP yet). Yes, Virtualization
is only referred from SEV's name currently. But I think calling it Virtualization
could also bring some confusion, as it could refer to AMD's SEV as to any other confidential computing solution.
Although cryptic, these names are basically trademarks, so I think it's way to make sure they will exclude each other. I also have to admit I'm for sure biased as those are the ways I see these technologies are referenced in documentation/news and that I don't have many new ideas to propose at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this a little bit more, what do you think about keeping confidentialInstanceType
and admitting the following values:
EncryptedVirtualizationSev
EncryptedNestedPagingSevsnp
And possibly upcoming ones:
TrustedDomainTdx
A bit too repetitive perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't addressed this in the last force-pushed version of the patch. I will be happy to do so when we reach an agreement.
api/v1beta1/gcpmachine_webhook.go
Outdated
@@ -108,15 +108,32 @@ func (m *GCPMachine) Default() { | |||
clusterlog.Info("default", "name", m.Name) | |||
} | |||
|
|||
func targetConfidentialType(tech *ConfidentialVMTechnology) (ConfidentialVMTechnology, error) { | |||
if tech == nil || tech != nil && *tech == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bgartzi for adding SEV_SNP support.
I think this can be shorten to: if tech == nil || *tech == "" {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @uril! Well noted. I will apply this on the next proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't address it (explicitly) as the API design change made update this piece of code. However, I took it into account in the new conditions I had to write, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damdo, @JoelSpeed, @uril thanks a lot for the quick review! I will try to update the PR as soon as I confirm some of the questions.
api/v1beta1/gcpmachine_types.go
Outdated
@@ -339,10 +352,18 @@ type GCPMachineSpec struct { | |||
// ConfidentialCompute Defines whether the instance should have confidential compute enabled. | |||
// If enabled OnHostMaintenance is required to be set to "Terminate". | |||
// If omitted, the platform chooses a default, which is subject to change over time, currently that default is false. | |||
// If ConfidentialInstanceType is configured, even if ConfidentialCompute is Disabled, a confidential compute instance will be configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with both of you. That would make more sense than the implementation I provided in this draft.
However, even if not documented explicitly that way, that's how the compute api behaves. I know after trying. I know it's a weak excuse, but I thought there might have been some reason for that, as it makes the former EnableConfidentialComputing
somewhat redundant.
That's why I decided to propose this as the first draft; an API that behaved as the compute api did, and decide if the true-to-the-backend or behavior we all would expect was the correct one.
Now, if we all agree on the proposed implementation instead, I will provide the changes, I hope that soon.
// confidentialInstanceType determines the required type of confidential computing technology. | ||
// confidentialInstanceType will precede confidentialCompute. That is, if confidentialCompute is "Disabled" but a valid confidentialInstanceType is specified, a confidential instance will be configured. | ||
// If confidentialInstanceType isn't set and confidentialCompute is "Enabled" the platform will set the default, which is subject to change over time. Currently the default is "sev" for "c2d", "c3d", and "n2d" machineTypes. For the other machine cases, a valid confidentialInstanceType must be specified. | ||
// +kubebuilder:validation:Enum=sev;sev-snp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, I also agree with you on this: those names aren't expressive at all. They are acronyms that stand for AMD's "Secure Encrypted Virtualization" [0] and "Secure Encrypted Virtualization - Secure Nested Paging" [1].
However, using the short acronyms instead seem to be a standard. One example that comes in handy is the compute API documentation and supported values for the confidential instance type field [2] (as referenced in the comment above). Even after reading the GCloud documentation[3], users might expect to find SEV/SEVSNP? I'm not personally sure.
Checking on how acronyms are written in pascalcase, would Sev
/SevSnp
+ an expansion on the documentation work for you? Or would you rather go for the whole SecureEncryptedVirtualization
/Secure EncryptedVirtualizationSecureNestedPaging
combo?
[0] https://www.amd.com/es/developer/sev.html
[1] https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white-papers/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
[2] https://pkg.go.dev/google.golang.org/api/compute/v1#ConfidentialInstanceConfig
[3] https://cloud.google.com/confidential-computing/confidential-vm/docs/confidential-vm-overview
api/v1beta1/gcpmachine_types.go
Outdated
// ConfidentialVMTechSEV sets AMD SEV as the VM instance's confidential computing technology of choice. | ||
ConfidentialVMTechSEV ConfidentialVMTechnology = "sev" | ||
// ConfidentialVMTechSEVSNP sets AMD SEV-SNP as the VM instance's confidential computing technology of choice. | ||
ConfidentialVMTechSEVSNP ConfidentialVMTechnology = "sev-snp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
api/v1beta1/gcpmachine_webhook.go
Outdated
@@ -108,15 +108,32 @@ func (m *GCPMachine) Default() { | |||
clusterlog.Info("default", "name", m.Name) | |||
} | |||
|
|||
func targetConfidentialType(tech *ConfidentialVMTechnology) (ConfidentialVMTechnology, error) { | |||
if tech == nil || tech != nil && *tech == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @uril! Well noted. I will apply this on the next proposal.
Only SEV machines could be configured by using the former confidentialCompute Enabled/Disabled. GCP allows now to also configure the confidential instance type as well by using the appropriate parameter, see [0]. This commit introduces confidentialInstanceType, which lets users choose between sev or sev-snp as their confidential computing technology of choice. The reason confidentialInstanceType will preceed confidentialCompute is to mimic GCP's behavior, and ensuring backwards compatibility. Meanwhile, add c3d as a machine that supports AMD SEV. [0] https://cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
/kind feature
What this PR does / why we need it:
Only SEV machines could be configured by using the former
confidentialCompute Enabled/Disabled. GCP allows now to also configure
the confidential instance type as well by using the appropriate
parameter, see [0].
This commit introduces confidentialInstanceType, which lets users choose
between sev or sev-snp as their confidential computing technology of
choice.
Meanwhile, add c3d as a machine that supports AMD SEV.
[0] https://cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1402
Special notes for your reviewer:
confidentialInstanceType
overridesconfidentialCompute
. This was due to these reasons:Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: