Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SEV_SNP instance type configuration #1410

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bgartzi
Copy link

@bgartzi bgartzi commented Jan 23, 2025

/kind feature

What this PR does / why we need it:
Only SEV machines could be configured by using the former
confidentialCompute Enabled/Disabled. GCP allows now to also configure
the confidential instance type as well by using the appropriate
parameter, see [0].

This commit introduces confidentialInstanceType, which lets users choose
between sev or sev-snp as their confidential computing technology of
choice.

Meanwhile, add c3d as a machine that supports AMD SEV.

[0] https://cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1402

Special notes for your reviewer:
confidentialInstanceType overrides confidentialCompute. This was due to these reasons:

  • Backwards compatibility.
  • Imitating the gcp compute API behavior.

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add confidentialInstanceType to the API

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 23, 2025
Copy link

linux-foundation-easycla bot commented Jan 23, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: bgartzi / name: Beñat Gartzia (ff8cf8e)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bgartzi
Once this PR has been reviewed and has the lgtm label, please assign chrischdi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @bgartzi!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-gcp 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-gcp has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot requested review from damdo and dims January 23, 2025 09:24
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 23, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @bgartzi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 23, 2025
Copy link

netlify bot commented Jan 23, 2025

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit ff8cf8e
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-gcp/deploys/679ba30bfa9b0700084c6dd0
😎 Deploy Preview https://deploy-preview-1410--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 23, 2025
@damdo
Copy link
Member

damdo commented Jan 23, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 23, 2025
@bgartzi bgartzi force-pushed the gcp-sev_snp branch 2 times, most recently from 8269aa6 to 5cbf79b Compare January 23, 2025 15:46
@bgartzi
Copy link
Author

bgartzi commented Jan 23, 2025

/retest

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass on the API design, left some comments

Comment on lines 144 to 147
// ConfidentialVMTechSEV sets AMD SEV as the VM instance's confidential computing technology of choice.
ConfidentialVMTechSEV ConfidentialVMTechnology = "sev"
// ConfidentialVMTechSEVSNP sets AMD SEV-SNP as the VM instance's confidential computing technology of choice.
ConfidentialVMTechSEVSNP ConfidentialVMTechnology = "sev-snp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use the fully typed out version of this, so
ConfidentialVMTechnologySEV and ConfidentialVMTechnologySEVSNP

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed in the last pushed version.

@@ -339,10 +352,18 @@ type GCPMachineSpec struct {
// ConfidentialCompute Defines whether the instance should have confidential compute enabled.
// If enabled OnHostMaintenance is required to be set to "Terminate".
// If omitted, the platform chooses a default, which is subject to change over time, currently that default is false.
// If ConfidentialInstanceType is configured, even if ConfidentialCompute is Disabled, a confidential compute instance will be configured.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this condition, and considering what we stated in the API above, I'm not sure this is the best approach we could take.

Looking deeper at what we are trying to set here, it seems like we trying to build up the underlying GCP
compute.ConfidentialInstanceConfig field in the compute/instance GCP SDK API.

Which is defined here: https://pkg.go.dev/google.golang.org/api/compute/v1#ConfidentialInstanceConfig

So we are trying to set these two fields:

type ConfidentialInstanceConfig struct {
// ConfidentialInstanceType: Defines the type of technology used by the
	// confidential instance.
	//
	// Possible values:
	//   "CONFIDENTIAL_INSTANCE_TYPE_UNSPECIFIED" - No type specified. Do not use
	// this value.
	//   "SEV" - AMD Secure Encrypted Virtualization.
	//   "SEV_SNP" - AMD Secure Encrypted Virtualization - Secure Nested Paging.
	//   "TDX" - Intel Trust Domain eXtension.
	ConfidentialInstanceType [string](https://pkg.go.dev/builtin#string) `json:"confidentialInstanceType,omitempty"`
	// EnableConfidentialCompute: Defines whether the instance should have
	// confidential compute enabled.
	EnableConfidentialCompute [bool](https://pkg.go.dev/builtin#bool) `json:"enableConfidentialCompute,omitempty"`
[...]
}

and since I don't see any mention on the ConfidentialInstanceType taking precedence over EnableConfidentialCompute, I assume the latter takes the overall precedence and flicks the switch on whether or not the whole feature is enabled or not.

As such I think we should do the same.

  • ConfidentialCompute = enabled, ConfidentialInstanceType not set => enable, default confidential instance type
  • ConfidentialCompute = disabled, ConfidentialInstanceType not set => disable
  • ConfidentialCompute = disabled, ConfidentialInstanceType set, fail webhooks validation (we could also have a discriminated union here I think, but it might break the existing API, so we could leverage the WH)
  • ConfidentialCompute = unknown, ConfidentialInstanceType not set => disable
  • ConfidentialCompute = unknown, ConfidentialInstanceType set => fail webhooks validation (as the default for unknown is disabled).

cc. @JoelSpeed for API expertise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dam, if we already have an api that controls this enablement, having a second field that implicitly enables the feature is not good. It should be a validation error for the instance type to exist when confidential compute is disabled, as mentioned by Dam

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with both of you. That would make more sense than the implementation I provided in this draft.
However, even if not documented explicitly that way, that's how the compute api behaves. I know after trying. I know it's a weak excuse, but I thought there might have been some reason for that, as it makes the former EnableConfidentialComputing somewhat redundant.
That's why I decided to propose this as the first draft; an API that behaved as the compute api did, and decide if the true-to-the-backend or behavior we all would expect was the correct one.
Now, if we all agree on the proposed implementation instead, I will provide the changes, I hope that soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bgartzi thanks for providing details on how the GCP compute API behaves.
I still think that our API should behave the way we all expect it to in this case, even if the GCP API doesn't. Our won't be surprising as users will be prompted by webhook failures that inform them why it didn't go through.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the API design and the webhook implementation (+tests) according to the design agreed in the comments above.

// confidentialInstanceType determines the required type of confidential computing technology.
// confidentialInstanceType will precede confidentialCompute. That is, if confidentialCompute is "Disabled" but a valid confidentialInstanceType is specified, a confidential instance will be configured.
// If confidentialInstanceType isn't set and confidentialCompute is "Enabled" the platform will set the default, which is subject to change over time. Currently the default is "sev" for "c2d", "c3d", and "n2d" machineTypes. For the other machine cases, a valid confidentialInstanceType must be specified.
// +kubebuilder:validation:Enum=sev;sev-snp;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kube conventions state that enum values should be PascalCase. What does sev and sev-snp actually mean? These aren't particularly expressive and might be hard for a consumer of this API to understand which value they may want

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I also agree with you on this: those names aren't expressive at all. They are acronyms that stand for AMD's "Secure Encrypted Virtualization" [0] and "Secure Encrypted Virtualization - Secure Nested Paging" [1].

However, using the short acronyms instead seem to be a standard. One example that comes in handy is the compute API documentation and supported values for the confidential instance type field [2] (as referenced in the comment above). Even after reading the GCloud documentation[3], users might expect to find SEV/SEVSNP? I'm not personally sure.

Checking on how acronyms are written in pascalcase, would Sev/SevSnp + an expansion on the documentation work for you? Or would you rather go for the whole SecureEncryptedVirtualization/Secure EncryptedVirtualizationSecureNestedPaging combo?

[0] https://www.amd.com/es/developer/sev.html
[1] https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white-papers/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
[2] https://pkg.go.dev/google.golang.org/api/compute/v1#ConfidentialInstanceConfig
[3] https://cloud.google.com/confidential-computing/confidential-vm/docs/confidential-vm-overview

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming the field secureInstanceEncryptionType and having values of Virtualization and VirtualizationWithNestedPaging? They would be fairly descriptive and flow right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding secureInstanceEncryptionType, I think it would be harder from users coming from the GCloud documentation to find the proper field name, and vice versa.
In terms of the supported values, the problem I see on mentioning virtualization is that most of the proposed confidential computing technologies are based on virtualization (probably all except for intel's sgx?). As these are AMD's solutions, intel has their own (TDX) and other architectures as ARM (CCA), also do (although it hasn't made it to GCP yet). Yes, Virtualization is only referred from SEV's name currently. But I think calling it Virtualization could also bring some confusion, as it could refer to AMD's SEV as to any other confidential computing solution.
Although cryptic, these names are basically trademarks, so I think it's way to make sure they will exclude each other. I also have to admit I'm for sure biased as those are the ways I see these technologies are referenced in documentation/news and that I don't have many new ideas to propose at this moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this a little bit more, what do you think about keeping confidentialInstanceType and admitting the following values:

  • EncryptedVirtualizationSev
  • EncryptedNestedPagingSevsnp

And possibly upcoming ones:

  • TrustedDomainTdx

A bit too repetitive perhaps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't addressed this in the last force-pushed version of the patch. I will be happy to do so when we reach an agreement.

@@ -108,15 +108,32 @@ func (m *GCPMachine) Default() {
clusterlog.Info("default", "name", m.Name)
}

func targetConfidentialType(tech *ConfidentialVMTechnology) (ConfidentialVMTechnology, error) {
if tech == nil || tech != nil && *tech == "" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bgartzi for adding SEV_SNP support.

I think this can be shorten to: if tech == nil || *tech == "" {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @uril! Well noted. I will apply this on the next proposal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't address it (explicitly) as the API design change made update this piece of code. However, I took it into account in the new conditions I had to write, thanks!

Copy link
Author

@bgartzi bgartzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damdo, @JoelSpeed, @uril thanks a lot for the quick review! I will try to update the PR as soon as I confirm some of the questions.

@@ -339,10 +352,18 @@ type GCPMachineSpec struct {
// ConfidentialCompute Defines whether the instance should have confidential compute enabled.
// If enabled OnHostMaintenance is required to be set to "Terminate".
// If omitted, the platform chooses a default, which is subject to change over time, currently that default is false.
// If ConfidentialInstanceType is configured, even if ConfidentialCompute is Disabled, a confidential compute instance will be configured.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with both of you. That would make more sense than the implementation I provided in this draft.
However, even if not documented explicitly that way, that's how the compute api behaves. I know after trying. I know it's a weak excuse, but I thought there might have been some reason for that, as it makes the former EnableConfidentialComputing somewhat redundant.
That's why I decided to propose this as the first draft; an API that behaved as the compute api did, and decide if the true-to-the-backend or behavior we all would expect was the correct one.
Now, if we all agree on the proposed implementation instead, I will provide the changes, I hope that soon.

// confidentialInstanceType determines the required type of confidential computing technology.
// confidentialInstanceType will precede confidentialCompute. That is, if confidentialCompute is "Disabled" but a valid confidentialInstanceType is specified, a confidential instance will be configured.
// If confidentialInstanceType isn't set and confidentialCompute is "Enabled" the platform will set the default, which is subject to change over time. Currently the default is "sev" for "c2d", "c3d", and "n2d" machineTypes. For the other machine cases, a valid confidentialInstanceType must be specified.
// +kubebuilder:validation:Enum=sev;sev-snp;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I also agree with you on this: those names aren't expressive at all. They are acronyms that stand for AMD's "Secure Encrypted Virtualization" [0] and "Secure Encrypted Virtualization - Secure Nested Paging" [1].

However, using the short acronyms instead seem to be a standard. One example that comes in handy is the compute API documentation and supported values for the confidential instance type field [2] (as referenced in the comment above). Even after reading the GCloud documentation[3], users might expect to find SEV/SEVSNP? I'm not personally sure.

Checking on how acronyms are written in pascalcase, would Sev/SevSnp + an expansion on the documentation work for you? Or would you rather go for the whole SecureEncryptedVirtualization/Secure EncryptedVirtualizationSecureNestedPaging combo?

[0] https://www.amd.com/es/developer/sev.html
[1] https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white-papers/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
[2] https://pkg.go.dev/google.golang.org/api/compute/v1#ConfidentialInstanceConfig
[3] https://cloud.google.com/confidential-computing/confidential-vm/docs/confidential-vm-overview

Comment on lines 144 to 147
// ConfidentialVMTechSEV sets AMD SEV as the VM instance's confidential computing technology of choice.
ConfidentialVMTechSEV ConfidentialVMTechnology = "sev"
// ConfidentialVMTechSEVSNP sets AMD SEV-SNP as the VM instance's confidential computing technology of choice.
ConfidentialVMTechSEVSNP ConfidentialVMTechnology = "sev-snp"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

@@ -108,15 +108,32 @@ func (m *GCPMachine) Default() {
clusterlog.Info("default", "name", m.Name)
}

func targetConfidentialType(tech *ConfidentialVMTechnology) (ConfidentialVMTechnology, error) {
if tech == nil || tech != nil && *tech == "" {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @uril! Well noted. I will apply this on the next proposal.

Only SEV machines could be configured by using the former
confidentialCompute Enabled/Disabled. GCP allows now to also configure
the confidential instance type as well by using the appropriate
parameter, see [0].

This commit introduces confidentialInstanceType, which lets users choose
between sev or sev-snp as their confidential computing technology of
choice.

The reason confidentialInstanceType will preceed confidentialCompute is
to mimic GCP's behavior, and ensuring backwards compatibility.

Meanwhile, add c3d as a machine that supports AMD SEV.

[0] https://cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance#rest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support confidential computing instance type configuration
5 participants