Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SNP-style virtual attestations, restoring code update tests #6770

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

eddyashton
Copy link
Member

We previously had a vestigial virtual attestation reusing some of the terminology and fields of SGX attestations. This didn't provide any distinctions between nodes or apply checks during node joining, so wasn't usefully testing code upgrade flows.

This has been replaced with a new scheme based on SNP attestations. A virtual node now has a measurement (the sha256 of the enclave library, calculated by the host at startup) and a host data/security policy value (currently a single default string for security policy, with host data the sha256 of that as it is for SNP). This introduces many duplicated tables, and associated duplicated governance, because we don't want to risk collisions across platforms.

The beneficial outcome is that we can now test code update flows (ie - change the "permitted nodes" of a service at run-time, confirm that old nodes can no longer join) close to how they run on SNP. We can also test some of the effects of fiddling with these tables (eg - omitting security policies, setting invalid host data) outside of SNP, though there's the caveat that these are all touching separate governance actions and tables.

There's no endorsements for virtual attestations, to avoid creating/maintaining any fake hardware keys, but this means there are still join paths on SNP that virtual doesn't test. I've tried to avoid too many renames/refactors of existing fields, but the existing PAL is extremely porous and inconsistent, so some of the names/concepts are unclear (ie - "host_data" is an SNP concept, "security_policy" is what we/ACI put there, but the names aren't consistently split and the digesting/decoding is haphazard).

I'll add some comments describing the changes I remember, when it's not last-thing-on-a-Friday.

@eddyashton eddyashton requested a review from a team as a code owner January 17, 2025 16:34
tests/governance.py Show resolved Hide resolved
tests/infra/utils.py Show resolved Hide resolved
tests/code_update.py Outdated Show resolved Hide resolved
tests/code_update.py Outdated Show resolved Hide resolved
tests/infra/consortium.py Show resolved Hide resolved
Comment on lines +356 to +364
if new_host_data is not None:
old_host_data, old_security_policy = (
infra.utils.get_host_data_and_security_policy(args.enclave_platform)
)

if old_host_data != new_host_data:
network.consortium.remove_host_data(
primary, args.enclave_platform, old_host_data
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hopefully predicting what will be necessary if/when a version-transition includes a host-data update. Currently it doesn't, on either SNP or Virtual. But if it does, we will (probably?) want to cycle it like we cycle measurements (I hope?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come that's not used? It seems like code updates on Virtual should go through this exact path, as we code upgrade from one binary to the next.

src/node/gov/handlers/service_state.h Show resolved Hide resolved
@@ -1004,6 +1004,25 @@ const actions = new Map([
},
),
],
[
Copy link
Member

@achamayou achamayou Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyashton I think we ought to split those actions in their own virtual_node.js file (similar to test/test_actions.js), going under CCF/samples/constitutions/sandbox/ or something similarly distinct from default. default should remain a good default choice for a production service, where these actions admittedly don't actively hurt but are also quite unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say this would be awkward because we need these actions in most of our tests, where virtual attestations are now applied for all joins. But that's actually wrong - we only need the governance when we want to change the measurements, which should only be for code_update and lts_compat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A footgun caveat while it's on my mind - the node that starts the network always adds its own attestation, so you could still easily accidentally permit virtual nodes if you (somehow) ran a virtual node as the first in the network.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but this along with the constitution, membership etc is part of what the consortium must review before signing off the transition to open.

If we're worried about accidents, we could decide that running a Start node in virtual mode requires passing --unsafe on the CLI, and will otherwise refuse to start.

// Hard-coded here and repeated in the relevant tests. Can be made dynamic
// (eg - from an env var or file) when the tests are able to run SNP nodes
// with distinct policies
char const* policy = std::getenv("CCF_VIRTUAL_SECURITY_POLICY");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take a digest of the .so instead, and eventually the binary itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The digest of the .so is used as the measurement, so it is determined at build-time. The policy is (I think?) a run-time decision, to be captured in the host_data field? In practice we're using the default security policy (from the next line) most of the time, and only bother setting this env var for a single "what happens if we have a bad policy" test.

To be clear, virtual nodes will:

  • populate the host_data table on startup (with the digest of this "policy")
  • check that the host_data of joiners is in that table, and refuse them if it is not
  • have governance to control contents of this table

So we're mimicking what the SNP attestation does, where we do an equality check for host_data values, which are calculated as the digest of some policy, but policy is not processed/applied within CCF (in fact in SNP, we allow the policy itself to be omitted/stubbed out, and only manage the host_data fields directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The digest of the .so is used as the measurement

Ok, I think that's not right, I think it should be host_data. A code upgrade in SNP is a host_data change, not a measurement change. measurement should be a fingerprint of the image, perhaps a digest of uname -a or some such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants