Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entropy check workflow in ResetDevice #4155

Merged
merged 6 commits into from
Jan 2, 2025
Merged

Conversation

andrewkozlik
Copy link
Contributor

This PR implements an automated workflow that allows the host to verify that when Trezor generates the seed, it correctly includes the external entropy from the host. The host performs a randomized test asking Trezor to generate several seeds, checking that they were generated correctly and using the last one as the final seed.

Documentation:

TODO:

  • Device tests.

Each iteration of the entropy check adds about 2 seconds to the ResetDevice workflow on Trezor T with SLIP-39. A lot of that time is taken up by the seed derivation, so we probably can't do much better.

@onvej-sl please review the security of the proposed workflow.
@matejcik please do a high-level review of the workflow, namely the protobuf messages and how they are handled. Let's discuss what's the best way to pass the staged seed to the get_public_key() handler. Currently I use the storage cache (APP_STAGED_MNEMONIC_SECRET).
Let's leave the detailed code review until after we are happy with the security and design of the workflow.

Sorry, something went wrong.

@andrewkozlik andrewkozlik added core Trezor Core firmware. Runs on Trezor Model T and T2B1. trezorlib Python library and the command line trezorctl tool. R&D Research and development team related labels Sep 5, 2024
@andrewkozlik andrewkozlik self-assigned this Sep 5, 2024
@andrewkozlik andrewkozlik force-pushed the andrewkozlik/display_random branch from 78fe225 to ddb3640 Compare September 5, 2024 13:38
Copy link

github-actions bot commented Sep 5, 2024

core UI changes device test click test persistence test
T2T1 Model T test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
T3B1 Safe 3 test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
T3T1 Safe 5 test(screens) main(screens) test(screens) main(screens) test(screens) main(screens)
All main(screens)

@onvej-sl
Copy link
Contributor

The protocol seems secure with two reservations highlighted in bold in the following text.

My first question was: What kind of probability distribution the suite should use to generate entropy_check_count?

First, let's look at the problem from the perspective of a fake Trezor. Assume the attacker knows the probability distribution used to generate entropy_check_count. The attacker wants to maximize the success of the attack, where the attacker's strategy is to generate the seed honestly n-1 times and dishonestly on the n-th attempt. The success probability of this attack is Pr[entropy_check_count = n]. The attacker has the highest success rate if they choose n such that Pr[entropy_check_count = n] is maximal, in other words, n should be the mode of the entropy_check_count distribution.

Now, let's examine the problem from the suite's perspective. Assume that the suite wants the attack probability to be at most e. The suite seeks a distribution that generates on average the smallest possible values of entropy_check_count. In other words, it looks for a distribution with a mode at most e and the smallest expected value.

It can be shown that if e = 1/i for natural i, the desired distribution is the uniform dicrete distribution over the set 1..n, which is not surprising.

It follows that if the attack's success probability needs to be at most e, then entropy_check_count should be a uniform discrete distribution over 1..ceil(1/e), and the average number of rounds will be (ceil(1/e) + 1)/2.

My second question was: Does the protocol protects against an attacker who honestly generates the seed but generates addresses dishonestly?

The answer is that it can only protect those addresses for which the passphrase and all parts of the path up to the last hardened index are known at the time this protocol runs. This typically means the passphrase, purpose, coin_type, and account.

@matejcik
Copy link
Contributor

matejcik commented Sep 30, 2024

@matejcik please do a high-level review of the workflow, namely the protobuf messages and how they are handled. Let's discuss what's the best way to pass the staged seed to the get_public_key() handler. Currently I use the storage cache (APP_STAGED_MNEMONIC_SECRET).

high-level flow looks fine to me

instead of messing with the root sources for get_seed() and get_secret(), i'd pass an optional keychain: Keychain | None to get_public_key handler, and have the reset device flow construct a Keychain instance with the local copy of the seed.
(there's also probably no reason to go through workflow_handlers, you can directly import the get_public_key app)

@andrewkozlik andrewkozlik added the translations Put this label on a PR to run tests in all languages label Oct 23, 2024
@andrewkozlik andrewkozlik force-pushed the andrewkozlik/display_random branch from ddb3640 to ce803a5 Compare October 23, 2024 12:27
@andrewkozlik
Copy link
Contributor Author

Rebased on top of main.

there's also probably no reason to go through workflow_handlers, you can directly import the get_public_key app

Done in 2a5d8f2.

have the reset device flow construct a Keychain instance with the local copy of the seed

Done in 0fed6cc, but I have to say I am not entirely convinced it's for the better.

If at some point we wanted to use CardanoGetPublicKey, SolanaGetPublicKey or TezosGetPublicKey in the entropy check flow, or some other messages, then I think the implementation which involves constructing a Keychain instance with the local copy of the seed would start getting messy, because each handler would need to be modified and each message would need to be addressed independently, not to mention that the listed messages use keychain decorators, which would need to be circumvented somehow. On the other hand, I doubt that we really need any other messages other than plain GetPublicKey, because it can be used to access nearly any BIP 32 node, and AFAICS, the host can translate between the XPUB format and the special Solana or Tezos format. Cardano is a different story though, because it uses a special seed derivation, which is why I used the word nearly. In the implementation where Keychain is passed to get_public_key() this would again need special handling. Currently as it's written I didn't bother with replicating the Cardano logic from derive_and_store_roots(), because CardanoGetPublicKey is not supported.

In terms of implementation complexity at this moment I don't see much difference between the two options. In terms extensibility and avoiding code duplication in the future, I think the original solution which messes with the root source for get_secret() is better, because, for example, adding support for CardanoGetPublicKey is simpler. That being said, I think both implementations are acceptable.

@andrewkozlik andrewkozlik marked this pull request as ready for review October 23, 2024 12:51
@andrewkozlik andrewkozlik requested a review from prusnak as a code owner October 23, 2024 12:51
@andrewkozlik
Copy link
Contributor Author

Let's get this review moving, please.

@matejcik
Copy link
Contributor

squashed & rebased your work, with minimal changes so far, so that we can get the CI going. i intend to make some changes for your review that will be easier to do this way

@matejcik matejcik removed the translations Put this label on a PR to run tests in all languages label Dec 20, 2024
Copy link
Contributor

@matejcik matejcik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on functionality.

I have more comments on Python library side, but I tried to implement my own suggestions and decided to spin it off into a separate PR with somewhat larger refactorings. Good to be merged in this PR as-is.

While reading the code in detail, I realized that there is a slightly better way to do the protobuf data structures:

  1. Trezor should not be returning Success when the entropy check is not finished. That's simply misleading. Instead, I introduced a special message EntropyCheckReady for this state.
  2. With that, I decided to fold ResetDeviceContinue and ResetDeviceFinish into a single message EntropyCheckContinue(finish: bool), which seems slightly tidier: this way, the overall number of MessageTypes does not need to grow.

Implementing these changes leads to a more readable docs section in the .proto files, which now e.g. allows us to document that we support GetPublicKey in the check phase (and we can extend that if we add more messages).
Modification in firmware is very simple, and the same thing is trivial in trezorlib too, so even if Connect already started to work on this, they should be able to adapt.

@andrewkozlik please look over the branch and comment, we can merge into this PR if you're OK with it.

common/protob/messages-management.proto Outdated Show resolved Hide resolved
legacy/firmware/protob/Makefile Outdated Show resolved Hide resolved
core/src/apps/management/reset_device/__init__.py Outdated Show resolved Hide resolved
common/protob/messages.proto Outdated Show resolved Hide resolved
@matejcik
Copy link
Contributor

entropy is broken for all tests for reset_device -- i assume that is expected, due to (probably??) changed order of storage writes?

@matejcik
Copy link
Contributor

we'll also need some tests that validate expected_responses for the old and the new flow

@andrewkozlik
Copy link
Contributor Author

Merged @matejcik's fixups and updated docs/common/message-workflows.md.

@andrewkozlik andrewkozlik added the translations Put this label on a PR to run tests in all languages label Dec 27, 2024
@andrewkozlik
Copy link
Contributor Author

entropy is broken for all tests for reset_device -- i assume that is expected, due to (probably??) changed order of storage writes?

Correct. This is because set_slip39_iteration_exponent() and set_backup_type() are now called before random.bytes(), instead of after, which makes for cleaner code. (Writing encrypted entries to storage makes use of the random number generator.)

Copy link
Contributor

@matejcik matejcik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spun off a separate issue #4464 for the outstanding items, let's merge this

@andrewkozlik andrewkozlik force-pushed the andrewkozlik/display_random branch from f78335c to 38557d4 Compare January 2, 2025 11:21
@andrewkozlik andrewkozlik merged commit 57868ad into main Jan 2, 2025
138 of 139 checks passed
@andrewkozlik andrewkozlik deleted the andrewkozlik/display_random branch January 2, 2025 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Trezor Core firmware. Runs on Trezor Model T and T2B1. R&D Research and development team related translations Put this label on a PR to run tests in all languages trezorlib Python library and the command line trezorctl tool.
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.

None yet

3 participants