Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient entropy for fingerprint #15

Open
oberrich opened this issue Jul 4, 2024 · 8 comments
Open

Insufficient entropy for fingerprint #15

oberrich opened this issue Jul 4, 2024 · 8 comments

Comments

@oberrich
Copy link

oberrich commented Jul 4, 2024

I've compared both the original and this implementation and while it's true that there's a standard way of getting a random number, the process id and the thread id, this does not provide the same amount of entropy.
Operating systems often use bit-hacks and smart encodings of identifiers for efficient lookup.
If you observe the pids/tids on your consumer Windows machine for example you will see that all pids are of the form 2*n where 150 < n < 20'000.
Servers have even less range for n as there's less Software running and they are less random and more "lab condition".
To make the seed more random this implementation calls rand twice.

When we now view this in the context of distributed systems* it becomes clear that pids and tids aren't really random and the only meaningful source of entropy comes from the two rand calls, which suddenly sounds a lot gloomier - as it should.

I think the amount of entropy here is insufficient compared to the original implementation and we should come up with a better way to seed this.

For reference, the original implementation hashes

> Object.keys(window).toString().length
< 4980

characters and uses 32 rand calls to build the seed.

See also: CWE-331: Insufficient Entropy and CWE-339: Small Seed Space in PRNG.


* imagine an infrastructure like Netflix, where you have 1000 identically configured servers, running the exact same microservice and think what happens when they - at the same time, with the same seed - try to generate an ID and 5% of the servers happen to have the same pid/tid.

@mplanchard
Copy link
Owner

Yeah this is an interesting point. We don't really have anything like window to use as a base, and it's possible that the reference implementation has updated since I last looked at it.

There was some discussion on this here: paralleldrive/cuid2#27

I'm not seeing where the reference implementation uses 32 rand calls to build the seed, unless you're talking about the original cuid 1. Current version is here: https://github.com/paralleldrive/cuid2/blob/main/src/index.js

Do you have suggestions as to a better approach? Perhaps just a larger random number to include in the fingerprint?

@oberrich
Copy link
Author

oberrich commented Aug 13, 2024

Could be a good idea to just mix in something like an __rdtsc() on x86 or GetTickCount64() on Windows and combine to a hash. Can just layer those on top of the PID/TID stuff. Another thing that can be taken advantage of possibly is ASLR, maybe take ImageBase or something.

@mplanchard
Copy link
Owner

I don't have any windows machines so it is hard for me to test that kind of thing.

I'd be happy to accept a PR adding some stuff to the fingerprint using conditional compilation on windows targets though. In the meantime, I might just throw a UUID in there, which is of course kind of a funny thing to do in an alternative "universal identifier" crate, but it would at least be sure that each host has essentially a unique fingerprint

@mplanchard
Copy link
Owner

Looking back at the implementation, are you sure about the implementation having insufficient entropy?

It is the combination of:

  • Two random 128-bit numbers
  • Process ID
  • Thread ID

The combination of two random 128-bit numbers gives us 256 random bits plus process/thread ID. By comparison a UUID v4 has 122 random bits.

To get overlap, you'd need for two processes with the same process and thread ID to also generate the same two random 128-bit numbers.

The likelihood of generating the same two random 128-bit numbers is 1/(2^128)*1/(2^128) = 1/1e77. If you created a fingerprint every microsecond, it would still take 3e63 years before you'd expect to see a duplicate.

I'm not totally convinced that this is insufficient entropy.

@oberrich
Copy link
Author

oberrich commented Aug 15, 2024 via email

@mplanchard
Copy link
Owner

According to the docs, the Standard distribution (used by default for rand::gen()) generates numbers uniformly distributed across all possible integer values, "provided the Rng is well behaved."

We're using ThreadRng, which says:

ThreadRng uses the same PRNG as StdRng for security and performance and is automatically seeded from OsRng.

Unlike StdRng, ThreadRng uses the ReseedingRng wrapper to reseed the PRNG from fresh entropy every 64 kiB of random data as well as after a fork on Unix (though not quite immediately; see documentation of ReseedingRng). Note that the reseeding is done as an extra precaution against side-channel attacks and mis-use (e.g. if somehow weak entropy were supplied initially). The PRNG algorithms used are assumed to be secure.

OsRng will be as good as the operating system's provided entropy. As noted above, the ThreadRng also resseds. The docs on the Reseeding Rng say:

Reseeding after a fixed number of generated bytes is never strictly necessary. Cryptographic PRNGs don’t have a limited number of bytes they can output, or at least not a limit reachable in any practical way. There is no such thing as ‘running out of entropy’.

Occasionally reseeding can be seen as some form of ‘security in depth’. Even if in the future a cryptographic weakness is found in the CSPRNG being used, or a flaw in the implementation, occasionally reseeding should make exploiting it much more difficult or even impossible.

It reseeds after generating 64 kb of data or after a fork, which means that we should be doubly good in the case of running in multiple threads.


I think the likelihood of PIDs and TIDs colliding is real and the range of numbers it produces is not really providing a lot of value.

I think this is probably true.

I do think adding some architecture (cpuid, rdtsc) or OS specific entropy (Uptime, Drive IDs) could make more sense.

I agree with this! It gets a little tricky when thinking of dockerized hosts, but I'd be glad to add some system-specific components to the fingerprint. It's a little difficult for me to test thoroughly, because I only have linux machines. Open to PRs in that vein with conditional compilation for specific targets, and I'll look into efficient ways to get some useful info on Linux that isn't totally useless in a dockerized context.

@oberrich
Copy link
Author

I agree with this! It gets a little tricky when thinking of dockerized hosts, but I'd be glad to add some system-specific components to the fingerprint. It's a little difficult for me to test thoroughly, because I only have linux machines. Open to PRs in that vein with conditional compilation for specific targets, and I'll look into efficient ways to get some useful info on Linux that isn't totally useless in a dockerized context.

I've implemented some stuff (cpuid via raw-cpuid crate, computer name, volume serials of logical hard drives) for Windows/x86_64. Source Code

For now it just feeds everything into a Hasher because this is really convenient for error handling and adding new stuff that contributes to the hash.

@oberrich
Copy link
Author

Added encoding of KSHARED_USER_DATA, this includes information such as OS version (major.minor.build), boot id, active processor count, number of physical pages, system timings and some more stuff.
For x86/x86_64 it now encodes the TSC (_rdtsc).

I think this should provide enough entropy for Windows, not really sure about Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants