-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insufficient entropy for fingerprint #15
Comments
Yeah this is an interesting point. We don't really have anything like There was some discussion on this here: paralleldrive/cuid2#27 I'm not seeing where the reference implementation uses 32 rand calls to build the seed, unless you're talking about the original cuid 1. Current version is here: https://github.com/paralleldrive/cuid2/blob/main/src/index.js Do you have suggestions as to a better approach? Perhaps just a larger random number to include in the fingerprint? |
Could be a good idea to just mix in something like an |
I don't have any windows machines so it is hard for me to test that kind of thing. I'd be happy to accept a PR adding some stuff to the fingerprint using conditional compilation on windows targets though. In the meantime, I might just throw a UUID in there, which is of course kind of a funny thing to do in an alternative "universal identifier" crate, but it would at least be sure that each host has essentially a unique fingerprint |
Looking back at the implementation, are you sure about the implementation having insufficient entropy? It is the combination of:
The combination of two random 128-bit numbers gives us 256 random bits plus process/thread ID. By comparison a UUID v4 has 122 random bits. To get overlap, you'd need for two processes with the same process and thread ID to also generate the same two random 128-bit numbers. The likelihood of generating the same two random 128-bit numbers is 1/(2^128)*1/(2^128) = 1/1e77. If you created a fingerprint every microsecond, it would still take 3e63 years before you'd expect to see a duplicate. I'm not totally convinced that this is insufficient entropy. |
Im not sure at all how the Rust implementation of random number generation works, if it can be made deterministic by calls to srand or something similar that could be problematic, but the rng calls are probably the saving grace here. I think the likelihood of PIDs and TIDs colliding is real and the range of numbers it produces is not really providing a lot of value. I do think adding some architecture (cpuid, rdtsc) or OS specific entropy (Uptime, Drive IDs) could make more sense. I feel like using PID/TID feels a bit hacky although it’s convenient as abstracted by std crate.
|
According to the docs, the We're using
It reseeds after generating 64 kb of data or after a fork, which means that we should be doubly good in the case of running in multiple threads.
I think this is probably true.
I agree with this! It gets a little tricky when thinking of dockerized hosts, but I'd be glad to add some system-specific components to the fingerprint. It's a little difficult for me to test thoroughly, because I only have linux machines. Open to PRs in that vein with conditional compilation for specific targets, and I'll look into efficient ways to get some useful info on Linux that isn't totally useless in a dockerized context. |
I've implemented some stuff (cpuid via raw-cpuid crate, computer name, volume serials of logical hard drives) for Windows/x86_64. Source Code For now it just feeds everything into a Hasher because this is really convenient for error handling and adding new stuff that contributes to the hash. |
Added encoding of I think this should provide enough entropy for Windows, not really sure about Linux. |
I've compared both the original and this implementation and while it's true that there's a standard way of getting a random number, the process id and the thread id, this does not provide the same amount of entropy.
Operating systems often use bit-hacks and smart encodings of identifiers for efficient lookup.
If you observe the pids/tids on your consumer Windows machine for example you will see that all pids are of the form
2*n
where150 < n < 20'000
.Servers have even less range for
n
as there's less Software running and they are less random and more "lab condition".To make the seed more random this implementation calls
rand
twice.When we now view this in the context of distributed systems* it becomes clear that pids and tids aren't really random and the only meaningful source of entropy comes from the two
rand
calls, which suddenly sounds a lot gloomier - as it should.I think the amount of entropy here is insufficient compared to the original implementation and we should come up with a better way to seed this.
For reference, the original implementation hashes
characters and uses 32
rand
calls to build the seed.See also: CWE-331: Insufficient Entropy and CWE-339: Small Seed Space in PRNG.
* imagine an infrastructure like Netflix, where you have 1000 identically configured servers, running the exact same microservice and think what happens when they - at the same time, with the same seed - try to generate an ID and 5% of the servers happen to have the same pid/tid.
The text was updated successfully, but these errors were encountered: