-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison between multiple 128-bit and 256-bit non-cryptographic hashes #257
Comments
Some request for other repos to also have 256-bit hash versions (which already has 128-bit hash): |
While There are multiple use cases which benefit from having more bits. One of them is to use the produced hash as a bitmap, from which bit fields are extracted for various usages. Bloom filters are a good example, because they require many bitfields, and are able to require more than 64-bit in certain scenarios. Another case is checksumming : assuming perfect collision properties (which is not always the case, see this recent study), the probability of 2 objects generating the same hash is bound by the birthday paradox. The only scenarios where 256-bit seems useful is for cryptographic applications : here, the larger bit space makes it impossible to "break" a hash (hence to intentionally generate a collision) with a brute-force attack. But for that to be true, one needs a cryptographic quality hash, that is, one where no known formula is able to "improve" probability of collision compared to brute-force. This is paid with a corresponding speed cost. And that's it. I don't see any scenario that would actually require a non-cryptographic 256-bit hash. I've read the thread where you mention that your needs are :
These quantities are suitable for a 128-bit hash. The chances of collisions are way too low to matter in practice (smaller than 1 / 2^64). |
@Cyan4973 bloom filters (for text) and ultra-fast non-secure checksumming are, indeed some of my intended goal. |
A 256-bit variant is necessarily going to be slower than a 128-bit one, due to the need to mix bits over a wider set of accumulators. So this speed cost can only be justified if it brings some advantage. For checksumming, I don't think there is any justifiable scenario where 256-bit offers any uniqueness advantage over 128-bit. Even if one scenario must deal with a thousand of billions of unique elements, 128-bit is still good enough. For generation of "any number of bitfields", one must find a case where more than 128-bit are needed. Even in this case, a quick workaround is to rehash the initial 128-bit to produce another 128-bit to pick from. |
256 bit or no, having benchmarks for the 128 bit variant along side the 64 bit one would be nice. FWIW, using the processor level AES 128 might be competitive -- two rounds is sufficient to produce a good hash that is not cryptographic. (I got the idea from here: https://openjdk.java.net/jeps/8201462) For anyone wanting a 256 bit hash, using processor intrinsic AES instructions might be faster than the large number of multiplies, shifts, rotates that would be needed for an accumulator 256 bits wide -- not necessarily a full official AES hash, but a truncated process with fewer rounds that satisfies the mixing and randomness properties but is not cryptographic. |
There will be an update of benchmark, featuring 128-bit variants. |
@Cyan4973 List of 128-bit hashes
|
There are many 64-bit variants in the list. Note that, if you are interested in a quick speed comparison between any of these variants, you can also do it directly by pluging them into the benchmark program at https://github.com/Cyan4973/xxHash/tree/dev/tests/bench . |
@Cyan4973 I am referring to the list of 128-bit variants on the list, the MRX list is broken as there are no documents saying which 6 out of 10 has 128-bit variants. |
Didi you ran any tests for Spookyhash (V2)? |
@Cyan4973 - There are definitely more applications for hashes with a large number of bits for streaming algorithms. I am using multiple 128b hashes for several things :). |
@Alain2019 SpookyHash is listed in https://github.com/Cyan4973/xxHash/wiki/Performance-comparison |
Thanks a lot, a small correction : SpookyHash is a 128bit hash, which can be used as 64bit or even 32bit. |
Good point. |
This issue has morphed into a request to benchmark a lot of (sometimes experimental) hash algorithms for comparison purposes. I believe there's a misunderstanding in the purpose of this repository. It is not to provide a full comparison exercise of every hash algorithm available. |
@Cyan4973, in that case, would it be possible to start a Telegram channel, a Discord or IRC group, or even a mailing list for such purposes? If so, who should we invite into the community? All ideas are welcomed. I know that there is an organization already doing it for cryptographic hashes, but we are targeting "fast" hashes. |
Well, that's a difficult question. The reference you provide on speed evaluation of cryptographic hashes is excellent. @rurban maintains an interesting evaluation of hashes. It's more concentrated on quality, which is fair, as otherwise it would be too easy to evaluate some fast "hash" with terrible quality. But it also collects a few elements about speed. So it could be a nice starting point. Let's be clear, it's a lot of work. Github may simplify a few important issues, such as creating and maintaining a public web page, but it still requires a lot of efforts to create something, even more so to keep it up to date. |
List of 256-bit hashes
Main benchmark goal:
The text was updated successfully, but these errors were encountered: