This is an improved version of the RANLUX generator, called Ranlux++ (or RANLUXPP) as originally proposed by Alexei Sibidanov.
There are two papers on it:
- A revision of the subtract-with-borrow random number generators
- Review of High-Quality Random Number Generators
The original RANLUX RNG is a well-established generator in particle physics for Monte Carlo simulations. It's theoretically backed up by the theory of dynamical systems. The original RANLUX's main drawback is that it is very very slow. The reason is that many values (typically 389) were computed but skipped. The new advancement is that RANLUX is equivalent to LCG with constant A
having 576 bits: X_{i + 1} = A * X_{i} mod m.
This can be efficiently computed using long arithmetic with the help of SSE or AVX2 instructions. More importantly, skipping can be easily realized in one step - one just needs to precompute A^skip mod m
. This new version is called Ranlux++ or RANLUXPP. With these improvements, RANLUX++ is running at speed 1.4 GiB/s on my laptop with Core i7-8650U CPU @ 1.90GHz. The original implementation using assembly by Alexei Sibidanov can be checked here.
The Alexei Sibidanov's implementation relies heavily on assembly language to implement multiplication of two integers with length of 576 bits each and is avaialble only for X86+64 CPUs with AVX2 support.
The next development is a portable (not using assembly language) implementation of Ranlux++ for the ROOT - Computing library developed by CERN (European Organization for Nuclear Research, the site of the Large Hadron Collider LHC). This advancement was presented by Jonas Hahnfeld and Lorenzo Moneta in this article.. The portable implementation reaches performance very close to the original assembler version.
My contribution is a standalone version of Ranlux++ written in C. It's combines Alexei Sibidanov's code with the implementation for ROOT converted from C++ to C.
- Compile it with
make
- Test speed to generate 1e9 64-bit values:
./ranluxpp-test -n 1e9
- Write out first 1e9 64-bit values to stdout:
./ranluxpp-test -w -n 1e9 | pv > /dev/null
- Generate endless stream of 64-bit values:
./ranluxpp-test -w -n 0 | practrand-RNG_test stdin64 -tlmax 64G -multithreaded
./ranluxpp-test --help
to get info on usage- Verify output (using p=2048 and seed=1):
./ranluxpp-test -w -n 0 | head -c 7200 | sha256sum
. Hash:67a0ef6f442519d7f674c721a3ebeb175005fc9fe6b17b359b39725dfa393819
- Compare output of
./ranluxpp-test -w -n 0 | head -c 7200
withranluxpp_p_2048_seed_1.output
file
- Compare output of
- Use
make clean
to clean unnecessary files - Use
make arbitrary_precison_computation
to compile test program to do 576 bit arithmetic directly. It's based on GMP library GNU Multiple Precision Arithmetic Library - On i7-8650U CPU, the time to generate 1e9 64-bit values is 7.7 seconds.