-
-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Alphabetic
distribution
#1587
Conversation
The rejection sampling equivalent performs slightly worse when I ran the benchmarks from `benches/`. Rejection sampling: ``` random_bool/standard time: [1.1139 ns 1.1224 ns 1.1321 ns] change: [+0.8541% +1.6869% +2.5284%] (p = 0.00 < 0.05) Change within noise threshold. Found 114 outliers among 1000 measurements (11.40%) 30 (3.00%) high mild 84 (8.40%) high severe ``` `rng.random_range`: ``` random_bool/standard time: [1.1012 ns 1.1069 ns 1.1136 ns] change: [-0.3371% +0.2525% +0.8818%] (p = 0.42 > 0.05) No change in performance detected. Found 80 outliers among 1000 measurements (8.00%) 40 (4.00%) high mild 40 (4.00%) high severe ```
cddbae0
to
be743d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for including this in rand
— why do we want a distribution over exactly a-zA-Z
and not, say, just over A-Z
?
Had the need for |
This is fine, but |
Acceptable motivation for a small addition to the library. (It's always hard to tell exactly how many uses some thing has however.) |
I'll work on that.
Understandable. I hope it's useful enough considering |
353d7b5
to
f86462e
Compare
The benchmarks perform the same but the memory usage should be lower.
Forgot to remove debugging calls. |
This still barely got a faster benchmark despite being one of the quickest solutions on theory
fc672b9
to
baca091
Compare
Decided to test my code in a project of mine. What I found out was that The code in question just generates 1 million markdown lines one by one with a random heading level (1-6) and 128 random alphabetic characters. The performance without I'm not the biggest fan of unsafe Rust, but considering And 32% is a massive deal... This is the documentation for
In our tested distribution we can guarantee that |
PS: It's also a bit quicker when you add a So in the end if we compare:
string.reserve_exact(len);
string.extend(self.sample_iter(rng).take(len).map(|c| c as char)); to
unsafe {
let v = string.as_mut_vec();
v.reserve_exact(len);
v.extend(self.sample_iter(rng).take(len));
} |
Yes, this is likely why I went with the unsafe code originally. It feels like something the optimizer should be able to solve, but without an ASCII-byte type that may be a hard ask. It turns out there is an unstable addition to the library for this purpose: Thus I feel we should use that later, which would imply the sampling type of So proceed with your original unsafe code for now (with a brief |
Will do.
Thanks for mentioning that, maybe in the future these two parts could be updated to use that, but since it's still unstable it'd be a short while until that happens. |
// See [#1590](https://github.com/rust-random/rand/issues/1590). | ||
unsafe { | ||
let v = string.as_mut_vec(); | ||
v.reserve_exact(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we don't need to use reserve
when using extend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we don't need to use reserve when using extend
Thought the same but on my 1 million markdown headings project from earlier project it yields noticeably better performance.
With reserve_exact
:
generate_markdown 0.54s user 0.30s system 94% cpu 0.883 total
generate_markdown 0.53s user 0.32s system 92% cpu 0.925 total
generate_markdown 0.53s user 0.21s system 99% cpu 0.747 total
Without reserve_exact
:
generate_markdown 0.79s user 0.32s system 95% cpu 1.167 total
generate_markdown 0.79s user 0.31s system 93% cpu 1.181 total
generate_markdown 0.77s user 0.33s system 94% cpu 1.164 total
And its not like we'd need more than we use when the method itself takes len
as an argument.
When opening std
the definition is this:
fn extend<I: IntoIterator<Item = char>>(&mut self, iter: I) {
let iterator = iter.into_iter();
let (lower_bound, _) = iterator.size_hint();
self.reserve(lower_bound);
iterator.for_each(move |c| self.push(c));
}
Yet using a reserve_exact
before it yields much better performance when just generating an exact string like so:
Alphabetic.sample_string(rng, 128)
Honestly, I'm unsure why it's that much more performant, but it's been consistently like that. Feel free to also test it out if you want: taboc/utils/generate_markdown/
.
CHANGELOG.md
entrySummary
Added an
Alphabetic
distribution similar toAlphanumeric
with the same derives and similar:Note
There's no documentation in:
src/lib.rs
,src/rng.rs
. Saying this because there's documentation aboutAlphanumeric
there. The reason is thatAlphabetic
andAlphanumeric
are pretty similar already so I didn't see much value in adding more generic examples compared to the type-level documentation.Motivation
Had the need for an alphabetic random generation and figured it'd be helpful to have it integrated directly into the library for easier usage.
Details
Decided to use
rng.random_range
for theDistribution<u8>
implementation due to it showing slightly better performance benchmarks (69cc117). With three benchmark runs on my system, here are the results with 3 runs:Rejection sampling
rng.random_range
Specs
6.13.2-zen1-1-zen
Note
Did do all the benches with a lot of my desktop apps in the background. That shouldn't be a problem because I didn't open new apps or use my background apps a lot.
Also, I'm a first time contributor so if I missed some implementation details or additional required documentation, feel free to tell me about it.