Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the exponential decay process in C++. #285

Merged
merged 2 commits into from
Mar 21, 2024
Merged

Conversation

plietar
Copy link
Member

@plietar plietar commented Mar 11, 2024

The existing process written in R needs to copy the contents of each variable from C++ using v$get_values(), then after scaling the vector it would copy the result back into C++ using v$queue_update.

The amount of data copied and the time it took was pretty significant. 6 double variables, one for each kind of immunity, need to be updated in full at each time step, each as big as the population size.

Moving this into C++ removes the need for any copy at all, besides the multication loop. Values are read out of a reference to the vector held by the DoubleVariable, the result of the multication is moved to the queue, and finally individual moves the vector in the queue into the DoubleVariable. The speedup from this change for a 1M population size is around 10%.

An alternative optimization I considered was to compute the exponential decay lazily, recording only the timestep and value at which the immunity was last updated and using the closed form expression of the exponential decay. This would avoid the need to have mass updates of the immunity variables at every time step. Unfortunately in my testing this ends up being slower than even the current implementation, with all the time being spent in calculating the current value. This would also be a much more intrusive change, since every use of the immunity variables needs to be modified to take the last update timestep, the current timestep and the decay rate into consideration.

@plietar plietar requested a review from giovannic March 11, 2024 11:02
The existing process written in R needs to copy the contents of each
variable from C++ using `v$get_values()`, then after scaling the vector
it would copy the result back into C++ using `v$queue_update`.

The amount of data copied and the time it took was pretty significant.
6 double variables, one for each kind of immunity, need to be updated in
full at each time step, each as big as the population size.

Moving this into C++ removes the need for any copy at all, besides the
multication loop. Values are read out of a reference to the vector held
by the DoubleVariable, the result of the multication is moved to the
queue, and finally individual moves the vector in the queue into the
DoubleVariable. The speedup from this change for a 1M population size is
around 10%.

An alternative optimization I considered was to compute the exponential
decay lazily, recording only the timestep and value at which the
immunity was last updated and using the closed form expression of the
exponential decay.  This would avoid the need to have mass updates of
the immunity variables at every time step. Unfortunately in my testing
this ends up being slower than even the current implementation, with all
the time being spent in calculating the current value. This would also
be a much more intrusive change, since every use of the immunity
variables needs to be modified to take the last update timestep, the
current timestep and the decay rate into consideration.
@plietar plietar force-pushed the faster-exponential branch from 5fae7bb to 92800d6 Compare March 11, 2024 11:03
@plietar
Copy link
Member Author

plietar commented Mar 11, 2024

The broken tests are because of the new individual version, will be fixed by mrc-ide/individual#192

Copy link
Member

@giovannic giovannic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff!

Though not necessary, it would be good to see the performance improvement compared to a naive copy, modify, move implementation (which wouldn't require all the iterator code).

Another alternative to potentially consider: Rounding off low immunities to zero and excluding them from the update? Milage would likely depend on the threshold for low immunity and the transmission level.

* to create an empty vector, use `reserve(N)` to pre-allocate the vector and
* then call `push_back` with each new value. The second way would be to create
* a zero-initialised vector of size N and then use `operator[]` to fill in the
* values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about copy, modify, move?

@giovannic
Copy link
Member

It's very surprising that gcc doesn't optimise the second approach. Is that because of the default optimisation level set by R packaging? Or is this generally the best way to do it?

@plietar
Copy link
Member Author

plietar commented Mar 12, 2024

Is that because of the default optimisation level set by R packaging?

Not really, I get similar results even at -O3 -march=alderlake.

See the disassembly at https://godbolt.org/z/9jWKbTev7. There's a lot going on, but even just at a glance the copy_modify and zero_initialized functions each start with a call to memmove or memset, respectively. reserve_pushback doesn't have any such call, but it doesn't have any unrolling and vectorization. The only multiplication is a single vmulsd instruction, which only multiplies one double value per call, whereas the other functions use multiple calls to vmulpd ymm, which does four multiplications at once (along with some vmulpd xmm and vmulsd calls to round things off).

Clang is even more aggressive about unrolling, and generates 4 vmulpd ymm instructions for loop iteration, and generally produces an easier to read assembly. Still doesn't optimize out the memmove and memset calls though.

Here are the microbenchmarks cited in the comment https://gist.github.com/plietar/72fc5160f79e19379418ce391db1624a

plot-gcc-o2
plot-gcc-o3-alderlake

@plietar
Copy link
Member Author

plietar commented Mar 12, 2024

To be honest I am a little on the fence about whether to use the iterator stuff or not. It was about 2% speedup on the overall simulation compared to the zero-initialization code, which is small but not insignificant. Given that all the complexity is contained within that one function and doesn't leak out, I think I'm inclined to keep it.

@plietar plietar merged commit b3376d3 into dev Mar 21, 2024
4 checks passed
@giovannic giovannic deleted the faster-exponential branch March 21, 2024 14:07
plietar added a commit to plietar/malariasimulation that referenced this pull request May 16, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 16, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 16, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 16, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 17, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 17, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 20, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 21, 2024
plietar added a commit to plietar/malariasimulation that referenced this pull request May 22, 2024
@giovannic giovannic mentioned this pull request Sep 11, 2024
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants