-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the exponential decay process in C++. #285
Conversation
The existing process written in R needs to copy the contents of each variable from C++ using `v$get_values()`, then after scaling the vector it would copy the result back into C++ using `v$queue_update`. The amount of data copied and the time it took was pretty significant. 6 double variables, one for each kind of immunity, need to be updated in full at each time step, each as big as the population size. Moving this into C++ removes the need for any copy at all, besides the multication loop. Values are read out of a reference to the vector held by the DoubleVariable, the result of the multication is moved to the queue, and finally individual moves the vector in the queue into the DoubleVariable. The speedup from this change for a 1M population size is around 10%. An alternative optimization I considered was to compute the exponential decay lazily, recording only the timestep and value at which the immunity was last updated and using the closed form expression of the exponential decay. This would avoid the need to have mass updates of the immunity variables at every time step. Unfortunately in my testing this ends up being slower than even the current implementation, with all the time being spent in calculating the current value. This would also be a much more intrusive change, since every use of the immunity variables needs to be modified to take the last update timestep, the current timestep and the decay rate into consideration.
5fae7bb
to
92800d6
Compare
The broken tests are because of the new individual version, will be fixed by mrc-ide/individual#192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great stuff!
Though not necessary, it would be good to see the performance improvement compared to a naive copy, modify, move implementation (which wouldn't require all the iterator code).
Another alternative to potentially consider: Rounding off low immunities to zero and excluding them from the update? Milage would likely depend on the threshold for low immunity and the transmission level.
* to create an empty vector, use `reserve(N)` to pre-allocate the vector and | ||
* then call `push_back` with each new value. The second way would be to create | ||
* a zero-initialised vector of size N and then use `operator[]` to fill in the | ||
* values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about copy, modify, move?
It's very surprising that gcc doesn't optimise the second approach. Is that because of the default optimisation level set by R packaging? Or is this generally the best way to do it? |
Not really, I get similar results even at See the disassembly at https://godbolt.org/z/9jWKbTev7. There's a lot going on, but even just at a glance the Clang is even more aggressive about unrolling, and generates 4 Here are the microbenchmarks cited in the comment https://gist.github.com/plietar/72fc5160f79e19379418ce391db1624a |
To be honest I am a little on the fence about whether to use the iterator stuff or not. It was about 2% speedup on the overall simulation compared to the zero-initialization code, which is small but not insignificant. Given that all the complexity is contained within that one function and doesn't leak out, I think I'm inclined to keep it. |
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
This reverts commit b3376d3.
The existing process written in R needs to copy the contents of each variable from C++ using
v$get_values()
, then after scaling the vector it would copy the result back into C++ usingv$queue_update
.The amount of data copied and the time it took was pretty significant. 6 double variables, one for each kind of immunity, need to be updated in full at each time step, each as big as the population size.
Moving this into C++ removes the need for any copy at all, besides the multication loop. Values are read out of a reference to the vector held by the DoubleVariable, the result of the multication is moved to the queue, and finally individual moves the vector in the queue into the DoubleVariable. The speedup from this change for a 1M population size is around 10%.
An alternative optimization I considered was to compute the exponential decay lazily, recording only the timestep and value at which the immunity was last updated and using the closed form expression of the exponential decay. This would avoid the need to have mass updates of the immunity variables at every time step. Unfortunately in my testing this ends up being slower than even the current implementation, with all the time being spent in calculating the current value. This would also be a much more intrusive change, since every use of the immunity variables needs to be modified to take the last update timestep, the current timestep and the decay rate into consideration.