-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We will never merge this (it will go into actor-math): 55 make polynomial operations parallel #58
Conversation
c3ab96b
to
a5a65ea
Compare
8158bb0
to
fce0696
Compare
eb786ab
to
4322bde
Compare
4322bde
to
07bf357
Compare
CMakeLists.txt
Outdated
@@ -68,7 +68,7 @@ target_include_directories(${CMAKE_WORKSPACE_NAME}_${CURRENT_PROJECT_NAME} INTER | |||
target_link_libraries(${CMAKE_WORKSPACE_NAME}_${CURRENT_PROJECT_NAME} INTERFACE | |||
${CMAKE_WORKSPACE_NAME}::algebra | |||
${CMAKE_WORKSPACE_NAME}::multiprecision | |||
|
|||
pthread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://cmake.org/cmake/help/latest/module/FindThreads.html, it would be more idiomatic to use find_package(Threads REQUIRED)
previously in the file, and then link against Threads::Threads
here instead of pthread
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it, looks like it was not needed any more.
// We need the lambda to be mutable, to be able to modify iterators captured by value. | ||
[this](std::size_t begin, std::size_t end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the comment is trying to say, but there is no mutable
specifier :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this comment was a leftover from an older version of code.
for (std::size_t i = 0; i < a.size(); ++i) { | ||
a[i] = a[i] * sconst; | ||
} | ||
nil::crypto3::parallel_foreach(a.begin(), a.end(), [&sconst](value_type& v){v *= sconst.data;}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention of calling the data element a_i
really helps make things clearer in other expressions, so I prefer it to just v
. Also, consider leaving the loop body on a line by itself — I think it increases readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
std::size_t m = 1; // invariant: m = 2^{s-1} | ||
field_value_type w_m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we move the declaration of w_m
outside of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved back.
static ThreadPool instance0(0, pool_size); | ||
static ThreadPool instance1(1, pool_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both thread pools get initialized simultaneously upon first execution of get_instance
, but we may only ever need one of them. Is that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about creating 2 separate functions, but then I was afraid that the function for instance1 can be simultaneously called from the threads of the intance0, and I'm not sure how safe it is during the initialization of instance1.
In practice (except the tests), 2 pools will always be used.
|
||
// Pool #0 will take care of the lowest level of operations, like polynomial operations. | ||
// We want the minimal size of element_per_cpu to be 65536, otherwise the cores are not loaded. | ||
if (pool_number == 0 && element_per_cpu < 65536) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pool numbers bear special significance, I suggest using an enum instead of the magic constants 0 and 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
std::size_t element_per_cpu = elements_count / cpu_usage; | ||
|
||
// Pool #0 will take care of the lowest level of operations, like polynomial operations. | ||
// We want the minimal size of element_per_cpu to be 65536, otherwise the cores are not loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define a constant for the magic number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
boost::asio::thread_pool pool; | ||
std::size_t pool_size; | ||
|
||
// Each pool with know it's number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "will know its number"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -1293,4 +1296,75 @@ BOOST_AUTO_TEST_CASE(polynomial_dfs_zero_one_test) { | |||
BOOST_CHECK((small_poly - one * small_poly).is_zero()); | |||
} | |||
|
|||
BOOST_AUTO_TEST_CASE(polynomial_dfs_addition_perf_test, *boost::unit_test::disabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we need some basic functional tests that are not disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is already tested, there are lots of tests in this file that cover the new code.
These disabled tests will probably be moved a separate "benchmark" sometime later.
I ended up adding one to check the pools. We will add more tests in the future, once everything works together.
// Here we can parallelize on the both cycles with 'k' and 'm', because for each value of k and m | ||
// the ranges of array 'a' used do not intersect. Think of these 2 cycles as 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cycle" usually refers to something like CPU cycles, so to avoid confusion, let's call these "loops".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
51f3225
to
cdea4d6
Compare
cdea4d6
to
63fae20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left some non-essential suggestions.
std::size_t element_per_cpu = elements_count / cpu_usage; | ||
|
||
// Pool #0 will take care of the lowest level of operations, like polynomial operations. | ||
// We want the minimal size of element_per_cpu to be 65536, otherwise the cores are not loaded. | ||
if (pool_id == PoolID::LOW_LEVEL_POOL_ID && element_per_cpu < POOL_0_MIN_CHUNK_SIZE) { | ||
cpu_usage = elements_count / POOL_0_MIN_CHUNK_SIZE + elements_count % POOL_0_MIN_CHUNK_SIZE ? 1 : 0; | ||
element_per_cpu = elements_count / cpu_usage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
element_per_cpu
could be declared const
and initialized once after the if
.
(Also, elements_per_cpu
might be a better name.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
template<class ReturnType> | ||
inline std::future<ReturnType> post(std::function<ReturnType()> task) { | ||
auto packaged_task = std::make_shared<std::packaged_task<ReturnType()>>(std::move(task)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a unique_ptr
here?
element_per_cpu = elements_count / cpu_usage; | ||
} | ||
|
||
std::size_t begin = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variable unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this code, it was supposed to be used.
} | ||
|
||
std::size_t begin = 0; | ||
for (int i = 0; i < cpu_usage; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, but it may be more logical to declare i
as size_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
std::size_t pool_size; | ||
|
||
PoolID pool_id; | ||
|
||
// For pool #0 we have experimentally found that operations over chunks of <65536 elements | ||
// do not load the cores. In case we have smaller chunks, it's better to load less cores. | ||
const std::size_t POOL_0_MIN_CHUNK_SIZE = 65536; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool_size
and pool_id
can probably be const
, and the constant can probably be constexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
LOW_LEVEL_POOL_ID, | ||
HIGH_LEVEL_POOL_ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since PoolID
is the name of the enum, the _POOL_ID
suffix seems redundant (and maybe we can call the enum PoolLevel
or PoolTier
and just have two values LOW
and HIGH
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
}; | ||
|
||
/** Returns a thread pool, based on the pool_id. pool with LOW_LEVEL_POOL_ID is normally used for low-level operations, like polynomial | ||
* operations and fft. Any code that uses these operations and needs to be parallel will submit it's tasks to pool with HIGH_LEVEL_POOL_ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "its tasks"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
|
||
/** Returns a thread pool, based on the pool_id. pool with LOW_LEVEL_POOL_ID is normally used for low-level operations, like polynomial | ||
* operations and fft. Any code that uses these operations and needs to be parallel will submit it's tasks to pool with HIGH_LEVEL_POOL_ID. | ||
* Submission of higher level tasks to low level pool will immediately result to a deadlock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "result in a deadlock"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
} | ||
|
||
// Waits for all the tasks to complete. | ||
inline void join() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect those methods are automatically inline
by value of being defined inside the class definition (but no harm in restating that, of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to write it, to keep the library header-only...
template< class InputIt, class OutputIt, class UnaryOperation > | ||
void parallel_transform(InputIt first1, InputIt last1, | ||
OutputIt d_first, UnaryOperation unary_op, | ||
ThreadPool::PoolID pool_id = ThreadPool::PoolID::LOW_LEVEL_POOL_ID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is very important to choose the right pool, we might not want a default value here, so the choice of the pool is always explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, let's keep this defaults. Almost always we want the lower pool when doing transforms, and in case someone makes a wrong choice, he will immediately get a deadlock and notice that something's wrong anyway.
8828e2d
to
270f7e0
Compare
270f7e0
to
f56af41
Compare
No description provided.