We will never merge this (it will go into actor-math): 55 make polynomial operations parallel #58

martun · 2023-09-08T13:50:00Z

No description provided.

github-actions · 2024-01-25T11:42:38Z

Test Results

  36 files   36 suites 4s ⏱️
  99 tests   95 ✔️   4 💤 0 ❌
496 runs 480 ✔️ 16 💤 0 ❌

Results for commit f56af41.

♻️ This comment has been updated with latest results.

avm · 2024-01-29T15:36:41Z

CMakeLists.txt

@@ -68,7 +68,7 @@ target_include_directories(${CMAKE_WORKSPACE_NAME}_${CURRENT_PROJECT_NAME} INTER
 target_link_libraries(${CMAKE_WORKSPACE_NAME}_${CURRENT_PROJECT_NAME} INTERFACE
                      ${CMAKE_WORKSPACE_NAME}::algebra
                      ${CMAKE_WORKSPACE_NAME}::multiprecision
-
+                      pthread


Per https://cmake.org/cmake/help/latest/module/FindThreads.html, it would be more idiomatic to use find_package(Threads REQUIRED) previously in the file, and then link against Threads::Threads here instead of pthread.

Removed it, looks like it was not needed any more.

avm · 2024-01-29T15:37:28Z

include/nil/crypto3/math/domains/arithmetic_sequence_domain.hpp

+                        // We need the lambda to be mutable, to be able to modify iterators captured by value.
+                        [this](std::size_t begin, std::size_t end) {


Not sure what the comment is trying to say, but there is no mutable specifier :-).

Yes, this comment was a leftover from an older version of code.

avm · 2024-01-29T15:41:55Z

include/nil/crypto3/math/domains/basic_radix2_domain.hpp

-                    for (std::size_t i = 0; i < a.size(); ++i) {
-                        a[i] = a[i] * sconst;
-                    }
+                    nil::crypto3::parallel_foreach(a.begin(), a.end(), [&sconst](value_type& v){v *= sconst.data;});


The convention of calling the data element a_i really helps make things clearer in other expressions, so I prefer it to just v. Also, consider leaving the loop body on a line by itself — I think it increases readability.

avm · 2024-01-29T15:44:57Z

include/nil/crypto3/math/domains/detail/basic_radix2_domain_aux.hpp


                    std::size_t m = 1;    // invariant: m = 2^{s-1}
+                    field_value_type w_m;


Why do we move the declaration of w_m outside of the loop?

Moved back.

avm · 2024-01-29T18:26:00Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+                static ThreadPool instance0(0, pool_size);
+                static ThreadPool instance1(1, pool_size);


Both thread pools get initialized simultaneously upon first execution of get_instance, but we may only ever need one of them. Is that a problem?

I was thinking about creating 2 separate functions, but then I was afraid that the function for instance1 can be simultaneously called from the threads of the intance0, and I'm not sure how safe it is during the initialization of instance1.

In practice (except the tests), 2 pools will always be used.

avm · 2024-01-29T18:30:23Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+
+                // Pool #0 will take care of the lowest level of operations, like polynomial operations.
+                // We want the minimal size of element_per_cpu to be 65536, otherwise the cores are not loaded.
+                if (pool_number == 0 && element_per_cpu < 65536) {


Since pool numbers bear special significance, I suggest using an enum instead of the magic constants 0 and 1.

avm · 2024-01-29T18:32:35Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+                std::size_t element_per_cpu = elements_count / cpu_usage;
+
+                // Pool #0 will take care of the lowest level of operations, like polynomial operations.
+                // We want the minimal size of element_per_cpu to be 65536, otherwise the cores are not loaded.


Please define a constant for the magic number.

avm · 2024-01-29T18:33:11Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+            boost::asio::thread_pool pool;
+            std::size_t pool_size;
+
+            // Each pool with know it's number.


typo: "will know its number"

avm · 2024-01-29T18:36:10Z

test/polynomial_dfs.cpp

@@ -1293,4 +1296,75 @@ BOOST_AUTO_TEST_CASE(polynomial_dfs_zero_one_test) {
    BOOST_CHECK((small_poly - one * small_poly).is_zero());
 }

+BOOST_AUTO_TEST_CASE(polynomial_dfs_addition_perf_test, *boost::unit_test::disabled()) {


I suppose we need some basic functional tests that are not disabled.

Everything is already tested, there are lots of tests in this file that cover the new code.

These disabled tests will probably be moved a separate "benchmark" sometime later.

I ended up adding one to check the pools. We will add more tests in the future, once everything works together.

avm · 2024-01-29T18:37:28Z

include/nil/crypto3/math/domains/detail/basic_radix2_domain_aux.hpp

+                        // Here we can parallelize on the both cycles with 'k' and 'm', because for each value of k and m
+                        // the ranges of array 'a' used do not intersect. Think of these 2 cycles as 1.


"Cycle" usually refers to something like CPU cycles, so to avoid confusion, let's call these "loops".

avm

LGTM, I left some non-essential suggestions.

avm · 2024-01-30T09:29:22Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+                std::size_t element_per_cpu = elements_count / cpu_usage;
+
+                // Pool #0 will take care of the lowest level of operations, like polynomial operations.
+                // We want the minimal size of element_per_cpu to be 65536, otherwise the cores are not loaded.
+                if (pool_id == PoolID::LOW_LEVEL_POOL_ID && element_per_cpu < POOL_0_MIN_CHUNK_SIZE) {
+                    cpu_usage = elements_count / POOL_0_MIN_CHUNK_SIZE + elements_count % POOL_0_MIN_CHUNK_SIZE ? 1 : 0;
+                    element_per_cpu = elements_count / cpu_usage;


element_per_cpu could be declared const and initialized once after the if.
(Also, elements_per_cpu might be a better name.)

avm · 2024-01-30T09:32:20Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+
+            template<class ReturnType>
+            inline std::future<ReturnType> post(std::function<ReturnType()> task) {
+                auto packaged_task = std::make_shared<std::packaged_task<ReturnType()>>(std::move(task));


Could we use a unique_ptr here?

avm · 2024-01-30T09:33:57Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+                    element_per_cpu = elements_count / cpu_usage;
+                }
+
+                std::size_t begin = 0;


Is this variable unused?

Changed this code, it was supposed to be used.

avm · 2024-01-30T09:37:38Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+                }
+
+                std::size_t begin = 0;
+                for (int i = 0; i < cpu_usage; i++) {


Not a problem, but it may be more logical to declare i as size_t.

avm · 2024-01-30T09:39:15Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+            std::size_t pool_size;
+
+            PoolID pool_id; 
+
+            // For pool #0 we have experimentally found that operations over chunks of <65536 elements
+            // do not load the cores. In case we have smaller chunks, it's better to load less cores.
+            const std::size_t POOL_0_MIN_CHUNK_SIZE = 65536;


pool_size and pool_id can probably be const, and the constant can probably be constexpr.

avm · 2024-01-30T09:43:48Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+                LOW_LEVEL_POOL_ID,
+                HIGH_LEVEL_POOL_ID


Since PoolID is the name of the enum, the _POOL_ID suffix seems redundant (and maybe we can call the enum PoolLevel or PoolTier and just have two values LOW and HIGH).

avm · 2024-01-30T09:46:04Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+            };
+
+            /** Returns a thread pool, based on the pool_id. pool with LOW_LEVEL_POOL_ID is normally used for low-level operations, like polynomial
+             *  operations and fft. Any code that uses these operations and needs to be parallel will submit it's tasks to pool with HIGH_LEVEL_POOL_ID.


typo: "its tasks"

avm · 2024-01-30T09:46:21Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+
+            /** Returns a thread pool, based on the pool_id. pool with LOW_LEVEL_POOL_ID is normally used for low-level operations, like polynomial
+             *  operations and fft. Any code that uses these operations and needs to be parallel will submit it's tasks to pool with HIGH_LEVEL_POOL_ID.
+             *  Submission of higher level tasks to low level pool will immediately result to a deadlock.


typo: "result in a deadlock"

avm · 2024-01-30T09:48:46Z

include/nil/crypto3/math/multithreading/thread_pool.hpp

+            }
+
+            // Waits for all the tasks to complete.
+            inline void join() {


I suspect those methods are automatically inline by value of being defined inside the class definition (but no harm in restating that, of course).

I need to write it, to keep the library header-only...

avm · 2024-01-30T10:01:49Z

include/nil/crypto3/math/multithreading/parallelization_utils.hpp

+        template< class InputIt, class OutputIt, class UnaryOperation >
+        void parallel_transform(InputIt first1, InputIt last1,
+                                OutputIt d_first, UnaryOperation unary_op,
+                                ThreadPool::PoolID pool_id = ThreadPool::PoolID::LOW_LEVEL_POOL_ID) {


Since it is very important to choose the right pool, we might not want a default value here, so the choice of the pool is always explicit.

Nope, let's keep this defaults. Almost always we want the lower pool when doing transforms, and in case someone makes a wrong choice, he will immediately get a deadlock and notice that something's wrong anyway.

martun self-assigned this Sep 8, 2023

martun linked an issue Sep 8, 2023 that may be closed by this pull request

Make polynomial operations parallel #55

Open

martun marked this pull request as draft September 8, 2023 13:50

martun changed the title ~~55 make polynomial operations parallel~~ We will never merge this: 55 make polynomial operations parallel Oct 13, 2023

martun added 4 commits January 25, 2024 15:17

Saving changes

c1462b9

Saving changes once again.

228f07c

Saving changes.

488fdf0

Few small fixes.

a5a65ea

martun force-pushed the 55-make-polynomial-operations-parallel branch from c3ab96b to a5a65ea Compare January 25, 2024 11:17

Now we have 2 thread pools.

fce0696

martun force-pushed the 55-make-polynomial-operations-parallel branch from 8158bb0 to fce0696 Compare January 29, 2024 08:04

Moved most of the math to the new actor code.

f964219

martun force-pushed the 55-make-polynomial-operations-parallel branch 3 times, most recently from eb786ab to 4322bde Compare January 29, 2024 15:30

Done with math library.

07bf357

martun force-pushed the 55-make-polynomial-operations-parallel branch from 4322bde to 07bf357 Compare January 29, 2024 15:39

martun changed the title ~~We will never merge this: 55 make polynomial operations parallel~~ We will never merge this (it will go into actor-math): 55 make polynomial operations parallel Jan 29, 2024

avm reviewed Jan 29, 2024

View reviewed changes

martun requested a review from avm January 30, 2024 07:51

martun force-pushed the 55-make-polynomial-operations-parallel branch from 51f3225 to cdea4d6 Compare January 30, 2024 09:51

Changes requested on code review.

63fae20

martun force-pushed the 55-make-polynomial-operations-parallel branch from cdea4d6 to 63fae20 Compare January 30, 2024 10:03

avm approved these changes Jan 30, 2024

View reviewed changes

martun force-pushed the 55-make-polynomial-operations-parallel branch from 8828e2d to 270f7e0 Compare January 30, 2024 11:20

More changes related to review comments.

f56af41

martun force-pushed the 55-make-polynomial-operations-parallel branch from 270f7e0 to f56af41 Compare January 30, 2024 11:21

martun closed this Feb 27, 2024

		// We need the lambda to be mutable, to be able to modify iterators captured by value.
		[this](std::size_t begin, std::size_t end) {


		std::size_t m = 1; // invariant: m = 2^{s-1}
		field_value_type w_m;

		static ThreadPool instance0(0, pool_size);
		static ThreadPool instance1(1, pool_size);

		// Here we can parallelize on the both cycles with 'k' and 'm', because for each value of k and m
		// the ranges of array 'a' used do not intersect. Think of these 2 cycles as 1.

We will never merge this (it will go into actor-math): 55 make polynomial operations parallel #58

We will never merge this (it will go into actor-math): 55 make polynomial operations parallel #58

Conversation

martun commented Sep 8, 2023

github-actions bot commented Jan 25, 2024 • edited Loading

Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martun Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 25, 2024 •

edited

Loading

martun Jan 30, 2024 •

edited

Loading