The need for dynamic load balancing in global adaptivity #129

IshaanDesai · 2024-09-16T11:11:49Z

In global adaptivity, it is very often the case that there exists only a particular region of interest on the macro scale, which leads to only micro simulations in that region to be active. From a performance perspective this is highly inefficient, because it means that some processors solve a large number of micro simulations, while other processors are idle. In a recent study where we scaled the two-scale-heat-conduction case to have 128 micro simulations, we saw the effect of the load imbalance:

A dynamic load balancing technique which would redistribute the micro simulations across processors would aide to increased performance and scalability.

IshaanDesai · 2025-01-17T14:40:51Z

The dynamic load balancing is envisioned to work in the following steps:

Each rank of the Micro Manager accesses the complete macro mesh. Even though the entire mesh is accessed, only a part of the micro simulations are created.
Initially the Micro Manager distributes the total number of micro simulations as evenly as possible amongst all the available ranks.
When the load balancing is triggered, an allgather is run to collect on each rank the global number of active simulations.
The global number of active simulations are divided by the number of ranks to find out the required number of active simulations per rank to have a balanced load.
Just like the allgather on the number of active simulation, another allgather is run to get the global IDs of active simulations, and the information of on which rank these active simulations are. The IDs and the rank location is necessary to determine a communication map to redistribute the load.
Using the above information, a communication map is created to decide to which rank is each active simulation sent to. The logic for this is already implemented in the existing GlobalAdaptivityCalculator class.
If an active simulation is moved to a different rank, all the inactive simulations associated to it on its current rank are also moved to a different rank.
When a simulation is moved to a new rank, the old rank writes zero data as results of a micro simulation that it no longer has. Every rank only writes results for the micro simulations that it is currently hosting.

IshaanDesai self-assigned this Sep 16, 2024

IshaanDesai added the new-feature Adding a new feature label Sep 16, 2024

IshaanDesai linked a pull request Jan 2, 2025 that will close this issue

Add feature: dynamic load balancing #141

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The need for dynamic load balancing in global adaptivity #129

The need for dynamic load balancing in global adaptivity #129

IshaanDesai commented Sep 16, 2024

IshaanDesai commented Jan 17, 2025

The need for dynamic load balancing in global adaptivity #129

The need for dynamic load balancing in global adaptivity #129

Comments

IshaanDesai commented Sep 16, 2024

IshaanDesai commented Jan 17, 2025