- Problem definition
- Single core solution
- Parallel implementation
- Summary
- Further development
The problem is a specific version of the range-searching problem, which in the most general case is defined as:
The range searching problem most generally consists of preprocessing a set
$S$ of objects, in order to determine which objects from$S$ intersect with a query object, called a range
In our case we have an input distribution of points in 3D and we are asked to compute the local density at the node points of a regular 3D grid with grid number
We can see that our case considers
$S = {x_i | x_i\in \R^3 ;; i=1,...,n}$ $x_{ij} \in [0,1] ;; \forall ; i,j$
While the query object is a sphere centred at every node point
Count for every node point the number of points that are within a radius R from it, this requires
Pros:
- Simple to implement
- No cache misses
Cons:
-
$\Theta(N^3n)$ complexity
Use a k-d tree to preprocess the data, this approach has a complexity of
Pros:
-
$\Theta(N^3\log n)$ complexity
Cons:
- Lots of cache misses (
$\Theta(\log n)$ per node) - Difficult to implement
This method does the opposite: instead of counting the number of
At first glance this method is the same as the brute force one, however we can only update the part of the density
Note that
Pros:
-
$\Theta((NR)^3 n)$ complexity, lower than brute force and for low$R$ better than kd-trees - No cache misses if programmed correctly
Cons:
- Harder to implement than brute force but simpler than kd-trees
- Needs to keep the whole
$D$ matrix in memory
The above analysis assumes that all the
Here we have two possible approaches:
This method has a clear advantage which is the reduced memory usage, furthermore it improves the performance of all algorithms because
- Read the
$x_i$ from the file ($t_{seek} + n \cdot t_{read}$ ) - Split the
$x_i$ (can be done at the same time as 1.) - Communicate the splits to all cores (
$n \cdot t_{comm}$ ) - Let each core compute its density matrix
$D$ , complexity:- Brute-force
$\Theta(N^3n / p^2)$ - K-d tree
$\Theta(N^3\log(n/p)/p)$ - Restrict search space
$\Theta((NR)^3 n/p)$
- Brute-force
- Print
$D$ to the file ($p \cdot t_{seek} + N^3 \cdot t_{write}$ )
- Space complexity:
- Brute-force
$\Theta(batch \cdot n/p)$ - K-d tree
$\Theta(batch \cdot n/p)$ - Restrict search space
$\Theta(n/p + N^3/p)$
- Brute-force
This method is simpler than the first, however it needs a copy of the
- Read the
$x_i$ from the file ($t_{seek} + n \cdot t_{read}$ ) - Let each core compute its density matrix
$D$ , complexity:- Brute-force
$\Theta(N^3n / p)$ - K-d tree
$\Theta(N^3\log(n/p))$ - Restrict search space
$\Theta((NR)^3 n/p)$
- Brute-force
- Sum all the density matrices (
$N^3 \log p \cdot t_{comm} + N^3 \log p \cdot t_{sum}$ ) - Print
$D$ to the file ($p \cdot t_{seek} + N^3 \cdot t_{write}$ )
- Space complexity:
$\Theta(n/p + pN^3)$
The first method that I will implement is restrict search space with
- The memory can contain the matrix
$D$ and all the points$n$ - A single core can saturate all the bandwidth of the disk read and write
Below I list some features to add to this algorithm in order of difficulty
- Allow for D matrices bigger than the available memory
- Allow for
$n$ points bigger than the available memory - Implement the k-d tree algorithm