Skip to content

Commit

Permalink
Add diffusion model stats
Browse files Browse the repository at this point in the history
  • Loading branch information
Liam Berrisford committed Nov 15, 2024
1 parent 6ba943e commit b54cead
Show file tree
Hide file tree
Showing 3 changed files with 109 additions and 4 deletions.
8 changes: 5 additions & 3 deletions course/lessons/03_improving_performance_with_cupy.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ hide:
- toc
---

# Leveraging GPUs
# Example Project Overview

To highlight the difference between NumPy and CuPy, a 3D temperature diffusion model is used to highlight the difference in performance that can be achieved for computationally intensive tasks.

Expand Down Expand Up @@ -79,6 +79,8 @@ The command above will create an interactive HTML file, that will have each time

<center>[View Plot in Seperate Tab](../_static/temperature_slice.html)</center>

When run within your own space the file produced will be `output/original_temperature_2d_interactive.html`.

### Visualise Cube - Interactive HTML file

Visualizing a 3D temperature slice in an interactive HTML file, allowing for a time series to be visualised.
Expand All @@ -87,16 +89,16 @@ Visualizing a 3D temperature slice in an interactive HTML file, allowing for a t
poetry run visualise_cube --num_depths 5 --num_time_steps 3
```


The command above will create an interactive HTML file, that will visualise the first 5 depth, for 3 time steps. For the above command the output producded will be:

<div class="responsive-container">
--8<-- "course/_static/temperature_cube.html"
</div>


<center>[View Plot in Seperate Tab](../_static/temperature_cube.html)</center>

When run within your own space the file produced will be `output/original_temperature_3d_interactive.html`.

## Summarising Data

Calculates and prints summary statistics for temperature data in a specified NetCDF file. Prints its mean, max, min, and standard deviation. Also provides information about the dataset’s dimensions and coordinates.
Expand Down
102 changes: 102 additions & 0 deletions course/lessons/04_running_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Leaveaging GPUs

## Pseudocode

The psuedocode that implements the diffusion loop is:

``` plaintext
1. For each timestep from 1 to num_timesteps:
2. Copy the current temperature values to a temporary array (temp_copy)
3. Initialize arrays for neighbor sums and neighbor counts with zeros
4. For each valid cell (ignoring boundaries):
5. Calculate the sum of neighboring cells:
- Add the value of the front neighbor if valid
- Add the value of the back neighbor if valid
- Add the value of the left neighbor if valid
- Add the value of the right neighbor if valid
- Add the value of the top neighbor if valid
- Add the value of the bottom neighbor if valid
6. Count the number of valid neighbors for each direction
7. Update the cell's temperature:
- New temperature = current temperature + diffusion coefficient * (neighbor_sum - 6 * current temperature) / neighbor_count
8. Ensure invalid points (NaN) remain unchanged
9. Update the main temperature array with the new values
```

## Running with NumPy

``` bash
poetry run diffusion_numpy --num_timesteps 100
```

The above command will run the 3D diffusion model using the NumPy version of the code for 100 timesteps. Once the execution has finished then a report will be provided concerning the time taken for execution. When running on an AMD EPYC 7552 48-Core Processor, the execution outputs:

``` plaintext
NumPy model completed in 489.2647 seconds. Average time per timestep: 4.8926 seconds.
```

You can visualise the model outputs producded with

``` bash
poetry run visualise_slice --target_depth 0 --animation_speed 100 --data_file predicted_temperatures_numpy.nc
```

Of note is that the file `predicted_temperatures_numpy.nc` is generated during the execution of the above command for the script `diffusion_numpy`. This will then generate a new interactive HTML file `output/predicted_temperature_2d_interactive.html`.


## Running With CuPy

As the same code has been wrote in CuPy you can experiment with the difference between CPU and GPU code with the following:

``` bash
poetry run diffusion_cupy --num_timesteps 100
```

The above command will run the 3D diffusion model using the CuPy version of the code for 100 timesteps. Once the execution has finished then a report will be provided concerning the time taken for execution. When running on an NVIDIA A40 GPU, the execution outputs:

``` plaintext
CuPy model completed in 171.9884 seconds. Average time per timestep: 1.7199 seconds.
```

You can visualise the model outputs producded with

``` bash
poetry run visualise_slice --target_depth 0 --animation_speed 100 --data_file predicted_temperatures_cupy.nc
```

Of note is that the file `predicted_temperatures_numpy.nc` is generated during the execution of the above command for the script `diffusion_numpy`. This will then generate a new interactive HTML file `output/predicted_temperature_2d_interactive.html`.

## Performance Comparison: CPU vs GPU

### Overall Speedup
- **CPU runtime**: 489 seconds
- **GPU runtime**: 171.9884 seconds
- **Speedup factor**:
\[
\text{Speedup} = \frac{\text{CPU time}}{\text{GPU time}} = \frac{489}{171.9884} \approx 2.84
\]
The GPU completed the task approximately 2.84 times faster than the CPU.

### Per-Timestep Speedup
- **CPU average timestep**: 4.9 seconds
- **GPU average timestep**: 1.7199 seconds
- **Speedup factor per timestep**:
\[
\text{Speedup per timestep} = \frac{\text{CPU timestep}}{\text{GPU timestep}} = \frac{4.9}{1.7199} \approx 2.85
\]
On a per-timestep basis, the GPU is about 2.85 times faster.

### Efficiency Observation
- The consistent speedup factor (both overall and per timestep) suggests that the GPU effectively parallelizes computations without significant overhead from data transfer or kernel launches.

### Implications
- **Computational Efficiency**:
Using a GPU provides substantial performance gains, especially for tasks with repetitive, parallelizable computations such as numerical modeling or simulations.
- **Observed Speedup** (~2.84x improvement) suggests:
- The task is well-suited for GPU acceleration.
- Full potential of the GPU might not yet be realized due to:
- Limited parallelism in the workload.
- Overheads from memory transfers between CPU and GPU.
- Suboptimal use of GPU-specific optimizations.

The GPU's performance significantly outpaces the CPU for this task, reducing runtime by approximately 65%. Of note is that this approach is simply a direct move from NumPy to CuPy which represents a minimal amount of effort. Further optimization of the GPU code could enhance performance and exploit its full potential, leveraging on known time intensive tasks for GPUs such as data transfer.
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ nav:
- Setup: lessons/00_setup.md
- Spack: lessons/01_spack.md
- NumPy and CuPy: lessons/02_numpy_and_cupy.md
- Leveraging GPUs: lessons/03_improving_performance_with_cupy.md
- Example Project Overview: lessons/03_improving_performance_with_cupy.md
- Leveraging GPUs: lessons/04_running_models.md
- Tips and Tricks: tips_and_tricks.md
- Spack Cheat Sheet: spack_cheat_sheet.md

0 comments on commit b54cead

Please sign in to comment.