Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable double caching: CPU cache and GPU cache #123

Merged
merged 7 commits into from
Jan 28, 2025
Merged

Enable double caching: CPU cache and GPU cache #123

merged 7 commits into from
Jan 28, 2025

Conversation

huiyuxie
Copy link
Member

@huiyuxie huiyuxie commented Jan 25, 2025

Fix #59.

The GPU-only cache strategy causes the indicator functions to run slower on GPU arrays. We are enabling an additional cache on the CPU as a hotfix strategy. However, not all data needs to be stored on both the GPU and CPU, so we should differentiate them in the future.

The latest benchmark results for Euler equations for shock capturing should be given here.

@huiyuxie huiyuxie added the refactoring Refactoring label Jan 25, 2025
@huiyuxie
Copy link
Member Author

1D benchmark

[ Info: Time for reset_du! and volume_integral! on GPU
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  35.300 μs … 361.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     42.200 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   52.005 μs ±  22.623 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▆██▇▆▄▃▃▂▁▂▄▄▃▂▃▃▂▂▁▁  ▁▁ ▁▁▁              ▁▁▁ ▁  ▁         ▂
  █████████████████████████████████▇███████▇▆███████████▇▆▄▆▄▅ █
  35.3 μs       Histogram: log(frequency) by time       128 μs <

 Memory estimate: 3.94 KiB, allocs estimate: 40.

2D benchmark

[ Info: Time for reset_du! and volume_integral! on GPU
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  197.100 μs …   5.657 ms  ┊ GC (min … max): 0.00% … 23.73%
 Time  (median):     262.100 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   282.205 μs ± 163.339 μs  ┊ GC (mean ± σ):  2.99% ±  5.30%

   ▄▇█▆▆▃▁▂▂▄▄█▇▂
  ▄████████████████▆▆▆▅▅▄▄▄▄▄▄▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  197 μs           Histogram: frequency by time          553 μs <

 Memory estimate: 258.50 KiB, allocs estimate: 42.

3D benchmark

[ Info: Time for reset_du! and volume_integral! on GPU
BenchmarkTools.Trial: 3373 samples with 1 evaluation.
 Range (min … max):  1.183 ms …   5.663 ms  ┊ GC (min … max): 0.00% … 67.39%
 Time  (median):     1.434 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.478 ms ± 296.514 μs  ┊ GC (mean ± σ):  1.62% ±  5.84%

    ▁▅▇██▅▃▂
  ▃▅████████▇▄▄▃▃▃▂▂▂▂▂▂▂▂▂▁▂▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▂▁▁▁▂▂ ▃
  1.18 ms         Histogram: frequency by time        3.57 ms <

 Memory estimate: 642.64 KiB, allocs estimate: 43.

They are still larger than what we obtained from the profiling - something other than the GPU kernels (e.g., allocating the CPU arrays) is dominating the overall runtime.

@huiyuxie huiyuxie merged commit 7e54d52 into main Jan 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scalar indexing on GPU arrays caused by indicator functions
1 participant