Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BENCH: set up benchmarks for the core transforms. #22

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

DWesl
Copy link
Contributor

@DWesl DWesl commented Aug 31, 2024

I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.

Benchmarks based on airspeed velocity. Run directions and results below.

$ python -m asv run
· Creating environments
· Discovering benchmarks..
·· Uninstalling from virtualenv-py3.9.
·· Building 182d0348 <patch-2> for virtualenv-py3.9....
·· Installing 182d0348 <patch-2> into virtualenv-py3.9....
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pyspharm commit 182d0348 <patch-2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.9
[25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--).
[50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--).
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                       ok
[75.00%] ··· ======== ============ ============
             --                 method
             -------- -------------------------
              ntrunc    computed      stored
             ======== ============ ============
                21      97.2±4μs     72.6±7μs
                42      366±30μs     286±30μs
                63      939±80μs    713±300μs
                84     3.28±0.7ms   2.96±0.5ms
               127      8.89±1ms     6.35±2ms
               168      33.2±1ms     35.0±8ms
               252      90.8±5ms     102±10ms
             ======== ============ ============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                     ok
[100.00%] ··· ======== ============ ============
              --                 method
              -------- -------------------------
               ntrunc    computed      stored
              ======== ============ ============
                 21     115±100μs    85.2±20μs
                 42      249±40μs     184±20μs
                 63      759±90μs     448±20μs
                 84     1.47±0.2ms   1.00±0.1ms
                127     5.67±0.3ms   4.63±0.2ms
                168      10.6±1ms     11.2±2ms
                252      30.9±3ms    27.7±10ms
              ======== ============ ============

DWesl added 4 commits August 31, 2024 10:27
I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance.
There's a decent chance the compilers see if-then-else the same way as a block of gotos,
but if not the more structured code might be easier to optimize.
@DWesl
Copy link
Contributor Author

DWesl commented Sep 7, 2024

Re-run of the benchmarks after passing complex coefficients to spectogrd. The 20% variation in the untouched grdtospec benchmark is apparently normal, so this would be useful primarily for finding large changes from one commit to the next, rather than tracking changes over time.

[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                      ok
[75.00%] ··· ======== ============= =============
             --                  method
             -------- ---------------------------
              ntrunc     computed       stored
             ======== ============= =============
                21       78.9±8μs      62.9±2μs
                42       296±9μs       228±2μs
                63      664±200μs     586±100μs
                84     2.88±0.04ms   2.70±0.04ms
               127      6.40±0.5ms    5.36±0.4ms
               168      28.1±0.7ms    27.2±0.5ms
               252       72.0±2ms      73.1±4ms
             ======== ============= =============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                    ok
[100.00%] ··· ======== ============ =============
              --                 method
              -------- --------------------------
               ntrunc    computed       stored
              ======== ============ =============
                 21      64.8±2μs     47.5±0.9μs
                 42      214±3μs       135±8μs
                 63      529±80μs      348±10μs
                 84     1.21±0.3ms     752±80μs
                127     4.73±0.3ms   3.87±0.02ms
                168     8.86±0.9ms    8.07±0.2ms
                252      27.1±1ms      23.4±1ms
              ======== ============ =============

@jswhit
Copy link
Owner

jswhit commented Sep 21, 2024

Thanks for all your recent contributions @DWesl! Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope. The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.

@DWesl
Copy link
Contributor Author

DWesl commented Sep 23, 2024

Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope.

My first thought was to wrap the files not exposed to python in Fortran 90 modules, but that broke the Python binding, so I'm inclined to agree with you.

The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.

I think there's one other, which uses Clenshaw-Curtis quadrature rather than Gaussian for that integration, but it seems it upscales the grid internally as well.

I think there's a paper some while back showing exact transforms on a regular lat-lon grid can't happen without 2N latitudes, but frequently what we can get with N+1 latitudes is fine. (There's other papers investigating similar things with how well Clenshaw-Curtis quadrature deals with polynomials beyond the degree for which it is exact, compared to Gaussian quadrature, and extending that to certain rational functions). As partial evidence for that, the fast Legendre transform in the IFS is also approximate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants