ecmwf-ifs · samhatfield · Jul 8, 2024 · Jul 8, 2024 · Jul 9, 2024 · Jul 9, 2024
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,5 @@ build/*
 install/*
 env.sh
 *.DS_Store
-
+docs/site
+docs/content_processed
diff --git a/docs/content/api.md b/docs/content/api.md
@@ -0,0 +1,41 @@
+---
+title: ecTrans API
+---
+
+@warning
+Page under construction.
+@endwarning
+
+## General notes
+
+@note
+ecTrans is a legacy code with an accumulated 30 years of history. Over this time certain
+features enabled through optional arguments will have fallen out of use. We are currently reviewing
+all options to identify those that can be safely deleted, but this takes time. In the mean time, we
+have tagged below all options we deem to be "potentially deprecatable".
+@endnote
+
+### Variable names
+
+ecTrans _in principle_ follows the coding standard and conventions outlined in the [IFS
+Documentation - Part VI: Technical and Computational Procedures](https://www.ecmwf.int/en/elibrary/
+81372-ifs-documentation-cy48r1-part-vi-technical-and-computational-procedures) section 1.5.
+Following these standards, all variable names must begin with a one- or two-character prefix
+denoting their scope (module level, dummy argument, local variables, loop index, or parameter) and
+type. These are outlined in Table 1.2. Dummy variables have the following prefixes:
+
+- `K` - integer
+- `P` - real (single or double precision)
+- `LD` - logical
+- `CD` - character
+- `YD` - derived type
+
+### `KIND` parameters
+
+As with the IFS, integer and real variables in ecTrans always have an explicit `KIND` specification.
+These are defined in the [`PARKIND1` module](https://github.com/ecmwf-ifs/fiat/blob/main/src/parkind
+/parkind1.F90) which is part of the Fiat library (a dependency of ecTrans). To understand the
+subroutines described here, only two must be considered:
+
+- `INTEGER, PARAMETER :: JPIM = SELECTED_INT_KIND(9)` (i.e. 4-byte integer)
+- `INTEGER, PARAMETER :: JPRD = SELECTED_REAL_KIND(13,300)` (i.e. 8-byte float)
diff --git a/docs/content/benchmarking.md b/docs/content/benchmarking.md
@@ -0,0 +1,122 @@
+---
+title: Benchmarking ecTrans
+---
+
+@warning
+Page under construction.
+@endwarning
+
+A ["benchmark driver" program](https://sites.ecmwf.int/docs/ectrans/sourcefile/ectrans-benchmark.
+f90.html) is bundled with ecTrans. This program performs a loop of inverse and
+direct spectral transforms over and over a specified number of times and collects timing statistics
+to provide an assessment of the overall performance of ecTrans. It is designed to mimic the use of
+ecTrans from within the IFS atmospheric model, in which inverse and direct spectral transforms are
+carried out on every model timestep. The benchmark program also includes a simple error checking
+algorithm for verifying that the transforms are performing with correct numerics. This latter
+feature is in fact used for the ecTrans CTest suite.
+
+Here we describe how to write a benchmark suite for ecTrans.
+
+## Installing ecTrans
+
+First follow the [instructions for installing ecTrans](installation.html) on your system. Verify
+that the benchmark programs (one for single and double precision) exist in your build's bin
+directory. You should see
+
+```bash
+ectrans-benchmark-cpu-sp  ectrans-benchmark-cpu-dp
+```
+
+Here we assume you have only enabled the `CPU` feature of ecTrans (which is on by default). If you
+also enabled the `GPU` feature, you'll also see GPU versions of these two programs. We'll just focus
+on CPUs here.
+
+## Using the benchmark program
+
+The benchmark program has many arguments for running ecTrans in different configurations. You can
+see the full set by running one of the benchmark programs with the `--help` option:
+
+```
+NAME    ectrans-benchmark-cpu-sp
+
+DESCRIPTION
+        This program tests ecTrans by transforming fields back and forth between spectral
+        space and grid-point space (single-precision version)
+
+USAGE
+        ectrans-benchmark-cpu-sp [options]
+
+OPTIONS
+    -h, --help          Print this message
+    -v                  Run with verbose output
+    -t, --truncation T  Run with this triangular spectral truncation (default = 79)
+    -g, --grid GRID     Run with this grid. Possible values: O<N>, F<N>
+                        If not specified, O<N> is used with N=truncation+1 (cubic relation)
+    -n, --niter NITER   Run for this many inverse/direct transform iterations (default = 10)
+    -f, --nfld NFLD     Number of scalar fields (default = 1)
+    -l, --nlev NLEV     Number of vertical levels (default = 1)
+    --vordiv            Also transform vorticity-divergence to wind
+    --scders            Compute scalar derivatives (default off)
+    --uvders            Compute uv East-West derivatives (default off). Only when also --vordiv is given
+    --flt               Run with fast Legendre transforms (default off)
+    --nproma NPROMA     Run with NPROMA (default no blocking: NPROMA=ngptot)
+    --norms             Calculate and print spectral norms of transformed fields
+                        The computation of spectral norms will skew overall timings
+    --meminfo           Show diagnostic information from FIAT's ec_meminfo subroutine on memory usage, thread-binding etc.
+    --nprtrv            Size of V set in spectral decomposition
+    --nprtrw            Size of W set in spectral decomposition
+    -c, --check VALUE   The multiplier of the machine epsilon used as a tolerance for correctness checking
+
+DEBUGGING
+    --dump-values       Output gridpoint fields in unformatted binary file
+```
+
+Some of these options (e.g. `-nprtrv`) require a detailed understanding of how fields are
+distributed across MPI tasks, so we won't describe them in detail here. The most important arguments
+are the following:
+- `-t, --truncation T`: this sets the overall resolution of the benchmark. The truncation T refers  
+  to the highest zonal and total wavenumber that can be kept in spectral space. By default, a  
+  suitable grid point resolution (i.e. a suitable number of latitudes on the octahedral grid) will  
+  be chosen for spectral space. This single number then determines the overall problem size of the  
+  spectral transform.  The higher this number, the larger the problem size. As of August 2024, the  
+  "HRES" (high-resolution, deterministic) forecast of ECMWF uses a spectral truncation of 1279,  
+  combined with an octahedral grid of 2560 latitudes, which gives a grid point resolution of  
+  approximately 8 km.
+- `-n, --niter NITER`: this determines how many iterations to perform in the spectral transform.  
+  The more interations you perform, the more reliable the timing statistics you gather. Note that  
+  two additional iterations are always performed at the start. This is because (at least for the  
+  GPU version of ecTrans) the first two iterations include some initialisation costs which  
+  shouldn't be included in any timing statistics.
+- `-l, --nlev NLEV`: this determines the number of vertical levels for three-dimensional fields  
+  such as U and V wind (or vorticity and divergence). ecTrans can operate on a batch of vertical  
+  levels with a single call and this determines the size of this batch (though by default, fields  
+  are distributed across MPI tasks on the vertical dimension at some stages in the spectral  
+  transform)
+- `--vordiv --scders --uvders`: these options enable some auxiliary code paths when calling the  
+  inverse transform. `--vordiv` calculates grid point vorticity and divergence, `--scders`  
+  calculates derivatives of scalar fields in grid point space, and `--uvders` calculates gradients  
+  of the U and V wind in grid point space. For testing code changes, it's good to include these  
+  options so as many code paths as possible are verified.
+- `--norms`: this option enables error norms, which are printed aggregated over all fields at the  
+  end of the benchmark. The errors are computed in spectral space with respect to the initial  
+  values of the fields. This is useful to get a good idea that the benchmark is numerically  
+  correct.
+
+
+
+
+
+
+
+
+
+When inspecting the program, you will notice that it is significantly more complex than, say, the
+example program described in our [usage guide](usage.html). This additional complexity comes not
+just from the instrumentation code for the timings, but notably also from the infrastructure to
+permit transforms of distributed fields. As explained in the [introduction](introduction.html),
+ecTrans can operate on fields distributed across MPI tasks, and the dimension across which fields
+are split is different for spectral space and grid point space. As such, the benchmark program
+includes infrastructure for specifying which elements of the relevant decomposed dimension belong
+to which MPI task.
+
+
diff --git a/docs/content/gpu.md b/docs/content/gpu.md
@@ -0,0 +1,7 @@
+---
+title: GPU offloading
+---
+
+@warning
+Page under construction.
+@endwarning
diff --git a/docs/content/img/spherical_harmonic.png b/docs/content/img/spherical_harmonic.png
diff --git a/docs/content/index.md b/docs/content/index.md
@@ -0,0 +1,67 @@
+---
+title: User Guide
+ordered_subpage: introduction.md
+ordered_subpage: installation.md
+ordered_subpage: usage.md
+ordered_subpage: benchmarking.md
+ordered_subpage: api.md
+ordered_subpage: transi.md
+ordered_subpage: license.md
+copy_subdir: img
+---
+
+@warning
+Page under construction.
+@endwarning
+
+## Introduction
+
+ecTrans is a high-performance numerical library for transforming meteorological fields between
+global grid-point space representation and a spectral representation based on spherical harmonics.
+It is a fundamental part of the
+[European Centre for Medium-Range Weather Forecasts'](https://www.ecmwf.int/)
+[Integrated Forecasting System (IFS)](https://www.ecmwf.int/en/forecasts/documentation-and-support/
+changes-ecmwf-model), a global numerical weather prediction suite. Indeed, ecTrans
+was previously part of the IFS source code itself. It therefore benefits from over 30 years of
+development and optimisation. In 2022, ecTrans was split out from the IFS source code and released
+as the first open-source component of the IFS as its own project.
+
+ecTrans is engineered to work efficiently running on many hundreds, or even thousands, of compute
+nodes. This is achieved through a significant optimisation of the constituent compute kernels,
+making use of the FFTW library for the Fourier transform in the longitudinal direction and BLAS
+GEMMs for the Legendre transforms in the latitudinal direction. However, given that transformed
+fields are distributed across compute tasks, great care has been taken to ensure that parallelism
+can be fully exploited at all stages in the algorithm. This is achieved through data exchange steps
+interleaved between the Fourier and Legendre transforms, which are implemented using the Message
+Passing Interface (MPI).
+
+The result is an algorithm which stresses a high-performance computing system both on the
+node level _and_ the network level, serving as an excellent overall benchmark and a target for
+optimisation of IFS execution speed.
+
+## The spectral transform
+
+ecTrans transforms a batch of meteorological fields from a grid point space representation
+\( X_k(\lambda_i, \phi_j) \), where \( \lambda_i \) is the \( i^{\text{th}} \) longitude,
+\( \phi_j \) is the \( j^{\text{th}} \) latitude, and \( k \) is the index which ranges over the
+batch of fields, to a spectral space representation \( X_{m,n,k} \), where \( m \) is the zonal
+wavenumber, and \( n \) is the total wavenumber. This constitutes a direct spectral transform.
+ecTrans can also carry out the inverse spectral transform.
+
+Beginning with the inverse spectral transform (spectral space to grid point space), this is
+accomplished in two computational steps. Firstly, an inverse Legendre transform is performed in the
+latitudinal direction,
+
+\[
+X_{m,k}(\phi_j) = \sum_{n=|m|}^{N} X_{m,n,k} P_{m,n}(\sin(\phi_j)).
+\]
+
+Then, an inverse Fourier transform is performed in the longitudinal direction,
+
+\[
+X_k(\lambda_i, \phi_j) = \sum_{n=-N}^{N} X_{m,k}(\phi_j) e^{im\lambda_i}.
+\]
+
+## Parallelizing a spectral transform
+
+## Basic usage of ecTrans
diff --git a/docs/content/installation.md b/docs/content/installation.md
@@ -0,0 +1,131 @@
+---
+title: Installing ecTrans
+---
+
+ecTrans relies on CMake for building and for unit testing. It follows standard CMake procedures for
+building out-of-source, but below we describe this explicitly for newcomers to CMake.
+
+## Requirements
+
+ecTrans has the following requirements:
+
+- A [CMake](https://cmake.org/) with version >= 3.12
+- [ecBuild](https://github.com/ecmwf/ecbuild.git) (a collection of ECMWF-specific CMake macros)
+- A Fortran compiler with OpenMP support and a C compiler. Officially we support:
+    - Classic Intel (i.e. ifort and icc)
+    - GNU
+    - NVHPC
+    - Cray
+- [FIAT: the Fortran IFS and Arpege Toolkit](https://github.com/ecmwf-ifs/fiat)
+- [FFTW](https://www.fftw.org/)
+- A library containing standard BLAS routines such as [LAPACK](https://www.netlib.org/lapack/)
+
+Note that you will also of course need a MPI library if you want to run ecTrans with distributed
+memory parallelism, but this is not an explicit dependency of ecTrans. Instead MPI functionality is
+provided through a wrapper library provided by FIAT. It is the latter that must be built with MPI
+support. In any case, ecTrans can still be tested without MPI with only shared memory parallelism
+support.
+
+For all of these except for FIAT and ecBuild we cannot give general instructions as it depends
+entirely on your system and software environment. Most modern high-performance computing systems
+should have these installed already, so we will assume this is the case for you as well.
+
+ecBuild can simply be cloned from GitHub like so:
+
+```bash
+git clone https://github.com/ecmwf/ecbuild.git --branch 3.8.2 --single-branch
+```
+
+It does not require a build or installation step. Simply export a variable `ecbuild_DIR` pointing to
+the cloned repository. 
+
+@note
+We are always willing to add other compilers to our list, so please
+[raise an issue](https://github.com/ecmwf-ifs/ectrans/issues) if you are encountering
+difficulties with a particular compiler.
+@endnote
+
+## (Optional) Prepare env and toolchain files
+
+Preparing these files is not strictly necessary, but is recommended to ease reproducibility. An env
+file (called env.sh below) should contain all of the steps required to make the dependencies of FIAT
+and ecTrans available on the path, and otherwise prepare the environment for their building and
+execution. For example, the env file on a system that uses modules might load the modules for CMake,
+the compiler suite, FFTW, and the BLAS library. The toolchain file (called toolchain.cmake below) 
+contains definitions of CMake variables which will be needed to build FIAT and ecTrans correctly.
+The most important are the compilers. The toolchain file should set the values of
+`CMAKE_C_COMPILER`, `CMAKE_CXX_COMPILER`, and `CMAKE_Fortran_COMPILER`. The toolchain file for
+NVHPC will look like this, for example:
+
+```cmake
+set(CMAKE_C_COMPILER nvc)
+set(CMAKE_CXX_COMPILER nvc++)
+set(CMAKE_Fortran_COMPILER nvfortran)
+```
+
+To get CMake to always detect ecBuild, add this to your env file:
+
+```bash
+export ecbuild_DIR=<path to ecBuild>
+```
+
+We will assume both the env file (env.sh) and the toolchain file (toolchain.cmake) are in the
+current directory. Now source the env file:
+
+```bash
+source env.sh
+```
+
+## Building FIAT
+
+First clone the latest version of the FIAT repository:
+
+```bash
+git clone https://github.com/ecmwf-ifs/fiat.git -b 1.4.1
+```
+
+Then run the configure step for FIAT (you can leave out `-DCMAKE_TOOLCHAIN_FILE` if you don't want
+to bother with toolchain files):
+
+```bash
+cmake -S fiat -B fiat/build -DCMAKE_TOOLCHAIN_FILE=$PWD/toolchain.cmake
+```
+
+Now run the build step (adjusting the level of multithreading as required with `-j`):
+
+```bash
+cmake --build fiat/build -j 4
+```
+
+## Building ecTrans
+
+Clone the latest version of the ecTrans repository:
+
+```bash
+git clone https://github.com/ecmwf-ifs/ectrans.git -b 1.3.2
+```
+
+Then run the configure step, making sure to pass the location of the FIAT build directory to CMake:
+
+```bash
+cmake -S ectrans -B ectrans/build -DCMAKE_TOOLCHAIN_FILE=$PWD/toolchain.cmake \
+  -Dfiat_ROOT=$PWD/fiat/build
+```
+
+Now run the build step:
+
+```bash
+cmake --build ectrans/build -j 4
+```
+
+You should now find in the ecTrans build directory the ecTrans libraries in the lib subdirectory
+(single- and double-precision versions) and the standalone ecTrans benchmarking binary in the bin
+subdirectory. We strongly recommend you run the full CTest suite to check everything's installed
+correctly:
+
+```bash
+cd ectrans/build
+ctest
+```
+If any of the tests fail, please let us know by
+[raising an issue](https://github.com/ecmwf-ifs/ectrans/issues)!