Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures in ensemble GETKF linear observer: obs types, ens members, memory #455

Open
metdyn opened this issue Nov 12, 2024 · 0 comments
Open

Comments

@metdyn
Copy link
Collaborator

metdyn commented Nov 12, 2024

Finding 1:

When testing observations for GETKF with linear observer, I find the following observations will result in code freeze (fv3jedi_letkf.x) for 2 member C12 case (Lev=72) with 1 node (120cores) or 2 node. [jedi version: 08/31/24]

Obs list that fails LGETKF code
   - iasi_metop-b
   - iasi_metop-c
   - cris-fsr_n20
   - cris-fsr_npp
   - gmi_gpm
   - gps

Finding 2:

Excluding the above list, I am having difficulties to run GETKF with linear observers for 32 members for C90 case (LM=72).

------------------------------------------------------------------------------------------
C90 + L72 + GETKF + Linear_obs + SCU17 + letkf.x [08/31/24] + 120core/node
------------------------------------------------------------------------------------------
#ob_tot #memb #node  timing(s)  per_task_mem(Gb)  final_tot_mem (Gb)
------------------------------------------------------------------------------------------
ob_1_3     3     6      698.       1.35 / 3.49      1322.34
ob_1_6     3     6      775.96     1.51 / 1.89      1799.64
ob_1_12    3     6      1202.10    3.55 / 3.84      2828.68
ob_1_18    3     6      Fail
ob_1_18    3     12     Fail       4.21  stop at starting ensemble member 1/OOPS_STATS GETKFSolver calculate hofx
ob_1_24                 NA

ob_1_3     32     12     11887.78    1.57 / 2.79    3538.89
ob_1_6     32     12     12146.14    2.52 / 3.25   4755.65
ob_1_12    32     12     Fail
ob_1_18
Obs list that can work with LGETKF code
   1       - sondes
   2       - amsua_aqua
   3       - amsua_metop-b
   4       - amsua_metop-c
   5       - amsua_n15
   6       - amsua_n18
   7       - amsua_n19
   8       - amsr2_gcom-w1
   9       - atms_n20
  10       - atms_npp
  11       - avhrr3_metop-b
  12       - avhrr3_n18
  13       - avhrr3_n19
  14       - scatwind
  15       - sfcship
  16       - sfc
  17       - mhs_metop-b
  18       - mhs_metop-c
  19       - mhs_n19
  20       - mls55_aura
  21       - omi_aura
  22       - ompsnm_npp
  23       - pibal
  24       - ssmis_f17
  25       - airs_aqua

Reasoning:

For contrast, I show the HX calculation below. The logic seems to be

  • HX calculation is memory consuming. For SCU17 node, which has about 480 GB memory, I should use half the node to reserve more memory.
  • Ens LGEKF, memory per task does not scale with members which is right.
------------------------------------------------------------------------------------------
C90 + L72 + HofX + SCU17 + fv3-jedi.x [08/31/24] + 24 cors
------------------------------------------------------------------------------------------
#ob_tot  #node        timing(s) per_task_mem(Gb)  final_tot_mem (Gb)
------------------------------------------------------------------------------------------
ob_1_33   1node [24 core]  356.45    5.91 / 7.36       154.75
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant