-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFDL to main (2023-04-06) #1599
GFDL to main (2023-04-06) #1599
Conversation
Created the new module remapping_attic to hold older versions of remapping code that are no longer used by MOM6. The subroutines is PosSumErrSignificant, remapByProjection, remapByDeltaZ and integrateReconOnInterval were moved to remapping_attic, where they can be tested by calling remapping_attic_unit_tests. The hard-coded old_algorithm logical module variable and the code it wraps were also eliminated. Also added a schematic description of the units of the real variables in the various routines in MOM_remapping and corrected some spelling errors. All answers are bitwise identical.
Moved interpolate_column and reintegrate_column (without changing anything) from MOM_diag_vkernels.F90 to MOM_remapping.F90 and incorporated the tests that had been in diag_vkernels_unit_tests into remapping_unit_tests. The entire MOM_diag_vkernels.F90 file was then removed. All answers are bitwise identical, although the module for two public routines was changed and a third was eliminated.
Remove missing_value arguments to interpolate_column and reintegrate_column, instead using 0 for the values in vanished cells. This change helps to address github.com/mom-ocean/issues/769. Also added comments schematically describing some of the argument units. Because 0 was already being used for the missing value (except in unit tests), all solutions are bitwise identical.
Added the new subroutine check_remapped_values with the duplicative error checking code in remapping_core_h and remapping_core_w, both to reduce code volume and promote code coverage, and to make the substance of these two routines easier to follow. All answers are bitwise identical.
- Adds `.gitlab/pipeline-ci-tool.sh` to enact most of the stages of the gitlab CI pipeline - enables interactive/command-line reproduction of the pipeline - `.gitlab/pipeline-ci-tool.sh` is documented in .gitlab/README.md with instructions on how to use at the command line and what each function is doing - All commands formerly in .gitlab-ci.yml are now one-line invocations of `.gitlab/pipeline-ci-tool.sh` so .gitlab-ci.yml is now considerably smaller and easier to read with statements like `.gitlab/pipeline-ci-tool.sh mrs-compile debug_gnu` or `.gitlab/pipeline-ci-tool.sh check-params gnu` - Previously, all results were compared again the stored regression answers. This meant that any error (e.g. layout) would show up as a fail of all types. We use the regression answers to check the repro-symmetric mode and then compare everything else to repro-symmetric or other results as appropriate. This allows us to distinguish between types of errors. The GH actions are doing it this way, and we originally did this in the first forms of the pipeline, but in the last re-factor I lazily switched to using the regression answers for everything.
- The subroutine categorize_axes cannot find the axes in ice restart files and gives warnings WARNING from PE 0: categorize_axes: Failed to identify x- and y- axes in the axis list (xaxis_1, yaxis_1, Time) of a variable being read from INPUT/ice_model.res.nc - This leads to an incorrect initializations and a subsequent sat.vap.press.overflow crash when using infra/FMS2 - Same experiment runs fine with infra/FMS1 - After the fix the infra/FMS1 and infra/FMS2 answers are bitwise identical
- The subroutine categorize_axes cannot find the axes in ice restart files and gives warnings WARNING from PE 0: categorize_axes: Failed to identify x- and y- axes in the axis list (xaxis_1, yaxis_1, Time) of a variable being read from I NPUT/ice_model.res.nc - This leads to an incorrect initializations and a subsequent sat.vap.press.overflow crash when using infra/FMS2 - Same experiment runs fine with infra/FMS1 - After the fix the infra/FMS1 and infra/FMS2 answers are bitwise identical
Added a line initializing the string Cartesian to a blank string in categorize_axes, so that it not be uninitialized when it is used a few lines later.
Set the interpolation weights inside of interpolate_column to explicitly be the complement of one another, thereby saving an extra division at each point and reducing the number of variables that need to be stored, in preparation for the creation of a separate subroutine to find interface positions. This commit is mathematically equivalent to what was there before, and the extensive unit testing of interpolate_column is still passing, but it changes the value of some interpolated interface diagnostics at the level of roundoff (but not the MOM6 solutions themselves, as they do not depend on interpolate_column yet).
This patch introduces a new autoconf macro, AX_FC_CHECK_C_LIB, which confirms if the Fortran compiler can be linked to the netCDF C library. As with other netCDF tests, the nc-config tool is used if necessary (and available). This resolves some recent issues on platforms where netCDF and netCDF-Fortran are installed in separate locations, with different library directories (-L). It also resolves some false assumptions in configure.ac which presumed equivalent access by the configured C and Fortran compilers. Previously, we would test if the C compiler could be linked to netCDF, and then assume that the Fortran compiler shared the same relationship. We now use the Fortran compiler for both C and Fortran tests. This patch fixes many issues observed on MacOS systems, including some persistent problems on the GitHub Actions MacOS tests. For example, we can now use the default GCC 12 compilers, rather than forcing a rollback to GCC 11.
This patch fixes some issues with testing of C bindings in Fortran. Specifically, some tests are using a C compiler which may be unconfigured, causing unexpected errors. The autoconf script now uses the Fortran compiler to test these bindings, rather than using the C compiler to test for their existence. A new macro (AX_FC_CHECK_BIND_C) was added to run these tests. This achieves the actual goal (test of Fortran binding) on top of the original goal (availability of C function), while ensuring that the actual compiler of interest (FC) is used in the test. Two C-based tests are still present in the script for testing the size of jmp_buf and sigjmp_buf. The C compiler is now configured with the AX_MPI macro, and is only used to determine the size of these structs.
* Setup OBC segments for COBALT/OBGC tracers - These are updates required to setup OBC segments for OBGC tracers. - Since COBALT package has more than 50 tracers using the MOM6 table mechanism for setting up OBC segments is not feasible. Rather, this update delegates such setup to mechanims used in ocean_BGS tracers leaving MOM6 mechanism for native tracers intact. - Fixed issues caught by MOM6 githubCI * Add capability to change obc segment update period - COBALT tracers do not need as frequent segment bc updates and can use a larger update period to speed up the model. This commit introduces a new parameter DT_OBC_SEG_UPDATE_OBGC that can be adjusted for obc segment update period. - This commit applies the change only to BGC tracers but can easily be changed to apply for all. * Insert missing US%T_to_sec - The unit conversion factor was missing causing a crash in a newer test. * Updates from Andrew Ross - Avoid low initial values in the tracer reservoirs * Per Andrew Ross review * corrected indentation per review * Avoid using module vars per review request - Reviewer asked to avoid using module variables with "save" attributes. - This commit hides the module variables inside the existing OBC type. * Coding style corrections per review * Modification per review: do_not_log if .not.associated(CS%OBC) Co-authored-by: Robert Hallberg <[email protected]>
In this PR an option is added to use ice viscosity computed from the observed surface velocity, computed by the model and use a constant value (for debugging purposes). A new (char) parameter "ICE_VISCOSITY_COMPUTE" is introduced; its values can be "MODEL" (the ice viscosity computed by the model); "OBS" the ice viscosity is computed at the preprocessing step and read from a file (its name is defined by the parameter "ICE_STIFFNESS_FILE") into a variable with a name defined by "A_GLEN_VARNAME" parameter; "CONSANT" is a constant value defined by a parameter "A_GLEN". These changes are in MOM_ice_shelf_dynamics.F90. Minor changes are done to MOM_ice_shelf_initialize.F90 to correct units, scales.
Update from main
Added calls to get_param to set 12 input variable names in files via runtime parameters, including TIDEAMP_VARNAME, TEMP_COORD_VAR, SALT_COORD_VAR, THICKNESS_IC_VAR, INTERFACE_IC_RESCALE, TEMP_IC_VAR, SALT_IC_VAR, BASIN_VAR, TIDAL_DISSIPATION_VAR, ROUGHNESS_VARNAME, TIDEAMP_VARNAME and KH_BG_2D_VARNAME. Also added two new runtime parameters, THICKNESS_IC_RESCALE and INTERFACE_IC_RESCALE, to allow input thickness and interface height fields to be rescaled. A number of spelling errors in comments or output messages in the files that were being modified as a part of this commit, including changes in the documentation that appears in MOM_parameter_doc files. All answers are bitwise identical, but there are new entries and minor changes in many MOM_parameter_doc files.
Added calls to get_param to set 4 more input variable names in files via runtime, including U_IC_VAR, V_IC_VAR, OPEN_DY_CU_VAR and OPEN_DX_CV_VAR. Also added or amended comments describing internal variables to describe their units more consistently in MOM_shared_initialization. All answers are bitwise identical, but there may be new entries in some MOM_parameter_doc files.
Corrected a bug in converting depths read from an input file from units of cm to m when the ER03 version of tidal mixing is used. This commit will change answers when INT_TIDE_DISSIPATION = True, USE_CVMix_TIDAL = True, and TIDAL_ENERGY_TYPE = "ER03". There are no such configurations in the MOM6-examples pipeline tests, and it is not clear whether or where such a configuration has ever been used. This bug was introduced into dev/gfdl on Nov. 19, 2018 as a part of PR mom-ocean#883 in commit 967e470, which was supposed to be a refactoring of this portion of the code without changing answers, but introduced this bug. This commit should restore solutions with impacted configurations to what they would have been before that earlier commit.
This patch removes the `build_{grid,data}.py` scripts from .testing's tc4, along with the setup of the Python infrastructure used in the .testing Makefile and GitHub Actions CI. The Python scripts have been replaced with equivalent Fortran programs which generate identical netCDF output. A new rule (`preproc`) has been added to the .testing top Makefile for generating the model input files. The netCDF compiler dependenices are configured with autoconf, currently duplicating the macros in `ac/configure.ac`. (NOTE: It may be possible to share these with a common macro in ac/m4. The configure script and Makefile are currently generated from `configure.ac` and `autoreconf`. In the future, we could simply pre-generate `configure` and add it to the repository. This patch was motivated by the inability to install recent netCDF-Python packages on systems with older gcc compilers, including our main production machine. We could have possibly resolved this by adding compiler configuration to pip, or perhaps reported the issue to the netCDF-python project for them to resolve. But the costs of relying on all this Python infrastructure is starting to exceed the benefits, and I would recommend we excise it from our test suite.
GitLab CI includes the internal testing suite (.testing) and included an explicit setup of the Python environment (`make work/local-env`). The rule has since been removed, and the command now fails. This patch removes those steps, since we no longer use Python in the tests. It also slightly reworks the reporting of test output. Instead of re-running `make test`, it uses the `make test.summary` rule to report the final result.
Added a new logical argument to interpolate_column to specify whether the interpolated interface values outside of massless layers should be masked to zero. Also refactored the code in interpolate_column to separate out the determination of the interface position from the interpolation and the masking to facilitate the extension of this code to use higher order interpolation in planned subsequent changes. All answers are bitwise identical, although there is a new mandatory argument for a public interface.
Added ALE_remap_interface_vals and ALE_remap_vertex_vals to handle the interpolation of variables that are at the interfaces atop tracer cells or above the corners of the tracers cells from one grid to another. Because these are not yet used (but have been tested in calls that will be added with the next commit) all answers are bitwise identical, but there are two new publicly visible routines.
Added REMAP_AUXILIARY_VARS to control whether to remap the accelerations that are used in the predictor step of the split RK2 time stepping scheme. Also added the new routines remap_dyn_split_rk2_aux_vars, remap_OBC_fields and remap_vertvisc_aux_vars to do the remapping, and code to call these routines when REMAP_AUXILIARY_VARS is true. By default, REMAP_AUXILIARY_VARS is false, and all answers are bitwise identical, but the entire MOM6-examples regression suite has been run with this set to true, and they do appear to give physically plausible answers in all cases, partially addressing the issue noted at github.com//issues/203. New entries are added to the MOM_parameter_doc files, and there are three new publicly visible routines, but by default answers do not change.
* Adds the option to set the diffusivity KHTH as horizontally varying * Can be enabled via READ_KHTH = True, filename is provided by user via KHTH_FILE * Will return error if user sets both READ_KHTH = True and KHTH > 0
* full file path is now set as INPUTDIR/KHTH_FILE, where both INPUTDIR and KHTH_FILE are runtime parameters
thickness diffusivity --> isopycnal height diffusivity
Corrected the units written to the output files for 4 diagnostics (CAu_Stokes, CAv_Stokes, area_shelf_h and sfc_mass_flux) and added missing units arguments to the get_param calls for some (mostly unlogged) parameters. The logged calls where units are added include those for EKE_MAX, NDIFF_DRHO_TOL, NDIFF_X_TOL, and IMPULSE_SOURCE_TIME, while some unnecessary carriage returns were removed in the descriptions of some of these and closely related parameters. Also added units to the comment describing the AGlen argument to initialize_ice_AGlen. All answers are bitwise identical, but there can be minor changes in the metadata of some files, and some MOM_parameter_doc and available_diags files might exhibit minor changes.
Added a missing scale factor in the DENSE_WATER_EAST_SPONGE_SALT get_param call in dense_water_initialize_sponges, and added comments describing the local variables (and their units) throughout the dense_water_initialization module. The variable set by DENSE_WATER_SILL_HEIGHT was unused and it probably was always intended to be DENSE_WATER_SILL_DEPTH, which it now is. Units arguments were also added to two of the unlogged get_param calls in this module. Without this change, this test case would not reproduce with dimensional rescaling due to a scale factor that was omitted when salinity was being rescaled on May 3, 2022, which became a part of PR #122 to dev/gfdl, but answers should not change when dimensional rescaling of salinities is not used. All answers and output in the MOM6-examples test suite are bitwise identical.
Removed meaningless units arguments from 31 get_param calls for integer, character, or logical parameters. All answers are bitwise identical, but some entries in the various parameter_doc files are changed.
now back to business (after almost 48 hours of travel). I ran one of UFS coupled case on wcoss2 (production machine) but found out that results are different from current baseline. Differences start from "v Predictor accel [uv]_accel_bt", see line 1705 in the "err" files. I put run log, restart and output files on GAEA at /lustre/f2/dev/ncep/Jiande.Wang/MOM6-update/20230406/M6-20230406. Files with "BM" are from current baseline runs, while "PR" are from this PR runs. Since this PR code works fine in at least two of our R&D machines, I suspect something related to precision on wcoss2. |
@jiandewang I looked at this with @Hallberg-NOAA and we believe that a likely source is the mixed layer restratification. The diffs in the barotropic solvers are in the halos of the data domain, rather than the compute domain, and are probably due to fixes in the checksum calculations, which had the incorrect halo widths. The first compute-domain divergence is in Old: MOM6/src/parameterizations/lateral/MOM_mixed_layer_restrat.F90 Lines 390 to 394 in 1e54bed
New: MOM6/src/parameterizations/lateral/MOM_mixed_layer_restrat.F90 Lines 388 to 389 in fc823f5
MOM6/src/parameterizations/lateral/MOM_mixed_layer_restrat.F90 Lines 824 to 826 in fc823f5
It's not clear why this could cause a regression, since it's only a refactor, but it's possible that an aggressive compiler could have reordered the multiplication, or moving into a separate function suppressed some undesired optimization. There is also enough going on in this file that some kind of regression could have happened. If I get you a modified version of this file without the code change, would you be able to test it? (Unfortunately it's not as simple as reverting the file, since there was an API change.) If you have the compiler version and flags used on WCOSS-2, that might also help troubleshoot the problem on our end. |
@marshallward and @Hallberg-NOAA thanks for the quick response, yes I will be able to test if you have a modified file for me. I suspected this commit is the cause but had code conflict when doing revert. Note we have all different setting of MOM in UFS, all of them had this issue except our super coarse resolution (5x5 ocean), in that setting, mixed layer is turned off. I will provide compiling flag shortly |
@marshallward I put compiling output file at /lustre/f2/dev/ncep/Jiande.Wang/MOM6-update/20230406/M6-20230406, see "compiling-output" line 798 for an example of flags being used |
Compiler was Intel Fortran 19.1.3, flags are |
We are also seeing a change in answers for the CESM tests. |
@gustavo-marques Do you know if the diff is in MOM_mixed_layer_restrat? Also it seems like WCOSS-2 is AMD, is your machine AMD or Intel? |
@marshallward I am 99% sure the modified code you provided to me solved the problem. Now I got identical results on wcoss2, I am repeating testing on R&D machines but there is a long waiting job queue at this moment, will update once jobs are done. |
@jiandewang Thanks, that is good news although I think we need to understand what might have caused the problem. I wonder if it is related to the AMD chipset. @gustavo-marques I can provide the same modified mixed layer restratification file if you are ready to test it. |
@marshallward, I have not looked close enough to find out if the difference is coming from MOM_mixed_layer_restrat. |
I tested this on C5 (our AMD cluster) but did not detect any regressions. So it's not a simple matter of Intel vs AMD architecture. We are using a different compiler from WCOSS (Intel 2022 vs 2019) but unfortunately 2022.02 is all we have available on C5. We could get into specifics (C5 is EPYC 7H12) but I doubt that is the reason. It was suggested Monday that there are differences in some layer-remapped diagnostics due to time-space average swapping, but it seems the consensus was that these answer changes are at the lowest bits and are acceptable. (Anyone please correct me if that is wrong.) Unfortunately I don't think we have sufficient access to explain this regression, but for now we will revert |
Due to some machines reporting a regression in the mixed layer restratification code, this patch reverts the calculation of the growth time in a separate function. Most of the content related to comments and parameter setup have been retained, even if those parameters are no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the Reversion of MOM_mixed_layer_restrat growth_time, it passed UFS regression test on wcoss2 and two other R&D machines
The reversion of |
@gustavo-marques @sanAkel Any update on this PR? |
@marshallward I plan to test this today/tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answers remain same, tests passed!
@marshallward thanks for your patience.
Sorry for the delay. Our tests are passing (no changes in the ocean.stats). |
Thanks to all for testing and working through the revisions. |
This pull request contains several new features and bugfixes, alongside a very large number of commits related to standardization of dimensional analysis. Despite the size, these should have no impact on any experiments beyond adjustments to configuration parameter files.
Note that this patch introduces the new netCDF I/O layer to handle simple operations which do not require - or simply do not work with - the FMS I/O layer. Reviewers should carefully verify any experiments which depend on datasets, particularly anything which may be outside of our standard test suite.
Features
Bugfixes
Refactor
Build/Test
Admin
Contributors: