Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive additional metadata and other information #121

Merged

Conversation

andrew-c-ross
Copy link
Contributor

@charliestock and I have been talking with the data portal team about metadata to include in the latest files that will be served on the CEFI data portal. We largely settled on adding an attribute to the published netcdf files pointing to the path of the data on /archive, and then storing additional metadata and other information in /archive. This PR adds some of that extra metadata to a new metadata.out/ directory that gets included in the ascii tar file. The main additions here are archiving the model XML file itself and the commit hash for each git submodule in the source code.

The new git_submodule_status file that gets archived will look something like this; it is the result of running git submodule status --recursive:

 f703b82972701a4e32a46d4d44a15f9fc2debb27 src/FMS (2024.03)
 8f96707a90132ca119d81ed84e5a62ca0ff3ed96 src/Icepack (Icepack1.1.0-139-g8f96707)
 3fb64c40e21d453fad18ad19de69c6c30a846a64 src/MOM6 (remotes/origin/dev_cefi_backup-62-g3fb64c40e)
 9423197f894112edfcb1502245f7d7b873d551f9 src/MOM6/pkg/CVMix-src (9423197)
 29e64d652786e1d076a05128c920f394202bfe10 src/MOM6/pkg/GSW-Fortran (29e64d6)
 022bd89d1bfe03dc259afc24a6e19c334e47eb4b src/SIS2 (fix_esm_dust_flux_2016.05.26-920-g022bd89)
 ff2cce7820bbe4d23b7cc82d15ed57f80fc7a57d src/atmos_null (xanadu_esm4_20190304-10-gff2cce7)
 e84cf65f1622c31f3143a793c11378f050398907 src/coupler (2024.03)
 1d19a57e0be2ba1b668e2f0dd7a7290733b21cfe src/ice_param (2020.01-alpha1-19-g1d19a57)
 fd6478fecd8891aebcba904fe4b34a415fe30e73 src/icebergs (remotes/origin/dev/master-134-gfd6478f)
 58cee106df801910d3372321cf8bb105e362aa48 src/land_null (2020.01-alpha1-4-g58cee10)
 840b65c40675e2d06bf40405ad3f12dec7f35923 src/libyaml (0.2.5-8-g840b65c)
 43655806c942ebb8a0750d5c70bb5259e658ae1a src/mkmf (2023.01-8-g4365580)
 0de1ab230a12999eb139763f40e56920a204d639 src/ocean_BGC (2022.03-78-g0de1ab2)
 385222c469942f0562b4c70b926dfdd8173138e7 src/ocean_BGC/mocsy (v2.3.3-6-g385222c)

This PR could also be a good place to discuss if there are other files that should be added to the archive.

@andrew-c-ross
Copy link
Contributor Author

Also, let me know if this should go here or in the regional-mom6-xml repo

@yichengt900
Copy link
Contributor

Also, let me know if this should go here or in the regional-mom6-xml repo

@andrew-c-ross, thank you for adding the extra metadata to the archive. I will review it more closely later, but at first glance, I didn’t notice any major issues. I believe it would be beneficial to apply these changes to our internal repository as well, as it would allow other developing regions to benefit from the extra metadata in the archives.

@chiaweh2
Copy link

chiaweh2 commented Dec 5, 2024

Thanks @andrew-c-ross @yichengt900! I will have a global attribute of cefi_archive_version that provide the absolution path in the /archive for every single netcdf files.

@yichengt900 yichengt900 added the enhancement New feature or request label Dec 6, 2024
Copy link
Contributor

@yichengt900 yichengt900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew-c-ross, the section on metadata and additional information looks good to me. I think it covers almost everything. I only found a few small typos. I will leave this PR open a little longer so others have time to review and provide any feedback.

@andrew-c-ross
Copy link
Contributor Author

Thanks, I should spellcheck before I copy and paste 🤦

@theresa-morrison
Copy link
Contributor

I like this idea. Saving the commit hash for each module seems especially helpful for reproducing old simulations since defaults can change over time. I think moving the parameter doc iles into the meta data folder also makes sense.

Should we use this as an opportunity to rename extra.results? I would propose either stats or model.stats based on what goes into those folders. There should be no velocity truncations, so that file would not be included in stats most of the time.

Is the field_table worth saving in the meta data? or are they generic enough we don't want to include them?

@andrew-c-ross
Copy link
Contributor Author

Should we use this as an opportunity to rename extra.results? I would propose either stats or model.stats based on what goes into those folders. There should be no velocity truncations, so that file would not be included in stats most of the time.

Looking into this, I wonder if we even need the extra.results or if it is a remnant of a very old version of FRE. This is the regex FRE uses to pick up ascii files:

set -r patternGrepAscii = '\<out\>|\<results\>|\<log\>|\<timestats\>|\<stats\>|\<velocity_truncations\>'

It seems like this could catch the stats and velocity truncation files.

Is the field_table worth saving in the meta data? or are they generic enough we don't want to include them?

My thinking was we are archiving the XML now, and currently we always include the full field table in the XML rather than linking a file. But the actual field table file could easily be added if we wanted it.

@theresa-morrison
Copy link
Contributor

If the field table is in the xml that should be good enough.

I would suggest we test if extra.results is needed with new versions of FRE, and remove it if it is not.

@yichengt900
Copy link
Contributor

Good catch, @andrew-c-ross. I can confirm that the new FRE now willl archive stats files as well as velocity_truncation files (if you name them like U.velocity_truncations).

@andrew-c-ross
Copy link
Contributor Author

(if you name them like U.velocity_truncations)

I think they are currently named like U_velocity_truncations. Does that not get caught? It looks like the name can be changed with the U_TRUNC_FILE and V_TRUNC_FILE MOM_input options?

@yichengt900
Copy link
Contributor

@andrew-c-ross, yes, we can always change file names in the MOM_input options. The issue is that the new FRE treats "velocity_truncation" as a file extension, so unfortunately, it cannot recognize U_velocity_truncations or V_velocity_truncations.

Copy link
Contributor

@yichengt900 yichengt900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew-c-ross, I have tested the new changes and can confirm that they successfully archive the metadata and additional information as proposed. Approved. I'll wait a bit to see if there is any further feedback. Thank you again for this contribution!

@yichengt900
Copy link
Contributor

I will merge this PR for now. If any metadata-related issues arise or concerns are raised later, we can always revisit them. We'll also add these changes to our private XML repository later.

@yichengt900 yichengt900 merged commit 858bfb6 into NOAA-GFDL:main Dec 11, 2024
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants