Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetch job and update stage_ic to work with fetched ICs #3141

Open
wants to merge 45 commits into
base: develop
Choose a base branch
from

Conversation

DavidGrumm-NOAA
Copy link

@DavidGrumm-NOAA DavidGrumm-NOAA commented Dec 5, 2024

Description

Most jobs require the initial conditions to be available on local disk. The existing “stage_ic” task copies/stages these initial condition into the experiment's COM directory. This PR for the “fetch” task extends that functionality to copy from HPSS (on HPSS-accessible machines) into COM.

Resolves #2988

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

  • In process of being tested on Hera

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • [] I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidGrumm-NOAA
Copy link
Author

I am in the process of testing.

@DavidGrumm-NOAA
Copy link
Author

DavidGrumm-NOAA commented Dec 7, 2024

To test my code, I ran create_experiment with the short yaml C48_ATM.yaml, (which created /scratch1/NCEPDEV/global/David.Grumm/G_WF_2988/testroot_1/EXPDIR and COMROOT) by :

HPC_ACCOUNT="fv3-cpu" MY_TESTROOT="/scratch1/NCEPDEV/global/David.Grumm/G_WF_2988/testroot_1" RUNTESTS=${MY_TESTROOT} pslot="1306a_2988" ./create_experiment.py --yaml ../ci/cases/pr/C48_ATM.yaml

… which completed without error or warning messages.

From within that EXPDIR I ran rocotorun:
rocotorun -w ./1306a_2988/1306a_2988.xml -d ./1306a_2988/1306a_2988.db

… which completed without error or warning messages. There was also no output to stdout, which I did not expect as I had placed a few diagnostic prints in my code. I verified that I am my current branch.

Runniing rocotostat gives me:

CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION

========================================================================
202103231200 gfs_stage_ic druby://10.184.8.62:37937 SUBMITTING - 0 0.0
202103231200 gfs_fcst_seg0 - - - - -
202103231200 gfs_atmos_prod_f000 - - - - -
etc.
… and this appears unchanged (at least for the 4 hours since I ran rocotorun)

I have 2 questions:

  • Am I not running my code, as the diagnostic prints do not appear ? ( I was able to run similarly modified versions of exglobal_fetch.py and fetch.py earlier when I had run fetch.sh, but I was running that script from the same directory. It simply may be a matter of manually resetting my PATH)

  • Rocotostat indicates that a job has been submitted (presumably with another version of the code)
    Shouldn’t I expect it to progress ?

@DavidHuber-NOAA
Copy link
Contributor

Rocoto is not a fully automated system. For each invocation of rocotorun, it checks the status of running jobs and updates its database accordingly, checks to see if any new jobs have met their prerequisites, and submits jobs to the queue that are ready to run. rocotostat simply reads the database and reports its contents. Thus, rocotorun must be run every few minutes to continuously submit jobs. Conveniently, in your EXPDIR, there is a file with extension .crontab. Copy the contents of this file to your crontab (which you can edit via the crontab -e command). Now your experiment should run continuously. For more details, read the docs.

Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some whitespace cleanup (most of it mine).

ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
ush/python/pygfs/task/fetch.py Outdated Show resolved Hide resolved
@DavidGrumm-NOAA
Copy link
Author

I updated the crontab.

@DavidGrumm-NOAA
Copy link
Author

DavidGrumm-NOAA commented Dec 10, 2024

Removed extraneous white space from fetch.py and recommitted; still testing.

@DavidGrumm-NOAA
Copy link
Author

DavidGrumm-NOAA commented Dec 16, 2024

I moved the fetch options to be in the run_options dict.

@DavidGrumm-NOAA
Copy link
Author

DavidGrumm-NOAA commented Jan 23, 2025 via email

ci/cases/yamls/gfs_defaults_ci.yaml Outdated Show resolved Hide resolved
parm/config/gfs/config.base Outdated Show resolved Hide resolved
parm/config/gfs/config.fetch Outdated Show resolved Hide resolved
workflow/rocoto/gfs_tasks.py Outdated Show resolved Hide resolved
parm/fetch/gfs_S2SW_cold_forecast-only.yaml.j2 Outdated Show resolved Hide resolved
…dd ci/cases/yamls/gfs_defaults_ci.yaml parm/config/gfs/config.base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These host files have extra whitespace in them. Could you please remove that? There should be no difference between these and develop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to (for that file) a new line at the end ? I added that as I've read that UNIX tools expect a new line at the end. I can remove it from that file and others to agree with the develop versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the new line character is expected, but most editors put this in automatically. If, after removing these blank lines, we see artifacts indicating that there isn't a new line character at the end of the file, we can diagnose the issue in emacs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation. I will remove them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed extraneous newlines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding this to your .emacs file:

(setq require-final-newline 1)

Then open, edit (add, then delete a character), and save the file. Lastly, run xxd on one of the files and verify that the last hex byte is 0A. This indicates that a new line was written at the end of the file.

@DavidHuber-NOAA DavidHuber-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Jan 24, 2025
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Jan 24, 2025
…_2988' of github.com:DavidGrumm-NOAA/Global_Workflow_2988 into stage_ic_2988
@DavidGrumm-NOAA
Copy link
Author

DavidGrumm-NOAA commented Jan 24, 2025 via email

@DavidGrumm-NOAA
Copy link
Author

DavidGrumm-NOAA commented Jan 24, 2025 via email

@emcbot emcbot added CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Jan 25, 2025
@emcbot
Copy link

emcbot commented Jan 25, 2025

CI Passed on Hercules in Build# 1
Built and ran in directory /work2/noaa/global/CI/HERCULES/3141


Experiment C48_ATM_939ce24d Completed 1 Cycles: *SUCCESS* at Fri Jan 24 15:54:49 CST 2025
Experiment C48mx500_3DVarAOWCDA_939ce24d Completed 2 Cycles: *SUCCESS* at Fri Jan 24 16:12:59 CST 2025
Experiment C48mx500_hybAOWCDA_939ce24d Completed 2 Cycles: *SUCCESS* at Fri Jan 24 16:13:09 CST 2025
Experiment C96_S2SWA_gefs_replay_ics_939ce24d Completed 1 Cycles: *SUCCESS* at Fri Jan 24 16:37:21 CST 2025
Experiment C96C48_hybatmDA_939ce24d Completed 3 Cycles: *SUCCESS* at Fri Jan 24 17:01:39 CST 2025
Experiment C96_atm3DVar_939ce24d Completed 3 Cycles: *SUCCESS* at Fri Jan 24 17:13:42 CST 2025
Experiment C48_S2SW_939ce24d Completed 1 Cycles: *SUCCESS* at Fri Jan 24 17:19:35 CST 2025
Experiment C48_S2SWA_gefs_939ce24d Completed 1 Cycles: *SUCCESS* at Fri Jan 24 18:50:57 CST 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stage initial conditions stored on HPSS
4 participants