-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior with Max Snapshots Per File with Restarts #6795
Comments
I don't believe we've seen this odd behavior when we have Max Snapshots Per File equaling a day worth of data (so 8 snaps for 3-hourly stream). I do wonder if this is specific to |
we discussed making this default 👀 👀 Restart:
force_new_file: true |
@mahf708 I see this when I have
which overrides this behavior (consistent with my experience too). |
Yep! I was gonna say the same exact thing!
Also, sorry I just saw this in your op :/ |
Even without forcing a new file, the output infrastructure should see that the last output file is full, so it should NOT try to add a new slice to that one. Definitely a bug. |
Guess something's gone wrong here? // Check if the prev run wrote any output file (it may have not, if the restart was written
// before the 1st output step). If there is a file, check if there's still room in it.
const auto& last_output_filename = get_attribute<std::string>(rhist_file,"GLOBAL","last_output_filename");
m_resume_output_file = last_output_filename!="" and not restart_pl.get("force_new_file",false);
if (m_resume_output_file) {
m_output_file_specs.storage.num_snapshots_in_file = scorpio::get_attribute<int>(rhist_file,"GLOBAL","last_output_file_num_snaps");
if (m_output_file_specs.storage.snapshot_fits(m_output_control.next_write_ts)) {
// The setup_file call will not register any new variable (the file is in Append mode,
// so all dims/vars must already be in the file). However, it will register decompositions,
// since those are a property of the run, not of the file.
m_output_file_specs.filename = last_output_filename;
m_output_file_specs.is_open = true;
setup_file(m_output_file_specs,m_output_control);
} else {
m_output_file_specs.close();
}
} |
I guess. But nothing stands out... I would have to reproduce manually, then inundate the src code with print statements and see... I'll get to it hopefully this week. Unless someone else feels like taking this on |
quick look at Peter B's case shows that in the metadata |
So probably would just need print statements here. I have a bunch of meetings tomorrow, but if I get a good break I will take a look. |
This commit fixes an issue during restarts that occurs with averaged type output. The restart history file (rhist) metadata was incorrectly setup which could lead EAMxx to reopen files that already had the max number of snaps in them and continue to fill them at the restart step. Fixes #6795
In EAMxx the YAML directive “Max Snapshot Per File” seems to be tripping up upon restarts and not behaving as expected.
For example, a DPxx run that is 6 hours in duration I have an output stream with
Max Snapshots Per File: 1
with hourly averaged output. I have observed the following:I have observed similar behavior in multiple DPxx simulations for both pm-cpu and pm-gpu. In a large production run I’m doing I have it doing daily restarts. I have an output stream with hourly output with
Max Snapshots Per File: 24
. In this case it is putting all my data into ONE file.I have performed one global ne30 test and noticed similar behavior with restarts and unexpected behavior with Max Snapshots Per File treatment. Thus, this does not appear to be a DPxx specific problem but a general problem.
For a quick reproducer of EXP01, run the DYCOMS-RF01 case:
https://github.com/E3SM-Project/scmlib/blob/master/DPxx_SCREAM_SCRIPTS/run_dpxx_scream_DYCOMSrf01.csh
Set to run for 3 hours (search for
stop_n
) and set for one restart.And direct to the following YAML:
The text was updated successfully, but these errors were encountered: