-
Notifications
You must be signed in to change notification settings - Fork 213
Meeting Notes
The joint CIME development team has bi-weekly telecons to coordinate development. Most recent notes are at the top of this page.
Testing dashboard: http://my.cdash.org/index.php?project=CIME
Discussed possible future developments.
See https://github.com/ESMCI/cime/issues/3886
See https://github.com/ESMCI/cime/wiki/CIME-Development-Plans
Collaborate on moving to paramgen in buildnml E3SM could move data models to paramgen. Could maybe move ELM if a collaborator can be found.
Making shared libs actually shared between cases (E3SM). shared libs built with cmake
XML handling: different formats for different files leads to complexity Many thousand of lines to handle.
Replacing test scheduler with CWL.
Can type create_test suitname
and CWL with a CWL engine could ship off to other machines.
We talked about how to use black in development. Decision to encourage use of .git/hooks. pre-commit --install
will do it for you.
Testing PR, testing with containers using github actions. There is a weekly limit for free testing but JimE hasn't hit it yet. Testing is currently nobatch but slurm and pbs are available. Would just need one slurm and one pbs case for coverage.
Developers guide: we will use the github wiki, remove developer stuff from github.io, point to to files in repo as appropriate.
Now possible to get rid of config_compilers.xml entirely.
Jason's upcoming CIME reorg would break existing cases if the code version those cases pointed to updated cime. We are ok with that. We never guarantee you can keep running existing cases after updating CIME. (Its just worked out that way).
We want to move E3SM provenance code out of cime proper and in to cime_config. There is some stuff that could be generalized but a lot of it is e3sm specific so easier to just move it all right now.
How to more routinely close issues?
Documentation has tons of old XML files re: stub models, data models. Remove them
Rebuilding docs? Still have to do it manually. JasonB is working on a github action to do it.
github cime has tons of personal branches on it. We should use our forks for development.
Should we rename "master" to "main"? After the pythonic PR.
When do we do the directory rename to make CIME more pythonic? JasonB should have a PR ready in a week or 2.
At last meeting, JasonB went over CCS reorg and what it means for versioning.
What does conda, etc mean for containers?
We are thinking of conda install for users, not developers.
The main problem: CIME CCS currently only requires a basic python install. If we start requiring more, then conda or containers become a way to manage those dependencies.
Example: some of the non-BFB tests E3SM added require additional python libraries. User has to load "cime_env" conda environment before running create_test. But if you're not running those tests, you don't need to.
Problem: if the machine file is also messing with python environment through "module load" etc., this will typically conflict with conda in weird ways.
Problem: if we add python dependencies deep in CCS, then all users, developers will have to use the conda or container for development/use.
Conclusion: Bill think the new python dependencies can be isolated.
Now that CIME pieces have been split up, need to fix licenses.
We (chris) will clean up the licenses as follows:
https://github.com/ESMCI/cime - Just NCAR, SNL
https://github.com/ESCOMP/CESM_CPL7andDataComps Just NCAR, ANL
https://github.com/ESCOMP/CESM_share Just NCAR
Rob will clean up as follows:
https://github.com/E3SM-Project/E3SM
driver-mct - Add NCAR, ANL license
Add NCAR license to
components/data_comps
components/xcpl_comps
components/stub_comps
Also make docuentation changes: change the master documentation to remove data models and coupler. Change the "What is CIME" description and README
First meeting in a while.
mappy testing of cime is better. Wesley working on ERIO test. Will be leaving E3SM soon.
Would still like to increase testing of PRs on multiple machines. Current PR testing is pylint, doctest and compiles of Fortran but no submissions. Using github actions.
JasonB's reorg will help with testing.
Sometimes its a really obscure config that breaks. Hard to protect against that.
We discussed the E3SM cmake macro PR to replace config_compilers.xml https://github.com/E3SM-Project/E3SM/pull/4537
PR 3990 will introduce ability to bring in important code by Manage_externals in new locations but won't delete old locations. Then another PR to remove the old code Then a CESM alpha tag to use the rearranged CIME
E3SM has already moved code to within E3SM. Only src/externals/mct is still used for now.
Can .gitmodules be used in parallel with manage_externals?
Maybe but in E3SM a git clone --recursive would go in to cime and clone all the extra code that E3SM doesn't need.
The fundamental XML variables currently in cime/src/driver/mct/cime_config will be moved to cime/config
CESM should be ready to split off the Fortran parts end of this month. Chris Fischer has done a lot of work. He found some issues in moving driver and share directories. We'll meet next week to go over what he has done.
Discussion of https://github.com/ESMCI/cime/issues/3886
Going to all cmake build? Would be a problem for CESM. JimF doesn't like config_compilers and its interface. Cmake cache format is better. Make cmake cache a consistent thing in repo? Maybe can translate from cmake cache back to Make include? In CESM, only CDEPS and PIO use cmake. CIME current supports a cmake or make path through the build. Strengthen that?
IDE's and python and CIME: Need someone you uses IDE's to help with this.
New CIME features: less building. Already there. Can tell CIME a suite needs only one executable. create_newcase also supports pointing to a pre-built executable.
Scan a suite and figure out how many things to build?
Old issues. Should we age them out?
Frivolous tickets.
We should each check the issues we authored to see if they should stay open.
Introduce Jason B as a developer
No progress yet on either side on splitting up Fortran parts of cime.
We need a better way to discuss/plan big changes to CIME. pythonification. Could use github "Projects" track work once we decide what to do.
JimF will work with JasonB to hand off some existing issues.
CESM has made some thinking on splitting up CIME. Motivated by https://github.com/ESMCI/cime/issues/3780
Can't build cime in standalone with CIME_MODEL=e3sm because root CMakefile is in E3SM/components
Revisiting what we had discussed in Sept, the new ESMCI/cime would use manage_externals to create a buildable/testable cime. cime/scripts would stay, so would cime/tools cime/src/CMake and cime/src/build_scripts. Could move latter to cime/scripts
What is cime/src/CMake ?
Testing: Need a deep-dive on CIME testing situation. Can we autotest in CESM and E3SM mode? Couldn't run a Fortran job in current automatic testing that uses Github Actions.
Discussion of how to break CIME in to separate repos.
Most of what we collaborate on is under scripts. Also cprnc.
What to do about things like cprnc that we want to collaborate on but don't live under scripts? Do we have a separate repo for those? After some discussion, we think it would be best to move all of the shared stuff under scripts (even though this isn't an ideal location for it).
Steve's suggestion of overall organization... there is general approval of this:
- We have a
cime_ccs
repository with the shared stuff - E3SM inlines the other cime stuff and submodules in
cime_ccs
-
ESMCI/cime
only exists for the sake of standalone testing: it will retain frozen versions of driver and data models
What do we need to do with cime/config
? Move it to individual models. Top-level cime will contain a config that's used for standalone testing.
We want to keep scripts in the directory cime/scripts
: even though it may be possible to move it up a level, that would be disruptive to users.
Rob suggests that the new repo just be called ESMCI/CaseControlSystem
.
Talked about how to allow volatile env_run vars.
How to split up the shared physics code? Sperate out driver, data models, flux. E3SM should subtree out to simplify its workflow.
What can we share in cime/src/share? Need to be aware of pieces that affect science. shr_const, shr_tfreeze.
Questions about rules for "CAM" string in config_compsets.xml. New version of config_components.xml sets rules. Not sure what controls it in older versions.
Making XML and Python Dictionary test suite styles more explicit. Yes we should do that.
Splitting up CIME Fortran pieces into more separate things?
If, in future, cime/src is only used by E3SM, why would the PR traffic for that have to go to CIME? If we get to that point, then yes should move all the MCT pieces under cime/src to E3SM.
(We met other times since Jan 21 but did not take notes).
Agenda: Config compilers changes (JimF)
Lot of concern from CESM scientists about changes to things like flux calculations which are going in without careful checking on CESM science.
Flux calculation is very model-science-specific. We will need to separate it out to one or more files. Per model.
Data models: Can they really be shared? Also model-science specific. Its a selling point of CIME. Do we want to give it up? Data model science differences could maybe be handled by more namelist options, components.
CIME development. Still trying to get people to follow procedure for making mods to CIME. Making cime a submodule in E3SM would help force e3sm developers to do it.
modifying env_run: Want to be able to queue-up changes in env_run for the next run. How to do that? Will a real database help? Could introduce a "Wizard" mode where the particular protection in question is turned off. A cime process is active while case.run is running. Needs to be consistent with what is in the xml files.
Could possibly re-read the XML between runs of a long resubmit job
Segregate variables into ones that are save to change? Maybe but resubmit isn't one that would qualify.
Queueing up changes: problem is lots of other variables that are defined off of xml vars. Only way to really force it is to exit the python (what is currently required for the workflow).
Bill had question about build-libs. See https://github.com/ESMCI/cime/issues/3286
Want more autotesting. Can use the Sandia Auto-Test tool? Get some help from CISL on setting up continuous integration on Cheyanne. Will at least need a Jenkins instance running there or able to connect without OTP.
JimF still thinking about how to mimic a specific machine so we don't have to actually test on it. VM? Container? What happens as CIME user base grows?
"dryrun" capability. Would do things like print submit command from machine X even though you are developing on machine Y. Could compare output with what your PR would print and flag differences. Would need to establish a baseline of these good dryrun outputs on all supported machines.
Does "supported" mean all CIME developers have access to it? May not be possible with DOD, NOAA users. They would have to supply dryrun output. Or describe machine and see if we can get on one close-enough.
Current testing: TravisCI runs on every PR. Only some basic python and Cmake tests.
TODO: drop python2.7 on TravisCI and add env=E3SM.
jenkins then runs the full scripts_regression_tests on a set of machine.
TODO: JimF will make sure compiler/mpilib env variables are set so they show up in dashboard.
Decision: Drop python2 support entirely. use #!/usr/bin/env python3 in all our scripts.
We should at least require python3.6. Cheyanne default is 3.4. Help with "string.format"
Welcome new Sandia developer Wesley
We went through some issues. single submit: has been partially implemented.
need a different term then "namelist generation" since that machinery is for more then Fortran namelists. "input generation".
Can we build less?
Can we move to having not all branches in main CIME repo? Use forks.
CIME repo is taking a long time to download for some people. (Need more data; we deleted some branches but that must not have been it).
UFS is going to be a CIME user.
We need to add a Code of Conduct.
After this: we will meet bi-weekly.
First meeting in a while.
More testing. Its very easy for us to break each other. We need to run some test coverage tools to make sure we are testing all our functions. Add more tests.
TODO: JimE will redo the build names on the dashboard to help identify compilers/queue systems/ e3sm/cesm.
JimF wants to be able to test machine configs without being on those machines: Need a more sophisticated test environment? A mock filesystem? Some way to output the result of case.setup on summit and compare against a baseline to make sure we don't break things.
New guy at Sandia will be helping JimF out with CIME.
Issues for development: Whenever someone plans new development in CIME, open an issue describing it. We will ask all CIME developers to check the issues before they start development so they don't conflict or duplicate effort. This will hopefully help prevent things like the recent overlap on DOCN development.
Meeting schedule: we will switch to ever-other-week but will have a meeting next week (Oct 30th). Switch after that.
We installed "stale" which will, if activated, automatically mark old issues as stale and then later close them. Need to add a yaml file to turn on.
Still need a replacement for Waffle. Bill found a tool https://github.com/philschatz/project-bot that will automatically add new issues to a project. This is missing from default Github projects. Zenhub is free but has lots of features we probably won't use.
We discussed https://github.com/ESMCI/cime/issues/3175. Aside from breaking backwards incompatibility, that particular change fails quietly if you still have the wrong case for FALSE. Need to check that code is checking schema.
Other work: JimE is working on workflow. Adding to maint-5.6 and then master. CIME has a workflow driver that uses queue dependency mechanisms. Can now use cylc instead. Some features: can manage ensemble with one cylc GUI interface. GUI is almost impossible to install on HPC systems. CLI works.
Other tools: Easyflow, one that sounds-like "roku".
JimF: Continuing CMake.
https://github.com/E3SM-Project/E3SM/pull/3043
Some notes from telecon:
It's not a standard use of cmake: still use perl for lots of stuff. Not sure if that will ever change: CMake would be an awkward "language" for doing a lot of that (e.g., components' configure step).
Each component just has a single line in its CMakeLists.txt. Even that wouldn't be needed, but Jim F wanted the build for each component to appear in its own subdirectory of the build directory, and it seems like this requires an add_subdirectory call in cmake.
Many of the benefits (listed in the top PR comment) come from having a single cmake invocation (e.g., the build speedup via exposing more build parallelism, and being able to invoke cmake & make directly - though note that the cmake command is huge, so you need to copy & paste that).
For CESM, we have slightly different requirements - particularly, can't rely on all components having a CMake based build. And we may want to add CMake to our components incrementally, too, as opposed to needing to do it all at once. To address these two needs, Jim E may look into what it would take to allow a mix of CMake and non-CMake-based builds. One approach would be to go back to what Jim F had initially, when he was incrementally moving the e3sm components to CMake (i.e., still invoking each component's buildlib separately, and that component would then call either make or cmake). But a better approach (allowing keeping the benefits of an all-cmake-based system, and allowing the cesm and e3sm builds to use essentially the same system) may be to keep the idea of having a single CMake call drive the whole build, and use CMake's ExternalProject functionality where we need to support non-cmake-based builds of components.
Bill looked a bit into ZenHub. It looks like this would be a good replacement for waffle: seems to do everything we did in waffle (and more, if we want more).
Another option would be GitHub projects. The problem with that is that, if we want a project that encompasses all of our issues, new issues would NOT get added automatically to that project. Workarounds could be:
- When we do our weekly meetings, we have a link to a search for open issues not in the main project, and then we add each of those issues to the project
- Bill came across https://github.com/philschatz/project-bot, which is a bot using the GitHub api that lets you automatically add new issues to a project board, and then do some other neat automation tricks.
In config_batch and env_batch there was a section that listed jobs to run. Currently its just case.run, case.test, case.st_archive. Jim is expanding that list to add pre and post-processing jobs. Doesn't really fit in to "config_batch.xml". So putting that in new file "config_workflow.xml".
Some postprocessing scripts have complicated prereq. Add a function call as an option in prereq? Or you always put a job in the queue, will determine if it could run. Second option is better. The preproc script will have all of python and all of case info to determine if it can run.
The postprocessing job won't block other jobs (unless they depend on it).
Ninja is a new thing from Google that is a backend to Cmake. Much faster then Make, better at exposing parallelism.
CMake vs. Make. Cmake is a Makefile generator. Can also call shell commands. You can call Make from CMake but then its clunky. Can we have Cmake in all of CIME? Yes but if active components are still calling Make, lose benefits.
Making parts of CIME available: some people want to use the check_input_data download feature and nothing else. Mariana: NUOPC mediator users want to use just that part of CIME. So will probably pull that out to a separate repo and bring it in with CESM's "manage_externals". People also want to use aspects of the case control system. With enhancements for initialization.
But splitting up the python parts of CIME will be hard and then hard to keep it all in sync. Could instead make it a python package. Need to find out more about what exactly people do/don't want from CIME. Could make CIME/scripts a python package.
Testing GPTL: it should be possible to replace the code in src/util/timing with latest GPTL (and Pat's changes) and just build.
special functions: For now, E3SM can just define HAVE_INTRINSICS in its config files. Will wait to add it to all intel compilers.
Need a waffle replacement.
github projects doesn't have as much automation. Won't add every new issue. waffle would move an issue to "in-progress" if you pushed a branch with the issue number in it.
went over issues in waffle for last time :(
CIME and CMake. We are building PIO with Cmake right now.
JimF: First iteration for atmosphere will be CMake but still use the perl. As long as perl is fast, can use it as a library. We want to eventually remove the perl.
Why not build an A-case in CIME with all CMake first? Some times its good to do the hardest case first. CAM/EAM is hardest. Build EAM like PIO is built. It just uses CMake.
We want to make sure a Cmake build, like our Makefile build, can work without knowing details of the prognostic models.
We will need to support both build system for a while to do this incrementally. build.py could be altered to support both build systems.
Get an Aquaplanet case building first.
Adding a python netcdf library to CIME https://github.com/ESMCI/cime/issues/2935 Instead of adding python modules to CIME, require it in the python environment you are using.
Normal default python environments, you can't count on scipy being there. If you activate a python environment as part of your workflow in running CIME, you can have anything you want. Normally for running python programs: activate your python env, run your python. CIME hasn't had to do that because it only requires bare-bones python.
Current workflow: add module commands to get "python" in your path, run create_newcase New (optional) Workflow: activate python environment, run create_newcase.
Catch exception if "import scipy" doesn't work and give clear error message to user pointing them to docs on how to set their environment.
Can ordinary users get the right python environment on their laptops/workstations? They are going to have to learn eventually. We can help with documentation.
https://github.com/E3SM-Project/e3sm-unified - the conda package for the python environment for e3sm diagnostics tools
The netcdf.io python file FATES needs is just one file. Joe and Xylar will help build a CIME python enviornment.
The SIAC PR "present" in CIME is true for a component if a data OR full model is in-use. Its false for a stub. JimE is doing testing for CESM. Gautum needs a better platform to run scripts_regression_tests for E3SM. Rob will help.
"git log" for CIME files on the E3SM side are kind of a mess because of squash merges. Need to look at history from within a ESMCI/cime clone.
- Gautam's proposed change to compsets to support an IAC component.
WIth current CIME, every single compset would have to be changed to at least add SIAC.
Steve will provide updated case.py to instead fill in missing stub components.
CIME will specify that a compset now must have an IAC component. If none is specified in the compset name, assume stub (SIAC).
That way existing compsets don't have to be modified unless they want to specifically test the ability to process "SIAC" in the compset name or add the real IAC (GCAM).
Most of existing SIAC PR can remain.
compclass of every item is parsed, stubs is pushed in for ones that are missing.
Still need a stub libiac in all builds for building the shared coupler.
Might need to change on E3SM config_files.xml is done.
- Process for non-backwards-compatible changes
- Bill Sacks suggestions:
- Try hard to avoid these
- Try to stage / combine these so that we rarely have backwards-incompatible changes more often than X (where X is something like 3 or 6 months?)
- Bill Sacks suggestions:
cime 5.8 was a non-backwards compatible change. What was before that? Its all about communication. As soon as a backwards incompatable change is contemplated, communicate that along with development schedule. Then we can plan when/how to add.
- Branch tag naming convention Everyone look at the issue and comment. https://github.com/ESMCI/cime/issues/3056
Do we have to many branches in ESMCI/cime? No.
- Revived CMake effort. JimF will be starting to work on this at 50%. Will include replace all the perl in the E3SM versions of cam and clm. We will start a "cam build school" to go through the perl and figure out what to throw out/reimplement. Will focus on E3SM's version of the perl.
Wanted to add DEBUG as a cpp variable but some Fortran files have it as a fortran variable. Will need to change the Fortran.
"-DDEBUG" used to be in Makefile a while ago. Went away at some point.
Need to do a merge of maint-5.6 to master.
Looking at new capability for user-mods in a compset
For CLM, in config_component.xml, can define CLM_USER_MODS block. Can set regexp for what compsets should have a usermods dir added to the compset.
CESM compset for ocean has two "%" modifiers POP2%ECO%ABIO
CIME now supports multiple ways to modify a base compset string. Multiple ways to achieve the same total set of mods. Some of these pre-date CIME (like CAM use_cases).
For future consideration: do we want to support all of these?
Need to think about bringing in new drivers. drivers/moab, drivers/nuopc. Some changes needed to CCS to make the multi-driver selection actually work.
We could continue to do all work on forks but CIME Case Control System on master needs to be fixed to actually work with multiple drivers and then stay fixed (tested). Can best do that by actually putting a second or third driver on master.
E3SM will go ahead with adding drivers/moab, mostly a copy of drivers/mct. Further PRs/issues will appear in E3SM. CIME will only see it when JimF does a subtree split. Similarly, daily nuopc development occurs on the cmeps-nuopc "fork".
In course of making NUOPC driver, several big changes have been made to data and dead models. A lot of duplicated code was cleaned up. There were also some changes to mct driver. Should all be BFB. Also made interfaces to define fields at runtime.
Namelists are generated and regenerated a few times.
- create_newcase
- no nmls
- case.setup
- nmls-1
- case.build
- nmls-2
- case.submit
- nmls-3
When a baseline creating is requested as part of create_test, nmls-1 is what is saved in the baseline. When blessing a diff with baselines, its usually enough to just copy over the new history/restart files to the baseline. But you can't copy the namelist because what's left in the RUNDIR of the test is nmls-3. Have to actually run create_newcase and case.setup to get nmls-1.
Build process for CICE and POP might change namelist during build.
Some test types WILL modify nmls-3 compared to nmls-2.
Conclusion: we could look at tests making a CaseDoc.postsetup directory to save the results we want to bless. Or leave as is.
First meeting in a while. Skipped several while working on CESM2 release.
What development is coming next? Cmake. Any way to do it incrementally? Component-by-component? each one built with Cmake. Top level stays as-is. Good first step would be to build an A or X case. Entirely CIME-contained.
Other development this summer: CMEPS driver/coupler. CMEPS doesn't need stub models. May need to change compset naming scheme or how CIME handles it. Separate out forcing part of compset: the year.
New sequential fields mod to handle merging. New datatypes for mapping type.
Unified atmospheric modeling: forecast capability in CIME.
E3SM released with CIME 5.5.0 on April 23rd. CESM2 will freeze on May 11 and use a 5.6.x tag.
See CESM2 milestone for issues that need to be addressed.
E3SM's next big development is to convert main model buildnml scripts to python. Jim will see if CLM/CTSM can collab on the giant CLMBuildNamelist.pm
We looked at issue 2521 about more testing. All agree that we need more. Hard to test batch config changes with a batch system to interact with.
Decided to not advance to CIME 6.0 just because its included in a CESM/E3SM release. Will just update the minor version number.
Decided that E3SM will add a template that generates a bash-based submit script. Emphasize that it won't work with the rest of CIME.
Looked at documentation and how to organize it a little better.
Wrote documentation.
Cancelled because same week as SEWG meeting.
Went over new issues in Waffle board
Compsets and backwards compatibility: Can only be done at the individual compset level.
There is no requirements for backwards compatibility for master going forward.
CESM has moved to making "B1850" a compset that is always latest physics and latest settings.
Compset has physics and forcing in one thing. Need to separate that to support lots of different initial conditions.
Don't always have to specify stub components. Assumed if you don't specify.
Tagging and releases. Use tags like
cime6.0 -> start a maint branch
(continuing on master)
cime6.1.0 (What is this?)
cesm/cime6.1.0-alpha01
ctsm/cime6.1.0
cesm/cime6.1.0-alpha02
cime6.2.0
cesm/cime6.2.0
cesm/cime6.2.1
ctsm/cime6.1.
cesm/cime6.2
OR, just increment the minor number.
cime6.0.0 -> main branch (cime 6.0.1, 6.0.2, 6.0.3)
cime6.1
cime6.2
cime6.3
cima6.4
cime6.5 -> maint branch. 6.5.1, 6.5.2
cime6.28
Clean up branches. Move to a fork-only development model?
Agenda: answer changes in recent merge to E3SM? Freezing for release. version numbering. PR for 1850 components.
Release tag: All agree, tag number will be higher then 5.4. Figure out rules for going back to semantic versioning via github issue discussion.
Request that CESM stop relying on tags so heavily, so we don't need to make so many arbitrary cime alpha tags just because that's what CESM is using in one of its tags. With CESM's move to git, this is possible: CESM can point to a cime sha-1 rather than a tag.
Freezing for release. Probably after next week. Need to look at xml caching again.
Have one release or maint branch that both CESM and E3SM pull from.
Both E3SM and CESM are looking at a spring release.
Currently, last chance to merge CIME to E3SM for its release is March 19.
Lots of changes to CIME recently. XML optimization, Fortran clean-up.
A py3 error prevented the E3SM-latest CIME test from passing. That has been fixed but needs to be merged.
Looking at timing, lots of time being spent loading modules.
We went through the waffle board and looked at recently opened issues.
Todo: does E3SM have any hardcoded aerosol concentrations in component models being sent to coupler?
Todo: document .cime/config
Open source standard documents:
Codes of conduct: https://www.contributor-covenant.org/version/1/4/code-of-conduct.html http://citizencodeofconduct.org/
We looked at the waffle board. Not a lot of "in-progress" activity.
Issues with CESM2 milestones. Many still needed. Only a couple already done/removed.
Build system: JimF will document how it works. Need this for CMake. E3SM is still using "v1" of build system. Meaning E3SM is using v1 schema and the python code to interpret it.
Tests now documented in 2 places: docs and config_tests.xml. Put pointer to docs in config_tests.xml and remove duplicate documentation.
Looked at waffle board.
Commits have slowed down.
E3SM is still planning for an April, 2018 release.
(no call on Nov 1, 2017) We looked at the latest update from ACME. Still lots of changes. We should be more careful to have any non-config changes hit ESMCI first.
(need to add notes)
Discussed the PR for merging E3SM cime changes back to ESMCI. Most of the conflicts are the short term archiver. Will do that as a separate PR.
The data model share code: Mariana will make a new src/share/streams dir to hold shared data model code used by other models.
Documentation of model-specific testing and other model-specific CIME things within CIME users's guide: do it as an Appendix to CIME User's guide instead of pointers to external web pages.
We looked at waffle board.
Python 3 update: We will develop in 2.7 and run extra tests to maintain this py3 compatibility. We'll keep python 2.7 compatibility for another year. Developers can mostly work as before. Tests will catch if we break python3 compatibility.
Other things coming up: reorg of dead models for better connection with NUOPC driver. Multiple glc models - Allowing antarctica AND greenland in one run. Paleo cases may need more then 2. How to introduce this in to coupler? ideas: a uniform grid and one master glc model,
We looked at E3SM advisory panel slide on CIME v2-v3 plans. Will update our long-term plans on this wiki.
Looked briefly at PRs and waffle boar
Without case name in filename, test comparisons with baselines won't work. Its possible that comparisons within a test (like restarts) won't work.
CESM has case name conventions for production runs.
We looked at waffle board.
CIME version naming. Since v1,v2,v3 and v4 were never in a released model or versions we want to support, we should renumber our versions with 1.0 starting in the first version of CIME released with the model. To keep the existing tags, add "rel". so cimerel1.0.0, CIMErel1.0.0, cimeREL1.0.0, and then we'll stop the old tag naming scheme and start with that one.
Python3 conversion: won't be easy. Have to import the right std libraries which have changed their name. To support both, have to detect you're in python3 and change names. There's a library ("six") that helps but that then becomes a dependency.
JimF: Once we decide to convert, we just go to Python3 and not support Python2. When Googling for python help, its always python2. Only a few things are better in 3.
Mariana: NOAA wants to use python3. Say there's stuff you can only do in Python3.
Homework: Look at http://www.asmeurer.com/python3-presentation/slides.html#1.
We looked at waffle board.
(no meeting on Aug 30 or Sept 6)
What to do when a PR requires careful review and everyone is busy? Good to get PRs done quickly.
If there's no one available to review, PR just stays open. Anyone who must review can add themselves and put hold on it until they have time.
To avoid working on a PR that might not get reviewed, we'll check the "Ready" tasks each weeks. Went over "Ready" tasks on waffle board.
agenda: update our long term plans
SOM: slab ocean has been put in docn. If you want to run with POP grid, can't do it because that grid has a hole in it (Greenland). "f09_f09" type grid aliases are going away because they say nothing about mask. So CESM will, for CAM grids, now specify mask in alias "f09_f09_mg17" uses the POP gx1v7 mask mapped to f09. Upcoming PR will allow you to create "f09_f09" with aquaplanet docn.
Gnu debugging mode. Bill has been trying to trap nans. Recent versions of CESM have nan checks in the code and this is not playing well with gnu debugging. See https://github.com/ESMCI/cime/issues/1763. ACME has nan checks but maybe hasn't run that code with gnu debugging.
Looked at waffle board.
Along with an optional coupler, ACME wants an optional economic model (GCAM). We'll also do that with an option to create_newcase instead of modifying the compset longname.
Talked about how an optional coupler is being enabled in CIME. Its an option to create_newcase in a branch Mariana is working on. That NUOPC branch won't be merged for a while. Want to add the the optional coupler code earlier.
Looked at waffle board.
agenda: PESetupHist, finidat
Mariana had added PESetupHist, was confusing and not used (to her knowledge) so removed it.
PFS test will keep a copy of env_mach_pes.xml with a rundate on it.
Any setting for CLM namelist in CAM are because CAM needed it for its standalone testing and scripts. When sea ice namelist was pythonized, CAM had to go through Case Control System so couldn't do that anymore. CAM went to aquaplanet for its standalone testing/scripts. Still sets some driver namelists. CSEG is not supporting CAM test/script system.
skip preview namelists: when done on case.submit, still calls it once. We should change it so that it indeed never calls it. Some components were depending on preview_namelists being called at certain time. JimF: we struggled with this for 2 years. We don't know the assumptions in all the components about when its called.
Looked at waffle board.
When you create a Single Column Model, code in cime_comp_mod.F90 checks to see if column is over ocean, so don't need to initalize land. Would rather have a utility in python that does same thing. Pre-configure model. Would need python to start reading init data in netcdf to figure things out. Michael and python cprnc: unidata's netcdf library for python has everything (for netcdf4).
Documentation: Working on testing documentation. Adding compsets, grids needs to be completely rewritten.
(June 21 meeting cancelled (CESM workshop), June 28 cancelled (conflicts))
Looked at waffle board.
Discussed how to add a new driver. Extend compset longname. Add an argument to create_newcase. Very hard to change compset naming convention. This will continue to bite us.
For now: we'll add an argument to create_newcase and an extra line in config_files.xml
(May 31 and Jun 7 (ACME) were cancelled.)
Looked at Waffle board.
Looked at waffle board.
Agenda: version numbering, CESM2 freeze, multiple couplers.
CESM2 freeze has been pushed by 2 weeks.
Latest tag is cime3.5.0-alpha15. We'll stay in alpha. Start beta when we have all the features we think we need.
cime 5.3.0 will be the the version in the CESM2 release.
Things still needed by CESM: In their config: grids, mapping files. Data models: fixes for spinup. Multi-coupler. A performance gain for multi-instance which is used in data assimilation. Rafaelle had a way of doing it with minimum changes.
New user mods: you can user mod directories that change behavior for many compsets or just one. Very powerful. Allows setting up dependencies across components.
github issue labels: go back to something like CESM-development had. Need more "types" Can get rid of "fixed in python" machine -> machine specifc driver_cpl -> mct driver scripts -> userinterface share -> src_share
tests -> SystemTests new: UnitTests tools: keep for cime/tools
python: User-interface: CIMElib: auxtools:
A technical writer is working on CIME User's Guide. Part 1 and 2. Will spend about 2 weeks (40 hours). We will then review.
Agenda: version numbering, cime adjective.
We decided to use "CIME-driven".
Looked over old issues on the Waffle board and assigning urgent new ones.
Agenda: version numbering. Cime adjective, development priorities.
New science requirements for CIME: why now? Example: no one realized that nitrogen deposition required changes to CIME (more fields being passed). Example 2: scientists didn't communicate that they wanted aquaplanet in data ocn. "customer" not clearly defining requirements.
Why is the backlog staying high? More users means more issues. Both added (Jason, Erich) and lost (Michael) developer time. JimF: focus on velocity of fixing issues. "velocity" = issues closed per week. But we're not really doing scrum. Some issues are harder then others but no extra credit for closing those.
Are we changing to fast? Maybe. Testing still can't catch everything. Also still "old" parts of CIME in the component models doing things like single instead of double dash. Redo single dash as warning instead of error.
Can't break scientists workflow anymore. ACME now has run_acme script on master with CIME itself. Branch that updates cime could also update run_acme. Need to test run_acme on same machines scientists use.
Long term archiving completely removed. No DOUT_L, no lt-archive batch job. See last weeks notes.
ERP test: was broken on ESMCI (but not in the ACME master). Still trying to confirm its fixed.
CESM and ACME will do more testing of CIME within fully coupled cases.
No decision yet on which adjective to use in documentation.
(new day/time)
GLC changes PR. CESM needs this for next alpha tag. Time critical. Rob will review.
Upcoming PR from John Truesdell: Fixes bug in coupler diagnostics. When sea-ice melts, water flux needs to be added. Were seeing inconsistency in water conservation between POP and CPL. POP was right, CPL was missing this term. John will add issue describing problem.
Also coming up: Aquaplanet functionality moving to DOCN. Should be backwards compatible. Will simplify use of aquaplanet. Currently CAM-based aquaplanet uses CAM decomp, CAM datatypes. Will also bring in a slab ocean capability. aquaplanet currently used analytic function SSTs.
Unit_testing: we'll move it and clean it up.
Long-term archiving: should we support it? A long-lived branch is adding tons of complexity. External tools can do better. Current lt-archive is in bash and only recognizes 2 layers of directories in short-term. Short-term is adding more. queue dependency issue: depends on success of short. Can't restart. Might run at same time as short-term.
decision: we will remove bash lt_archive.sh script. Can't pythonize/test/support it.
user_nl files. Long term: need to better define what these files do. What are the rules? Possibly convert to a different language. Short term: need it working "good enough" for release. People have written scripts on top of current user_nl. usermods also alter these. user_nl should allow fortran namelist standard at a minimum. We can add if we support it.
We will move our meeting times to Wednesday 1pm MT/ 2pm CT.
(cancelled)
Upcoming data model cleanup from Mariana. Rob will review for ACME.
Proposed --script-root option to create_newcase is fine. JimF will implement.
Documentation: we'll combine Parts 2 and 3 using the Part 2 title until we think of a better one.
Github issue cleanup: everyone needs more time to go through the old issues and see if they're still relevant.
Mariana was absent.
We went through the Waffle board concentrating on old issues that have never been assigned. Rob will send email asking authors to look at them.
We discussed PR with ACME changes to 5.2. JimE needs to look at it. Should go in early next week.
Data models: lots of hard-wired parameters. Indexes that had to be lined up. Easy to mess up and get garbage. Mariana is fixing.
Mariana added sections to UG about manage_case.
RESUBMIT and ARCHIVE. Archiver will copy to archive all restart/history, will copy back last restart even if not resubmitting. So run directory is always ready to run. The copy back isn't working for MPAS.
Archiver still has lots of fragility. It should work in a short 5 day run with no history. Didn't used to. Calling it with a specific year: probably won't work. Hard to tell which history files are save to move.
Ask Alice: does CSEG run diagnostics on data in short term archive?
Agenda: protecting Titan. Finding and recovering useful old use modes. case.run doing to much.
We are slowly recovering Titan ability with cime5.2. Multiple compiler support may be broken in 5.2 with the old config_compilers format. CSEG is using new format AND testing multiple compilers. ACME doesn't test multiple compilers. Users noticed.
We can make an X case that mimics the Titan layout. JimF has a machine-neutral way of testing the aprun format with a doctest. Will assert the current aprun command. Now in scripts_regression on the ACME side.
Short term archiving: Its definitely limiting to always run short-term archive in a separate job. But there are options for getting it through NERSC queues by using "transfer queue" at NERSC. We will add back the ability run archiving with the batch run as an option.
We made changes for MPAS. In the python scripts. Need another pass through config_archive.xml. To much of assumed naming convention in the code. Need to just key off a string that is specified per model.
Short-term archiving with RESUBMIT not working yet with MPAS. Rob needs to investigate more.
Need to shorten the CIME update time from ESMCI to ACME. Something to try: test ACME master against ESMCI master using a temporary clone of ESMCI/cime in place of AMCE/cime.
Calling preview_namelist: Yes its done twice in a submit. The first one is to account for changes between building and submitting. The last one, in case.run, is in case people edit namelists while the job is waiting in the queue. We can add an option to not do that and just tell CIME the namelists will stay untouched after case.submit. Can we also check to see if an env file was changed? Maybe. Need to understand more about which env files can affect the namelists.
JimE is working on setting array elements in a Fortran namelist. Had to redo some low-level data structures.
To be added to coupler: passing ozone from atm to land. Also nitrogen deposition.
Discussion on adding robustness. Using st-archiver, can make an "interim restart directory". For restarts that are created during a run. To start out from them, need to have a consistent collection. st-archiver will create directories with consistent sets. Situation: run for 1 years, write restarts every month. Say you die in month 6. Call short-term archiver, it will create restart directories for all 5 complete sets. Really useful for backing up to a restart prior to the current period.
Can test the recover part without a true fail. Can test fail detection by just running on Titan.
Archive script can be run at command line.
CESM and CIME updating. CSEG only makes changes in ESMCI. Lag between making changes and exposing them to CESM is a couple of days at most. CESM actually points to fork of ESMCI/cime in CESM-Development. CESM-Development/cime is a mirror of ESMCI/cime.
Looked at the waffle board and the backlog.
Looked at waffle board. Closed some old issues.
We looked at new tools to make web pages showing all the namelist variables that are in xml for each model. (Also, all namelist variables for a model should be in xml.) These tools will be checked in. Its a standalone page so can be dropped in anywhere. Not clear if github pages will make it inherit any of the style. We'll just put the driver and data model namelists with the CIME github pages.
We looked at readthedocs.org to host our Sphinx-generated documentation. It has a different skin, different URL. But it will do the generation of the html and keep it out of our repo.
Talked about licensing, copyright. CIME5 will have copyrights for UCAR, Sandia, Argonne. License will be BSD 3-clause. License: GPTL needs at lest the LGPL. Would prefer BSD 3-clause.
How to make changes to documentation. Make a PR? Ask in Slack? Just push directly to gh-pages unless you really want someone to review it.
"full" "active" "prognostic" all mean the same thing. Lets use just one in documentation. DECISION: "active".
The 5.2 series of tags ends with the old directly structure. Switch to 5.3
JSON vs. XML. MOM6 developer asked about JSON. Its similar in concept to XML. Cleaner. More python-like. Has more types. Doesn't have a native analog to attributes. At one point, JimF wanted to convert all our XML to JSON.
License for an official release. Use BSD 3-clause? Current license is the UCAR one, very permissive and very similar to the MPAS one. Just need to add a Copyright in the top LICENSE.TXT
Possible banner for every file:
! Copyright and license information can be found in the LICENSE.TXT file ! distributed with this code, or at http://github.com/ESMCI/cime/LICENSE.TXT ! !|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Current big picture for CIME: Mostly doing documentation, getting models up to latest CIME, hardening. No big development pushes. CMDV is working on CMake build system. ACME working on ne4 compsets.
Ideas for future development: All seq_fields_mod fields be set at runtime. Better naming convention. Code generation for the fields. Better way of specifying pe layouts. config_pes still to repetitive or not picking up right layout.
Agenda: github labels, CIME directory structure,
We discussed proposal for restructuring CIME source directories. See https://github.com/ESMCI/cime/issues/1178 What does "share" mean? Its used by more then one component which implies it must be built early. Issue: some driver-specific code has wound up in files that are in "shr". Separating that out is more work.
Some whole files can be moved. Will do that in a second pass after moving directories.
Other proposal: CIME/utils/python -> CIME/scripts/lib. Still need CIME/scripts/lib/CIME without major search/replace. CIME/utils/python/tests -> CIME/scripts/tests/
remove CIME/scripts/Testing
CIME/cime_config/buildlib*. Andy will move to src/buildscripts/. Jim is rewriting in python.
Remove cime/tests
later: CIME/scripts/Tools -> CIME/scripts/case_support or case_tools. Leave CIME/tools.
Who will do this? Andy
Documentation: JimE will write about case.build for User's guide.
(temporary day/time change)
Build system for CLM and CIME pfunit unit tests didn't use new CIME infrastructure. Was reading XML in a non-CIME way. Changes to config_compilers broke it. Unit tests weren't be run by developers so missed it for a while. Now unit tests will read the XML correctly. Added unit tests to scripts_regression but depends on pfunit being installed. Currently only yellowstone trys to run pfunit tests. Will try to expand that list.
ACME update to 5.2: our component buildnml scripts are still perl. Logical comparisons with lowercase vs. upper case returned by xmlquery is a problem.
Workflow: ACME will make changes to cime/cime_config/acme over in the ACME side. Any other changes will go through ESMCI first. CESM will point to an "allactive" outside of CIME and make its changes there.
FindNetcdf/FindPnetcdf CMake routines had diverged between PIO2/cmake and externals/cmake. PIO2 one was chosen and it broke PIO build in a few places.
New cprnc-python will be made part of cime.
Only the PR assignee should integrate a branch to master.
Using Jupiter for CIME tutorials? Get JimE in touch with CMDV-Software folks.
Next meeting will be 3:30pm MT, 2/22/17
We looked at documentation with sphinx. For now, we will maintain .rst files in a cime/doc directory, build html with sphinx, and push that to github pages. All done on the gh-pages branch of cime. wiki content will be deleted. For now, push directly to the gh-pages branch. Don't forget to rebuild the html (you must install sphinx).
We will avoid stepping on each other by communicating what we are working on.
We looked at the waffle board.
Discussion on test coverage. Not all of the CIME data and stub models are being tested in cime_developer. Rob will look at coverage.
We went through issue backlog at: https://github.com/ESMCI/cime/issues and assigned people to work on issues.
We looked at the waffle board at made sure it was up-to-date: https://waffle.io/ESMCI/cime
JimE is focussing on the port to NCAR's new Cheyane system.
Will clean up the github issue tags and make them more relevant to the parts of CIME, then start using them.
We affirmed that documentation on the wiki should be done in .rest format with multiple files. Need to revisit how to convert this via Sphinx to nicely formatted docs and explore what effect multiple filenames has. Agreed it was ok to not have section numbers in the wiki if necessary.
Decision: keep meeting notes.
Clarify threading tests: PET will force a run to have 2 threads at least.
ERP: will halve the tasks and threads. If your initial number of threads is not >1, threads won't change but tasks will still change.
CESM mostly uses ERP instead of PEM.
CCSM_CO2_PPMV: The background CO2 concentration. Any compset that has CAM, has a CO2 coupling scenario. CAM controls the CO2 concentration and that value should be ignored. But there was at least one case where CAM would use CCSM_CO2_PPMV. New CAM version in CESM is always ignoring CESM_CO2_PMV.
There is a set of scenarios for how CO2 is shared. CO2A - CAM sends it. CO2B - surface components send it but only used in CAM diagnosticly, CO2C - surface sends and CAM uses it.
POP and CLM can do scenarios where it ignores CAM's CO2 and uses a different background. To recognize that, want to get rid of CCSM_CO2_PPMV (and its many values) and replace it with CO2_TYPE_CONSTANT_VALUE. CAM will always get its value from the namelist. Latest CAM completely ignores CCSM_CO2_PPMV.
Starting github issues: we'll do it for everything except machine fixes. How to set priority? Can use "size" in waffle board. Or order them in waffle board.
Discussion of issue #692.
Latest CESM CAM/CLM has script called "buildcpp". If you have a CPP dependency, buildcpp will be called by buildlib. Might be called by buildnml if doing a namelist comparison. Eliminates need to always call preview_namelists during build. All components need to have that before removing call to preview_namelists. buildlib used to have to call preview_namelist to evaluate cpp and filepaths. Now buildlib's can call that the cpp part directly.
Need to cleanup the backlog. Look for issues that are out-of-date.
Overall guidance. Need to communicate big development pushes. Namelist generation: speeding it up, cleaning up. Documentation: Mariana will write create_newcase, JimE will write porting, Rob will clean-up outline. We'll use github issues to announce new documentation.
Next week: look at issue tags.