Ability to run tier 2 tests locally (#481)

* Tier 2 test support for user yaml, added documentation * Support for tier 2 tests from command line * Added documentation for tier2 tests and platform choice * removed leftover print * Added capability to use tier2 overrides from suite_tests * string format fix * create_experiment now uses tier2 overrides if file is present * Simplified tier test selection, removed print calls from debug * fixed call to tier2 enum * fixes to suite tests * removed unused function
GEOS-ESM · Jan 3, 2025 · ba8ea75 · ba8ea75
1 parent ac67008
commit ba8ea75
Show file tree

Hide file tree

Showing 5 changed files with 165 additions and 8 deletions.
diff --git a/docs/code_tests/suite_tests.md b/docs/code_tests/suite_tests.md
@@ -45,6 +45,7 @@ One recommended override, per the example below, is to request a local copy of `
     ```
 This works because, unless otherwise specified, `sbatch` automatically inherits all current environment variables, which you have already configured in step 2 above.
 If you prefer, you can create a dedicated `sbatch` script to wrap the `swell t1test ...` command, to be extra sure that the environment is exactly as it should be.
+By default, tier tests will be run on the `nccs_discover_sles15` platform, alternative platforms can be specified with the `-p` flag, similar to other swell commands.
 
 4. Repeat (2) for other tests you would like to run. Currently, we recommend running the following tests:
     - `3dvar`
@@ -56,4 +57,37 @@ Debug as necessary.
 
 ## Tier 2 tests
 
-Coming soon!
+Swell tier2 tests are run in a similar way to tier1 tests. Overrides from ~/.swell/swell-test.yaml are read and used in the test. Tier 2 tests generally involve building JEDI before running each test suite, a time and computationally expensive process. This involves cloning the Git repositories for JEDI, which must be done on a Discover login node, as compute nodes do not have internet access. In addition, access to private  Git repositories necessary to build JEDI requires the user to be part of the JDSCA-internal organization on Github. A `~/.git-credentials` must be created containing Github access token information (see [jedi bundle documentation](https://github.com/GEOS-ESM/jedi_bundle/blob/develop/docs/git_credentials.md))
+
+The recommended way to run tier 2 tests on NCCS Discover is as follows:
+
+1. (Optional but recommended), create a file called `~/.swell/swell-test.yaml`.
+Like tier 1 tests, setting the root directory controls where test outputs will be stored (replacing with a real file path):
+
+    ```yaml
+    test_root: /path/to/tier1/test/outputs
+    ```
+(If unset, the test function will create a temporary directory that is deleted by the operating system when the `sbatch` job concludes. 
+You can still run the tests without this, but you won't be able to study the outputs.) Other overrides will be passed to the test.
+By default, tier 2 tests will build JEDI at the beginning of the job, unless a path to an existing JEDI build is specified in the user's `~/.swell/swell-test.yaml`, using the lines
+    ```yaml
+    jedi_build_method: use_existing
+    existing_jedi_build_directory: /path/to/jedi/build/directory
+    exising_jedi_source_directory: /path/to/jedi/source/directory
+    ```
+2. Ensure your SWELL interactive enviroment is set up correctly (see step 2 under tier 2 tests)
+
+3. To start a tier 2 test on a login node (This is needed for internet access for cloning jedi repositories, please do this sparingly to conserve NCCS resources. If you have a sucessful JEDI build and want to test it further, use the overrides in the `~/.swell/swell-test.yaml` to avoid having to build JEDI and use login nodes)
+    ```sh
+    swell t2test <suite> -p <platform>
+    # The platform is specified with -p, this will default to nccs_discover_sles15.
+    ```
+This works because, unless otherwise specified, `sbatch` automatically inherits all current environment variables, which you have already configured in step 2 above.
+
+4. Repeat (2) for other tests you would like to run. Currently, we recommend running the following tests:
+    - `3dvar`
+    - `hofx`
+    - `ufo_testing`
+    - `convert_ncdiags`
+    - `3dfgat_atmos`
+    - `build_jedi`
diff --git a/src/swell/deployment/prepare_config_and_suite/prepare_config_and_suite.py b/src/swell/deployment/prepare_config_and_suite/prepare_config_and_suite.py
@@ -18,6 +18,7 @@
 from swell.deployment.prepare_config_and_suite.question_and_answer_defaults import GetAnswerDefaults
 from swell.utilities.logger import Logger
 from swell.utilities.jinja2 import template_string_jinja2
+from swell.utilities.dictionary import update_dict
 
 
 # --------------------------------------------------------------------------------------------------
@@ -298,6 +299,14 @@ def override_with_external(self) -> None:
             with open(test_file, 'r') as ymlfile:
                 override_dict = yaml.safe_load(ymlfile)
 
+        # Update overrides with tier2 suite test file if available
+        tier2_test_file = os.path.join(get_swell_path(), 'test', 'suite_tests',
+                                       self.suite + '-tier2.yaml')
+        if os.path.exists(tier2_test_file):
+            with open(tier2_test_file, 'r') as ymlfile:
+                tier2_override_dict = yaml.safe_load(ymlfile)
+            override_dict = update_dict(override_dict, tier2_override_dict)
+
         # Now append with any user provided override
         if self.override is not None:
 

diff --git a/src/swell/swell.py b/src/swell/swell.py
@@ -16,7 +16,7 @@
 from swell.deployment.launch_experiment import launch_experiment
 from swell.tasks.base.task_base import task_wrapper, get_tasks
 from swell.test.test_driver import test_wrapper, valid_tests
-from swell.test.suite_tests.suite_tests import run_suite
+from swell.test.suite_tests.suite_tests import run_suite, TestSuite
 from swell.utilities.suite_utils import get_suites
 from swell.utilities.welcome_message import write_welcome_message
 from swell.utilities.scripts.utility_driver import get_utilities, utility_wrapper
@@ -244,15 +244,42 @@ def test(test: str) -> None:
 
 
 @swell_driver.command()
+@click.option('-p', '--platform', 'platform', type=click.Choice(get_platforms()),
+              default="nccs_discover_sles15", help=platform_help)
 @click.argument('suite', type=click.Choice(("hofx", "3dvar", "ufo_testing")))
-def t1test(suite: Literal["hofx", "3dvar", "ufo_testing"]) -> None:
+def t1test(
+    suite: Literal["hofx", "3dvar", "ufo_testing"],
+    platform: Optional[str] = "nccs_discover_sles15"
+) -> None:
     """
     Run a particular swell suite from the tier 1 tests.
 
     Arguments:
         suite (str): Name of the suite to run (e.g., hofx, 3dvar, ufo_testing)
     """
-    run_suite(suite)
+    run_suite(suite, platform, TestSuite.TIER1)
+
+
+# --------------------------------------------------------------------------------------------------
+
+
+@swell_driver.command()
+@click.option('-p', '--platform', 'platform', type=click.Choice(get_platforms()),
+              default="nccs_discover_sles15", help=platform_help)
+@click.argument('suite', type=click.Choice(("hofx", "3dvar", "ufo_testing",
+                                            "convert_ncdiags", "3dfgat_atmos", "build_jedi")))
+def t2test(
+    suite: Literal["hofx", "3dvar", "ufo_testing",
+                   "convert_ncdiags", "3dfgat_atmos", "build_jedi"],
+        platform: Optional[str] = "nccs_discover_sles15"
+) -> None:
+    """
+    Run a particular swell suite from the tier 2 tests.
+
+    Arguments:
+        suite (str): Name of the suite to run (e.g., hofx, 3dvar, ufo_testing)
+    """
+    run_suite(suite, platform, TestSuite.TIER2)
 
 
 # --------------------------------------------------------------------------------------------------

diff --git a/src/swell/test/suite_tests/build_jedi-tier1.yaml b/src/swell/test/suite_tests/build_jedi-tier1.yaml
@@ -0,0 +1,6 @@
+jedi_build_method: create
+bundles:
+- fv3-jedi
+- soca
+- iodaconv
+- ufo
diff --git a/src/swell/test/suite_tests/suite_tests.py b/src/swell/test/suite_tests/suite_tests.py
@@ -5,19 +5,67 @@
 from pathlib import Path
 from datetime import datetime
 from importlib import resources
+from enum import Enum
 
 from swell.deployment.create_experiment import create_experiment_directory
 from swell.deployment.launch_experiment import launch_experiment
 from swell.utilities.dictionary import update_dict
 
 
-def run_suite(suite: str):
+class TestSuite(Enum):
+    TIER1 = "tier1"
+    TIER2 = "tier2"
+
+
+def build_jedi_for_tier2(test_dir: str, experiment_id_root: str, platform: str, test_config: dict):
+    suite_overrides_file = (resources.files("swell") /
+                            "test" /
+                            "suite_tests" /
+                            "build_jedi-tier1.yaml")
+
+    with suite_overrides_file.open("r") as f:
+        suite_overrides = yaml.safe_load(f)
+
+    experiment_id = experiment_id_root + "build_jedi"
+
+    override = {
+        "experiment_id": experiment_id,
+        "experiment_root": str(test_dir),
+        **suite_overrides
+    }
+
+    if "override" in test_config:
+        override = update_dict(override, test_config['override'])
+
+    experiment_dir = test_dir / experiment_id
+    experiment_dir.mkdir(parents=True, exist_ok=True)
+    override_yml = experiment_dir / "override.yaml"
+
+    with open(override_yml, "w") as f:
+        yaml.dump(override, f)
+
+    create_experiment_directory(
+        "build_jedi", "defaults", platform,
+        str(override_yml), False, None
+    )
+
+    suite_path = str(experiment_dir / f"{experiment_id}-suite")
+    log_path = str(experiment_dir / "log")
+
+    launch_experiment(suite_path, True, log_path)
+
+    return experiment_dir
+
+
+def run_suite(suite: str, platform: str, test_tier: TestSuite):
     # Add a random int to the experiment_id to mitigate errors from workflows
     # created at (roughly) the same time.
     ii = random.randint(0, 99)
-    experiment_id = f"t{datetime.now().strftime('%Y%jT%H%M')}r{ii:02d}{suite}"
 
-    # Get test directory from `~/.swell/swell-test.yml`
+    experiment_id_root = f"t{datetime.now().strftime('%Y%jT%H%M')}r{ii:02d}"
+    experiment_id = f"{experiment_id_root}{suite}"
+
+    # Get test directory from `~/.swell/swell-test.yaml`
     test_config = {
         "test_root": Path(tempfile.TemporaryDirectory().name)
     }
@@ -47,6 +95,22 @@ def run_suite(suite: str):
     with suite_overrides_file.open("r") as f:
         suite_overrides = yaml.safe_load(f)
 
+    # If it exists, update suite overrides from (suite)-tier2.yaml
+    if test_tier == TestSuite.TIER2:
+        tier2_suite_overrides_file = (resources.files("swell") /
+                                      "test" /
+                                      "suite_tests" /
+                                      f"{suite}-tier2.yaml")
+        if Path(tier2_suite_overrides_file).exists():
+            with open(tier2_suite_overrides_file, 'r') as f:
+                tier2_suite_overrides = yaml.safe_load(f)
+            print("Updating suite with tier 2 overrides" +
+                  f"from: {tier2_suite_overrides_file}")
+            suite_overrides = update_dict(suite_overrides, tier2_suite_overrides)
+        else:
+            print(f"Could not find tier 2 override file for {suite}," +
+                  " defaulting to tier 1 overrides")
+
     override = {
         "experiment_id": experiment_id,
         "experiment_root": str(testdir),
@@ -71,12 +135,29 @@ def run_suite(suite: str):
     experiment_dir = testdir / experiment_id
     experiment_dir.mkdir(parents=True, exist_ok=True)
 
+    # Build JEDI for tier 2 tests if existing build is not specified in user yaml
+    if test_tier == TestSuite.TIER2:
+        if not ("jedi_build_method" in test_config
+           and test_config["jedi_build_method"] == "use_existing"
+           and 'existing_jedi_source_directory' in test_config
+           and 'existing_jedi_build_directory' in test_config):
+            jedi_dir = build_jedi_for_tier2(testdir, experiment_id_root, platform, test_config)
+
+            tier2_override = {"jedi_build_method": "use_existing",
+                              "existing_jedi_source_directory": f"{jedi_dir}/jedi_bundle/source",
+                              "existing_jedi_build_directory": f"{jedi_dir}/jedi_bundle/build"}
+
+            override = update_dict(override, tier2_override)
+
+            if suite == "build_jedi":
+                return None
+
     override_yml = experiment_dir / "override.yaml"
     with open(override_yml, "w") as f:
         yaml.dump(override, f)
 
     create_experiment_directory(
-        suite, "defaults", "nccs_discover_sles15",
+        suite, "defaults", platform,
         str(override_yml), False, None
     )