Skip to content

Commit

Permalink
[NASA] Update DaCe to v0.15.1 RC (#41)
Browse files Browse the repository at this point in the history
* NASA commits sync (#31)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* lint

* More linting

* Remove unused if leading to empty code block

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0
Fix bad merge for bdt with GEOS_Wrapper

* Remove unused code

* Fix theroritical timings
Lint

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

This reverts commit 4fc5b4d.

* Revert "Remove previous per stencil override of default_build_folder"

This reverts commit 2245027.

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Fix bad requirements syntax

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Replace all logger with pace_log
Introduce PACE_LOGLEVEL to control log level from outside

* Code guidelines clean up

* Devops/GitHub actions on (#15)

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Remove cd to pace

* Lint: git submodule recursive

* Typo

* Add openmpi to the image

* Linting

* Fix unit tests (remove dxa, dya rely on halo ex)

* typo

* Change name of jobs

* Distributed compilation on orchestrated backend for NxN layouts (#14)

* Adapt orchestration distribute compile for NxN layout

* Remove debug code

* Add a more descriptive string base postfix for cache naming
Identify the code path for all cases
Consistent reload post-compile
Create a central space for all caches generation logic
No more original layout check required

* Add a test on caches relocatability

* Verbose todo

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Remove cd to pace

* Lint: git submodule recursive

* Typo

* Add openmpi to the image

* Linting

* Fix unit tests (remove dxa, dya rely on halo ex)

* typo

* Change name of jobs

* Missing enum

* Lint imports

* Fix unit tests

* Deactivate relocability test due to Python crash
Logged as issyue 16

* Typo

* Raise for 1,X and X,1 layouts which requires a new descriptor

* Added ak, bk for 137 levels in eta.py

* Add floating point precision to GEOS bridge init

* lint

* Add device PCI bus id (for MPS debug)

* Typo + lint

* Try to detect MPS reading the "log" pipe

* Lint

* Clean up

* Log info GEOS bridge (#18)

* Add floating point precision to GEOS bridge init

* lint

* Add device PCI bus id (for MPS debug)

* Typo + lint

* Try to detect MPS reading the "log" pipe

* Lint

* Clean up

* Update geos/develop to grab NOAA PR9 results (#21)

* Verbose choice of block/grid size

* added build script for c5

* updated repo to NOAA

* GEOS integration (#9)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* Remove unused if leading to empty code block

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0
Fix bad merge for bdt with GEOS_Wrapper

* Remove unused code

* Fix theroritical timings

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

* Revert "Remove previous per stencil override of default_build_folder"

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Fix bad requirements syntax

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Fix or explain inlined import

* Verbose runtime error when bad dt_atmos

* Verbose warm up

* re-initialize heat_source and diss_est each call, add do_skeb check to accumulation

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>

---------

Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>

* [NOAA:Update] Bring back #15 & doubly periodic domain (#25)

* Feature/dp driver (#13)

* initial commit

* adding test config

* adding the rest of driver and util code

* updating history.md

* move u_max to dycore config

* uncomment assert

* added comment explaining the copy of grid type to dycore config

* Turn main unit test  & lint on PR, logger clean up [NASA:Update]  (#15)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0

* Fix theroritical timings

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Introduce PACE_LOGLEVEL to control log level from outside

* Code guidelines clean up

* Devops/GitHub actions on (#15)

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Fix unit tests (remove dxa, dya rely on halo ex)

* Update HISTORY.md

* Adapt log_level in driver.run

* Verbose the PACE_CONSTANTS

* Doc log level hierarchical nature

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>

* Lint

---------

Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>

* Update gt4py, dace, cleanup (#19)

* Update gt4py to top of master on June 21

* Update DaCe to 0.14.2
Workaround aliasing issue in FiniteVolumeTransport

* Fix to gt4py storage

* Downgrade to dace 0.14.1

* DaCe to 0.14.4
Orchestrating NonHydrostaticPressureGradient
Adptating code to newer gt4py

* Regenerate constraints.txt

* Default constants to GFS
Fix snapshot for GPU runs
Lint on ETA
Fix log level

* Remove `daint_venv` submodule

* Adding dace as a submodule
Removing buildenv as a submodule

* Update gt4py to latest master

* Skip ConstantPropagation during `Simplify`

* Remove buidlenv

* Update requirements_dev.txt

* Add editable util to requirements_dev.txt

* lint

* scipy for tests is now needed

* Pin `DaCe` to pace-fixes-0 merge

* Remove logging setup in test_translate

* Make cupy import robust to device not being available

* Fix to GEOS bridge MPS detection

* Up gt4py to August 14th EOD:
  - Hip/ROCm
  - New allocators

* DaCE module: swap SSH for HTTPS (#26)

* GEOS GridTools stencils build override (#27)

* Stencil build override for GEOS

* Deactivate warnings if PACE_LOGLEVEL is > WARNING

* Better log level

* Bad merge (again)

* NASA fork sync. (#37) (#30)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* lint

* More linting

* Remove unused if leading to empty code block

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0
Fix bad merge for bdt with GEOS_Wrapper

* Remove unused code

* Fix theroritical timings
Lint

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

This reverts commit 4fc5b4d.

* Revert "Remove previous per stencil override of default_build_folder"

This reverts commit 2245027.

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Fix bad requirements syntax

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Replace all logger with pace_log
Introduce PACE_LOGLEVEL to control log level from outside

* Code guidelines clean up

* Devops/GitHub actions on (#15)

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Remove cd to pace

* Lint: git submodule recursive

* Typo

* Add openmpi to the image

* Linting

* Fix unit tests (remove dxa, dya rely on halo ex)

* typo

* Change name of jobs

* Distributed compilation on orchestrated backend for NxN layouts (#14)

* Adapt orchestration distribute compile for NxN layout

* Remove debug code

* Add a more descriptive string base postfix for cache naming
Identify the code path for all cases
Consistent reload post-compile
Create a central space for all caches generation logic
No more original layout check required

* Add a test on caches relocatability

* Verbose todo

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Remove cd to pace

* Lint: git submodule recursive

* Typo

* Add openmpi to the image

* Linting

* Fix unit tests (remove dxa, dya rely on halo ex)

* typo

* Change name of jobs

* Missing enum

* Lint imports

* Fix unit tests

* Deactivate relocability test due to Python crash
Logged as issyue 16

* Typo

* Raise for 1,X and X,1 layouts which requires a new descriptor

* Added ak, bk for 137 levels in eta.py

* Add floating point precision to GEOS bridge init

* lint

* Add device PCI bus id (for MPS debug)

* Typo + lint

* Try to detect MPS reading the "log" pipe

* Lint

* Clean up

* Log info GEOS bridge (#18)

* Add floating point precision to GEOS bridge init

* lint

* Add device PCI bus id (for MPS debug)

* Typo + lint

* Try to detect MPS reading the "log" pipe

* Lint

* Clean up

* Update geos/develop to grab NOAA PR9 results (#21)

* Verbose choice of block/grid size

* added build script for c5

* updated repo to NOAA

* GEOS integration (#9)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* Remove unused if leading to empty code block

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0
Fix bad merge for bdt with GEOS_Wrapper

* Remove unused code

* Fix theroritical timings

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

* Revert "Remove previous per stencil override of default_build_folder"

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Fix bad requirements syntax

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Fix or explain inlined import

* Verbose runtime error when bad dt_atmos

* Verbose warm up

* re-initialize heat_source and diss_est each call, add do_skeb check to accumulation

---------




---------






* [NOAA:Update] Bring back #15 & doubly periodic domain (#25)

* Feature/dp driver (#13)

* initial commit

* adding test config

* adding the rest of driver and util code

* updating history.md

* move u_max to dycore config

* uncomment assert

* added comment explaining the copy of grid type to dycore config

* Turn main unit test  & lint on PR, logger clean up [NASA:Update]  (#15)

* Initialize GeosDycoreWrapper with bdt (timestep)

* Use GEOS version of constants

* 1. Add qcld to the list of tracers beings advected
2. Made GEOS specific changes to thresholds in saturation adjustment

* Accumulate diss_est

* Allow GEOS_WRAPPER to process device data

* Add clear to collector for 3rd party use. GEOS pass down timings to caller

* Make kernel analysis run a copy stencil to compute local bandwith
Parametrize tool with backend, output format

* Move constant on a env var
Add saturation adjustement threshold to const

* Restrict dace to 0.14.1 due to a parsing bug

* Add guard for bdt==0

* Fix theroritical timings

* Fixed a bug where pkz was being calculated twice, and the second calc was wrong

* Downgrade DaCe to 0.14.0 pending array aliasing fix

* Set default cache path for orchestrated DaCe to respect GT_CACHE_* env

* Remove previous per stencil override of default_build_folder

* Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env"

* Read cache_root in default dace backend

* Document faulty behavior with GT_CACHE_DIR_NAME

* Check for the string value of CONST_VERSION directly instead of enum

* Protect constant selection more rigorusly.
Clean abort on unknown constant given

* Log constants selection

* Refactor NQ to constants.py

* Introduce PACE_LOGLEVEL to control log level from outside

* Code guidelines clean up

* Devops/GitHub actions on (#15)

* Linting on PR

* Run main unit test

* Update python to available 3.8.12

* Fix unit tests (remove dxa, dya rely on halo ex)

* Update HISTORY.md

* Adapt log_level in driver.run

* Verbose the PACE_CONSTANTS

* Doc log level hierarchical nature

---------




* Lint

---------





* Update gt4py, dace, cleanup (#19)

* Update gt4py to top of master on June 21

* Update DaCe to 0.14.2
Workaround aliasing issue in FiniteVolumeTransport

* Fix to gt4py storage

* Downgrade to dace 0.14.1

* DaCe to 0.14.4
Orchestrating NonHydrostaticPressureGradient
Adptating code to newer gt4py

* Regenerate constraints.txt

* Default constants to GFS
Fix snapshot for GPU runs
Lint on ETA
Fix log level

* Remove `daint_venv` submodule

* Adding dace as a submodule
Removing buildenv as a submodule

* Update gt4py to latest master

* Skip ConstantPropagation during `Simplify`

* Remove buidlenv

* Update requirements_dev.txt

* Add editable util to requirements_dev.txt

* lint

* scipy for tests is now needed

* Pin `DaCe` to pace-fixes-0 merge

* Remove logging setup in test_translate

* Make cupy import robust to device not being available

* Fix to GEOS bridge MPS detection

* Up gt4py to August 14th EOD:
  - Hip/ROCm
  - New allocators

* DaCE module: swap SSH for HTTPS (#26)

* GEOS GridTools stencils build override (#27)

* Stencil build override for GEOS

* Deactivate warnings if PACE_LOGLEVEL is > WARNING

* Better log level

* Bad merge (again)

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>

* Update DaCe to 0.15.1 RC (#35)

- Update: DaCe to 0.15.1 RC and GT4Py to latest main 
- Minor: orchestration build logging 
- Minor: dead code clean up

---------

Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Purnendu Chakraborty <[email protected]>
Co-authored-by: Rusty Benson <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
Co-authored-by: Oliver Elbert <[email protected]>
  • Loading branch information
6 people authored Dec 7, 2023
1 parent 771ca7c commit 4b26398
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 25 deletions.
25 changes: 9 additions & 16 deletions dsl/pace/dsl/dace/orchestration.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,8 @@ def _build_sdfg(
memory_pooled += arr.total_size * arr.dtype.bytes
arr.lifetime = dace.AllocationLifetime.Scope
memory_pooled = float(memory_pooled) / (1024 * 1024)
DaCeProgress.log(
DaCeProgress.default_prefix(config),
f"Pooled {memory_pooled} mb",
pace_log.debug(
f"{DaCeProgress.default_prefix(config)} Pooled {memory_pooled} mb",
)

# Set of debug tools inserted in the SDFG when dace.conf "syncdebug"
Expand All @@ -195,12 +194,10 @@ def _build_sdfg(

# Printing analysis of the compiled SDFG
with DaCeProgress(config, "Build finished. Running memory static analysis"):
DaCeProgress.log(
DaCeProgress.default_prefix(config),
report_memory_static_analysis(
sdfg, memory_static_analysis(sdfg), False
),
report = report_memory_static_analysis(
sdfg, memory_static_analysis(sdfg), False
)
pace_log.info(f"{DaCeProgress.default_prefix(config)} {report}")

# Compilation done.
# On Build: all ranks sync, then exit.
Expand All @@ -211,17 +208,13 @@ def _build_sdfg(
# a true multi-machine sync, outside of our own communicator class.
if config.get_orchestrate() == DaCeOrchestration.Build:
MPI.COMM_WORLD.Barrier() # Protect against early exist which kill SLURM jobs
DaCeProgress.log(
DaCeProgress.default_prefix(config),
"Build only, exiting.",
)
pace_log.info(f"{DaCeProgress.default_prefix(config)} Build only, exiting.")
exit(0)
elif config.get_orchestrate() == DaCeOrchestration.BuildAndRun:
if not is_compiling:
DaCeProgress.log(
DaCeProgress.default_prefix(config),
"Rank is not compiling. "
"Waiting for compilation to end on all other ranks...",
pace_log.info(
f"{DaCeProgress.default_prefix(config)} Rank is not compiling."
"Waiting for compilation to end on all other ranks..."
)
MPI.COMM_WORLD.Barrier()

Expand Down
8 changes: 2 additions & 6 deletions dsl/pace/dsl/dace/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,17 @@ def __init__(self, config: DaceConfig, label: str):
self.prefix = f"[{config.get_orchestrate()}]"
self.label = label

@classmethod
def log(cls, prefix: str, message: str):
pace_log.debug(f"{prefix} {message}")

@classmethod
def default_prefix(cls, config: DaceConfig) -> str:
return f"[{config.get_orchestrate()}]"

def __enter__(self):
DaCeProgress.log(self.prefix, f"{self.label}...")
pace_log.debug(self.prefix, f"{self.label}...")
self.start = time.time()

def __exit__(self, _type, _val, _traceback):
elapsed = time.time() - self.start
DaCeProgress.log(self.prefix, f"{self.label}...{elapsed}s.")
pace_log.debug(self.prefix, f"{self.label}...{elapsed}s.")


def _is_ref(sd: dace.sdfg.SDFG, aname: str):
Expand Down
2 changes: 1 addition & 1 deletion external/dace
Submodule dace updated 189 files
2 changes: 1 addition & 1 deletion external/gt4py
Submodule gt4py updated 161 files
3 changes: 2 additions & 1 deletion fv3core/tests/savepoint/translate/translate_a2b_ord4.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def __call__(
vort,
delpc,
dt,
grid_type,
grid_type: int,
):
# this function is kept because it has a translate test, if its
# structure is changed significantly from __call__ of DivergenceDamping
Expand Down Expand Up @@ -80,5 +80,6 @@ def compute_from_storage(self, inputs):
nord_col,
nord_col,
)
inputs["grid_type"] = 0
self.compute_obj(divdamp, **inputs)
return inputs

0 comments on commit 4b26398

Please sign in to comment.