-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get zstd compression in netcdf on wcoss2 operation #2319
Comments
@BrianCurtis-NOAA I have a zstd enabled netcdf and hdf5 combinations added on acorn. Please test them: module use /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.9 This has reproduced the UFS compression tests that previous done by Dusan. |
@Hang-Lei-NOAA Please point us at your version of modulefiles/ufs_acorn.intel.lua you used for testing. Thanks. |
Hi, All, Please copy both |
Thanks. Where is 'zstd' module loaded? |
@DusanJovic-NOAA Please update the ufs_common.lua file again. |
@Hang-Lei-NOAA @edwardhartnett since acorn is still not available, is there a way we can move this forward? Several UFS applications are waiting to run experiments with this feature. Thank you! |
@jun Wang - NOAA Federal ***@***.***> We have to wait for a while.
Acorn will be back soon.
…On Mon, Jul 22, 2024 at 10:03 AM Jun Wang ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> @edwardhartnett
<https://github.com/edwardhartnett> since acorn is still not available,
is there a way we can move this forward? Several UFS applications are
waiting to run experiments with this feature. Thank you!
—
Reply to this email directly, view it on GitHub
<#2319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFHMFIKYRBQH7FMEAADZNUGLFAVCNFSM6AAAAABJEV7XLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTGA2DMMBSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Hang-Lei-NOAA I've used the modulefiles from your runs with Dusan a while ago and the tests that use ZSTANDARD_LEVEL=5 have passed. Did we need to run the full suite with these tests or should this be enough to move things forward? |
Yes, please!
…On Mon, Aug 5, 2024 at 2:48 PM Brian Curtis ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> I've used the
modulefiles from your runs with Dusan a while ago and the tests that use
ZSTANDARD_LEVEL=5 have passed. Did we need to run the full suite with these
tests or should this be enough to move things forward?
—
Reply to this email directly, view it on GitHub
<#2319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGJJCP3QLLAZAUIC2TZP7CGXAVCNFSM6AAAAABJEV7XLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGY4TEOJTGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Is there more testing that will be done, or is the UFS team confident this works? |
all failed due to wallclock
Last bit before it stops running, seems early in the process:
None of these tests use IDEFLATE=1 or ZSTANDARD_LEVEL=5, but maybe they need it with these lib changes? |
@Hang-Lei-NOAA Did you build ESMF with netcdf/zstd? It seems the PIO issue coming in this ESMF build too, can you check how the ESMF is built with netcdf/zlib? Since atm-land test does not use the compression at all (both ideflate and zstandard_level = 0), they should not be impacted at all, but now we see the PIO issue.
|
@jun Wang - NOAA Federal ***@***.***> This ESMF used the system
installed esmf-C, which is not built with the zstd
…On Fri, Aug 23, 2024 at 12:55 PM Jun Wang ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> Did you build ESMF with
netcdf/zstd? It seems the PIO issue coming in this ESMF build too, can you
check how the ESMF is built with netcdf/zlib? Since atm-land test does not
use the compression at all (both ideflate and zstandard_level = 0), they
should not be impacted at all, but now we see the PIO issue.
20240821 211108.437 WARNING PET150 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF Unable to open existing file: INPUT/oro_data.tile1.nc, (PIO/PNetCDF error =
NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20240821 211108.438 WARNING PET150 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF Unable to open existing file: INPUT/oro_data.tile2.nc, (PIO/PNetCDF error = NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20240821 211108.438 WARNING PET150 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF Unable to open existing file: INPUT/oro_data.tile3.nc, (PIO/PNetCDF error =
NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20240821 211108.439 WARNING PET150 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF Unable to open existing file: INPUT/oro_data.tile4.nc, (PIO/PNetCDF error =
NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20240821 211108.440 WARNING PET150 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF Unable to open existing file: INPUT/oro_data.tile5.nc, (PIO/PNetCDF error =
NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20240821 211108.440 WARNING PET150 ESMCI_PIO_Handler.C:1404 ESMCI::PIO_Handler::openOneTileF Unable to open existing file: INPUT/oro_data.tile6.nc, (PIO/PNetCDF error =
NetCDF: Attempt to use feature that was not turned on when netCDF was built.)
20240821 211108.440 ERROR PET150 ESMCI_PIO_Handler.C:617 ESMCI::PIO_Handler::arrayReadOne Unable to read from file - file not open
20240821 211108.440 ERROR PET150 ESMCI_IO_Handler.C:405 ESMCI::IO_Handler::arrayRead() Unable to read from file - Internal subroutine call returned Error
20240821 211108.440 ERROR PET150 ESMCI_IO.C:382 ESMCI::IO::read() Unable to read from file - Internal subroutine call returned Error
20240821 211108.440 ERROR PET150 ESMCI_IO.C:282 ESMCI::IO::read() Unable to read from file - Internal subroutine call returned Error
20240821 211108.440 ERROR PET150 ESMCI_IO_F.C:210 c_esmc_ioread() Unable to read from file - Internal subroutine call returned Error
20240821 211108.440 ERROR PET150 ESMF_IO.F90:397 ESMF_IOAddArray() Unable to read from file - Internal subroutine call returned Error
20240821 211108.440 ERROR PET150 ESMF_FieldBundle.F90:14436 ESMF_FieldBundleRead() Unable to read from file - Internal subroutine call returned Error
—
Reply to this email directly, view it on GitHub
<#2319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFEDFBXEDMET5E4N5CDZS5SOVAVCNFSM6AAAAABJEV7XLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGQ3DGOBWGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Hang-Lei-NOAA Can you check if the esmf library is loaded correctly in Brian's testing?
|
@brian Curtis - NOAA Affiliate ***@***.***> That is my fault. I
was thinking you are skipping atmlnd test on acorn. So last time you asked
me to review the modulefiles. We keep the system esmf-C library.
@jun Wang - NOAA Federal ***@***.***> From your email, you want the
fix on the atmlnd case. Then I have further revised the modulefile to use
all my installed libraries.
Please Brian use this for
tests: /lfs/h1/emc/nceplibs/noscrub/Hang.Lei/works/clibs/modulefiles/ufs_acorn.intel.lua.brian
…On Fri, Aug 23, 2024 at 1:28 PM Jun Wang ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> Can you check if the
esmf library is loaded correctly in Brian's testing?
/lfs/h1/emc/nems/noscrub/brian.curtis/git/BrianCurtis-NOAA/ufs-weather-model/netcdf_zstd/modulefiles/ufs_acorn.intel.lua
—
Reply to this email directly, view it on GitHub
<#2319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGOB2XDZBVOGHNUXC3ZS5WNFAVCNFSM6AAAAABJEV7XLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGUYTINZVGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
You're correct, they're ignored on WCOSS2. This testing is specifically for WCOSS2, since spack-stack is the official source for Acorn. I would say then that this is a successful test on Acorn for netcdf and zstd. @junwang-noaa do you agree? |
@Hang-Lei-NOAA I am not asking to fix the the atmlnd case. These tests are currently working on Acorn in the develop branch with spack-stack library (please see links below), they are skipped on wcoss2 and NOAA cloud. These tests failed on acorn when Brian tested the model with the new acorn module file with zstd netcdf library updates. https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/ufs_acorn.intel.lua https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/rt.conf#L307 |
@junwang-noaa my acorn modulefile is modified from the wcoss2 modulefile, so it acts most like WCOSS2 instead of Acorn. |
I didn't want to hack too much of the rt system to make it run the wcoss2 tests only as well. So it runs the Acorn tests. |
My modified new modulefiles (skipping GDIT libraries) can succeed in all
these tests too.
…On Fri, Aug 23, 2024 at 2:56 PM Brian Curtis ***@***.***> wrote:
@junwang-noaa <https://github.com/junwang-noaa> my acorn modulefile is
modified from the wcoss2 modulefile, so it acts most like WCOSS2 instead of
Acorn.
—
Reply to this email directly, view it on GitHub
<#2319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGSSGEROGNS2IZJPPDZS6AU7AVCNFSM6AAAAABJEV7XLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGY2DEMZUGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Hang-Lei-NOAA would you please post the RT test log from your modified new module files? |
@jun Wang - NOAA Federal ***@***.***> I will rerun the case soon.
and post here in an hour or so. Due to the disk quota limit in adding extra
lib to the spack-stack 1.6.0 last week, I cleaned my space including the
UFS tests.
…On Mon, Aug 26, 2024 at 9:17 AM Jun Wang ***@***.***> wrote:
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> would you please post
the RT test log from your modified new module files?
—
Reply to this email directly, view it on GitHub
<#2319 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFBYCQHQ2OB4GAXTLWDZTMTHJAVCNFSM6AAAAABJEV7XLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGE4TKNBSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hang's modified modulefile (with .brian at the end) shows the same as before, but since those tests are not run on WCOSS2, we should be able to proceed. @junwang-noaa are you OK with this? |
The UFS model develop branch runs on acorn, but loading the ufs_wcoss2.intel.lua file in run time. |
yes, for clarification, i'm using a modified modulefile from WCOSS2 but using acorn RT tests. For future testing, it might be helpful to have a setup that works to confuse rt.sh to think its running wcoss2 tests. |
Description
The zstd compression in netcdf have been tested before. Now need to have it on wcoss2
Will fully test on our end, and then need to get zstd on operational machines, and deliver the whole packages.
The text was updated successfully, but these errors were encountered: