-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue reading c180 GEOS-IT mass flux in c180 GCHP simulation #2862
Comments
Well, since this is ExtData related I'm adding @bena-nasa. But I'll also mention @tclune and @atrayano as, well, they know this stuff in a way too. Note that MAPL 2.26 is a bit ago so perhaps this has been fixed? Though if so, might have been in ExtData2, not 1. |
@yuanjianz You tried doing what I was going to suggest, changing the "H" in the fourth column to "N" which gets you around the issue. As you say, if they are on the same grid no need to regrid. My first guess is that there is something about @tclune implementation of the flux regridder that is getting confused since it should not be regridding other than a trivial identity regridder. |
It suddenly came to me that @lizziel did several Transport-Tracer simulations with raw c180 GEOS-IT at C180 resolution on Discover. Did you encounter this problem? |
Given that things seemed to work for @lizziel my primary suspicion is that the code is failing to detect that the two grids are in fact identical. But even then, I would have expected the code to work, just wasting resources computing what amounts to an identity transform. |
@yuanjianz Probably not important, but I am confused by the fact that the linked version of MAPL above does not actually produce the error message given in the traceback. ("no such property" does not occur in NewRegridderManager.F90) It will be important for us to confirm the precise version of MAPL to track this down. |
I am looking into this now, first trying to figure out why a subroutine in GriddedIO.F90 is returning false when checking if file grid and run grid are the same. That should be true for the GEOS-IT input file if running at C180. Regarding NX and NY, I did C180 runs with NX=6, NY=30. I did not play with other combinations. Could it be that file resolution divided by NX must be even? |
@lizziel the "even" requirement should only be on the model grid. But lots of subtle things about how MAPL invents decompositions may come into play here. |
@tclune I double checked, and I confirmed the MAPL version and the link was correct. I did some extra tests, and it seems to be related to the total cores I apply. My previous failed tests were 1176 and 876 core. However, my recent test with 600 cores at C180 succeed. @lizziel 's suggestion might be correct. In both 1176 (14x14x6) and 876 (12x12x6) tests, 180 divided by Nx is non-even number, while the 600(10x10x6, 180/10=18) test turns out successful. |
Regarding the unequal grids, here is the line that results in false: Line 1099 in b76ba1a
I printed out the two grids and get this: filegrid:
output_grid:
This is MAPL 2.26. "no such property" is in |
I was able to run GCHP at C180 using C180 mass fluxes with the following combos: Like @yuanjianz, 864 cores with NX=12 failed for me. |
Interesting - will have to investigate further. Hopefully we can reproduce on our end. |
@lizziel so sounds like you ran this on Discover? Maybe? If so, where are these GEOS-IT files on Discover or your input file with the path? |
My runs with 180 cores are on discover at My test runs with other numbers of cores are at Harvard. |
@bena-nasa since GriddedIO.F90 failed to detect the two grids are identical, do you see any potential errors that could be introduced by setting regridding method to N to bypass the error? (N seems to be bilinear regridding) |
I think I see what the problem is. You have your application, it creates a grid, you said at 864 cores you used nx=12, ny=72, so each face is decomposed on a 12x12 layout. So the two grids are not the same grid and it tries to find a regridder which in this case is the flux regridder. And there's no flux regridding that can do what is essentially a redistribution of the data so it fails. If you set the regridding to any other regrid method goes through ESMF it will just work since ESMF can always regrid between these two grids which are really just the same grid, different layouts, so yes, you would get past this error and the regridder it spawned would be the identity for all practical purposes. The only solution to this is that we essentially do not allow the user to choose the decomposition. It ALWAYS chooses a grid decomposition based on the core count and we make sure we are always using the same algorithm and you would do this for the application grid rather than look at a file. If the user has freedom, there's no solution since they can always choose something other than what an algorithm would choose and when we make the grid from the file, it has to just pick a layout. |
Thanks @bena-nasa. For nearly all GCHP users the domain decomposition is chosen automatically anyway as a pre-run step I added to the run script. So making this built-in would not be a problem. However, it would be nice to have an override option for testing (e.g. to run at two different domain decomps and check bit-for-bit reproducibility) as well as for domain decomposition research. |
Tagging @sdeastham |
I am using MAPL 2.26.0 to run GCHP at C180 with GEOS-IT meteorology. Since what I am reading and running are the same resolution, I supposed there should not be a regridding issue. Surprisingly, I got the error message below:
I looked through the issues and found #2118 and #1124. From my understanding, the HFLUX essentially requires the resolution divisible by the Nx and Ny after decomposition.
In my case, however:
Related entries in my ExtData.rc are:
For a workaround, I replace
H
withN
and manually bypass the regridding. It works as well.The text was updated successfully, but these errors were encountered: