Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue using REGRID_METHOD_CONSERVE_HFLUX reading c180 GEOS-IT data #2118

Open
lizziel opened this issue May 1, 2023 · 20 comments
Open

Issue using REGRID_METHOD_CONSERVE_HFLUX reading c180 GEOS-IT data #2118

lizziel opened this issue May 1, 2023 · 20 comments
Assignees
Labels
🪲 Bug Something isn't working ⌛ Long Term Long term issues

Comments

@lizziel
Copy link
Contributor

lizziel commented May 1, 2023

I am using MAPL 2.26.0 to run GCHP at C24 with GEOS-IT meteorology. I am encountering an error in MAPL that occurs on a C180 input file during MAPL_ExtDataPrefetch. MAPL is searching for a prototype to make a new regridder for the file and is not able to find one.

pe=00009 FAIL at line=00147    NewRegridderManager.F90                  <no such property>
pe=00009 FAIL at line=00092    NewRegridderManager.F90                  <status=1>
pe=00009 FAIL at line=01011    GriddedIO.F90                            <status=1>
pe=00009 FAIL at line=04705    ExtDataGridCompMod.F90                   <status=1>
pe=00009 FAIL at line=01490    ExtDataGridCompMod.F90                   <status=1>
pe=00009 FAIL at line=01807    MAPL_Generic.F90                         <status=1>
pe=00009 FAIL at line=01337    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=01300    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=01260    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=00837    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=00977    MAPL_CapGridComp.F90                     <status=1>
pe=00009 FAIL at line=00301    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00258    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00192    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00169    MAPL_Cap.F90                             <status=1>
pe=00009 FAIL at line=00031    GCHPctm.F90                              <status=1>

The file is a GEOS-IT C180 file:
/home/dao_ops/d5294_geosit_jan18/run/.../archive/diag/Y2019/M07/d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.2019-07-01T0030Z.nc4

The regrid method is 'H', which corresponds to REGRID_METHOD_CONSERVE_HFLUX.

I am using ExtData and not ExtData2G.

Any ideas on what the problem is? I am digging through the traceback now but welcome any thoughts.

@mathomp4 mathomp4 added the 🪲 Bug Something isn't working label May 1, 2023
@mathomp4
Copy link
Member

mathomp4 commented May 1, 2023

I've assigned @bena-nasa because he is the expert here!

@bena-nasa
Copy link
Collaborator

@lizziel I'm not sure what's going on. I'll try to reproduce with my standalone tester for ExtData/History.

@lizziel
Copy link
Contributor Author

lizziel commented May 2, 2023

Thanks @bena-nasa. My ExtData.rc entry is this:

MFXC;MFYC Pa_m+2_s-1    N H F0;003000 none  0.6666666 MFXC;MFYC  ./MetDir/Y%y4/M%m2/d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.%y4-%m2-%d2T%h2%n2Z.nc4 2017-01-01T00:30:00P01:00
CXC;CYC   1             N H F0;003000 none  none      CX;CY      ./MetDir/Y%y4/M%m2/d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.%y4-%m2-%d2T%h2%n2Z.nc4 2017-01-01T00:30:00P01:00

@bena-nasa
Copy link
Collaborator

bena-nasa commented May 2, 2023

@lizziel
I just pulled and built v2.26.0 of MAPL, and ingested those same GESO-IT files via this ExtData.rc file

Ext_AllowExtrap: .true.
Prefetch: .true.

PrimaryExports%%
MFXC;MFYC NA N H F0;003000 none none MFXC;MFYC d5294_geosit_jan18.ctm_tavg_1hr_glo_C180x180x6_v72.%y4-%m2-%d2T%h2%n2Z.nc4 2017-01-01T00:30:00P01:00
%%

If I run my driver at something that is exactly divisible by 180, like c90 it works, but if I run at c24, c48, etc it fails. I think the problem is that c24, so 24 is not divisible by 180 and it can't generate the flux regridder for that case.

@tclune wrote all this so he might have something more to say. Probably could use a more descriptive error.

@lizziel
Copy link
Contributor Author

lizziel commented May 2, 2023

Aha! Now that you mention it I recall @LiamBindle mentioning this limitation when he ran using GEOS-FP mass fluxes, or maybe I am confusing that with stretched grid limitations. Regardless, thanks!

@tclune, this isn't a huge problem for us, but is good to know. Maybe it should be added as a comment somewhere?

@lizziel
Copy link
Contributor Author

lizziel commented May 2, 2023

And maybe the error handling could be expanded to give a message about why it is failing.

@tclune
Copy link
Collaborator

tclune commented May 2, 2023

  1. The HorizontalFluxRegridder is known to be incorrect at this time. There is a branch awaiting testing by Seb (been there for a while now.)

  2. We can add a somewhat better error message, but it will still be rather vague, as there is no way to blame any particular regridder for failing to work. All that we can do is state more clearly with:

_FAIL('No regridder prototypes support the requested spec')

(Should go just before the _RETURN(...) which will then be redundant and could be removed.

  1. The Flux regridder should have satisfied the use case if I understand correctly. The only requirement is that the output grid be coarser by an integer factor:
    supports = all(mod(counts_in(1:2), counts_out(1:2)) == 0) .or. all(mod(counts_out, counts_in) == 0)

@lizziel
Copy link
Contributor Author

lizziel commented May 2, 2023

@sdeastham, is the branch that needs testing on your radar? I wonder if I should stick to winds until the flux regridder issues are sorted.

I'll update the GCHP docs to warn users about the limitation and what error message (or traceback) to look out for.

It failed for my use cases because ran at c24 and c48, both not coarser by an integer factor of c180. Those resolutions are fine with GEOS-FP mass fluxes which are c720, but GEOS-IT is more limiting at c180.

@sdeastham
Copy link
Contributor

@lizziel - it's on my radar. I wouldn't wait, to be honest; the tests I've been performing have so far been with the "buggier" flux regridder, but it's still a vast improvement over using winds (errors in absolute surface pressure change at each time step fall by a factor of five at C30).

@tclune
Copy link
Collaborator

tclune commented May 2, 2023

@lizziel The branch in question does not change the divisibility requirement. Seb and I were unable to come up with a generalization, and you're probably better off using an ordinary regridding method for non-divisible cases.

@lizziel
Copy link
Contributor Author

lizziel commented May 2, 2023

Okay, sounds good. Strangely I am still getting the same error in a c90 run. I briefly am switching to winds just so I can diagnose and fix the other ExtData data issues (missing files on discover, etc) and then will swing back to getting mass flux regridding working.

@lizziel
Copy link
Contributor Author

lizziel commented May 15, 2023

Regridding mass fluxes now works with the following two adjustments: (1) switching from C24 to C90, and (2) switching from 96 processors to 216 processors.

It would be great if you could incorporate the grid resolution and processor constraints into the error handling somehow, even if just an expanded message that points people to comments in the code detailing what the constraints are. That message could be triggered only if the regrid type is 9 (which is HFLUX regridding).

@tclune

@stale
Copy link

stale bot commented Jul 14, 2023

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the ❄️ Stale This issue has been marked stale label Jul 14, 2023
@mathomp4
Copy link
Member

I'll long term this until @tclune can look at the last message from @lizziel

@mathomp4 mathomp4 added the ⌛ Long Term Long term issues label Jul 14, 2023
@stale stale bot removed the ❄️ Stale This issue has been marked stale label Jul 14, 2023
@lizziel
Copy link
Contributor Author

lizziel commented Jul 17, 2023

I think this issue can be closed by #2056 once it is merged. @tclune mentioned in the PR that he added logic to do a better job giving a helpful message for this issue. However, I don't see the update for it there so I am not sure if it is pushed yet.

@tclune
Copy link
Collaborator

tclune commented Jul 17, 2023

@tclune mentioned in the PR that he added logic to do a better job giving a helpful message for this issue. However, I don't see the update for it there so I am not sure if it is pushed yet.

I remember saying that. But don't remember doing it. If I said it in the past tense, then presumably I did ...

@tclune
Copy link
Collaborator

tclune commented Jul 17, 2023

This and the previous branch have this:

print*,__FILE__,__LINE__,'I cannot create this regridder. types are <',&
& grid_type_in,',',grid_type_out,'>'

It would be better as an _ASSERT but appears to capture the essence of what I was saying. So the question becomes, what error message were you seeing @lizziel

@tclune
Copy link
Collaborator

tclune commented Jul 17, 2023

I wonder if I put the fix in the wrong branch. "2nd try" branch has this:

supports = all(mod(counts_in(1:2), counts_out(1:2)) == 0) .or. all(mod(counts_out, counts_in) == 0)
_ASSERT(supports, "HFlux regridder requires local domains to be properly nested.")

While 3rd try just returns at that point. I'll copy the line over.

@lizziel
Copy link
Contributor Author

lizziel commented Jul 18, 2023

Perfect, that's what I was looking for!

@mathomp4
Copy link
Member

@lizziel I'm hoping to test #2056 tomorrow. It should be (trivally) zero-diff since I don't even know how to trigger this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 Bug Something isn't working ⌛ Long Term Long term issues
Projects
None yet
Development

No branches or pull requests

5 participants