-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing values and MAPL #252
Comments
@mpagowski Let me ping @bena-nasa and @atrayano to answer this. I'm sure we handle it "correctly" but I'm not sure of the specifics. |
Are these files that are read via ExtData? Please let me know what version of MAPL you are using. The short answer is that even though MAPL can respect it when doing say spatial interpolations, if your application code that ultimately uses the data doesn't respect the missing value, it doesn't matter. MAPL has an internal "MAPL undefined" constant MAPL_UNDEF, that we use at various places in the MAPL code to protect against operations. It is set to a value of 1.0e15 I don't know what version of MAPL you are using but in newer versions of the generation 2 ExtData, we check the missing value defined in the file, if it does not match the "MAPL undefined" value, when we read the array from the NetCDF file we set the points that have the file missing value to the "MAPL undefined" one. Then any point in our code base that respects this will be aware of it. For example when regridding the file grid of the original file to the application grid we respect this. Of course if the code or component (outside of MAPL) that ultimately uses these does not protect against doing operations at points that are "MAPL undefined", we cannot control that, the user can do what they want with arrays in their code In general our emissions files do not have missing values as how do you handle that in all the components that may use this? It would be a nightmare. Rather if there are no emissions, they are simply 0 since gocart is a huge code base and does not check for any sort of missing value when doing array level operations. Not to mention protecting every array operation for a missing value would probably destroy performance. So the answer is that depending on the version of ExtData you are using, we may respect the the file defined missing value and set points that are "missing" to our own internal missing value. This is respected at points in the MAPL code base, but when you get to the application code, all bets are off since we have no control over how the code developer chose to use arrays. So the ultimate answer is if your input files have missing values, even if MAPL respects them, GOCART does. At that point we have 2 options.
To me of all 3, options seems by far the easiest, it would be a very trivial python script to read in and re-write the file replacing anything that has a missing value with 0. Heck, maybe even NCO or some other utility could do this, replace anything with a missing value with 0. |
Thanks, that answers my question. It is about NOAA's ExtData for wildfires
which come at 0.1deg x 0.1deg resolution
and contain very numerous missing values in places where both wildfires
should not exist or they exist but retrievals are obscured.I can see that
the safest way to deal with this is to convert all those to 0s though that
may not be strictly correct
as it will decrease emissions where the wildfires are burning. Any comments
from NOAA participants/others would be welcome.
…On Mon, Aug 21, 2023 at 8:07 AM Ben Auer ***@***.***> wrote:
Are these files that are read via ExtData? Please let me know what version
of MAPL you are using.
MAPL has an internal "MAPL undefined" constant MAPL_UNDEF, that we use at
various places in the code to protect against operations. It is set to a
value of 1.0e15
I don't know what version of MAPL you are using but in newer versions of
the generation 2 ExtData, we check the missing value defined in the file,
if it does not match the "MAPL undefined" value, read the array from the
NetCDF file we set the points that have the file missing value to the "MAPL
undefined" one. Then any point in our code base that respects this will be
aware of it. For example when regridding the lat-lon grid of the original
file to the application grid we respect this.
If the target point has inputs from any points that are the "MAPL
undefined" value we do not include them in the application to compute the
target value. If all were "MAPL undefined" then the target cell is "MAPL
undefined".
Of course if the code or component (outside of MAPL) that ultimately uses
these does not protect against doing operations at points that are "MAPL
undefined", we cannot control that, the user can do what they want with
arrays in their code.
In general our emissions files do not have missing values as how do you
handle that in all the components that may use this. Rather if there are no
emissions, they are simply 0 since gocart is a huge code base and does not
check for any sort of missing value when doing array level operations.
So the answer is that depending on the version of ExtData you are using,
we may respect the the file defined missing value and set points that are
"missing" to our own internal missing value. This is respected at points in
the MAPL code base, but when you get to the application code, all bets are
off since we have no control over how the code developer chose to use
arrays.
So the ultimate answer is if your input files have missing values, even if
MAPL respects them, GOCART does.
At that point we have 2 options.
1. Reprocess those files so rather than having missing values, they
are just set to say 0 at those points.
2. If the above is not an option, but you are using a version of
ExtData that respects the file missing value then after the fact in the
application code, the arrays would have to be intercepted so that any
MAPL_UNDEFs can be set to 0 or protected against
3. If you are using an older version of MAPL where ExtData did not
respect the file supplied missing value it was simply not making any
accommodation for that we would need to come up with a custom solution
To me of all 3, options seems by far the easiest, it would be a very
trivial python script to read in and re-write the file replacing anything
that has a missing value with 0.
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFOU6S5WVNJ4AK4O53IW7FTXWN2TRANCNFSM6AAAAAA3V73Z64>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It doesn't matter what is most "correct", what matters is where is this data used, is it going to ingested and used by an application (I assume GOCART), that does operations of floating point arrays filled from this data. At that point you simply cannot have "missing values" unless EVERY array level operations that may use this cata somehow knows to protect/not use points that are missing. That's not realistic or how GOCART (assuming that is the use case) is implemented. |
Yes, it is GOCART, and currently failing so that the data needs to be
preprocessed (i.e. set to 0s) until a solution to distinguish zero
emissions (like ocean) from obscured retrievals is found.
…On Mon, Aug 21, 2023 at 9:04 AM Ben Auer ***@***.***> wrote:
It doesn't matter what is most "correct", what matters is where is this
data used, is it going to ingested and used by an application (I assume
GOCART), that does operations of floating point arrays filled from this
data. At that point you simply cannot have "missing values" unless EVERY
array level operations somehow knows to protect/not use points that are
missing, that's just not realistic.
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFOU6S6JZPPIXEYOYBOFJ6LXWOBGVANCNFSM6AAAAAA3V73Z64>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, sorry, but by far and away the simplest (and really only) solution would be to take the existing files and make new versions that have the "undef" points replaced with 0 and just use those. Should be easy enough to do that. GOCART simply gets arrays that represent emissions, what would that even mean to have "undefined" emissions, either a cell has something or it doesn't in which case it's 0 i.e. no emissions seems perfectly logical. We are doing floating point math on arrays, so it needs valid array, full arrays. I'm not sure how any code could use files that have missing values unless it had a special accommodation for that at the Fortran or C array level which would be bad for performance and vectorization. |
Yes, that is true but we don't control production of the files
…On Mon, Aug 21, 2023 at 11:55 AM Ben Auer ***@***.***> wrote:
Yes, sorry, but by far and away the simplest solution would be to take the
existing files and make new versions that have the "undef" points replaced
with 0 and just use those. Should be easy enough to do that.
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFOU6S37GF4RKJIHMOSWOMLXWOVJJANCNFSM6AAAAAA3V73Z64>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@mpagowski That is partially true. Yes NESDIS controls the creation of the native files but we can have a preprocessor in the workflow to "fix" the files. |
Yes, we are currently fixing the files for our NRT runs. But the problem
with converting all "missing" to 0s remains
…On Mon, Aug 21, 2023 at 12:38 PM Barry Baker ***@***.***> wrote:
@mpagowski <https://github.com/mpagowski> That is partially true. Yes
NESDIS controls the creation of the native files but we can have a
preprocessor in the workflow to "fix" the files.
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFOU6S5RTD63M7VGVBCEXXLXWO2JTANCNFSM6AAAAAA3V73Z64>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ok, sounds like you can fix this in your workflow. In that, there exists a spot in the workflow where you can take the file(s) as produced by NESDIS, make new file(s) from the originals that have missing value replace by 0 using a some sort of utility, then those are the files that are fed to GOCART. What is the "problem with converting all "missing" to 0s"? Are you asking how one can do this? |
No, we are set, we are already converting all missing values to 0s and let
NESDIS deal with missing values in places where retrievals are obscured.
…On Tue, Aug 22, 2023 at 7:05 AM Ben Auer ***@***.***> wrote:
Ok, sounds like you can fix this in your workflow. In that you can take
the file(s) as produced by NESDIS, make new file(s) from the originals that
have missing value replace by 0, then those are the files that are fed to
GOCART. What is the "problem with converting all "missing" to 0s"? Are you
asking how one can do this?
—
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFOU6S7OA3L5B3D4JIPCTEDXWS4ARANCNFSM6AAAAAA3V73Z64>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
NOAA's GBBEPx wildfire files (at 0.1deg x 0.1deg resolution) have missing values. How MAPL treats such values in interpolations?
The text was updated successfully, but these errors were encountered: