-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle empty HRRR data files via linear imputation #245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
) | ||
for j in range(wind_farm_ct) | ||
] | ||
for i, _ in tqdm(enumerate(dts)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the enumerate here? It seems we just need the length of the dts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enumerate()
vs. range(len())
, which is more pythonic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in tqdm(dts)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jon-hagg I think we will need the index rather than the element of dts
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tqdm(dts, total=len(dts))
would work, but not sure if that's better than enumerate
/range
. It's unclear to me if any is the most pythonic, but I think enumerate
is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also refactor how we build wind_speed_data
--as a dataframe instead of a numpy array--and then we could build wind_power_data
using apply
calls instead of list comprehensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #221 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha. Maybe I'll leave this alone for now then.
EDIT: Or maybe the pandas version can be made a little more transparent by instantiating the dataframe with index=
and columns=
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I tried to strike a good balance between compactness and readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks!
6817900
to
ab47e24
Compare
ab47e24
to
535b4d9
Compare
Pull Request doc
Purpose
When generating wind power profiles from HRRR data, gracefully handle any missing data via linear interpolation. Closes #244.
What the code is doing
The
impute
module is moved fromprereise.gather.winddata.rap.impute
toprereise.gather.winddata.impute
, and a new linear interpolation method is added which should perform well on small data gaps.Within
prereise.gather.winddata.hrrr.calculations
,calculate_pout
is refactored to first build an array of all wind speed magnitudes obtained from the NOAA grib files (filling in NA when files are empty), then impute missing values as necessary, and finally convert wind speeds to wind powers.Testing
Unit tests still pass, and this has been tested end-to-end when generating 2020 wind power profiles for the HIFLD grid (see #227 (comment)). When downloading the 2020 data, there were four files which downloaded empty, even after several attempts, suggesting that the data are missing from the NOAA server.
Usage Example/Visuals
Time estimate
15-30 minutes.