Add more flexibility for loading EIA/EPA data when building HIFLD grid #246
Labels
feature request
Request for a new feature. (Only lives in Backlog)
hifld
Related to ingestion of the HIFLD data
🚀
Describe the workflow you want to enable
Currently, we're loading EIA & EPA data for a specific year from CSVs on our blob storage, which were either downloaded as zip files (EPA AMPD) or created manually from xlsx files (EIA Form 860). I with that there were more user-accessible flexibility, in terms of being able to download data for a given year (or a given month, since EIA has monthly Form 860M releases inbetween the annual Form 860 release), and in terms of being able to obtain data from different sources (e.g. from a local copy of Catalyst Cooperative's PUDL database, or their web API).
Describe your proposed implementation
Additional functions could be added to
prereise.gather.griddata.hifld.data_access.load
which could read data from different sources, and additional parameters could be added to the highest-level functions withinprereise.gather.griddata.hifld.data_process
to specify which data sources to read from at the start of processing, and these same parameters could be added toprereise.gather.griddata.hifld.orchestration.create_csvs
and then passed through to thedata_process
functions.Additional context
Catalyst Cooperative currently has a subset of the data we need available via a Datasette interface (which can be read directly from pandas), but not the full dataset, which is available as a sqlite database.
The text was updated successfully, but these errors were encountered: