Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test InMAP for multiday run with 1-km CMAQ input #9

Open
4 of 5 tasks
pmartien opened this issue Aug 29, 2022 · 50 comments
Open
4 of 5 tasks

Test InMAP for multiday run with 1-km CMAQ input #9

pmartien opened this issue Aug 29, 2022 · 50 comments
Assignees

Comments

@pmartien
Copy link

pmartien commented Aug 29, 2022

Steps to Close

  • @bkoo-git at BAAQMD team to supply needed modeling files
  • @yuzhou-wang and @bujinb UW team to run multi-day InMAP test with 1-km CMAQ inputs
  • @yuzhou-wang and @bujinb at UW team to report back via this GitHub issue any issues? Follow up needed?
  • @bkoo-git at BAAQMD team to process the WRF and CMAQ data for InMAP for the whole 2018 using the tool/script provided by the UW team and send the processed file to the UW team
  • @yuzhou-wang and @bujinb at UW team to test the InMAP data processed by the BAAQMD team and report back if there's any issue
@yuzhou-wang
Copy link

We have finished the test of the multi-day InMAP with 1-km CMAQ, and have sent the scripts.

@pmartien
Copy link
Author

Thanks, @bkoo-git, @yuzhou-wang. Any issues to report? What are next steps?

@bkoo-git
Copy link
Collaborator

I've finished testing the scripts prepared by @yuzhou-wang on our cluster machine. I believe the next step will be building InMAP using the preprocessed WRF/CMAQ data. @yuzhou-wang, any guidance?

@bkoo-git
Copy link
Collaborator

bkoo-git commented Sep 9, 2022

Runtime for preprocessing the 1-km WRF and CMAQ data for InMAP for the whole year of 2018:

  • The first step was to extract required variables from daily WRF, CMAQ and MCIP files and combine them into a single daily file. This took ~6 hours processing a monthly period on our cluster. To facilitate the process, all 12 months were processed simultaneously running each month on a separate cluster node;
  • The second step was to convert these daily files into InMAP meteorology and baseline chemistry input data. This step took 3.6 hours for the whole 2018 period on our cluster.

@bkoo-git bkoo-git reopened this Sep 9, 2022
@pmartien
Copy link
Author

pmartien commented Sep 14, 2022

Thanks @bkoo-git for the status update and for the questions about next steps! @yuzhou-wang, @bujinb: can you provide a status update? Do you see any issues with what @bkoo-git provided?
I'm trying to encourage more discussion and updates via GitHub so we can facilitate quick turn around on simple blockers to our collective progress. If this doesn't work I will call for more frequent project Zoom meetings, which I think will be less efficient & productive. :-)

@pmartien
Copy link
Author

pmartien commented Oct 3, 2022

Hi InMAP-SFAB team,
Any progress to report? Updates for the group?
Thanks!

@yuzhou-wang
Copy link

I'm still working on the testing of the 1km data, and will provide feedbacks by the end of this week.

@pmartien
Copy link
Author

pmartien commented Oct 3, 2022

Great. Thank you @yuzhou-wang ! I look forward to your feedback!

@bujinb
Copy link

bujinb commented Oct 5, 2022

I have run inmap and isrm on the google cloud. Currently working on running multiple inmap runs in parallel on compute engine, but facing issues

@bkoo-git
Copy link
Collaborator

bkoo-git commented Oct 5, 2022

@bujinb Could you please clarify about the issues? Is there anything wrong with the InMAP data file I processed or you are having issues with running InMAP on the cloud?

@bujinb
Copy link

bujinb commented Oct 6, 2022

@bkoo-git I am primarily working on running inmap on cloud and having issues with running multiple inmap runs in parallel in the cloud. @yuzhou-wang is working with the cmaq outputs

@pmartien
Copy link
Author

pmartien commented Oct 6, 2022

Thanks, @yuzhou-wang, @bujinb! So it sounds like the processed files we handed off to you are okay? But that setting up multiple runs on the Google cloud processors is an issue. Is it an issue specific to InMAP or just running any process on multiple processors is an issue? Thanks for posting updates on GitHub!

@bujinb
Copy link

bujinb commented Oct 6, 2022

@pmartien google engineer thinks it is an inmap issue, but Chris has used kubernetes to run inmap in paralel before so it might not be inmap issue. We are trying to have regular meetings with Chris for help. We'll try to update on github as much as we can. Thanks!

@pmartien
Copy link
Author

pmartien commented Oct 6, 2022

@bujinb: Got it! Let us know if there's any way we can be helpful.

@yuzhou-wang
Copy link

I tested the new InMAP several time, but ran into a same problem: it generated infinite concentrations using the emissions in San Francisco. I'm still trying to find out the reason. I will post updates when I find our the reason or solved the problem.

@pmartien
Copy link
Author

Thanks, @yuzhou-wang ! Let us know if there is any indication that the files we provided are causing/contributing to this problem. Are you only seeing the problem when submitting multiple InMAP runs? Or does it also occur with a single run?
Thanks again.

@yuzhou-wang
Copy link

@pmartien I tried both single and multiple InMAP runs, and used both one-day and whole-year InMAP data, all the tests generate infinite numbers. I looked into the InMAP data and find that there should be some problems with the calculation of dry deposition. Futher tracking to the wrfcmaq data, there are missing values in three wrfcmaq variables (rain water mixing ratio, cloud water mixing ratio, cloud fraction). I'm not sure whether the problem is caused by the wrf data itself, or by my calculation (getting wrfcmaq data from wrf and cmaq). I'll look in to the wrf data and try to find our the reason of the problem in the following days.

@yuzhou-wang
Copy link

I have figured out the problem: there is mismatch of the wrf layers and wrfcmaq layers. The wrfcmaq verticle layers should start from 0 (ground level), but it started from -1 due to a small error in the python preprocess code. I revised the python code and generated a new one-day inmap, and it runs correctly. So I guess we need to redo the whole year inmap preprocess using the revised code. I'll make more tests to make sure that the revised code generate correct results. I'll send @bkoo-git the updated inmap preprocess code and a detailed guide to run the new inmap this week.

@bkoo-git
Copy link
Collaborator

@yuzhou-wang Thanks for fixing the error! I'll re-process the wrfcmaq data once I receive the updated code.

@pmartien
Copy link
Author

Thanks @yuzhou-wang, @bujinb for isolating this problem! And for keeping us updated on github; super helpful!

@bujinb
Copy link

bujinb commented Oct 13, 2022

We are still working on running multiple inmap in parallel on google cloud. Chris's Kubernetes cluster can run 1250 inmap at the same time, we are hoping we could do the same or better. I heard from Yuzhou that your cluster is fast; I was wondering if we can run inmap in parallel on your cluster @bkoo-git Can we schedule a quick meeting?
Thanks
Bujin

@bkoo-git
Copy link
Collaborator

Status update:

  1. @yuzhou-wang fixed the error and sent me the updated preprocessor code. I will re-do the InMAP preprocessing with the updated code and send her the new InMAP data for verification next week.
  2. @bujinb and I arranged a meeting on Monday (Oct. 17, 2PM) to discuss about running InMAP in parallel on the District cluster. Jeff Matsuoka will join the meeting. Let us know if anyone else wants to join.

@pmartien
Copy link
Author

Great work, all. Thanks for the updates @bkoo-git!

@yuzhou-wang
Copy link

@bkoo-git @bujinb I'd also like to join the meeting about the running InMAP in parallel. Can you send me the link? Thanks!

@bujinb
Copy link

bujinb commented Oct 17, 2022

@bkoo-git ,@yuzhou-wang, Jeff and I had our meeting on running inmap on your local cluster. Seems like running inmap on the cloud will be the faster way to generate the new ISRM as we can potentially run a thousand inmap run in parallel once we learn how to utilize kubernetes. We have sent instructions of running inmap on a local computer (in the google drive). @bkoo-git please update us when you try running it on your cluster.

@bkoo-git
Copy link
Collaborator

Status update:

  1. CMAQ and WRF data for 2018 were re-processed with the updated preprocessor code, and the new InMAP data file produced was sent to @yuzhou-wang for verification.
  2. Jeff and I will work on running InMAP on the District cluster with the instructions prepared by @yuzhou-wang.

@pmartien
Copy link
Author

Thanks for the status updates, @bujinb and @bkoo-git. This sounds like good progress.

@yuzhou-wang
Copy link

I've tested the new InMAP using the year-2016 egu emissions in San Francisco. The concentration estimations at 1km resolution are higher (2 to 3 times higher) than the estimation from the national InMAP, but looks still reasonable. I'll make more test using other emission files.

@bujinb
Copy link

bujinb commented Nov 7, 2022

@bkoo-git In case you need help running Inmap on your local cluster @yuzhou-wang and I are available.
Update on running inmap on cloud: We are still trying to troubleshoot our attempts at running inmap on kubernetes engine.

@bkoo-git
Copy link
Collaborator

bkoo-git commented Nov 7, 2022

Thanks, @bujinb! I did test the 2005 NEI test case (from the InMAP release page) on the District cluster, and the results look reasonable. However, I believe a better test would be to reproduce the results of the Bay Area test case @yuzhou-wang did using the InMAP data file created from the 2018 CMAQ/WRF data. I've asked @yuzhou-wang for the input files she used for her test, and received the files today. I will try to replicate her test case on our cluster this week and report back to you guys~

@pmartien
Copy link
Author

pmartien commented Nov 7, 2022

Thanks again all for the status update. Much appreciated!

@bkoo-git
Copy link
Collaborator

A quick update:
I successfully ran @yuzhou-wang's SF test case on the District cluster and verified that my results and hers are identical. She said the test run took 1.5 hours on her lab computer. It took 44 minutes on the District machine (soma). As I believe the InMAP code is not threaded, I think the runtime difference simply reflects the clock speed difference between the processors used. Also, note that the test case doesn't include the full set of emissions used in our 2018 base case CMAQ simulation.

@pmartien
Copy link
Author

Great news, @bkoo-git. What should our next steps be? Should we meet to discuss?

@bkoo-git
Copy link
Collaborator

I think now might be a good time for another meeting to get everyone on the same page and discuss the next step.
I wonder if the InMAP results (if all emissions are included) naturally match our annual CMAQ results since we built the InMAP baseline chemistry input data using the full 2018 CMAQ outputs. I'd like hear from the InMAP developers on this.
If we still need to evaluate how well InMAP replicates the annual CMAQ results, we'd need to develop InMAP emission inputs that are consistent with the 2018 CMAQ emissions inputs, re-run InMAP, and compare the InMAP results with our CMAQ results.

@yuzhou-wang
Copy link

I think this comparison it's important. Although the new InMAP was built on the CMAQ, the annual predictions can still be slightly different since InMAP is linear.

We can discuss the emission inputs needed by InMAP. @bkoo-git Do you have the emission inputs that are align to the CMAQ grids (1km or 4km)?

@bkoo-git
Copy link
Collaborator

We have discussed about the emissions input formats in Issue #2 and determined that the SMOKE-formatted files (such as ORL or FF10) would be the easiest way if we want to retain the source info (e.g., SCC). @yuzhou-wang, do you have a sample test case that uses SMOKE-formatted emissions input files?

@yuzhou-wang
Copy link

I've made a comparison between the national InMAP and the new InMAP, using all the NEI 2016 all point emissions in the Bay Area. I've attached the comparison slides. I compared the results at both 1km and 10km spatial resolutions. It seems that the mean value of Total PM2.5 predictions from the new InMAP for the whole domain is around 2-3 times of the national InMAP predictions. The biggest difference is in SO2 pollutant, for which the new InMAP has much higher concentration predictions than the national InMAP. I've also looked at the total SO2 emissions in California, and find that the SO2 emissions dropped an order of magnitude from 2005 to 2018 (160 ton/year to 20 ton/year). The great changes of SO2 emissions may cause the sensitivity changes of SO2 to the PSO4.

The good thing is that from the 1km resolution comparisons, the predictions from the new InMAP seems more precise. It also seems to capture the emission sources well.

We plan to more comparisons of the new InMAP to CMAQ, and new InMAP to monitoring concentrations to see how well the new InMAP perform.
comparison_inmap.pptx

@yuzhou-wang
Copy link

@bkoo-git Could you send me a sample of SMOKE-formatted emissions input files? I'd like to make some test runs using that format. I don't have a sample SMOKE-formatted emissions input handy.

@bkoo-git
Copy link
Collaborator

Thanks @yuzhou-wang for sharing your comparison results. @stephenreid65 can provide you with sample SMOKE-formatted emissions input files.
I have a question: Can you use different emissions input formats in a single run? For example, can you list a SMOKE-formatted emissions input for a source category and a shapefile for another category in the same TOML?

@yuzhou-wang
Copy link

@bkoo-git I'm not sure about it. I'll make some test runs including both shapefile and SMOKE-formatted emissions. I guess the default InMAP configuration only take shapefile. We may need to make some preprocess to convert the SMOKE-formatted emissions to shapefile.

@bkoo-git
Copy link
Collaborator

I was asking because not all emissions are generated by SMOKE. Sea spray emissions are internally generated by CMAQ at runtime: they can be made available via diagnostic outputs in a netCDF format, which could be converted to a shapefile, but formatting them into a SMOKE inventory file wouldn't be desirable.

@bkoo-git
Copy link
Collaborator

@yuzhou-wang If we have to convert the SMOKE-formatted emissions to shapefiles, wouldn't we lose source info in the process? Then, what's the purpose of using SMOKE-formatted emissions? I notice that your test case emission inputs don't retain source info like SCC. What's the reason why we want to keep source info like SCC in the emissions input?

@pmartien
Copy link
Author

pmartien commented Nov 16, 2022

Thanks for sharing the comparison slide deck, @yuzhou-wang. That's very interesting. @bkoo-git, are we seeing high PSO4 levels in CMAQ runs?

@bkoo-git
Copy link
Collaborator

Annual average PSO4 predicted by CMAQ can be high near high SO2-emitting sources, but the max was ~10 μg/m3. Peak PSO4 predicted by InMAP appears to be much higher than what CMAQ predicted even though the InMAP run includes point source emissions only.

@stephenreid65
Copy link
Collaborator

@yuzhou-wang, I can provide SMOKE-ready emissions inputs, but they would basically be CSV files with annual emissions by county or facility. I think you would need something gridded, so would our spatial surrogates also be required? We don't have emissions in shapefile format right now.

@yuzhou-wang
Copy link

@bkoo-git @stephenreid65 I guess that since the comparison is mostly to make sure that the new InMAP provide the reasonable prediction. So we may not need the emissions with detailed source info. I think we can use a combined emission file if it's available. Or if you have CMAQ prediction from a single source, I can also run the new InMAP using the single emission file. Do you have any suggestions on that?

@bkoo-git
Copy link
Collaborator

We have discovered that VOC mappings in the wrfcmaq2inmap preprocessor wasn't updated for the SAPRC07 chemical mechanism which was used in our Bay Area CMAQ modeling, thus many VOC species were dropped from the process. So, we need to re-do the preprocessing. Since we are running a new 2018 base case CMAQ simulation at the moment, I propose preparing the InMAP input data using the new simulation outputs. The new simulation will also generate additional diagnostic outputs for sea spray emissions, which can be used later for evaluating InMAP. Meanwhile, I will work with @yuzhou-wang to fix the VOC mappings in the preprocessor. Let me know if any comments/suggestions/questions.

@bujinb
Copy link

bujinb commented Nov 29, 2022

We have successfully built the kubernetes needed for running inmap on the cloud in parallel to make a new ISRM, but still in the process of testing the command. Meanwhile, I have run the new inmap on several locations and made example test results. Please provide suggestions.
example test results inmap.pptx

@pmartien
Copy link
Author

pmartien commented Dec 3, 2022

Hi @bujinb, @yuzhou-wang, and all. Thanks for the update and for sharing these test runs. Following up on earlier comments on this issue, I think it may be a good time to schedule a meeting to discuss next steps. I'll follow up with an email with some suggested dates.

@bujinb
Copy link

bujinb commented Jan 11, 2023

We have successfully ran and made small (16 grid cells) isrm for testing purposes on google cloud. Now we are testing bigger runs with more grid cells to get the idea of how long and how much money will the process take.
Before we have the meeting in 2 weeks, do you have any suggestions for the example test results I posted above? We will shift to 2020 census data soon.
Thanks Bujin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants