Skip to content

SimonKraatz-USDA/EzPAI

Repository files navigation

EzPAI: Python scripts for extracting Plant Area Index from zenith-looking Digital Cover Photography

Background and need:

EzPAI was developed to meet our need to monitor tree canopy changes with Digital Cover Photography (DCP). DCPs were installed 40+ USDA soil moisture monitoring stations, that were installed to help with calibration and validation efforts of NASA’s Soil Moisture Active Passive (SMAP) over forests [1]. This particular SMAP Validation Experiment extended from 2019 through 2022, but only few stations had a continuous 4 year record (most had 3 years). Originally the experiment plan was to be conducted and completed during 2019, however a combination of COVID and SMAP satellite dropouts caused delays in the ability to conduct the field campaign until 2022.

Naming convention: the .csv and hist*.jpg output results are named using the input folder name. There was a different file organization between SMAPVEX19-22 MA and MB sites.

For MA, stations are named 420 etc, and all the input data are organized into a single folder. The .csv timestamp should be identical to exif/image overlay ones (perhaps +-1 hour at times).

For MB, it is more complicated. This is because (1) different convention was used (2) many MB stations reset their time stamp to default settings in 2022. For (1) Folders are named by site and by when a field person manually downloaded all the data to a memory card. So a station could have up to 6 folders (each of them named by the intermittent collect YY-mm-dd). For example one of the example given, the field person did a download on 2020-6-29 of data going back to the beginning of the experiment (~May 2019).For (2) At ~2 stations, there could also be an inconsistency between the collection date (i.e., the year) and the year given in the exif / jpg timestamps, because presumably the station was set to 2021 not 2022. In those cases we used the year of the folder name as truth as compatered to that from the station setting (station setting = the info in the exif and image overlay). Hence, there can be an inconsitency between the exif/overlay information vs the date given in the .csv - because the .csv presents the result afte the timestamp adjustment. Therefore the date in .csv should be considered as correct.

Rationale on making this tool:

DCP were used instead of digital hemispherical photography (DHP), due to the much greater cost of DHP. Our DCPs were wildlife cameras that only cost about $100 each [2]. Thus, we were able to install them at nearly every soil moisture station. Our DCPs were set to take output JPG imagery at high resolution (2304 x 1728 pixels) every hour. Thus, data volume could exceed 10k photos per camera. Although [3] already reported on how to extract PAI from for DCP back in 2012, there were no free tools to extract PAI/LAI from DCP until about 2022. In 2022, a tool for doing this kind processing was released recently for the R programming language [4], but it did not appear in our literature search ahead of developing EzPAI. Hence, we developed our own tool which like [4] is also based on [3]. EzPAI only uses the OpenCV library in addition to other relatively basic python libraries such as numpy, imageio, pandas, matplotlib and scikit-image. One goal was to use as few specialized libraries as possible. This was to allow others to check/modify the scripts to their needs more easily, and also to keep the overhead low for someone to transition EzPAI to other platforms such as a C# Windows application. Currently we have no plans to do this, due to time constraints. Although Python is slow, we were still able to process all our camera data within reasonable effort/time using EzPAI.

Running this tool

  1. first install my environment: conda env create -f paiproc.yml
  2. python 0_hourscreen.py -i MB520_2020-6-29_MillbrookSchool-a_testinput
  3. python 1_blurscreen.py -i MB520_2020-6-29_MillbrookSchool-a_testinput
  4. python 2_getPAI.py -i MB520_2020-6-29_MillbrookSchool-a_testinput , where -i is the folder containing the jpg images. The tool should also work with any other image format compatible with the imageio library (not tested).

The result of 4 is a csv file starting with '2_process' and some overview imagery prefixed with 'hist_', summarizing the important processing steps. Beyond that, users will most likely need to postprocess data for QA/QC.

Explanation:

0_hourscreen.py

Example: python 0_hourscreen.py -i MB520_2020-6-29_MillbrookSchool-a_testinput

This script to filter camera imagery by time of day, and export valid data to a file starting with '0_hourscreen_'. The time stamp is set by the -c option, by default it is set to obtain from exif (-c 2) but create time and modified time can also be used. This tool accomplishes two goals: (1) it allows users to skip images that don't fall inside the hours of the day they're interested in; (2) it writes the image date and time into a csv alongside the image name and allows it to be adjusted from there. However, in practice we did not need this kind of screening, as the blur screening script and postprocess filtering described later usually take care of them. However, it is still needed, because the .csv output is a required input to 1_blurscreen.py. For the 21 images in the example, it took 1.2 seconds (i7 Dell Precision 7560 Laptop).

csv content: the first column has the timestamp and the second column has the file name.

1_blurscreen.py

Example: python 1_blurscreen.py -i MB520_2020-6-29_MillbrookSchool-a_testinput

This script is a second filter, and screens the images listed in the '0_hourscreen_' file for blurry images. The output is written to a file starting with '1_blurscreen_'. Information and description of the default thresholds are provided in the script. The blur detection approach is based on the variance of Laplacian, i.e., if the variance (or max value) is lower than a threshold, the image is deemed blurry. For the 19 remaining images in 0_hourscreen csv it took 7.6 seconds (i7 Dell Precision 7560 Laptop).

csv content: same as 0_hourscreen, plus the variance (b1) and maximum value (b2) of the laplace filter result. The threshold is only used for screening out blurry imagery. Only non-blurry images are listed.

2_getPAI.py

Example: python 2_getPAI.py -i MB520_2020-6-29_MillbrookSchool-a_testinput

This script calculates all the items needed to obtain PAI, following the approach of Ryu et al. (2012). The output is given in a file starting with '2_getPAI_' . Information on the default settings are available in the script. These settings were used over all our images, to get the initial results for PAI and other plant structural parameters. For the 17 remaining images in 1_blurscreeen csv it took 13.1 seconds (i7 Dell Precision 7560 Laptop).

csv content:

  • lmxb, lmxc and rmxb, rmxc are the locations and counts for the canopy (prefix l) and sky (prefix r) peaks.
  • rb_l and rb_r are the locations of the bins used in the canopy/sky discrimination step, based on the Rosin method.
  • 'sky' gives the blue sky index mean value of the sky pixels. Here, sky pixels were determined using a manner than in the canopy sky partitioning (skythr = 0.75 vs tmthrc, tmthri values of 0.25 of 0.5).
  • minpixarea gives size of the smallest patches considered as large gaps as % of the image
  • GF, CC, CP and PAI are the canopy structural parameters Gap Fraction, Crown Cover, Crown Porosity, Plant Area Index.

hist_ image content:

image info

Disclaimer: Most data is not this clean, and histograms don't look this nice! Even then EzPAI can still frequently obtain reasonable values.

The upper left figure shows the image histogram. It also draws the line from the origin (last bin) to the rmxb (lmxb). These are used in the Rosin method to determine the bin location corresponding to the max curvature (i.e. longest perpendicular distance from the drawn lines). The Rosin bins rb_l (rb_r) are marked with the red (black) x, respectively. The figure also shows the canopy vs sky delineating bin, located between the Rosin bins, with a vertical dashed line and an annotation at the top of the figure. Here the sky is cloudy, and the delineating bin is at 0.25 the distance between the two Rosin bins.

The upper right figure shows the result of partitioning the image into canopy and sky pixels. This paritioning is based on the rb_l and rb_r values, and can also be modified by the tmthri, tmthrc values explained below.

The lower right figure shows the input image for reference, after the camera metadata pixels had been removed from the bottom.

The lower left figure shows the large gaps that were identified in the image. It is important to keep in mind that this functionality is based on a simple contour finding approach, and the contours will often be larger in size than one would expect from visual inspection of the upper and lower right figures. This will bias results for canopy structural parameters, but results can still be expected to have reasonable magnitudes and consistency over time.

0_run_ctrl.py

Example: python 0_run_ctrl.py -i . -p MB

A convenience script for running the first three scripts in sequence using default values for each script, for each folder having a prefix (i.e., starting with the string) given by -p. Also, outputs duration of each step.

  • There could be some value in skipping the blur detection, as it adds almost 50% processing time. This may not be needed, given that blurry imagery may also be filtered out in other pre- or post- processing steps, getPAI can be changed to read in imagery from the hourscreen step output.
  • There can also be value in skipping hour screening, however given that it is fast and provides users with a csv of timestamps it not worth skipping. But one may want to modify the code to remove any screening and consider all available imagery, to avoid omitting useful data when the timestamps are wrong. Timestamps can be updated on the .csv as needed.
  • It is recommended to update Timestamps ahead of the getPAI step, because timestamps are written out on the small overview images that summarize the PAI extraction process ('hist_' jpg), which can be useful for understanding or tweaking settings.

Please also note that many cameras in Millbrook experienced a reset of the time stamp. We were able to recover them, due to having the date time of the field visit as reference to determine the offset applicable to all the images affected. Different stations had different offsets. We used an alternative workflow for this processing (script not provided here): Essentially there was no screening for month or hour and 0_hourscreen was only used to export results into the csv. Then we looked up what the appropriate time offset was for year/day/hour based on the site visit date compared to the image time stamp (either create, modify but usually the one in exif) and applied the offset. Then we fed that .csv file into the above workflow, starting with 1_blurscreen and so forth.

Important parameters:

cloudythr: the threshold for the blue sky index value [3] that deciding if the image is mainly diffuse light (cloud, <thr) or not (clear, > thr). This is somewhat qualitative, and depends on the camera used and judgement. It should not take much time to obtain a reasonable estimate from trial and error. This parameter is quite important, because the threshold can impacts how image pixels are categorized into canopy and sky if tmthri and tmthrc have different values as our default code does. Cloudy vs not-cloudy is important for postprocessing, because ideally one would want to calibrate all imagery to diffuse light condition.

k, the extinction coefficient in Beer's Law [3]: This is somewhat flexible and different, valid rationale can be made to set this value. Probably it would usually be in the 0.4-1.0 range. In our case, we did not have leaf angle distribution measurement, so we just assumed the G function that Ryu used for their 2012 paper over broadleaf forest, 'erectophile'. Our camera field of view was estimated at 40 degrees. We calculated k=G(40)/cos(40) and obtained k=0.65. Other manuscripts have used k = 0.5 for conifers or 0.6 for broadleaves or derived their k value using somewhat different logic. Probably not worth it to overthink how to exactly set k as long as it is within a reasonable range.

skipbotpix: DCP usually have some part of the image dedicated to metadata. In our imagery, this was put at the bottom of the image, so this setting removed the bottom pixels. Feel free to change the code to remove from image however needed, or do the cropping in a prior step.

fcval: the threshold for findcontours to identify large gaps between crowns [3]. For our image processing, we qualitatively determined that 10000 yielded reasonable results in our dense forest. We only added a way to determine what the smallest gap sizes were as percentage of the image, and it was about 0.3% for this setting. We also checked a few other values for our imagery, with 50k and 100k corresponding to about 1.4% and 3%, respectively. This is for our image size of were 2304 x (1728-skipbotpix), for imagery with larger (fewer) pixels than ours fcval needs to be increased (decreased) to keep the size as percentage of the image the same.

Suggested postprocessing

First, one may want to re-screen data for QA/QC. We found the following to be helpful in screening poorly processed data:

  • remove images having rb_r < 32 ; Rosin bin for maximum curvature for 'canopy' should be greater than some bin, from our data picked 32. For canopy, rb_r < 32 can correspond to a local min, followed by another smaller vegetation peak. EzPAI limitation is that it use a very basic peak finding, which is workable for but can be improved. Otherwise sky can be overestimated.
  • remove images having rb_l < 128 ; Rosin bin for maximum curvature for 'sky' shouldn't be less than some bin, from our data picked 128. Otherwise sky can be overestimated.
  • remove images having delta <= 18 ; if rb_r and rb_l are too close, can't accurately discriminate canopy from sky. Otherwise sky can be overestimated.
  • remove images having CP < 0.023 ; very small CP values can give unreliable estimates. This is because PAI values vary multiplicatively with CP (i.e., the 0.023 is a factor of -3.77 vs only -2.3 for 0.1) PAI becomes greatly overestimated. Small CP values can stem from both DCP image quality or EzPAI approach limitations. Otherwise PAI change between hourly obs can be unrealiable, jump.

Second, after data had been screened one should consider calibrating all data to 'diffuse' light conditions:

  • for each station do a linear regression between cloudy (blue sky index < cloudythr) and clear images. The pearson correlation coefficient R is usually > 0.8.
  • use the regression relation to adjust the clear results for CC and CF to that of cloudy
  • recalculate CP and PAI

Some background on the need for calibrating to 'diffuse' light condition: We noted a bias in CC/GF/PAI values depending on whether the sky is cloudy or not. This bias is expected a priori and is attributable to illumination differences [3]. Reference [3] addressed this by changing thresholds for canopy/sky discrimination depending on whether the sky was cloudy or not. Our getPAI script also has this functionality, but we used different threshold values that were more appropriate for our setup than those provided in [3] (see our tmthri, tmthrc values in the getPAI script). But even then, we still observed differences ('jumps') for same-day CC/CP/PAI values that depended on whether the sky was cloudy or not. We found that the above data screenings and calibration greatly improved consistency of the data, without eliminating all that much data. Obtaining 'calibrated PAI' from the csv outputs was a much better option, given the large amount of time and uncertainty involved in experimenting with different EzPAI settings and re-running the scripts each time.

Results for example site:

LICOR measurment over 200 m x 200 m region near station on 7/19/2022 gave PAI = 4.16. Time series for 2022 from EzPAI default settings + "Suggested postprocessing" gives 4.07 for the average (see image below). Please note that the image also gives two standard deviation values: one of PAI for the summer (Jun - Sep) and another based on the daily variance (the plotted result). They both serve an important purpose: the former helps identify whether summer PAI was highly variable or not. The latter is useful for establishing uncertainty of the PAI detection, because it is only based on same-day variations rather than gradual impacts (such as insect infestation) and recovery effects. The two may also quickly compared to assess whether there could have been any unexpected events such as insect damage, if for example $\frac{\sigma_S}{\sigma_{h-d}} &gt; 2$.

image info

Preliminary results:

Full analysis/write-up regarding the PAI extraction using this tool is still in progress [6]. But we already compared our calibrated EzPAI results (i.e., the average of the 2019-2022 average summer PAI values at each station) to where LICOR-2200 in situ data were collected over 200 m x 200 m areas near the cameras during spring and summer 2022 (N=20) (see [7]). Results of this comparison show R = 0.89, RMSD = 0.93, MD = -0.54, and ubRMSD = 0.75, indicating good correspondence between EzPAI results and in situ. Other preliminary results also showed that postprocessed PAI results are temporally stable having within- and between-year variations of PAI (i.e., sdPAI/PAI) of < 5% at most stations. We're currently working on estimating LAI from our PAI values, and then plan also a more detailed comparison between our dense time LAI series to those obtained from remote sensing in a second future manuscript.

Future work:

We will probably not maintain this code, but may make a windows application based on it to make DCP processing even more accessible.

References:

[1] Colliander, A., Cosh, M. H., Kelly, V. R., Kraatz, S., Bourgeau-Chavez, L., Siqueira, P., ... & Yueh, S. H. (2020). SMAP detects soil moisture under temperate forest canopies. Geophysical research letters, 47(19), e2020GL089697.

[2] Moultrie WCT-00125 TimelapseCam. We are not 100% sure about the field of view on WCT-00125, it was not in the manual. The manual for WCT-00126 gives FOV as 40 degrees, which sounds about right for WCT-00125 also.

[3] Ryu, Y., Verfaillie, J., Macfarlane, C., Kobayashi, H., Sonnentag, O., Vargas, R., ... & Baldocchi, D. D. (2012). Continuous observation of tree leaf area index at ecosystem scale using upward-pointing digital cameras. Remote Sensing of Environment, 126, 116-125.

[4] Chianucci, F., Ferrara, C., & Puletti, N. (2022). coveR: an R package for processing digital cover photography images to retrieve forest canopy attributes. Trees, 36(6), 1933-1942.

[5] Macfarlane, C., Coote, M., White, D. A., & Adams, M. A. (2000). Photographic exposure affects indirect estimation of leaf area in plantations of Eucalyptus globulus Labill. Agricultural and Forest Meteorology, 100(2-3), 155-168.

[6] Kraatz, S., Cosh, M., Kelly, V., Bourgeau-Chavez, L., Cook, C., Walker, V., Siqueira, P., Colliander, A. (2024). SMAPVEX19-22 Digital Cover Photography (DCP): a new dataset and processing tool, and new insights on sky bias and DCP capability. In Review.

[7] Cook, C. L. , Bourgeau-Chavez, L., Miller, M. E., Vander Bilt, D., Kraatz, S., Cosh, M.H., Colliander, A. 2024. Comparison of In Situ and Remotely Sensed Leaf Area Index of Northeastern American Deciduous, Mixed, and Coniferous Forests for SMAPVEX19-22. In Review.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages