Re-run canopy cover #572

max-zilla · 2019-05-09T17:57:36Z

Discussions with @ZongyangLi and myself yielded several updates to canopy cover:

add thresholding so images below quality threshold are not processed (avoid bad values e.g. May 7)
on fieldmosaic step, set "NoData" value separate from the 0 that denotes soil in soil masked images
rerun RGB images for 05/07 and RGB_mask
regenerate fieldmosaic using the NoData fix to generate a % of the visible area for partial plot scans
resubmit canopycover to bety with fixed values

max-zilla · 2019-05-09T18:16:24Z

check if geotiff supports NoData or null, if not can we encode -99 as standard NoData value

max-zilla · 2019-05-09T18:24:09Z

alpha band for opacity, or a synthetic band 0/1 representing soil, categorized band with multiple settings (NoData, Soil, Mask)

Can we assign a NoData value to VRT before translating to geotiff? it's possible that the source file not having NoData is resulting in (0,0,0)

dlebauer · 2019-05-09T18:50:25Z

Add documentation, tests, if an OGC standard for encoding missing data

See also https://aggateway.atlassian.net/wiki/spaces/SG/pages/258670684/AgGateway+Post-Image+Collection+Specification+PICS for ideas

max-zilla · 2019-05-09T19:06:34Z

-a_nodata NoData works for gdal_translate.

max-zilla · 2019-05-16T16:51:01Z

After lots of different experimentation, gdal_translate seems to conflate NoData and 0,0,0 pixel values during the VRT -> TIF conversion process regardless of VRT NoData settings given to gdalbuildvrt command, or parameters given to gdal_translate.

I set that aside in order to get fundamental process working, and am using -add_alpha flag in gdalbuildvrt to make the fullfield mask an RGBA image instead of RGB, with alpha=255 where the photos exist in the image and alpha=0 where no data exists (between rows) leaving the 0s for soil removed from photos intact. Just modified and tested cc algorithm on a small fieldmosaic of 5 images:

...got a cc value of 98.84% for the whole image. Deploying test on actual fullfield date next, then can rerun all cc data over weekend if it looks correct.

max-zilla · 2019-05-23T15:30:24Z

Currently still running but spot checks are looking good:

range 52 (top) and 51 (bottom), columns 9-14

          9 10 11 12 13 14
    ------------------
ROW 52 - 29 36 91 89 75 29
ROW 51 - 87 87 90 84 86 87

These percentages are much closer to what one would expect. I've applied a NoData maximum of 75% (larger than it was before) to push partialplots scans through the pipe, so things like Column 16 will be omitted in those cases:

Expect it to finish Friday or Saturday.

max-zilla · 2019-05-23T16:01:41Z

Some more QA tidbits...

on 2018-07-01, 3 of the 5 scans run were full field. The same scan with 'shade' in the name was run twice, and a third scan with 'sun' in the name.

shade  average CC, all plots - 86%
shade2 average CC, all plots - 84%
sun    average CC, all plots - 97%

The sunlit scans are around ~10% higher consistently:

The two shade scans were fairly consistent on average, but a small handful of plots (19/766) had differences above 10% for the shade scans. This is likely attributable to the sensitivity in our rgb_mask algorithm:

Shade2 Mask - 57.8%

Shade1 Mask - 81% , this was also a 2-pass partialplot scan instead of 1-pass (more coverage)

Sun Mask - 85%, nice and bright, more pixels retained here.

I would argue that this doesn't merit further delays for more reprocessing, but it'll be important for data consumers to understand this kind of phenomena when we have multiple differing CC values per-day.

Maybe simplest suggestion is just use the maximum observation for a given day, I doubt over-estimation will be a common problem except in rare cases where e.g. a reflectance test panel or something is on the dirt field and reads as > 0 canopy cover.

dlebauer · 2019-05-23T16:11:14Z

We shouldn't use a 'max' per day as a workaround for an algorithm that doesn't function as expected.

The best way to fix the problem is to fix the algorithm. But that may take a while to fix.

Otherwise, if the data are known to be in error, e.g. if the algorithm can't handle sunlit scans, then we shouldn't include that data in the database.

In the end, If we have three measurements from a day then having a single that under-estimates by 10% is a small issue.

max-zilla · 2019-05-23T16:15:11Z

That sounds good. To be clear, there are 2 shady scans and 1 sunny - the sunny seems to correctly report the higher value, and the 2 shady scans seem to be under-estimating. Adjusting the RGB mask thresholds could address this case, but it could have other repercussions. Not sure if I'd go so far as to call it an error.

dlebauer · 2019-05-23T16:47:06Z

sorry I got that backward. If it isn't as far as an error, I think adding this caveat to the documentation (README) under known limitations would be okay.

dlebauer · 2019-05-23T18:33:12Z

@ZongyangLi and @abby621 can these exceptions be added as test cases to the extractor?

max-zilla · 2019-05-30T15:58:24Z

I started uploading the CSVs to bety and noticed a small number of files from May were being omitted from the field mosaics so I paused the upload. Closer examination revealed the omitted mask images had a different TIF header than the majority:

Band 1 Block=2472x1 Type=Float32, ColorInterp=Gray
Band 2 Block=2472x1 Type=Float32, ColorInterp=Undefined
Band 3 Block=2472x1 Type=Float32, ColorInterp=Undefined

The data type was Float32 and the RGB color bands aren't properly indicated (the data itself is fine). But these headers meant GDAL rejected them from the VRT creation because they differed from the other expected header data:

Band 1 Block=2472x1 Type=Byte, ColorInterp=Red
Band 2 Block=2472x1 Type=Byte, ColorInterp=Green
Band 3 Block=2472x1 Type=Byte, ColorInterp=Blue

I'm not sure why the headers are different - perhaps the small number of May files were generated with an older version of the extractor and didn't get re-run properly. The good news is that the fix to data type and RGB header is a simple GDAL command:

gdal_translate -ot Byte -colorinterp red,green,blue source_file out_file
rm source_file
mv out_file source_file

This forces the output file to have properly registered RGB channels and data type.

I'm running a small script to correct these, but it looks like the issue doesn't occur later so I will proceed to upload the remaining CSVs in the meantime. I don't anticipate this will impact the results being sent to bety, we just might get some more plots from early May scans once done.

Should be able to close this issue then, and upload a few of the test images I've been using for future checks.

max-zilla · 2019-06-13T14:21:26Z

created #590 to follow on QA process for this.

max-zilla added this to the TERRA Sprint - April 2019 milestone May 9, 2019

max-zilla self-assigned this May 9, 2019

max-zilla modified the milestones: TERRA Sprint - April 2019, TERRA Sprint - May 2019 May 16, 2019

max-zilla closed this as completed Jun 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-run canopy cover #572

Re-run canopy cover #572

max-zilla commented May 9, 2019

max-zilla commented May 9, 2019

max-zilla commented May 9, 2019

dlebauer commented May 9, 2019

max-zilla commented May 9, 2019

max-zilla commented May 16, 2019

max-zilla commented May 23, 2019

max-zilla commented May 23, 2019

dlebauer commented May 23, 2019

max-zilla commented May 23, 2019

dlebauer commented May 23, 2019

dlebauer commented May 23, 2019

max-zilla commented May 30, 2019

max-zilla commented Jun 13, 2019

Re-run canopy cover #572

Re-run canopy cover #572

Comments

max-zilla commented May 9, 2019

max-zilla commented May 9, 2019

max-zilla commented May 9, 2019

dlebauer commented May 9, 2019

max-zilla commented May 9, 2019

max-zilla commented May 16, 2019

max-zilla commented May 23, 2019

max-zilla commented May 23, 2019

dlebauer commented May 23, 2019

max-zilla commented May 23, 2019

dlebauer commented May 23, 2019

dlebauer commented May 23, 2019

max-zilla commented May 30, 2019

max-zilla commented Jun 13, 2019