Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-run canopy cover #572

Closed
max-zilla opened this issue May 9, 2019 · 13 comments
Closed

Re-run canopy cover #572

max-zilla opened this issue May 9, 2019 · 13 comments
Assignees

Comments

@max-zilla
Copy link
Contributor

Discussions with @ZongyangLi and myself yielded several updates to canopy cover:

  • add thresholding so images below quality threshold are not processed (avoid bad values e.g. May 7)
  • on fieldmosaic step, set "NoData" value separate from the 0 that denotes soil in soil masked images
  • rerun RGB images for 05/07 and RGB_mask
  • regenerate fieldmosaic using the NoData fix to generate a % of the visible area for partial plot scans
  • resubmit canopycover to bety with fixed values
@max-zilla max-zilla added this to the TERRA Sprint - April 2019 milestone May 9, 2019
@max-zilla max-zilla self-assigned this May 9, 2019
@max-zilla
Copy link
Contributor Author

check if geotiff supports NoData or null, if not can we encode -99 as standard NoData value

@max-zilla
Copy link
Contributor Author

alpha band for opacity, or a synthetic band 0/1 representing soil, categorized band with multiple settings (NoData, Soil, Mask)

Can we assign a NoData value to VRT before translating to geotiff? it's possible that the source file not having NoData is resulting in (0,0,0)

@dlebauer
Copy link
Member

dlebauer commented May 9, 2019

Add documentation, tests, if an OGC standard for encoding missing data

See also https://aggateway.atlassian.net/wiki/spaces/SG/pages/258670684/AgGateway+Post-Image+Collection+Specification+PICS for ideas

@max-zilla
Copy link
Contributor Author

-a_nodata NoData works for gdal_translate.

@max-zilla
Copy link
Contributor Author

After lots of different experimentation, gdal_translate seems to conflate NoData and 0,0,0 pixel values during the VRT -> TIF conversion process regardless of VRT NoData settings given to gdalbuildvrt command, or parameters given to gdal_translate.

I set that aside in order to get fundamental process working, and am using -add_alpha flag in gdalbuildvrt to make the fullfield mask an RGBA image instead of RGB, with alpha=255 where the photos exist in the image and alpha=0 where no data exists (between rows) leaving the 0s for soil removed from photos intact. Just modified and tested cc algorithm on a small fieldmosaic of 5 images:
Screen Shot 2019-05-16 at 11 48 28 AM
...got a cc value of 98.84% for the whole image. Deploying test on actual fullfield date next, then can rerun all cc data over weekend if it looks correct.

@max-zilla
Copy link
Contributor Author

Currently still running but spot checks are looking good:
image
range 52 (top) and 51 (bottom), columns 9-14

          9 10 11 12 13 14
    ------------------
ROW 52 - 29 36 91 89 75 29
ROW 51 - 87 87 90 84 86 87

These percentages are much closer to what one would expect. I've applied a NoData maximum of 75% (larger than it was before) to push partialplots scans through the pipe, so things like Column 16 will be omitted in those cases:
image

Expect it to finish Friday or Saturday.

@max-zilla
Copy link
Contributor Author

Some more QA tidbits...

  • on 2018-07-01, 3 of the 5 scans run were full field. The same scan with 'shade' in the name was run twice, and a third scan with 'sun' in the name.
shade  average CC, all plots - 86%
shade2 average CC, all plots - 84%
sun    average CC, all plots - 97%

The sunlit scans are around ~10% higher consistently:
Screen Shot 2019-05-23 at 10 42 04 AM
The two shade scans were fairly consistent on average, but a small handful of plots (19/766) had differences above 10% for the shade scans. This is likely attributable to the sensitivity in our rgb_mask algorithm:

Shade2 Mask - 57.8%
Screen Shot 2019-05-23 at 10 46 18 AM

Shade1 Mask - 81% , this was also a 2-pass partialplot scan instead of 1-pass (more coverage)
Screen Shot 2019-05-23 at 11 00 08 AM

Sun Mask - 85%, nice and bright, more pixels retained here.
Screen Shot 2019-05-23 at 10 46 37 AM

I would argue that this doesn't merit further delays for more reprocessing, but it'll be important for data consumers to understand this kind of phenomena when we have multiple differing CC values per-day.

Maybe simplest suggestion is just use the maximum observation for a given day, I doubt over-estimation will be a common problem except in rare cases where e.g. a reflectance test panel or something is on the dirt field and reads as > 0 canopy cover.

@dlebauer
Copy link
Member

We shouldn't use a 'max' per day as a workaround for an algorithm that doesn't function as expected.

The best way to fix the problem is to fix the algorithm. But that may take a while to fix.

Otherwise, if the data are known to be in error, e.g. if the algorithm can't handle sunlit scans, then we shouldn't include that data in the database.

In the end, If we have three measurements from a day then having a single that under-estimates by 10% is a small issue.

@max-zilla
Copy link
Contributor Author

That sounds good. To be clear, there are 2 shady scans and 1 sunny - the sunny seems to correctly report the higher value, and the 2 shady scans seem to be under-estimating. Adjusting the RGB mask thresholds could address this case, but it could have other repercussions. Not sure if I'd go so far as to call it an error.

@dlebauer
Copy link
Member

sorry I got that backward. If it isn't as far as an error, I think adding this caveat to the documentation (README) under known limitations would be okay.

@dlebauer
Copy link
Member

@ZongyangLi and @abby621 can these exceptions be added as test cases to the extractor?

@max-zilla
Copy link
Contributor Author

I started uploading the CSVs to bety and noticed a small number of files from May were being omitted from the field mosaics so I paused the upload. Closer examination revealed the omitted mask images had a different TIF header than the majority:

Band 1 Block=2472x1 Type=Float32, ColorInterp=Gray
Band 2 Block=2472x1 Type=Float32, ColorInterp=Undefined
Band 3 Block=2472x1 Type=Float32, ColorInterp=Undefined

The data type was Float32 and the RGB color bands aren't properly indicated (the data itself is fine). But these headers meant GDAL rejected them from the VRT creation because they differed from the other expected header data:

Band 1 Block=2472x1 Type=Byte, ColorInterp=Red
Band 2 Block=2472x1 Type=Byte, ColorInterp=Green
Band 3 Block=2472x1 Type=Byte, ColorInterp=Blue

I'm not sure why the headers are different - perhaps the small number of May files were generated with an older version of the extractor and didn't get re-run properly. The good news is that the fix to data type and RGB header is a simple GDAL command:

gdal_translate -ot Byte -colorinterp red,green,blue source_file out_file
rm source_file
mv out_file source_file

This forces the output file to have properly registered RGB channels and data type.

I'm running a small script to correct these, but it looks like the issue doesn't occur later so I will proceed to upload the remaining CSVs in the meantime. I don't anticipate this will impact the results being sent to bety, we just might get some more plots from early May scans once done.

Should be able to close this issue then, and upload a few of the test images I've been using for future checks.

@max-zilla
Copy link
Contributor Author

created #590 to follow on QA process for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants