Skip to content

Commit

Permalink
Merge pull request #2 from mundialis/training_preparation
Browse files Browse the repository at this point in the history
Training preparation
  • Loading branch information
griembauer authored Jan 14, 2025
2 parents 392837d + 79c2da6 commit fbd041a
Show file tree
Hide file tree
Showing 9 changed files with 676 additions and 5 deletions.
8 changes: 4 additions & 4 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ exclude = .git
max-line-length = 87

per-file-ignores =
./m.neural_network.preparedata/m.neural_network.preparedata.py: F821
./m.neural_network.preparedata/m.neural_network.preparedata.py: E501
./m.neural_network.preparedata.worker_nullcells/m.neural_network.preparedata.worker_nullcells.py: F821
./m.neural_network.preparedata.worker_nullcells/m.neural_network.preparedata.worker_nullcells.py: E501
./m.neural_network.preparedata/m.neural_network.preparedata.py: E501,F821
./m.neural_network.preparedata.worker_nullcells/m.neural_network.preparedata.worker_nullcells.py: F821,E501
./m.neural_network.preparedata.worker_export/m.neural_network.preparedata.worker_export.py: E501
./m.neural_network.preparetraining/m.neural_network.preparetraining.py: E501,F821
./m.neural_network.preparetraining.worker/m.neural_network.preparetraining.worker.py: E501,F821
4 changes: 3 additions & 1 deletion m.neural_network.html
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ <h2>DESCRIPTION</h2>
<li><a href="m.neural_network.preparedata.html">m.neural_network.preparedata</a>: Prepares and exports tiles for the label process</li>
<li><a href="m.neural_network.preparedata.worker_export.html">m.neural_network.preparedata.worker_export</a>: Worker for parallel processing for exporting for <b>m.neural_network.preparedata</b></li>
<li><a href="m.neural_network.preparedata.worker_nullcells.html">m.neural_network.preparedata.worker_nullcells</a>: Worker to analyse the number of null cells in parallel for <b>m.neural_network.preparedata</b></li>
<li><a href="m.neural_network.preparetraining.html">m.neural_network.preparetraining</a>: Takes and reformats labeled tiles for the neural network training</li>
<li><a href="m.neural_network.preparetraining.html">m.neural_network.preparetraining</a>: Prepares imagery and labelled data for training and application of a neural network.</li>
<li><a href="m.neural_network.preparetraining.worker.html">m.neural_network.preparetraining</a>: Worker to rasterize labelled data in parallel for <b>m.neural_network.preparetraining</b><</li>
</ul>

<h2>REQUIREMENTS</h2>
Expand All @@ -39,6 +40,7 @@ <h2>REQUIREMENTS</h2>

<ul>
<li>grass-gis-helpers>=2.2.0</li>
<li>GDAL/OGR and Python bindings</li>
</ul>

<h2>AUTHORS</h2>
Expand Down
7 changes: 7 additions & 0 deletions m.neural_network.preparetraining.worker/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
MODULE_TOPDIR = ../..

PGM = m.neural_network.preparetraining.worker

include $(MODULE_TOPDIR)/include/Make/Script.make

default: script
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<h2>DESCRIPTION</h2>

<em>m.neural_network.preparetraining.worker</em> is used within <em>m.neural_network.preparetraining</em> to rasterize label data in parallel.
<p>
<h2>SEE ALSO</h2>

<em>
<a href="g.region.html">g.region</a>
<a href="r.mapcalc.html">r.mapcalc</a>,
<a href="v.to.rast.html">v.to.rast</a>,
</em>

<h2>AUTHORS</h2>
<p>Guido Riembauer, <a href="https://www.mundialis.de/">mundialis GmbH &amp; Co. KG</a><br>
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
#!/usr/bin/env python3
"""############################################################################
#
# MODULE: m.neural_network.preparetraining.worker
# AUTHOR(S): Guido Riembauer
# PURPOSE: Worker module for m.neural_network.preparetraining to check
# and rasterize label data
# COPYRIGHT: (C) 2024 by mundialis GmbH & Co. KG and the GRASS Development
# Team.
#
# This program is free software under the GNU General Public
# License (v3). Read the file COPYING that comes with GRASS
# for details.
#
##############################################################################
"""

# %Module
# % description: Worker module for m.neural_network.preparetraining to check and rasterize label data
# % keyword: raster
# % keyword: statistics
# %end

# %option G_OPT_F_INPUT
# % required: yes
# % multiple: no
# % label: Path to the label vector file
# % guisection: Input
# %end

# %option G_OPT_F_INPUT
# % key: img_path
# % required: yes
# % multiple: no
# % label: Path to the corresponding imagery raster file
# % guisection: Input
# %end

# %option
# % key: class_column
# % type: string
# % required: yes
# % multiple: no
# % answer: class_number
# % label: Column of the label vector that holds the class number
# % guisection: Parameters
# %end

# %option
# % key: class_values
# % type: integer
# % required: yes
# % multiple: yes
# % answer: 2
# % label: Expected and output values for the class/es of interest
# % guisection: Parameters
# %end

# %option
# % key: no_class_value
# % type: integer
# % required: yes
# % multiple: no
# % answer: 1
# % label: Expected and output value for the non class of interest areas
# % description: Can be understood as a "rest" class for a multiclass system and a "no-class" for a binary classification
# % guisection: Parameters
# %end

# %option G_OPT_F_OUTPUT
# % required: yes
# % multiple: no
# % label: Path to the output label raster file
# % guisection: Output
# %end

# %option
# % key: new_mapset
# % type: string
# % required: yes
# % multiple: no
# % label: Name of the new mapset to work in
# % guisection: Parameters
# %end

import atexit
import os
import shutil

import grass.script as grass
from grass_gis_helpers.mapset import switch_to_new_mapset
from osgeo import gdal

NEWGISRC = None
GISRC = None
ID = grass.tempname(8)
NEW_MAPSET = None


def cleanup():
"""Switch mapsets and deleting the new one."""
# switch back to original mapset
grass.utils.try_remove(NEWGISRC)
os.environ["GISRC"] = GISRC
# delete the new mapset (doppelt haelt besser)
gisenv = grass.gisenv()
gisdbase = gisenv["GISDBASE"]
location = gisenv["LOCATION_NAME"]
mapset_dir = os.path.join(gisdbase, location, NEW_MAPSET)
if os.path.isdir(mapset_dir):
shutil.rmtree(mapset_dir)


def main():
"""Run label rasterization."""
global NEWGISRC, GISRC, NEW_MAPSET
input = options["input"]
img_file = options["img_path"]
NEW_MAPSET = options["new_mapset"]
class_values = options["class_values"].split(",")
no_class_value = options["no_class_value"]
class_col = options["class_column"]
output = options["output"]

# switch to the new mapset
GISRC, NEWGISRC, old_mapset = switch_to_new_mapset(NEW_MAPSET)
# get extent from reference img file
info = gdal.Info(img_file, format="json")
south = info["cornerCoordinates"]["lowerLeft"][1]
west = info["cornerCoordinates"]["lowerLeft"][0]
north = info["cornerCoordinates"]["upperRight"][1]
east = info["cornerCoordinates"]["upperRight"][0]
cols, rows = info["size"]
# set the region
grass.run_command(
"g.region",
n=north,
s=south,
e=east,
w=west,
rows=rows,
cols=cols,
quiet=True,
)

# import the label dataset
labelvect = f"labelvect_{ID}"
labelrast = f"labelrast_{ID}"
grass.run_command("v.import", input=input, output=labelvect, quiet=True)

# check the values of the vector
dbselect = list(grass.parse_command("v.db.select", map=labelvect).keys())
colnames = dbselect[0].split("|")
rows = [item.split("|") for item in dbselect[1:]]
try:
idx = colnames.index(class_col)
except ValueError:
grass.fatal(_(f"File {input} has no column {class_col}"))
class_numbers = [item[idx] for item in rows]
class_num_set_ref = set([*class_values, no_class_value])
difference = set(class_numbers).difference(class_num_set_ref)
if len(difference) > 0:

grass.fatal(
_(
f"Label file {input} has features with unexpected values"
f" in column {class_col}: {difference}. Allowed values "
f"are [{','.join(class_values)}, {no_class_value}].",
),
)

tile_empty = False
if len(class_numbers) == 0 or set(class_numbers) == set([no_class_value]):
grass.warning(
_(
f"Label file {input} contains no features with the "
f"expected class values {class_values} in "
f"column {class_col}. It is assumed that the classes "
"do not occur in this tile.",
),
)
tile_empty = True

# rasterize
if tile_empty is True:
grass.run_command(
"r.mapcalc",
expression=f"{labelrast}={no_class_value}",
quiet=True,
)
else:
labelrast_tmp = f"{labelrast}_tmp"
grass.run_command(
"v.to.rast",
input=labelvect,
output=labelrast_tmp,
type="area",
use="attr",
attribute_column=class_col,
quiet=True,
)
# if there is any nodata left in the label, this will be assigned
# to the no-class class
exp = f"{labelrast}=if(isnull({labelrast_tmp}),{no_class_value},{labelrast_tmp})"
grass.run_command("r.mapcalc", expression=exp, quiet=True)

grass.run_command(
"r.out.gdal",
input=labelrast,
output=output,
type="Byte",
createopt="COMPRESS=LZW", # no tiles or overviews required for the small tiles (?)
flags="c",
quiet=True,
)


if __name__ == "__main__":
options, flags = grass.parser()
atexit.register(cleanup)
main()
7 changes: 7 additions & 0 deletions m.neural_network.preparetraining/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
MODULE_TOPDIR = ../..

PGM = m.neural_network.preparetraining

include $(MODULE_TOPDIR)/include/Make/Script.make

default: script
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
<h2>DESCRIPTION</h2>

<em>m.neural_network.preparetraining</em> prepares imagery and labelled data for training and application of a neural network.
<p>While <a href="m.neural_network.preparedata">m.neural_network.preparedata</a> initially provides a setup for labelling tiles of imagery,
<em>m.neural_network.preparetraining</em> rasterizes the vector labels and restructures the imagery data.

<h2>Notes</h2>
It is expected that all data lie in the directory structure and naming format as created by <a href="m.neural_network.preparedata">m.neural_network.preparedata</a>.
This data is provided to <em>m.neural_network.preparetraining</em> via the <em>input_traindir</em> and <em>input_applydir</em> parameters.
<em>m.neural_network.preparetraining</em> creates a new directory with the two directories <em>train</em> and <em>apply</em>. Each of these contains
the following directories/data:

<ul>
<li><em>train_images:</em>: contains tilewise multiband .vrt-files with all imagery bands and an ndsm band to be used for training. This directory is empty in the <em>apply</em> dir.</li>
<li><em>train_masks:</em>: contains tilewise rasterized .tif label files to be used for training. This directory is empty in the <em>apply</em> dir.</li>
<li><em>val_images:</em>: contains tilewise multiband .vrt-files with all imagery bands and an ndsm band to be used for validation. This directory holds data both in the <em>train</em> and <em>apply</em> dirs. In the <em>train</em> dir, this data is used for validation during training, while in the <em>apply</em> dir, this directory holds all imagery used for prediction.</li>
<li><em>val_masks:</em>: contains tilewise rasterized .tif label files to be used for training. This directory is empty in the <em>apply</em> dir.</li>
<li><em>singleband_vrts:</em>: contains singleband .vrts for each imagery band of each tile. They are stored here as a basis to create the tilewise multiband .vrts.</li>
<li><em>tile_XX_YY.vrt:</em> (only in the <em>train</em> dir): One multiband tile .vrt is stored here for the NN model to read in the number of bands.</li>
</ul>
<p>
In order to save diskspace, all imagery is stored as .vrts, so the original datasets (created by <a href="m.neural_network.preparedata">m.neural_network.preparedata</a>) should
not be moved (or <em>m.neural_network.preparetraining</em> should be run again afterwards).
</p>
<p>
The user can indicate what percentage of the training tiles are used for validation (during training) with the <em>val_percentage</em> parameter.
</p>
<p>
It is not possible to run <em>m.neural_network.preparetraining</em> repeatedly with the same <em>output</em> directory, as the training/validation split up happens during runtime.
Hence, <em>m.neural_network.preparetraining</em> expects that the <em>output</em> directory does not exist.
</p>
<p>
With the <em>class_values</em> and the <em>no_class_value</em> parameters, the user defines the allowed range of values in the <em>class_column</em> of the labelled data. In
case an unexpected value is found, an error is thrown which indicates the affected tile.
</p>
<p>
If a tile is not completely covered either by <em>class_values</em> or <em>no_class_value</em>, the not allocated areas will be filled with <em>no_class_value</em> in the rasterized version.
</p>

<h2>EXAMPLES</h2>

<div class="code"><pre>
m.neural_network.preparetraining input_traindir=nn_data_with_labels/train input_applydir=nn_data_with_labels/apply nprocs=6 class_column=class_number class_values=2 no_class_value=1 output=nn_data_structured
</pre></div>


<h2>SEE ALSO</h2>

<em>
<a href="https://grass.osgeo.org/grass-stable/manuals/v.import.html">v.import</a>,
<a href="https://grass.osgeo.org/grass-stable/manuals/g.region.html">g.region</a>
<a href="https://grass.osgeo.org/grass-stable/manuals/r.mapcalc.html">r.mapcalc</a>,
<a href="https://grass.osgeo.org/grass-stable/manuals/v.to.rast.html">v.to.rast</a>,
</em>

<h2>REQUIREMENTS</h2>
<ul>
<li>GDAL and OGR Python bindings</li>
<li><a href="https://pypi.org/project/grass-gis-helpers/">grass-gis-helpers</a> Python library >= 2.2.0</li>
</ul>

<h2>AUTHORS</h2>
Guido Riembauer, <a href="https://www.mundialis.de/">mundialis GmbH &amp; Co. KG</a><br>
Loading

0 comments on commit fbd041a

Please sign in to comment.