Merge pull request #382 from hwelch-fle/hwelch-fle-count_multipart_fe…

…atures_2 First Contribution: Re-Write of the count_multipart_features script
Esri · Feb 27, 2024 · 055f577 · 055f577
2 parents f0db027 + b5a232e
commit 055f577
Show file tree

Hide file tree

Showing 2 changed files with 233 additions and 0 deletions.
diff --git a/python/arcpy-python/count_multipart_features_2/README.md b/python/arcpy-python/count_multipart_features_2/README.md
@@ -0,0 +1,84 @@
+Count Multipart Features 2
+=========================
+
+## Instructions
+
+1. Set the `feature_class` variable in `main()` to your featureclass
+2. Set the `count_field` variable in `main()` to the desired name of your count field (default is `PartCount`)
+3. Set the `overwrite` variable in `main()` to `False` if you don't want to overwrite an existing count
+4. The input feature class will have a new field added that will state the number of parts per feature.
+
+## Use Case
+
+This script could be used to identify features with many parts, which could be affecting performance. It could also be used to determine if any features in your feature class are multipart.
+
+## Updates to original
+
+The FeatureClass operations are now handled by passing an `arcpy.da.Describe` dictionary to a `Feature` object
+
+There are some additional properties added to this object that simplify access to describe attributes such as field names, workspace path, shapeType, and shape/oid field names
+
+The cursor objects are now handled by generator functions in the `Feature` class (`get_rows(<fields>, ?<query>)` and `update_rows(<fields>, ?<query>)`)
+
+### Feature
+Class for pre-processing a feature class before passing it off to a script
+
+#### Properties
+1. field_names: a list of field names
+2. workspace_path: the path to the feature workspace as returned by the workspace object in the Describe
+3. shape_type: alternative name for shapeType
+4. id_field: currently returns `'OID@'` for use with data access cursors, but can be modified to return the OIDFieldName attribute if needed
+5. shape_field: currently return `'SHAPE@'` for use with data access cursors, but can be modified to return shapeFieldName attribute if needed
+
+#### Feature.get_rows
+This method uses the cursor context manager to return a generator object with each row as a dictionary formatted as `{field_name: field_value}`
+If a query is set, then the cursor will pass that query to the cursor
+
+#### Feature.update_rows
+This method uses the cursor context manager to return a generator object with each row as a tuple containting the cursor and row dictionary formatted as `(cursor, {field_name: field_value})`
+If a query is set, then the cursor will pass that query to the cursor
+
+### count_multipart
+The main function of the script, this takes a featureclass path as the only positional argument
+#### kwargs
+1. field_name: An optional parameter for setting the field name to output the count to
+2. overwrite: An optional flag that will prevent overwriting existing fields with the provided field_name
+3. report_only: An optional flag that will skip all updates to the featureclass and just print out the number of multipart features found
+
+#### Logic
+
+**Main check:**
+```python 
+multipart_counts = \
+    {
+        row[features.id_field]: row[features.shape_field].partCount  # Get the number of parts for each multipart
+        for row in features.get_rows([features.id_field, features.shape_field])
+        if row[features.shape_field] and row[features.shape_field].isMultipart  # Only get the rows that are multipart
+    }
+```
+This block is the primary work done by the function. It uses a dictionary comprehension to write the partcount of multipart features to an update dictionary
+
+using an update dictionary allows for minimal work to be done within the UpdateCursor itself and reduces the likelyhood of the program crashing or failing while a cursor is active
+Even if it did fail in the cursor, the context manager should gracefully handle the error. This pre-processing step also massively speeds up the update because only features that are multipart are updated and a query that filters the rows can be passed directly to the cursor instead of checking every row.
+
+This step also allows for a length check to be run before any more expensive operations and if there are no multipart features, the Cursor is never created
+
+**Update Block:**
+```python
+with arcpy.da.Editor(features.workspace_path):
+    upd_keys = [str(k) for k in multipart_counts.keys()]
+    # Use the OIDFieldName to build the SQL query, OBJECTID and OID@ dont work in queries
+    update_query = f"{features.OIDFieldName} IN ({','.join(upd_keys)})"  # Only update the rows that are in the dictionary
+    # Use _update_rows to get a row dictionary so we can update the row using field names
+    for cursor, row in features.update_rows([features.id_field, field_name], query=update_query):
+        row[field_name] = multipart_counts[row[features.id_field]]  # Get the part count from the dictionary
+        cursor.updateRow(list(row.values()))  # Convert the dictionary to a list and update the row
+```
+Initialize an Editor object using the builtin context manager
+
+Since we pre-calculated the updates, we can build a SQL query that only pulls rows that need updates written to them
+NOTE: The SQL query can't use the `'OID@'` value that we used for everything else, so the we need to pull the OIDFieldName attribute
+
+Using the `Feature.update_rows` method, we can initalise a cursor for the FeatureClass without writing out a second context manager as that method handles context
+Since the `update_rows` method returns a dictionary object, we can access the row fields by name instead of index
+The values for each row returned by `update_rows` are then pulled from the dictionary after reassignemnt and fed to `updateRow()` from cursor object
diff --git a/python/arcpy-python/count_multipart_features_2/count_multipart_features_2.py b/python/arcpy-python/count_multipart_features_2/count_multipart_features_2.py
@@ -0,0 +1,149 @@
+#-------------------------------------------------------------------------------
+# Name:        Count Multipart
+# Purpose:     This takes an input feature class/shapefile, adds a new field
+#              and adds a value for how many features make up each multipart
+#
+# Author:      Lucas Danzinger/Hayden Welch
+#
+# Created:     30/05/2013 (Lucas Danzinger) - Original script
+#              01/02/2024 (Hayden Welch) - Updated Version with handler class
+#-------------------------------------------------------------------------------
+
+import arcpy
+import os
+from typing import Generator, Any
+
+class Feature:
+    """
+    This class will handle the describe object and provide a few properties
+    and cursor methods
+    """
+    def __init__(self, path: str) -> None:
+        if not arcpy.Exists(path): # Validate that the input feature exists
+             raise ValueError(f"{path} does not exist")
+        self.__dict__ = arcpy.da.Describe(path)  # Write the describe object to the instance attributes
+
+    @property
+    def field_names(self) -> list[str]:
+        return [f.name for f in self.fields]
+
+    @property
+    def workspace_path(self) -> os.PathLike:
+        return self.workspace.catalogPath
+
+    @property
+    def shape_type(self) -> str:
+        return self.shapeType
+
+    @property
+    def id_field(self) -> str:
+        # Data Access cursors can handle the OID with OID@
+        return 'OID@'  # self.OIDFieldName
+
+    @property
+    def shape_field(self) -> str:
+        # Data Access requires SHAPE@ to get the geometry object
+        return 'SHAPE@'  # self.shapeFieldName
+
+    def get_rows(self, fields: list[str], *,
+              query: str=None) -> Generator[dict[str, Any], None, None]:
+        """This function will take an input feature class and get rows based on a query
+        featureclass: The path to the feature class
+        fields: The fields to be retrieved
+        query: The query to be used to select the rows to be retrieved
+        return: A generator of the rows to be retrieved (row_dict)
+        """
+        with arcpy.da.SearchCursor(self.catalogPath, fields, query) as cursor:
+            for row in cursor:
+                yield dict(zip(cursor.fields, row))
+        return
+
+    def update_rows(self, fields: list[str], *,
+                 query: str=None) -> Generator[tuple[arcpy.da.UpdateCursor, dict[str, Any]], None, None]:
+        """This function will take an input feature class and update rows based on a query
+        featureclass: The path to the feature class
+        fields: The fields to be updated
+        query: The query to be used to select the rows to be updated
+        return: A generator of the rows to be updated (cursor, row_dict)
+        """
+        with arcpy.da.UpdateCursor(self.catalogPath, fields, query) as cursor:
+            for row in cursor:
+                yield (cursor, dict(zip(cursor.fields, row)))
+        return
+
+def count_multipart(feature_path: os.PathLike, *,
+                    field_name: str="PartCount", 
+                    overwrite: bool=False,
+                    report_only: bool=False):
+    """This function will take an input feature class and add a new field
+    featureclass: The path to the feature class
+    field_name: The name of the count field to be added ("PartCount" by default)
+    """
+    # Get the Feature object
+    feature_class: Feature = Feature(feature_path)
+
+    # Set the workspace to the feature class workspace
+    arcpy.env.workspace = feature_class.workspace_path
+
+    # MultiPatch is not supported for multipart features
+    if feature_class.shape_type == "MultiPatch":
+        raise ValueError("This is not a supported geometry shape type. Please select a Multipoint, Polyline, or Polygon")
+
+    # Count the number of parts for each multipart feature and add it to a dictionary
+    multipart_counts: dict[str, Any] = \
+        {
+            row[feature_class.id_field]: row[feature_class.shape_field].partCount  # Get the number of parts for each multipart
+            for row in feature_class.get_rows([feature_class.id_field, feature_class.shape_field])
+            if row[feature_class.shape_field] and row[feature_class.shape_field].isMultipart  # Only get the rows that are multipart
+        }
+
+    # Don't bother updating the rows if there are no multipart features
+    if len(multipart_counts) == 0:
+        print("No multipart features found")
+        return
+
+    if not report_only:  # Only set up the output field and update the rows if we are not just reporting
+        # Set up the output field
+        if field_name in feature_class.field_names:
+            if not overwrite:
+                raise ValueError(f"The field {field_name} already exists in {feature_path}")
+            else:
+                arcpy.DeleteField_management(feature_path, field_name)
+
+        arcpy.AddField_management(feature_path, field_name, 'SHORT')  # Add the field
+
+        # Update the rows with part counts
+        with arcpy.da.Editor(feature_class.workspace_path):
+            upd_keys: list[str] = [str(k) for k in multipart_counts.keys()]
+            # Use the OIDFieldName to build the SQL query, OBJECTID and OID@ dont work in queries
+            update_query = f"{feature_class.OIDFieldName} IN ({','.join(upd_keys)})"  # Only update the rows that are in the dictionary
+            # Use _update_rows to get a row dictionary so we can update the row using field names
+            for cursor, row in feature_class.update_rows([feature_class.id_field, field_name], query=update_query):
+                row[field_name] = multipart_counts[row[feature_class.id_field]]  # Get the part count from the dictionary
+                cursor.updateRow(list(row.values()))  # Convert the dictionary to a list and update the row
+
+    print(f"{len(multipart_counts)} multipart features found in {feature_class.baseName}")
+
+def main():
+
+    # Set these
+    feature_class = r"path\to\feature_class"
+    count_field = 'PartCount'
+    overwrite = True
+    report_only = False
+
+    count_multipart(feature_class, field_name=count_field, overwrite=overwrite, report_only=report_only)
+
+    # Example of an iterative call (uncomment and fill out to use)
+    #
+    #workspace = r'path\to\workspace'
+    #count_field = 'PartCount'
+    #overwrite = True
+    #dataset = 'Landbase'
+    #wildcard = None
+    #arcpy.env.workspace = workspace
+    #for fc in arcpy.ListFeatureClasses(feature_dataset=dataset, wild_card=wildcard):
+    #    count_multipart(fc, field_name=count_field, overwrite=True, report_only=True)
+
+if __name__ == "__main__": 
+    main()