Skip to content

Commit

Permalink
Merge pull request #382 from hwelch-fle/hwelch-fle-count_multipart_fe…
Browse files Browse the repository at this point in the history
…atures_2

First Contribution: Re-Write of the count_multipart_features script
  • Loading branch information
tsimons6 authored Feb 27, 2024
2 parents f0db027 + b5a232e commit 055f577
Show file tree
Hide file tree
Showing 2 changed files with 233 additions and 0 deletions.
84 changes: 84 additions & 0 deletions python/arcpy-python/count_multipart_features_2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
Count Multipart Features 2
=========================

## Instructions

1. Set the `feature_class` variable in `main()` to your featureclass
2. Set the `count_field` variable in `main()` to the desired name of your count field (default is `PartCount`)
3. Set the `overwrite` variable in `main()` to `False` if you don't want to overwrite an existing count
4. The input feature class will have a new field added that will state the number of parts per feature.

## Use Case

This script could be used to identify features with many parts, which could be affecting performance. It could also be used to determine if any features in your feature class are multipart.

## Updates to original

The FeatureClass operations are now handled by passing an `arcpy.da.Describe` dictionary to a `Feature` object

There are some additional properties added to this object that simplify access to describe attributes such as field names, workspace path, shapeType, and shape/oid field names

The cursor objects are now handled by generator functions in the `Feature` class (`get_rows(<fields>, ?<query>)` and `update_rows(<fields>, ?<query>)`)

### Feature
Class for pre-processing a feature class before passing it off to a script

#### Properties
1. field_names: a list of field names
2. workspace_path: the path to the feature workspace as returned by the workspace object in the Describe
3. shape_type: alternative name for shapeType
4. id_field: currently returns `'OID@'` for use with data access cursors, but can be modified to return the OIDFieldName attribute if needed
5. shape_field: currently return `'SHAPE@'` for use with data access cursors, but can be modified to return shapeFieldName attribute if needed

#### Feature.get_rows
This method uses the cursor context manager to return a generator object with each row as a dictionary formatted as `{field_name: field_value}`
If a query is set, then the cursor will pass that query to the cursor

#### Feature.update_rows
This method uses the cursor context manager to return a generator object with each row as a tuple containting the cursor and row dictionary formatted as `(cursor, {field_name: field_value})`
If a query is set, then the cursor will pass that query to the cursor

### count_multipart
The main function of the script, this takes a featureclass path as the only positional argument
#### kwargs
1. field_name: An optional parameter for setting the field name to output the count to
2. overwrite: An optional flag that will prevent overwriting existing fields with the provided field_name
3. report_only: An optional flag that will skip all updates to the featureclass and just print out the number of multipart features found

#### Logic

**Main check:**
```python
multipart_counts = \
{
row[features.id_field]: row[features.shape_field].partCount # Get the number of parts for each multipart
for row in features.get_rows([features.id_field, features.shape_field])
if row[features.shape_field] and row[features.shape_field].isMultipart # Only get the rows that are multipart
}
```
This block is the primary work done by the function. It uses a dictionary comprehension to write the partcount of multipart features to an update dictionary

using an update dictionary allows for minimal work to be done within the UpdateCursor itself and reduces the likelyhood of the program crashing or failing while a cursor is active
Even if it did fail in the cursor, the context manager should gracefully handle the error. This pre-processing step also massively speeds up the update because only features that are multipart are updated and a query that filters the rows can be passed directly to the cursor instead of checking every row.

This step also allows for a length check to be run before any more expensive operations and if there are no multipart features, the Cursor is never created

**Update Block:**
```python
with arcpy.da.Editor(features.workspace_path):
upd_keys = [str(k) for k in multipart_counts.keys()]
# Use the OIDFieldName to build the SQL query, OBJECTID and OID@ dont work in queries
update_query = f"{features.OIDFieldName} IN ({','.join(upd_keys)})" # Only update the rows that are in the dictionary
# Use _update_rows to get a row dictionary so we can update the row using field names
for cursor, row in features.update_rows([features.id_field, field_name], query=update_query):
row[field_name] = multipart_counts[row[features.id_field]] # Get the part count from the dictionary
cursor.updateRow(list(row.values())) # Convert the dictionary to a list and update the row
```
Initialize an Editor object using the builtin context manager

Since we pre-calculated the updates, we can build a SQL query that only pulls rows that need updates written to them
NOTE: The SQL query can't use the `'OID@'` value that we used for everything else, so the we need to pull the OIDFieldName attribute

Using the `Feature.update_rows` method, we can initalise a cursor for the FeatureClass without writing out a second context manager as that method handles context
Since the `update_rows` method returns a dictionary object, we can access the row fields by name instead of index
The values for each row returned by `update_rows` are then pulled from the dictionary after reassignemnt and fed to `updateRow()` from cursor object
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
#-------------------------------------------------------------------------------
# Name: Count Multipart
# Purpose: This takes an input feature class/shapefile, adds a new field
# and adds a value for how many features make up each multipart
#
# Author: Lucas Danzinger/Hayden Welch
#
# Created: 30/05/2013 (Lucas Danzinger) - Original script
# 01/02/2024 (Hayden Welch) - Updated Version with handler class
#-------------------------------------------------------------------------------

import arcpy
import os
from typing import Generator, Any

class Feature:
"""
This class will handle the describe object and provide a few properties
and cursor methods
"""
def __init__(self, path: str) -> None:
if not arcpy.Exists(path): # Validate that the input feature exists
raise ValueError(f"{path} does not exist")
self.__dict__ = arcpy.da.Describe(path) # Write the describe object to the instance attributes

@property
def field_names(self) -> list[str]:
return [f.name for f in self.fields]

@property
def workspace_path(self) -> os.PathLike:
return self.workspace.catalogPath

@property
def shape_type(self) -> str:
return self.shapeType

@property
def id_field(self) -> str:
# Data Access cursors can handle the OID with OID@
return 'OID@' # self.OIDFieldName

@property
def shape_field(self) -> str:
# Data Access requires SHAPE@ to get the geometry object
return 'SHAPE@' # self.shapeFieldName

def get_rows(self, fields: list[str], *,
query: str=None) -> Generator[dict[str, Any], None, None]:
"""This function will take an input feature class and get rows based on a query
featureclass: The path to the feature class
fields: The fields to be retrieved
query: The query to be used to select the rows to be retrieved
return: A generator of the rows to be retrieved (row_dict)
"""
with arcpy.da.SearchCursor(self.catalogPath, fields, query) as cursor:
for row in cursor:
yield dict(zip(cursor.fields, row))
return

def update_rows(self, fields: list[str], *,
query: str=None) -> Generator[tuple[arcpy.da.UpdateCursor, dict[str, Any]], None, None]:
"""This function will take an input feature class and update rows based on a query
featureclass: The path to the feature class
fields: The fields to be updated
query: The query to be used to select the rows to be updated
return: A generator of the rows to be updated (cursor, row_dict)
"""
with arcpy.da.UpdateCursor(self.catalogPath, fields, query) as cursor:
for row in cursor:
yield (cursor, dict(zip(cursor.fields, row)))
return

def count_multipart(feature_path: os.PathLike, *,
field_name: str="PartCount",
overwrite: bool=False,
report_only: bool=False):
"""This function will take an input feature class and add a new field
featureclass: The path to the feature class
field_name: The name of the count field to be added ("PartCount" by default)
"""
# Get the Feature object
feature_class: Feature = Feature(feature_path)

# Set the workspace to the feature class workspace
arcpy.env.workspace = feature_class.workspace_path

# MultiPatch is not supported for multipart features
if feature_class.shape_type == "MultiPatch":
raise ValueError("This is not a supported geometry shape type. Please select a Multipoint, Polyline, or Polygon")

# Count the number of parts for each multipart feature and add it to a dictionary
multipart_counts: dict[str, Any] = \
{
row[feature_class.id_field]: row[feature_class.shape_field].partCount # Get the number of parts for each multipart
for row in feature_class.get_rows([feature_class.id_field, feature_class.shape_field])
if row[feature_class.shape_field] and row[feature_class.shape_field].isMultipart # Only get the rows that are multipart
}

# Don't bother updating the rows if there are no multipart features
if len(multipart_counts) == 0:
print("No multipart features found")
return

if not report_only: # Only set up the output field and update the rows if we are not just reporting
# Set up the output field
if field_name in feature_class.field_names:
if not overwrite:
raise ValueError(f"The field {field_name} already exists in {feature_path}")
else:
arcpy.DeleteField_management(feature_path, field_name)

arcpy.AddField_management(feature_path, field_name, 'SHORT') # Add the field

# Update the rows with part counts
with arcpy.da.Editor(feature_class.workspace_path):
upd_keys: list[str] = [str(k) for k in multipart_counts.keys()]
# Use the OIDFieldName to build the SQL query, OBJECTID and OID@ dont work in queries
update_query = f"{feature_class.OIDFieldName} IN ({','.join(upd_keys)})" # Only update the rows that are in the dictionary
# Use _update_rows to get a row dictionary so we can update the row using field names
for cursor, row in feature_class.update_rows([feature_class.id_field, field_name], query=update_query):
row[field_name] = multipart_counts[row[feature_class.id_field]] # Get the part count from the dictionary
cursor.updateRow(list(row.values())) # Convert the dictionary to a list and update the row

print(f"{len(multipart_counts)} multipart features found in {feature_class.baseName}")

def main():

# Set these
feature_class = r"path\to\feature_class"
count_field = 'PartCount'
overwrite = True
report_only = False

count_multipart(feature_class, field_name=count_field, overwrite=overwrite, report_only=report_only)

# Example of an iterative call (uncomment and fill out to use)
#
#workspace = r'path\to\workspace'
#count_field = 'PartCount'
#overwrite = True
#dataset = 'Landbase'
#wildcard = None
#arcpy.env.workspace = workspace
#for fc in arcpy.ListFeatureClasses(feature_dataset=dataset, wild_card=wildcard):
# count_multipart(fc, field_name=count_field, overwrite=True, report_only=True)

if __name__ == "__main__":
main()

0 comments on commit 055f577

Please sign in to comment.