Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#3299: Script to update suborg values - [ZA] #3322

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5e5626a
basic logic
zandercymatics Jan 8, 2025
14b6382
Add import
zandercymatics Jan 8, 2025
ae2e4c5
Ducttape
zandercymatics Jan 8, 2025
1b6667a
Cleanup script
zandercymatics Jan 9, 2025
f170ab9
Add records to delete manually
zandercymatics Jan 9, 2025
028d033
Code comments + lint
zandercymatics Jan 9, 2025
db75d31
Wrap up PR
zandercymatics Jan 9, 2025
bff9899
Merge branch 'main' into za/3299-script-to-update-suborg-values
zandercymatics Jan 9, 2025
2fa8366
Lint
zandercymatics Jan 9, 2025
bf9824a
Lint part 2
zandercymatics Jan 9, 2025
25cf2fe
Revert "Lint part 2"
zandercymatics Jan 9, 2025
e60d1db
Update create_federal_portfolio.py
zandercymatics Jan 9, 2025
104b1f8
Update create_federal_portfolio.py
zandercymatics Jan 9, 2025
0ddb3a2
Fix linter C901 error
zandercymatics Jan 10, 2025
e58d68b
add some debug logs
zandercymatics Jan 10, 2025
5e697c3
Update create_federal_portfolio.py
zandercymatics Jan 10, 2025
9a887d8
Update create_federal_portfolio.py
zandercymatics Jan 10, 2025
19e4dff
Update create_federal_portfolio.py
zandercymatics Jan 10, 2025
ac08b17
Update create_federal_portfolio.py
zandercymatics Jan 10, 2025
2ca7cd4
Modify terminal helper
zandercymatics Jan 10, 2025
2d509cf
Fix sneaky bug
zandercymatics Jan 10, 2025
51105eb
Remove unrelated changes
zandercymatics Jan 10, 2025
48f4708
Update src/registrar/management/commands/create_federal_portfolio.py
zandercymatics Jan 10, 2025
e95bcb4
Remove redundant check on created
zandercymatics Jan 13, 2025
bf8cdc5
Merge branch 'main' into za/3299-script-to-update-suborg-values
zandercymatics Jan 13, 2025
2fad00d
Merge branch 'main' into za/3299-script-to-update-suborg-values
zandercymatics Jan 13, 2025
e184149
Update create_federal_portfolio.py
zandercymatics Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/operations/data_migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -918,3 +918,38 @@ Example (only requests): `./manage.py create_federal_portfolio --branch "executi
- Parameters #1-#2: Either `--agency_name` or `--branch` must be specified. Not both.
- Parameters #2-#3, you cannot use `--both` while using these. You must specify either `--parse_requests` or `--parse_domains` seperately. While all of these parameters are optional in that you do not need to specify all of them,
you must specify at least one to run this script.


## Patch suborganizations
This script deletes some duplicate suborganization data that exists in our database (one-time use).
It works in two ways:
1. If the only name difference between two suborg records is extra spaces or a capitalization difference,
then we delete all duplicate records of this type.
2. If the suborg name is one we manually specify to delete via the script.

Before it deletes records, it goes through each DomainInformation and DomainRequest object and updates the reference to "sub_organization" to match the non-duplicative record.

### Running on sandboxes

#### Step 1: Login to CloudFoundry
```cf login -a api.fr.cloud.gov --sso```

#### Step 2: SSH into your environment
```cf ssh getgov-{space}```

Example: `cf ssh getgov-za`

#### Step 3: Create a shell instance
```/tmp/lifecycle/shell```

#### Step 4: Upload your csv to the desired sandbox
[Follow these steps](#use-scp-to-transfer-data-to-sandboxes) to upload the federal_cio csv to a sandbox of your choice.

#### Step 5: Running the script
To create a specific portfolio:
```./manage.py patch_suborganizations```

### Running locally

#### Step 1: Running the script
```docker-compose exec app ./manage.py patch_suborganizations```
82 changes: 72 additions & 10 deletions src/registrar/management/commands/create_federal_portfolio.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from registrar.management.commands.utility.terminal_helper import TerminalColors, TerminalHelper
from registrar.models import DomainInformation, DomainRequest, FederalAgency, Suborganization, Portfolio, User
from registrar.models.utility.generic_helper import normalize_string
from django.db.models import F, Q


logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -104,12 +105,17 @@ def handle(self, **options):
message = f"Failed to create portfolio '{federal_agency.agency}'"
TerminalHelper.colorful_logger(logger.info, TerminalColors.FAIL, message)

# POST PROCESS STEP: Add additional suborg info where applicable.
updated_suborg_count = self.post_process_suborganization_fields(agencies)
message = f"Added city and state_territory information to {updated_suborg_count} suborgs."
TerminalHelper.colorful_logger(logger.info, TerminalColors.MAGENTA, message)

TerminalHelper.log_script_run_summary(
self.updated_portfolios,
self.failed_portfolios,
self.skipped_portfolios,
debug=False,
skipped_header="----- SOME PORTFOLIOS WERE SKIPPED -----",
skipped_header="----- SOME PORTFOLIOS WERENT CREATED -----",
display_as_str=True,
)

Expand Down Expand Up @@ -169,14 +175,11 @@ def post_process_started_domain_requests(self, agencies, portfolios):

def handle_populate_portfolio(self, federal_agency, parse_domains, parse_requests, both):
"""Attempts to create a portfolio. If successful, this function will
also create new suborganizations.
Returns the portfolio for the given federal_agency.
"""
portfolio, created = self.create_portfolio(federal_agency)
if created:
self.create_suborganizations(portfolio, federal_agency)
if parse_domains or both:
self.handle_portfolio_domains(portfolio, federal_agency)
also create new suborganizations"""
portfolio, _ = self.create_portfolio(federal_agency)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the comment Returns the portfolio for the given federal_agency removed?

Copy link
Contributor Author

@zandercymatics zandercymatics Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I just removed that because this function doesn't actually return anything - so I think it was a typo

self.create_suborganizations(portfolio, federal_agency)
if parse_domains or both:
self.handle_portfolio_domains(portfolio, federal_agency)

if parse_requests or both:
self.handle_portfolio_requests(portfolio, federal_agency)
Expand Down Expand Up @@ -233,7 +236,6 @@ def create_suborganizations(self, portfolio: Portfolio, federal_agency: FederalA
federal_agency=federal_agency, organization_name__isnull=False
)
org_names = set(valid_agencies.values_list("organization_name", flat=True))

if not org_names:
message = (
"Could not add any suborganizations."
Expand Down Expand Up @@ -352,3 +354,63 @@ def handle_portfolio_domains(self, portfolio: Portfolio, federal_agency: Federal
DomainInformation.objects.bulk_update(domain_infos, ["portfolio", "sub_organization"])
message = f"Added portfolio '{portfolio}' to {len(domain_infos)} domains."
TerminalHelper.colorful_logger(logger.info, TerminalColors.OKGREEN, message)

def post_process_suborganization_fields(self, agencies):
"""Post-process suborganization fields by pulling data from related domains and requests.

This function updates suborganization city and state_territory fields based on
related domain information and domain request information.
"""
# Assuming that org name, portfolio, and suborg all aren't null
# we assume that we want to add suborg info.
# as long as the org name doesnt match the portfolio name (as that implies it is the portfolio).
should_add_suborgs_filter = Q(
organization_name__isnull=False,
portfolio__isnull=False,
sub_organization__isnull=False,
) & ~Q(organization_name__iexact=F("portfolio__organization_name"))
domains = DomainInformation.objects.filter(
should_add_suborgs_filter, federal_agency__in=agencies, portfolio__isnull=False
)
requests = DomainRequest.objects.filter(
should_add_suborgs_filter, federal_agency__in=agencies, portfolio__isnull=False
)
domains_dict = {domain.organization_name: domain for domain in domains}
requests_dict = {request.organization_name: request for request in requests}
suborgs_to_edit = Suborganization.objects.filter(
Q(id__in=domains.values_list("sub_organization", flat=True))
| Q(id__in=requests.values_list("sub_organization", flat=True))
)
for suborg in suborgs_to_edit:
domain = domains_dict.get(suborg.name, None)
request = requests_dict.get(suborg.name, None)

# PRIORITY:
# 1. Domain info
# 2. Domain request requested suborg fields
# 3. Domain request normal fields
city = None
if domain and domain.city:
city = normalize_string(domain.city, lowercase=False)
elif request and request.suborganization_city:
city = normalize_string(request.suborganization_city, lowercase=False)
elif request and request.city:
city = normalize_string(request.city, lowercase=False)

state_territory = None
if domain and domain.state_territory:
state_territory = domain.state_territory
elif request and request.suborganization_state_territory:
state_territory = request.suborganization_state_territory
elif request and request.state_territory:
state_territory = request.state_territory

if city:
suborg.city = city

if suborg:
suborg.state_territory = state_territory

logger.info(f"{suborg}: city: {suborg.city}, state: {suborg.state_territory}")

return Suborganization.objects.bulk_update(suborgs_to_edit, ["city", "state_territory"])
133 changes: 133 additions & 0 deletions src/registrar/management/commands/patch_suborganizations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
import logging
from django.core.management import BaseCommand
from registrar.models import Suborganization, DomainRequest, DomainInformation
from registrar.management.commands.utility.terminal_helper import TerminalColors, TerminalHelper
from registrar.models.utility.generic_helper import count_capitals, normalize_string


logger = logging.getLogger(__name__)


class Command(BaseCommand):
help = "Clean up duplicate suborganizations that differ only by spaces and capitalization"

def handle(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the us of help text here!

"""Process manual deletions and find/remove duplicates. Shows preview
and updates DomainInformation / DomainRequest sub_organization references before deletion."""

# First: get a preset list of records we want to delete.
# For extra_records_to_prune: the key gets deleted, the value gets kept.
extra_records_to_prune = {
normalize_string("Assistant Secretary for Preparedness and Response Office of the Secretary"): {
"replace_with": "Assistant Secretary for Preparedness and Response, Office of the Secretary"
},
normalize_string("US Geological Survey"): {"replace_with": "U.S. Geological Survey"},
normalize_string("USDA/OC"): {"replace_with": "USDA, Office of Communications"},
normalize_string("GSA, IC, OGP WebPortfolio"): {"replace_with": "GSA, IC, OGP Web Portfolio"},
normalize_string("USDA/ARS/NAL"): {"replace_with": "USDA, ARS, NAL"},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as to where these values came from? Are they supposed to be hardcorded?

Copy link
Contributor Author

@zandercymatics zandercymatics Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They come from this spreadsheet. The create_federal_portfolio script assumes that the org names the user provides are always correct, but some domains misspelt their org name. Since its just a misspelling, these corner cases are hard coded. Data cleanup basically


# Second: loop through every Suborganization and return a dict of what to keep, and what to delete
# for each duplicate or "incorrect" record. We do this by pruning records with extra spaces or bad caps
# Note that "extra_records_to_prune" is just a manual mapping.
records_to_prune = self.get_records_to_prune(extra_records_to_prune)
if len(records_to_prune) == 0:
TerminalHelper.colorful_logger(logger.error, TerminalColors.FAIL, "No suborganizations to delete.")
return

# Third: Build a preview of the changes
total_records_to_remove = 0
preview_lines = ["The following records will be removed:"]
for data in records_to_prune.values():
keep = data.get("keep")
delete = data.get("delete")
if keep:
preview_lines.append(f"Keeping: '{keep.name}' (id: {keep.id})")

for duplicate in delete:
preview_lines.append(f"Removing: '{duplicate.name}' (id: {duplicate.id})")
total_records_to_remove += 1
preview_lines.append("")
preview = "\n".join(preview_lines)

# Fourth: Get user confirmation and delete
if TerminalHelper.prompt_for_execution(
system_exit_on_terminate=True,
prompt_message=preview,
prompt_title=f"Remove {total_records_to_remove} suborganizations?",
verify_message="*** WARNING: This will replace the record on DomainInformation and DomainRequest! ***",
):
try:
# Update all references to point to the right suborg before deletion
all_suborgs_to_remove = set()
for record in records_to_prune.values():
best_record = record["keep"]
suborgs_to_remove = {dupe.id for dupe in record["delete"]}
DomainRequest.objects.filter(sub_organization_id__in=suborgs_to_remove).update(
sub_organization=best_record
)
DomainInformation.objects.filter(sub_organization_id__in=suborgs_to_remove).update(
sub_organization=best_record
)
all_suborgs_to_remove.update(suborgs_to_remove)
# Delete the suborgs
delete_count, _ = Suborganization.objects.filter(id__in=all_suborgs_to_remove).delete()
TerminalHelper.colorful_logger(
logger.info, TerminalColors.MAGENTA, f"Successfully deleted {delete_count} suborganizations."
)
except Exception as e:
TerminalHelper.colorful_logger(
logger.error, TerminalColors.FAIL, f"Failed to delete suborganizations: {str(e)}"
)

def get_records_to_prune(self, extra_records_to_prune):
"""Maps all suborgs into a dictionary with a record to keep, and an array of records to delete."""
# First: Group all suborganization names by their "normalized" names (finding duplicates).
# Returns a dict that looks like this:
# {
# "amtrak": [<Suborganization: AMTRAK>, <Suborganization: aMtRaK>, <Suborganization: AMTRAK >],
# "usda/oc": [<Suborganization: USDA/OC>],
# ...etc
# }
#
name_groups = {}
for suborg in Suborganization.objects.all():
normalized_name = normalize_string(suborg.name)
name_groups.setdefault(normalized_name, []).append(suborg)

# Second: find the record we should keep, and the records we should delete
# Returns a dict that looks like this:
# {
# "amtrak": {
# "keep": <Suborganization: AMTRAK>
# "delete": [<Suborganization: aMtRaK>, <Suborganization: AMTRAK >]
# },
# "usda/oc": {
# "keep": <Suborganization: USDA, Office of Communications>,
# "delete": [<Suborganization: USDA/OC>]
# },
# ...etc
# }
records_to_prune = {}
for normalized_name, duplicate_suborgs in name_groups.items():
# Delete data from our preset list
if normalized_name in extra_records_to_prune:
# The 'keep' field expects a Suborganization but we just pass in a string, so this is just a workaround.
# This assumes that there is only one item in the name_group array (see usda/oc example).
# But this should be fine, given our data.
hardcoded_record_name = extra_records_to_prune[normalized_name]["replace_with"]
name_group = name_groups.get(normalize_string(hardcoded_record_name))
keep = name_group[0] if name_group else None
records_to_prune[normalized_name] = {"keep": keep, "delete": duplicate_suborgs}
# Delete duplicates (extra spaces or casing differences)
elif len(duplicate_suborgs) > 1:
# Pick the best record (fewest spaces, most leading capitals)
best_record = max(
duplicate_suborgs,
key=lambda suborg: (-suborg.name.count(" "), count_capitals(suborg.name, leading_only=True)),
)
records_to_prune[normalized_name] = {
"keep": best_record,
"delete": [s for s in duplicate_suborgs if s != best_record],
}
return records_to_prune
24 changes: 12 additions & 12 deletions src/registrar/management/commands/utility/terminal_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -401,16 +401,15 @@ def prompt_for_execution(
# Allow the user to inspect the command string
# and ask if they wish to proceed
proceed_execution = TerminalHelper.query_yes_no_exit(
f"""{TerminalColors.OKCYAN}
=====================================================
{prompt_title}
=====================================================
{verify_message}

{prompt_message}
{TerminalColors.FAIL}
Proceed? (Y = proceed, N = {action_description_for_selecting_no})
{TerminalColors.ENDC}"""
f"\n{TerminalColors.OKCYAN}"
"====================================================="
f"\n{prompt_title}\n"
"====================================================="
f"\n{verify_message}\n"
f"\n{prompt_message}\n"
f"{TerminalColors.FAIL}"
f"Proceed? (Y = proceed, N = {action_description_for_selecting_no})"
f"{TerminalColors.ENDC}"
)
Comment on lines 403 to 413
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the spacing issue that has been present on this helper function, so now it is "left aligned" (in a sense)


# If the user decided to proceed return true.
Expand Down Expand Up @@ -443,13 +442,14 @@ def print_to_file_conditional(print_condition: bool, filename: str, file_directo
f.write(file_contents)

@staticmethod
def colorful_logger(log_level, color, message):
def colorful_logger(log_level, color, message, exc_info=True):
"""Adds some color to your log output.

Args:
log_level: str | Logger.method -> Desired log level. ex: logger.info or "INFO"
color: str | TerminalColors -> Output color. ex: TerminalColors.YELLOW or "YELLOW"
message: str -> Message to display.
exc_info: bool -> Whether the log should print exc_info or not
"""

if isinstance(log_level, str) and hasattr(logger, log_level.lower()):
Expand All @@ -463,4 +463,4 @@ def colorful_logger(log_level, color, message):
terminal_color = color

colored_message = f"{terminal_color}{message}{TerminalColors.ENDC}"
log_method(colored_message)
log_method(colored_message, exc_info=exc_info)
14 changes: 14 additions & 0 deletions src/registrar/models/utility/generic_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,3 +353,17 @@ def normalize_string(string_to_normalize, lowercase=True):

new_string = " ".join(string_to_normalize.split())
return new_string.lower() if lowercase else new_string


def count_capitals(text: str, leading_only: bool):
"""Counts capital letters in a string.
Args:
text (str): The string to analyze.
leading_only (bool): If False, counts all capital letters.
If True, only counts capitals at the start of words.
Returns:
int: Number of capital letters found.
"""
if leading_only:
return sum(word[0].isupper() for word in text.split() if word)
return sum(c.isupper() for c in text if c)
2 changes: 2 additions & 0 deletions src/registrar/tests/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
ErrorCode,
responses,
)
from registrar.models.suborganization import Suborganization
from registrar.models.utility.portfolio_helper import UserPortfolioPermissionChoices, UserPortfolioRoleChoices
from registrar.models.user_domain_role import UserDomainRole

Expand Down Expand Up @@ -911,6 +912,7 @@ def sharedTearDown(cls):
DomainInformation.objects.all().delete()
DomainRequest.objects.all().delete()
UserDomainRole.objects.all().delete()
Suborganization.objects.all().delete()
Portfolio.objects.all().delete()
UserPortfolioPermission.objects.all().delete()
User.objects.all().delete()
Expand Down
Loading
Loading