Skip to content

Commit

Permalink
Merge pull request #1993 from cisagov/za/1911-fill-new-org-type-column
Browse files Browse the repository at this point in the history
Ticket #1911: Fill new org type column
  • Loading branch information
zandercymatics authored Apr 17, 2024
2 parents 7caafac + 4467571 commit 45f758c
Show file tree
Hide file tree
Showing 8 changed files with 628 additions and 25 deletions.
56 changes: 56 additions & 0 deletions docs/operations/data_migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -586,3 +586,59 @@ Example: `cf ssh getgov-za`
| | Parameter | Description |
|:-:|:-------------------------- |:----------------------------------------------------------------------------|
| 1 | **debug** | Increases logging detail. Defaults to False. |


## Populate Organization type
This section outlines how to run the `populate_organization_type` script.
The script is used to update the organization_type field on DomainRequest and DomainInformation when it is None.
That data are synthesized from the generic_org_type field and the is_election_board field by concatenating " - Elections" on the end of generic_org_type string if is_elections_board is True.

### Running on sandboxes

#### Step 1: Login to CloudFoundry
```cf login -a api.fr.cloud.gov --sso```

#### Step 2: Get the domain_election_board file
The latest domain_election_board csv can be found [here](https://drive.google.com/file/d/1aDeCqwHmBnXBl2arvoFCN0INoZmsEGsQ/view).
After downloading this file, place it in `src/migrationdata`

#### Step 2: Upload the domain_election_board file to your sandbox
Follow [Step 1: Transfer data to sandboxes](#step-1-transfer-data-to-sandboxes) and [Step 2: Transfer uploaded files to the getgov directory](#step-2-transfer-uploaded-files-to-the-getgov-directory) from the [Set Up Migrations on Sandbox](#set-up-migrations-on-sandbox) portion of this doc.

#### Step 2: SSH into your environment
```cf ssh getgov-{space}```

Example: `cf ssh getgov-za`

#### Step 3: Create a shell instance
```/tmp/lifecycle/shell```

#### Step 4: Running the script
```./manage.py populate_organization_type {domain_election_board_filename}```

- The domain_election_board_filename file must adhere to this format:
- example.gov\
example2.gov\
example3.gov

Example:
`./manage.py populate_organization_type migrationdata/election-domains.csv`

### Running locally

#### Step 1: Get the domain_election_board file
The latest domain_election_board csv can be found [here](https://drive.google.com/file/d/1aDeCqwHmBnXBl2arvoFCN0INoZmsEGsQ/view).
After downloading this file, place it in `src/migrationdata`


#### Step 2: Running the script
```docker-compose exec app ./manage.py populate_organization_type {domain_election_board_filename}```

Example (assuming that this is being ran from src/):
`docker-compose exec app ./manage.py populate_organization_type migrationdata/election-domains.csv`


### Required parameters
| | Parameter | Description |
|:-:|:------------------------------------|:-------------------------------------------------------------------|
| 1 | **domain_election_board_filename** | A file containing every domain that is an election office.
237 changes: 237 additions & 0 deletions src/registrar/management/commands/populate_organization_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
import argparse
import logging
import os
from typing import List
from django.core.management import BaseCommand
from registrar.management.commands.utility.terminal_helper import TerminalColors, TerminalHelper, ScriptDataHelper
from registrar.models import DomainInformation, DomainRequest
from registrar.models.utility.generic_helper import CreateOrUpdateOrganizationTypeHelper

logger = logging.getLogger(__name__)


class Command(BaseCommand):
help = (
"Loops through each valid DomainInformation and DomainRequest object and updates its organization_type value. "
"A valid DomainInformation/DomainRequest in this sense is one that has the value None for organization_type. "
"In other words, we populate the organization_type field if it is not already populated."
)

def __init__(self):
super().__init__()
# Get lists for DomainRequest
self.request_to_update: List[DomainRequest] = []
self.request_failed_to_update: List[DomainRequest] = []
self.request_skipped: List[DomainRequest] = []

# Get lists for DomainInformation
self.di_to_update: List[DomainInformation] = []
self.di_failed_to_update: List[DomainInformation] = []
self.di_skipped: List[DomainInformation] = []

# Define a global variable for all domains with election offices
self.domains_with_election_boards_set = set()

def add_arguments(self, parser):
"""Adds command line arguments"""
parser.add_argument(
"domain_election_board_filename",
help=("A file that contains" " all the domains that are election offices."),
)

def handle(self, domain_election_board_filename, **kwargs):
"""Loops through each valid Domain object and updates its first_created value"""

# Check if the provided file path is valid
if not os.path.isfile(domain_election_board_filename):
raise argparse.ArgumentTypeError(f"Invalid file path '{domain_election_board_filename}'")

# Read the election office csv
self.read_election_board_file(domain_election_board_filename)

domain_requests = DomainRequest.objects.filter(organization_type__isnull=True)

# Code execution will stop here if the user prompts "N"
TerminalHelper.prompt_for_execution(
system_exit_on_terminate=True,
info_to_inspect=f"""
==Proposed Changes==
Number of DomainRequest objects to change: {len(domain_requests)}
Organization_type data will be added for all of these fields.
""",
prompt_title="Do you wish to process DomainRequest?",
)
logger.info("Updating DomainRequest(s)...")

self.update_domain_requests(domain_requests)

# We should actually be targeting all fields with no value for organization type,
# but do have a value for generic_org_type. This is because there is data that we can infer.
domain_infos = DomainInformation.objects.filter(organization_type__isnull=True)
# Code execution will stop here if the user prompts "N"
TerminalHelper.prompt_for_execution(
system_exit_on_terminate=True,
info_to_inspect=f"""
==Proposed Changes==
Number of DomainInformation objects to change: {len(domain_infos)}
Organization_type data will be added for all of these fields.
""",
prompt_title="Do you wish to process DomainInformation?",
)
logger.info("Updating DomainInformation(s)...")

self.update_domain_informations(domain_infos)

def read_election_board_file(self, domain_election_board_filename):
"""
Reads the election board file and adds each parsed domain to self.domains_with_election_boards_set.
As previously implied, this file contains information about Domains which have election boards.
The file must adhere to this format:
```
domain1.gov
domain2.gov
domain3.gov
```
(and so on)
"""
with open(domain_election_board_filename, "r") as file:
for line in file:
# Remove any leading/trailing whitespace
domain = line.strip()
if domain not in self.domains_with_election_boards_set:
self.domains_with_election_boards_set.add(domain)

def update_domain_requests(self, domain_requests):
"""
Updates the organization_type for a list of DomainRequest objects using the `sync_organization_type` function.
Results are then logged.
This function updates the following variables:
- self.request_to_update list is appended to if the field was updated successfully.
- self.request_skipped list is appended to if the field has `None` for `request.generic_org_type`.
- self.request_failed_to_update list is appended to if an exception is caught during update.
"""
for request in domain_requests:
try:
if request.generic_org_type is not None:
domain_name = None
if request.requested_domain is not None and request.requested_domain.name is not None:
domain_name = request.requested_domain.name

request_is_approved = request.status == DomainRequest.DomainRequestStatus.APPROVED
if request_is_approved and domain_name is not None and not request.is_election_board:
request.is_election_board = domain_name in self.domains_with_election_boards_set

self.sync_organization_type(DomainRequest, request)
self.request_to_update.append(request)
logger.info(f"Updating {request} => {request.organization_type}")
else:
self.request_skipped.append(request)
logger.warning(f"Skipped updating {request}. No generic_org_type was found.")
except Exception as err:
self.request_failed_to_update.append(request)
logger.error(err)
logger.error(f"{TerminalColors.FAIL}" f"Failed to update {request}" f"{TerminalColors.ENDC}")

# Do a bulk update on the organization_type field
ScriptDataHelper.bulk_update_fields(
DomainRequest, self.request_to_update, ["organization_type", "is_election_board", "generic_org_type"]
)

# Log what happened
log_header = "============= FINISHED UPDATE FOR DOMAINREQUEST ==============="
TerminalHelper.log_script_run_summary(
self.request_to_update, self.request_failed_to_update, self.request_skipped, True, log_header
)

update_skipped_count = len(self.request_to_update)
if update_skipped_count > 0:
logger.warning(
f"""{TerminalColors.MAGENTA}
Note: Entries are skipped when generic_org_type is None
{TerminalColors.ENDC}
"""
)

def update_domain_informations(self, domain_informations):
"""
Updates the organization_type for a list of DomainInformation objects
and updates is_election_board if the domain is in the provided csv.
Results are then logged.
This function updates the following variables:
- self.di_to_update list is appended to if the field was updated successfully.
- self.di_skipped list is appended to if the field has `None` for `request.generic_org_type`.
- self.di_failed_to_update list is appended to if an exception is caught during update.
"""
for info in domain_informations:
try:
if info.generic_org_type is not None:
domain_name = info.domain.name

if not info.is_election_board:
info.is_election_board = domain_name in self.domains_with_election_boards_set

self.sync_organization_type(DomainInformation, info)

self.di_to_update.append(info)
logger.info(f"Updating {info} => {info.organization_type}")
else:
self.di_skipped.append(info)
logger.warning(f"Skipped updating {info}. No generic_org_type was found.")
except Exception as err:
self.di_failed_to_update.append(info)
logger.error(err)
logger.error(f"{TerminalColors.FAIL}" f"Failed to update {info}" f"{TerminalColors.ENDC}")

# Do a bulk update on the organization_type field
ScriptDataHelper.bulk_update_fields(
DomainInformation, self.di_to_update, ["organization_type", "is_election_board", "generic_org_type"]
)

# Log what happened
log_header = "============= FINISHED UPDATE FOR DOMAININFORMATION ==============="
TerminalHelper.log_script_run_summary(
self.di_to_update, self.di_failed_to_update, self.di_skipped, True, log_header
)

update_skipped_count = len(self.di_skipped)
if update_skipped_count > 0:
logger.warning(
f"""{TerminalColors.MAGENTA}
Note: Entries are skipped when generic_org_type is None
{TerminalColors.ENDC}
"""
)

def sync_organization_type(self, sender, instance):
"""
Updates the organization_type (without saving) to match
the is_election_board and generic_organization_type fields.
"""

# Define mappings between generic org and election org.
# These have to be defined here, as you'd get a cyclical import error
# otherwise.

# For any given organization type, return the "_ELECTION" enum equivalent.
# For example: STATE_OR_TERRITORY => STATE_OR_TERRITORY_ELECTION
generic_org_map = DomainRequest.OrgChoicesElectionOffice.get_org_generic_to_org_election()

# For any given "_election" variant, return the base org type.
# For example: STATE_OR_TERRITORY_ELECTION => STATE_OR_TERRITORY
election_org_map = DomainRequest.OrgChoicesElectionOffice.get_org_election_to_org_generic()

# Manages the "organization_type" variable and keeps in sync with
# "is_election_board" and "generic_organization_type"
org_type_helper = CreateOrUpdateOrganizationTypeHelper(
sender=sender,
instance=instance,
generic_org_to_org_map=generic_org_map,
election_org_to_generic_org_map=election_org_map,
)

org_type_helper.create_or_update_organization_type(force_update=True)
12 changes: 8 additions & 4 deletions src/registrar/management/commands/utility/terminal_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def bulk_update_fields(model_class, update_list, fields_to_update, batch_size=10
Usage:
bulk_update_fields(Domain, page.object_list, ["first_ready"])
"""
logger.info(f"{TerminalColors.YELLOW} Bulk updating fields... {TerminalColors.ENDC}")
# Create a Paginator object. Bulk_update on the full dataset
# is too memory intensive for our current app config, so we can chunk this data instead.
paginator = Paginator(update_list, batch_size)
Expand All @@ -59,13 +60,16 @@ def bulk_update_fields(model_class, update_list, fields_to_update, batch_size=10

class TerminalHelper:
@staticmethod
def log_script_run_summary(to_update, failed_to_update, skipped, debug: bool):
def log_script_run_summary(to_update, failed_to_update, skipped, debug: bool, log_header=None):
"""Prints success, failed, and skipped counts, as well as
all affected objects."""
update_success_count = len(to_update)
update_failed_count = len(failed_to_update)
update_skipped_count = len(skipped)

if log_header is None:
log_header = "============= FINISHED ==============="

# Prepare debug messages
debug_messages = {
"success": (f"{TerminalColors.OKCYAN}Updated: {to_update}{TerminalColors.ENDC}\n"),
Expand All @@ -85,15 +89,15 @@ def log_script_run_summary(to_update, failed_to_update, skipped, debug: bool):
if update_failed_count == 0 and update_skipped_count == 0:
logger.info(
f"""{TerminalColors.OKGREEN}
============= FINISHED ===============
{log_header}
Updated {update_success_count} entries
{TerminalColors.ENDC}
"""
)
elif update_failed_count == 0:
logger.warning(
f"""{TerminalColors.YELLOW}
============= FINISHED ===============
{log_header}
Updated {update_success_count} entries
----- SOME DATA WAS INVALID (NEEDS MANUAL PATCHING) -----
Skipped updating {update_skipped_count} entries
Expand All @@ -103,7 +107,7 @@ def log_script_run_summary(to_update, failed_to_update, skipped, debug: bool):
else:
logger.error(
f"""{TerminalColors.FAIL}
============= FINISHED ===============
{log_header}
Updated {update_success_count} entries
----- UPDATE FAILED -----
Failed to update {update_failed_count} entries,
Expand Down
15 changes: 12 additions & 3 deletions src/registrar/models/domain_information.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,14 +236,17 @@ def __str__(self):
except Exception:
return ""

def save(self, *args, **kwargs):
"""Save override for custom properties"""
def sync_organization_type(self):
"""
Updates the organization_type (without saving) to match
the is_election_board and generic_organization_type fields.
"""

# Define mappings between generic org and election org.
# These have to be defined here, as you'd get a cyclical import error
# otherwise.

# For any given organization type, return the "_election" variant.
# For any given organization type, return the "_ELECTION" enum equivalent.
# For example: STATE_OR_TERRITORY => STATE_OR_TERRITORY_ELECTION
generic_org_map = DomainRequest.OrgChoicesElectionOffice.get_org_generic_to_org_election()

Expand All @@ -262,6 +265,12 @@ def save(self, *args, **kwargs):

# Actually updates the organization_type field
org_type_helper.create_or_update_organization_type()

return self

def save(self, *args, **kwargs):
"""Save override for custom properties"""
self.sync_organization_type()
super().save(*args, **kwargs)

@classmethod
Expand Down
Loading

0 comments on commit 45f758c

Please sign in to comment.