Skip to content

Commit

Permalink
Merge pull request #1301 from cisagov/za/additional-data-transferred-…
Browse files Browse the repository at this point in the history
…domains

Ticket #1139: Import additional data for transferred domains
  • Loading branch information
zandercymatics authored Nov 13, 2023
2 parents 88ab9a4 + 34d1b8f commit 7baf205
Show file tree
Hide file tree
Showing 25 changed files with 2,710 additions and 379 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,6 @@ node_modules

# Compliance/trestle related
docs/compliance/.trestle/cache

src/migrationdata/*

142 changes: 114 additions & 28 deletions docs/operations/data_migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,54 @@ docker compose run app ./manage.py load_domain_invitations /app/escrow_domain_co
```

## Transition Domains (Part 1) - Setup Files for Import
We are provided with information about Transition Domains in 3 files:

#### STEP 1: obtain data files
We are provided with information about Transition Domains in the following files:

- FILE 1: **escrow_domain_contacts.daily.gov.GOV.txt** -> has the map of domain names to contact ID. Domains in this file will usually have 3 contacts each
- FILE 2: **escrow_contacts.daily.gov.GOV.txt** -> has the mapping of contact id to contact email address (which is what we care about for sending domain invitations)
- FILE 3: **escrow_domain_statuses.daily.gov.GOV.txt** -> has the map of domains and their statuses
- FILE 4: **escrow_domains.daily.dotgov.GOV.txt** -> has a map of domainname, expiration and creation dates
- FILE 5: **domainadditionaldatalink.adhoc.dotgov.txt** -> has the map of domains to other data like authority, organization, & domain type
- FILE 6: **domaintypes.adhoc.dotgov.txt** -> has data on federal type and organization type
- FILE 7: **organization.adhoc.dotgov.txt** -> has organization name data
- FILE 8: **authority.adhoc.dotgov.txt** -> has authority data which maps to an agency
- FILE 9: **agency.adhoc.dotgov.txt** -> has federal agency data
- FILE 10: **migrationFilepaths.json** -> A JSON which points towards all given filenames. Specified below.

#### STEP 2: obtain JSON file (for file locations)
Add a JSON file called "migrationFilepaths.json" with the following contents (update filenames and directory as needed):
```
{
"directory": "migrationdata",
"contacts_filename": "escrow_contacts.daily.dotgov.GOV.txt",
"domain_contacts_filename": "escrow_domain_contacts.daily.dotgov.GOV.txt",
"domain_statuses_filename": "escrow_domain_statuses.daily.dotgov.GOV.txt",
"domain_escrow_filename": "escrow_domains.daily.dotgov.GOV.txt",
"domain_additional_filename": "domainadditionaldatalink.adhoc.dotgov.txt",
"domain_adhoc_filename": "domaintypes.adhoc.dotgov.txt",
"organization_adhoc_filename": "organization.adhoc.dotgov.txt"
"authority_adhoc_filename": "authority.adhoc.dotgov.txt",
"agency_adhoc_filename": "agency.adhoc.dotgov.txt",
}
```

This JSON file can exist anywhere, but to keep things simple, add it to the same folder as used in step 1. `src/migrationdata`.
Directory specifies the directory that the given `filenames` exist in. For instance, a `contacts_filename` of `test.txt` with a `directory` of `migrationdata` would need to exist under `migrationdata/test.txt`.

Later on, we will bundle this file along with the others into its own folder. Keep it within the `migrationdata/` directory if you are passing data to your sandbox, for simplicity.

We need to run a few scripts to parse these files into our domain tables.
We can do this both locally and in a sandbox.

#### STEP 3: Bundle all relevant data files into an archive
Move all the files specified in Step 1 into a shared folder, and create a tar.gz.

Create a folder on your desktop called `datafiles` and move all of the obtained files into that. Add these files to a tar.gz archive using any method. See (here)[https://stackoverflow.com/questions/53283240/how-to-create-tar-file-with-7zip].

After this is created, move this archive into `src/migrationdata`.


### SECTION 1 - SANDBOX MIGRATION SETUP
Load migration data onto a production or sandbox environment

Expand All @@ -111,8 +150,6 @@ cat {LOCAL_PATH_TO_FILE} | cf ssh {APP_NAME_IN_ENVIRONMENT} -c "cat > /home/vcap

CloudFoundry supports scp as means of transferring data locally to our environment. If you are dealing with a batch of files, try sending across a tar.gz and unpacking that.



##### Login to Cloud.gov

```bash
Expand Down Expand Up @@ -183,7 +220,6 @@ tar -xvf migrationdata/{FILE_NAME}.tar.gz -C migrationdata/ --strip-components=1
*FILE_NAME* - Name of the desired file, ex: exportdata



#### Manual method
If the `cat_files_into_getgov.py` script isn't working, follow these steps instead.

Expand All @@ -203,22 +239,27 @@ cat ../tmp/{filename} > migrationdata/{filename}
In order to run the scripts locally, we need to add the files to a folder under `src/`.
This will allow Docker to mount the files to a container (under `/app`) for our use.

- Add the above files to the `migrationdata/` folder
- Add the same files from section 1 to a TEMPORARY `tmp/` folder under `src/` (do not check this folder into our repo)
- Open a terminal and navigate to `src/`


*You are now ready to run migration scripts.*

## Transition Domains (Part 2) - Running the Migration Scripts

While keeping the same ssh instance open (if you are running on a sandbox), run through the following commands.If you cannot run `manage.py` commands, try running `/tmp/lifecycle/shell` in the ssh instance.

### STEP 1: Load Transition Domains

Run the following command, making sure the file paths point to the right location. This will parse the three given files and load the information into the TransitionDomain table.
Run the following command, making sure the file paths point to the right location. This will parse all given files and load the information into the TransitionDomain table. Make sure you have your migrationFilepaths.json file in the same directory.

(NOTE: If working in cloud.gov, change "/app/tmp" to point to the `migrationdata/` directory and and remove "docker compose run -T app" from the command)
```
##### LOCAL COMMAND
```shell
docker compose run -T app ./manage.py load_transition_domain /app/tmp/escrow_domain_contacts.daily.gov.GOV.txt /app/tmp/escrow_contacts.daily.gov.GOV.txt /app/tmp/escrow_domain_statuses.daily.gov.GOV.txt --debug
docker-compose exec app ./manage.py load_transition_domain migrationFilepaths.json --directory /app/tmp/ --debug --limitParse 10
```
##### SANDBOX COMMAND
```shell
./manage.py load_transition_domain migrationFilepaths.json --debug
```

##### COMMAND LINE ARGUMENTS:
Expand All @@ -232,17 +273,55 @@ Directs the script to load only the first 100 entries into the table. You can a
`--resetTable`
This will delete all the data in transtion_domain. It is helpful if you want to see the entries reload from scratch or for clearing test data.

###### (arguments that override filepaths and directories if needed)

`--directory`
Defines the directory where all data files and the JSON are stored.

`--domain_contacts_filename`
Defines the filename for domain contact information.

`--contacts_filename`
Defines the filename for contact information.

`--domain_statuses_filename`
Defines the filename for domain status information.

`--agency_adhoc_filename`
Defines the filename for agency adhocs.

`--domain_additional_filename`
Defines the filename for additional domain data.

`--domain_escrow_filename`
Defines the filename for creation/expiration domain data.

`--domain_adhoc_filename`
Defines the filename for domain type adhocs.

`--organization_adhoc_filename`
Defines the filename for domain type adhocs.

`--authority_adhoc_filename`
Defines the filename for domain type adhocs.

`--infer_filenames`
Determines if we should infer filenames or not. This setting is not available for use in environments with the flag `settings.DEBUG` set to false, as it is intended for local development only.

### STEP 2: Transfer Transition Domain data into main Domain tables

Now that we've loaded all the data into TransitionDomain, we need to update the main Domain and DomainInvitation tables with this information.
In the same terminal as used in STEP 1, run the command below;
(This will parse the data in TransitionDomain and either create a corresponding Domain object, OR, if a corresponding Domain already exists, it will update that Domain with the incoming status. It will also create DomainInvitation objects for each user associated with the domain):

(NOTE: If working in cloud.gov, and remove "docker compose run -T app" from the command)
##### LOCAL COMMAND
```shell
docker compose run -T app ./manage.py transfer_transition_domains_to_domains --debug
```
##### SANDBOX COMMAND
```shell
./manage.py transfer_transition_domains_to_domains --debug
```

##### COMMAND LINE ARGUMENTS:

Expand All @@ -256,10 +335,14 @@ Directs the script to load only the first 100 entries into the table. You can a

To send invitation emails for every transition domain in the transition domain table, execute the following command:

(NOTE: If working in cloud.gov, and remove "docker compose run -T app" from the command)
##### LOCAL COMMAND
```shell
docker compose run -T app ./manage.py send_domain_invitations -s
```
##### SANDBOX COMMAND
```shell
./manage.py send_domain_invitations -s
```

### STEP 4: Test the results (Run the analyzer script)

Expand All @@ -269,18 +352,27 @@ This script's main function is to scan the transition domain and domain tables f

To analyze our database without running migrations, execute the script without any optional arguments:

(NOTE: If working in cloud.gov, and remove "docker compose run -T app" from the command)
##### LOCAL COMMAND
```shell
docker compose run -T app ./manage.py master_domain_migrations --debug
```
##### SANDBOX COMMAND
```shell
./manage.py master_domain_migrations --debug
```

#### OPTION 2 - RUN MIGRATIONS FEATURE

To run the migrations again (all above migration steps) before analyzing, execute the following command (read the documentation on the terminal arguments below. Everything used by the migration scripts can also be passed into this script and will have the same effects). NOTE: --debug and --prompt allow you to step through the migration process and exit it after each step if you need to. It is recommended that you use these arguments when using the --runMigrations feature:
To run the migrations again (all above migration steps) before analyzing, execute the following command (read the documentation on the terminal arguments below. Everything used by the migration scripts can also be passed into this script and will have the same effects). NOTE: --debug provides detailed logging statements during the migration. It is recommended that you use this argument when using the --runMigrations feature:

(NOTE: If working in cloud.gov, and remove "docker compose run -T app" from the command)
(NOTE: If you named your JSON file something other than "migrationFilepaths.json" (all the way back in the "file setup" section). You will want to utilize the `--migrationJSON` argument in the following commands...)
##### LOCAL COMMAND
```shell
docker compose run -T app ./manage.py master_domain_migrations --migrationDirectory /app/tmp --runMigrations --debug
```
##### SANDBOX COMMAND
```shell
docker compose run -T app ./manage.py master_domain_migrations --runMigrations --debug --prompt
./manage.py master_domain_migrations --runMigrations --debug
```

##### COMMAND LINE ARGUMENTS
Expand All @@ -291,25 +383,18 @@ Runs all scripts (in sequence) for transition domain migrations

`--migrationDirectory`

The location of the files used for load_transition_domain migration script.
The location of both the JSON file and all files needed for migration.
(default is "migrationdata" (This is the sandbox directory))

Example Usage:
*--migrationDirectory /app/tmp*

`--migrationFilenames`
`--migrationJSON`

The filenames used for load_transition_domain migration script.
Must appear *in oprder* and comma-delimited:
default is "escrow_domain_contacts.daily.gov.GOV.txt,escrow_contacts.daily.gov.GOV.txt,escrow_domain_statuses.daily.gov.GOV.txt"
where...
- domain_contacts_filename is the Data file with domain contact information
- contacts_filename is the Data file with contact information
- domain_statuses_filename is the Data file with domain status information
The filename of the JSON that holds all the filepath info needed for migrations.

Example Usage:
*--migrationFilenames domain_contacts_filename.txt,contacts_filename.txt,domain_statuses_filename.txt*

*--migrationJSON migrationFilepaths.json*

`--sep`

Expand All @@ -320,9 +405,10 @@ Delimiter for the migration scripts to correctly parse the given text files.

Activates additional print statements

`--prompt`
`--disablePrompts`

Activates terminal prompts that allows the user to step through each portion of this script.
Disables the terminal prompts that allows the user to step through each portion of this script.
*used to facilitate unit tests. Not recommended for everyday use*

`--limitParse`

Expand Down
2 changes: 2 additions & 0 deletions src/registrar/config/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@
BASE_DIR = path.resolve().parent.parent.parent

# SECURITY WARNING: don't run with debug turned on in production!
# TODO - Investigate the behaviour of this flag. Does not appear
# to function for the IS_PRODUCTION flag.
DEBUG = env_debug

# Controls production specific feature toggles
Expand Down
Loading

0 comments on commit 7baf205

Please sign in to comment.