Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9/18/24 production deploy #1335

Merged
merged 42 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
146f0cc
initial
Aug 15, 2024
3fde9c5
fix flake 8
Aug 15, 2024
6aed01e
more flake 8
Aug 15, 2024
c0ab7c8
more exc_info
Aug 15, 2024
3ac2db4
more flake 8
Aug 15, 2024
01fff30
merge from main
Aug 22, 2024
4301f7a
Added new ADR for BackstopJS
alexjanousekGSA Aug 27, 2024
e487b50
Apply suggestions from code review
alexjanousekGSA Aug 28, 2024
db9f2b8
merge from main
Aug 28, 2024
b9e400d
fix flake 8
Aug 30, 2024
c6f2226
add more debug to get e2e tests working
Sep 4, 2024
099e77b
debug
Sep 4, 2024
4253121
fix flake8
Sep 4, 2024
1288e72
pip audit fail
Sep 4, 2024
f60dce5
Merge pull request #1312 from GSA/notify-admin-1157
ccostino Sep 4, 2024
93a1024
Merge pull request #1297 from GSA/0009-backstopjs-adr
ccostino Sep 4, 2024
74d8bfd
more debug
Sep 4, 2024
cf246ab
more debug
Sep 4, 2024
33568d4
change TypeError test
Sep 4, 2024
21dd065
Merge pull request #1313 from GSA/notify-admin-1157
ccostino Sep 4, 2024
17b41ee
Change BackstopJS ADR (0009) to accepted and add more details
ccostino Sep 4, 2024
671cad6
Merge pull request #1314 from GSA/update-adr-0009
xlorepdarkhelm Sep 5, 2024
befe202
try another way
Sep 5, 2024
fefdd29
initial
Sep 6, 2024
b34012c
log s3 objects we want to delete
Sep 6, 2024
452f762
Merge pull request #1316 from GSA/notify-admin-1157
ccostino Sep 9, 2024
e236ed6
Merge pull request #1318 from GSA/notify-api-1303
terrazoon Sep 9, 2024
4e9e014
add test
Sep 10, 2024
1b98f45
Merge pull request #1329 from GSA/fix_test_coverage
ccostino Sep 10, 2024
51ac7bd
Add production deployment steps to documentation
ccostino Sep 9, 2024
64a8586
code review feedback and merge from main
Sep 11, 2024
7fe00e6
code review feedback and merge from main
Sep 11, 2024
a154d57
Merge pull request #1272 from GSA/notify-api-1271
terrazoon Sep 11, 2024
5af1102
Merge pull request #1327 from GSA/add-prod-deploy-instructions
ccostino Sep 12, 2024
97cecd8
add initial phone validation ability
Sep 12, 2024
7dd396b
whoops
Sep 12, 2024
3737865
add try/except
Sep 12, 2024
eb736ca
Merge pull request #1331 from GSA/notify-api-1306
terrazoon Sep 13, 2024
641b01d
some cleanup
Sep 13, 2024
5db03fe
Merge pull request #1332 from GSA/notify-api-1306
ccostino Sep 18, 2024
71417a1
fix load test
Sep 25, 2024
9cd1b8d
Merge pull request #1338 from GSA/argh_load_test
terrazoon Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .ds.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@
"filename": "tests/app/aws/test_s3.py",
"hashed_secret": "67a74306b06d0c01624fe0d0249a570f4d093747",
"is_verified": false,
"line_number": 25,
"line_number": 27,
"is_secret": false
}
],
Expand Down Expand Up @@ -384,5 +384,5 @@
}
]
},
"generated_at": "2024-08-22T18:00:24Z"
"generated_at": "2024-09-10T18:12:39Z"
}
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -502,15 +502,16 @@ instructions above for more details.
- [For the reviewer](.docs/all.md#for-the-reviewer)
- [For the author](.docs/all.md#for-the-author)
- [Run Book](./docs/all.md#run-book)
- [ Alerts, Notifications, Monitoring](./docs/all.md#-alerts-notifications-monitoring)
- [ Restaging Apps](./docs/all.md#-restaging-apps)
- [ Smoke-testing the App](./docs/all.md#-smoke-testing-the-app)
- [ Configuration Management](./docs/all.md#-configuration-management)
- [ DNS Changes](./docs/all.md#-dns-changes)
- [Alerts, Notifications, Monitoring](./docs/all.md#-alerts-notifications-monitoring)
- [Restaging Apps](./docs/all.md#-restaging-apps)
- [Deploying to Production](./docs/all.md#-deploying-to-production)
- [Smoke-testing the App](./docs/all.md#-smoke-testing-the-app)
- [Configuration Management](./docs/all.md#-configuration-management)
- [DNS Changes](./docs/all.md#-dns-changes)
- [Exporting test results for compliance monitoring](./docs/all.md#exporting-test-results-for-compliance-monitoring)
- [ Known Gotchas](./docs/all.md#-known-gotchas)
- [ User Account Management](./docs/all.md#-user-account-management)
- [ SMS Phone Number Management](./docs/all.md#-sms-phone-number-management)
- [Known Gotchas](./docs/all.md#-known-gotchas)
- [User Account Management](./docs/all.md#-user-account-management)
- [SMS Phone Number Management](./docs/all.md#-sms-phone-number-management)
- [Data Storage Policies \& Procedures](./docs/all.md#data-storage-policies--procedures)
- [Potential PII Locations](./docs/all.md#potential-pii-locations)
- [Data Retention Policy](./docs/all.md#data-retention-policy)
Expand Down
11 changes: 8 additions & 3 deletions app/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from app.clients.document_download import DocumentDownloadClient
from app.clients.email.aws_ses import AwsSesClient
from app.clients.email.aws_ses_stub import AwsSesStubClient
from app.clients.pinpoint.aws_pinpoint import AwsPinpointClient
from app.clients.sms.aws_sns import AwsSnsClient
from notifications_utils import logging, request_helper
from notifications_utils.clients.encryption.encryption_client import Encryption
Expand Down Expand Up @@ -68,6 +69,7 @@ def apply_driver_hacks(self, app, info, options):
aws_ses_stub_client = AwsSesStubClient()
aws_sns_client = AwsSnsClient()
aws_cloudwatch_client = AwsCloudwatchClient()
aws_pinpoint_client = AwsPinpointClient()
encryption = Encryption()
zendesk_client = ZendeskClient()
redis_store = RedisClient()
Expand Down Expand Up @@ -101,6 +103,7 @@ def create_app(application):
aws_ses_client.init_app()
aws_ses_stub_client.init_app(stub_url=application.config["SES_STUB_URL"])
aws_cloudwatch_client.init_app(application)
aws_pinpoint_client.init_app(application)
# If a stub url is provided for SES, then use the stub client rather than the real SES boto client
email_clients = (
[aws_ses_stub_client]
Expand Down Expand Up @@ -265,7 +268,7 @@ def after_request(response):

@app.errorhandler(Exception)
def exception(error):
app.logger.exception(error)
app.logger.exception(f"Handling error: {error}")
# error.code is set for our exception types.
msg = getattr(error, "message", str(error))
code = getattr(error, "code", 500)
Expand Down Expand Up @@ -353,7 +356,9 @@ def checkout(dbapi_connection, connection_record, connection_proxy): # noqa
"url_rule": "unknown",
}
except Exception:
current_app.logger.exception("Exception caught for checkout event.")
current_app.logger.exception(
"Exception caught for checkout event.",
)

@event.listens_for(db.engine, "checkin")
def checkin(dbapi_connection, connection_record): # noqa
Expand Down Expand Up @@ -403,7 +408,7 @@ def on_failure(self, exc, task_id, args, kwargs, einfo): # noqa
"Celery task {task_name} (queue: {queue_name}) failed".format(
task_name=self.name,
queue_name=self.queue_name,
)
),
)

def __call__(self, *args, **kwargs):
Expand Down
48 changes: 42 additions & 6 deletions app/authentication/auth.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
import uuid

from flask import current_app, g, request
Expand Down Expand Up @@ -62,17 +63,25 @@ def requires_admin_auth():


def requires_internal_auth(expected_client_id):
if expected_client_id not in current_app.config.get("INTERNAL_CLIENT_API_KEYS"):
raise TypeError("Unknown client_id for internal auth")

# Looks like we are hitting this for some reason
# expected_client_id looks like ADMIN_CLIENT_USERNAME on the admin side, and
# INTERNAL_CLIENT_API_KEYS is a dict
keys = current_app.config.get("INTERNAL_CLIENT_API_KEYS")
if keys.get(expected_client_id) is None:
err_msg = "Unknown client_id for internal auth"
current_app.logger.error(err_msg)
raise TypeError(err_msg)

request_helper.check_proxy_header_before_request()
auth_token = _get_auth_token(request)
client_id = _get_token_issuer(auth_token)

if client_id != expected_client_id:
current_app.logger.info("client_id: %s", client_id)
current_app.logger.info("expected_client_id: %s", expected_client_id)
raise AuthError("Unauthorized: not allowed to perform this action", 401)
err_msg = "Unauthorized: not allowed to perform this action"
current_app.logger.error(err_msg)
raise AuthError(err_msg, 401)

api_keys = [
InternalApiKey(client_id, secret)
Expand Down Expand Up @@ -125,28 +134,50 @@ def requires_auth():


def _decode_jwt_token(auth_token, api_keys, service_id=None):
# Temporary expedient to get e2e tests working. If we are in
# the development or staging environments, just return the first
# api key.
if os.getenv("NOTIFY_ENVIRONMENT") in ["development", "staging"]:
for api_key in api_keys:
return api_key

for api_key in api_keys:
try:
decode_jwt_token(auth_token, api_key.secret)
except TypeError:
err_msg = "Invalid token: type error"
current_app.logger.exception(err_msg)
raise AuthError(
"Invalid token: type error",
403,
service_id=service_id,
api_key_id=api_key.id,
)
except TokenExpiredError:
if not current_app.config.get("ALLOW_EXPIRED_API_TOKEN", False):
err_msg = (
"Error: Your system clock must be accurate to within 30 seconds"
)
current_app.logger.exception(err_msg)
raise AuthError(
err_msg, 403, service_id=service_id, api_key_id=api_key.id
)
except TokenAlgorithmError:
err_msg = "Invalid token: algorithm used is not HS256"
current_app.logger.exception(err_msg)
raise AuthError(err_msg, 403, service_id=service_id, api_key_id=api_key.id)
except TokenDecodeError:
# we attempted to validate the token but it failed meaning it was not signed using this api key.
# Let's try the next one
# TODO: Change this so it doesn't also catch `TokenIssuerError` or `TokenIssuedAtError` exceptions (which
# are children of `TokenDecodeError`) as these should cause an auth error immediately rather than
# continue on to check the next API key
current_app.logger.exception(
"TokenDecodeError. Couldn't decode auth token for given api key"
)
continue
except TokenError:
current_app.logger.exception("TokenError")
# General error when trying to decode and validate the token
raise AuthError(
GENERAL_TOKEN_ERROR_MESSAGE,
Expand All @@ -156,8 +187,10 @@ def _decode_jwt_token(auth_token, api_keys, service_id=None):
)

if api_key.expiry_date:
err_msg = "Invalid token: API key revoked"
current_app.logger.error(err_msg, exc_info=True)
raise AuthError(
"Invalid token: API key revoked",
err_msg,
403,
service_id=service_id,
api_key_id=api_key.id,
Expand All @@ -166,7 +199,10 @@ def _decode_jwt_token(auth_token, api_keys, service_id=None):
return api_key
else:
# service has API keys, but none matching the one the user provided
raise AuthError("Invalid token: API key not found", 403, service_id=service_id)
# if we get here, we probably hit TokenDecodeErrors earlier
err_msg = "Invalid token: API key not found"
current_app.logger.error(err_msg, exc_info=True)
raise AuthError(err_msg, 403, service_id=service_id)


def _get_auth_token(req):
Expand Down
69 changes: 49 additions & 20 deletions app/aws/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,42 @@ def list_s3_objects():
)
else:
break
except Exception as e:
current_app.logger.error(
f"An error occurred while regenerating cache #notify-admin-1200 {e}"
except Exception:
current_app.logger.exception(
"An error occurred while regenerating cache #notify-admin-1200",
)


def get_bucket_name():
return current_app.config["CSV_UPLOAD_BUCKET"]["bucket"]


def cleanup_old_s3_objects():

bucket_name = get_bucket_name()

s3_client = get_s3_client()
# Our reports only support 7 days, but can be scheduled 3 days in advance
# Use 14 day for the v1.0 version of this behavior
time_limit = aware_utcnow() - datetime.timedelta(days=14)
try:
response = s3_client.list_objects_v2(Bucket=bucket_name)
while True:
for obj in response.get("Contents", []):
if obj["LastModified"] <= time_limit:
current_app.logger.info(
f"#delete-old-s3-objects Wanting to delete: {obj['LastModified']} {obj['Key']}"
)
if "NextContinuationToken" in response:
response = s3_client.list_objects_v2(
Bucket=bucket_name,
ContinuationToken=response["NextContinuationToken"],
)
else:
break
except Exception:
current_app.logger.exception(
"#delete-old-s3-objects An error occurred while cleaning up old s3 objects",
)


Expand Down Expand Up @@ -106,9 +139,9 @@ def get_s3_files():
)
if "phone number" in object.lower():
JOBS[job_id] = object
except LookupError as le:
except LookupError:
# perhaps our key is not formatted as we expected. If so skip it.
current_app.logger.error(f"LookupError {le} #notify-admin-1200")
current_app.logger.exception("LookupError #notify-admin-1200")

current_app.logger.info(
f"JOBS cache length after regen: {len(JOBS)} #notify-admin-1200"
Expand All @@ -130,14 +163,14 @@ def download_from_s3(
result = s3.download_file(bucket_name, s3_key, local_filename)
current_app.logger.info(f"File downloaded successfully to {local_filename}")
except botocore.exceptions.NoCredentialsError as nce:
current_app.logger.error("Credentials not found")
current_app.logger.exception("Credentials not found")
raise Exception(nce)
except botocore.exceptions.PartialCredentialsError as pce:
current_app.logger.error("Incomplete credentials provided")
current_app.logger.exception("Incomplete credentials provided")
raise Exception(pce)
except Exception as e:
current_app.logger.error(f"An error occurred {e}")
text = f"EXCEPTION {e} local_filename {local_filename}"
except Exception:
current_app.logger.exception("An error occurred")
text = f"EXCEPTION local_filename {local_filename}"
raise Exception(text)
return result

Expand All @@ -148,8 +181,8 @@ def get_s3_object(bucket_name, file_location, access_key, secret_key, region):
try:
return s3.Object(bucket_name, file_location)
except botocore.exceptions.ClientError:
current_app.logger.error(
f"Can't retrieve S3 Object from {file_location}", exc_info=True
current_app.logger.exception(
f"Can't retrieve S3 Object from {file_location}",
)


Expand Down Expand Up @@ -223,32 +256,28 @@ def get_job_from_s3(service_id, job_id):
"RequestTimeout",
"SlowDown",
]:
current_app.logger.error(
current_app.logger.exception(
f"Retrying job fetch {FILE_LOCATION_STRUCTURE.format(service_id, job_id)} retry_count={retries}",
exc_info=True,
)
retries += 1
sleep_time = backoff_factor * (2**retries) # Exponential backoff
time.sleep(sleep_time)
continue
else:
# Typically this is "NoSuchKey"
current_app.logger.error(
current_app.logger.exception(
f"Failed to get job {FILE_LOCATION_STRUCTURE.format(service_id, job_id)}",
exc_info=True,
)
return None

except Exception:
current_app.logger.error(
current_app.logger.exception(
f"Failed to get job {FILE_LOCATION_STRUCTURE.format(service_id, job_id)} retry_count={retries}",
exc_info=True,
)
return None

current_app.logger.error(
f"Never retrieved job {FILE_LOCATION_STRUCTURE.format(service_id, job_id)}",
exc_info=True,
)
return None

Expand Down Expand Up @@ -287,7 +316,7 @@ def extract_phones(job):
if phone_index >= len(row):
phones[job_row] = "Unavailable"
current_app.logger.error(
"Corrupt csv file, missing columns or possibly a byte order mark in the file"
"Corrupt csv file, missing columns or possibly a byte order mark in the file",
)

else:
Expand Down
4 changes: 2 additions & 2 deletions app/celery/nightly_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ def cleanup_unfinished_jobs():
try:
acceptable_finish_time = job.processing_started + timedelta(minutes=5)
except TypeError:
current_app.logger.error(
f"Job ID {job.id} processing_started is {job.processing_started}."
current_app.logger.exception(
f"Job ID {job.id} processing_started is {job.processing_started}.",
)
raise
if now > acceptable_finish_time:
Expand Down
8 changes: 4 additions & 4 deletions app/celery/process_ses_receipts_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ def process_ses_results(self, response):
except Retry:
raise

except Exception as e:
current_app.logger.exception("Error processing SES results: {}".format(type(e)))
except Exception:
current_app.logger.exception("Error processing SES results")
self.retry(queue=QueueNames.RETRY)


Expand Down Expand Up @@ -204,9 +204,9 @@ def handle_complaint(ses_message):
)
try:
reference = ses_message["mail"]["messageId"]
except KeyError as e:
except KeyError:
current_app.logger.exception(
f"Complaint from SES failed to get reference from message with error: {e}"
"Complaint from SES failed to get reference from message"
)
return
notification = dao_get_notification_history_by_reference(reference)
Expand Down
9 changes: 3 additions & 6 deletions app/celery/provider_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,11 +144,10 @@ def deliver_sms(self, notification_id):
if isinstance(e, SmsClientResponseException):
current_app.logger.warning(
"SMS notification delivery for id: {} failed".format(notification_id),
exc_info=True,
)
else:
current_app.logger.exception(
"SMS notification delivery for id: {} failed".format(notification_id)
"SMS notification delivery for id: {} failed".format(notification_id),
)

try:
Expand Down Expand Up @@ -186,10 +185,8 @@ def deliver_email(self, notification_id):

notification.personalisation = json.loads(personalisation)
send_to_providers.send_email_to_provider(notification)
except EmailClientNonRetryableException as e:
current_app.logger.exception(
f"Email notification {notification_id} failed: {e}"
)
except EmailClientNonRetryableException:
current_app.logger.exception(f"Email notification {notification_id} failed")
update_notification_status_by_id(notification_id, "technical-failure")
except Exception as e:
try:
Expand Down
Loading
Loading