-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDS minor version upgrades fail when upgrading primary and replica together #22107
Comments
Relates: #20514. |
@steveteahan Thanks for raising this issue 👏 .
interacts with the Terraform dependency graph. resource "aws_db_instance" "dev" {
...
}
resource "aws_db_instance" "dev-replica" {
...
replicate_source_db = aws_db_instance.dev.id
} => Updating You could try using Terraform resource targeting although that is not recommended as a long-term solution. This workflow does not have a simple solution in Terraform today. See hashicorp/terraform#4149 for a discussion of the thinking on additional configuration change modes. |
Thank you for the quick response, @ewbankkit! I do see now that I should have used I believed from the logs that Terraform was applying the upgrade to the replica first, but taking a closer look I missed the modify call for the primary. Thank you for the detailed explanation and the additional resources. I'll take a more careful look through everything and decide on a path forward. |
+1,
|
We have implemented a workaround using # Get current region and IAM role first
data "aws_region" "current" {}
data "aws_caller_identity" "current" {}
data "aws_iam_session_context" "current" {
arn = data.aws_caller_identity.current.arn
}
locals {
db_instance_id = "xyz"
postgres_version = "14.4"
}
# RDS requires that read replicas engine versions must be updated
# before updating primary DB instance engine version. Since it's not really possible
# with vanilla Terraform (DB replicas depend on the primary, so the primary
# is always changed first), we implement a "hack":
# - This null_resource is "created/updated" first and will update
# read replicas engine version
# - Primary DB instance resource depends on this null resource to make sure
# "update-db-replicas.py" script runs before that.
resource "null_resource" "update_replicas_before_primary" {
triggers = {
engine_version = local.postgres_version
}
provisioner "local-exec" {
command = "${path.module}/update-db-replicas.py"
environment = {
DB_INSTANCE_ID = local.db_instance_id
ENGINE_VERSION = local.postgres_version
AWS_ROLE_ARN = data.aws_iam_session_context.current.issuer_arn
AWS_REGION = data.aws_region.current.name
}
}
}
resource "aws_db_instance" "primary" {
identifier = local.db_instance_id
engine = "postgres"
engine_version = null_resource.update_replicas_before_primary.triggers.engine_version
# ...snip...
} And here's #!/usr/bin/env python3
"""
This script update engine version for read replicas of a one particular DB instance.
It's a workaround for Terraform AWS provider behavior which doesn't allow updating
read replicas engine versions separately since version 4.
The script is supposed to be run only by Terraform and strictly before updating engine version
of the primary DB instance.
"""
import boto3
import os
import random
import logging
logging.basicConfig(level=logging.INFO,
format='[%(asctime)s] %(levelname)s %(message)s',
handlers=[logging.StreamHandler()])
logger = logging.getLogger()
db_instance_id = os.environ['DB_INSTANCE_ID']
role_arn = os.environ['AWS_ROLE_ARN']
engine_version = os.environ['ENGINE_VERSION']
region = os.environ['AWS_REGION']
# Assume IAM role
sts_client = boto3.client('sts', region_name=region)
assumed_role = sts_client.assume_role(
RoleArn=role_arn,
RoleSessionName=f"terraform-update-db-replicas-{random.randint(1, 10000)}"
)
credentials = assumed_role['Credentials']
# Find the source DB instance
rds = boto3.client(
'rds',
region_name=region,
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken']
)
response = rds.describe_db_instances(Filters=[{'Name': 'db-instance-id', 'Values': [db_instance_id]}])
if len(response['DBInstances']) == 0:
# If there is no source instance, we're probably running the script before the DB instance is created.
logging.warn("Source DB instance %s not found, ignoring" % db_instance_id)
sys.exit(0)
db_instance = response['DBInstances'][0]
for replica_id in db_instance['ReadReplicaDBInstanceIdentifiers']:
logging.info("Updating DB replica %s engine version to %s" % (replica_id, engine_version))
waiter = rds.get_waiter('db_instance_available')
waiter.wait(DBInstanceIdentifier=replica_id)
rds.modify_db_instance(
DBInstanceIdentifier=replica_id,
EngineVersion=engine_version,
ApplyImmediately=False
)
logging.info("Done") It's probably not the most clean solution (e.g. the script only works with IAM-roles, not with static AWS credentials), but it works for us. |
This issue is still open, though I see it has been discussed elsewhere (#24887) though it seems that the resolution to that ticket was to reenable versioning of a read-replica. Is the accepted way to handle this question, the "two applies" method? |
Community Note
Terraform CLI and Terraform AWS Provider Version
Affected Resource(s)
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
Debug Output
DBUpgradeDependencyFailure
Panic Output
Expected Behavior
The replica should be upgraded first and then the primary should be upgraded, without a failure.
Actual Behavior
The replica is upgraded first, but the primary is never upgraded because a
DBUpgradeDependencyFailure
error is thrown. A secondapply
must be executed to complete the upgrade.It is worth mentioning that in a related issue, during the creation of the resources I also saw:
I don't want to start down the path of considering separate issues in a single bug report, but I do wonder if this is related to the same root cause. Perhaps this is a matter of eventual consistency from the AWS API and the provider could wait longer, or retry after a short period of time to see if there is a different answer?
Steps to Reproduce
terraform apply
terraform apply -var postgres_engine_version=13.3
Important Factoids
Nothing atypical for this account.
References
The text was updated successfully, but these errors were encountered: