RDS minor version upgrades fail when upgrading primary and replica together #22107

steveteahan · 2021-12-08T11:13:22Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform -v
Terraform v1.0.11
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.68.0

Affected Resource(s)

aws_db_instance

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

provider "aws" {
  profile = "lab"
  region  = "us-east-2"
}

variable "postgres_engine_version" {
  type    = string
  default = "13.2"
}

resource "aws_db_instance" "dev" {
  allocated_storage       = 10
  engine                  = "postgres"
  engine_version          = var.postgres_engine_version
  identifier              = "dev"
  instance_class          = "db.t3.micro"
  username                = "postgres"
  password                = "password"
  skip_final_snapshot     = true
  backup_retention_period = 1
  apply_immediately       = true
}

resource "aws_db_instance" "dev-replica" {
  allocated_storage   = 10
  engine              = "postgres"
  engine_version      = var.postgres_engine_version
  identifier          = "dev-replica"
  instance_class      = "db.t3.micro"
  replicate_source_db = "dev"
  apply_immediately   = true
}

Debug Output

First run - failure due to DBUpgradeDependencyFailure
Second run - success

Panic Output

Expected Behavior

The replica should be upgraded first and then the primary should be upgraded, without a failure.

Actual Behavior

The replica is upgraded first, but the primary is never upgraded because a DBUpgradeDependencyFailure error is thrown. A second apply must be executed to complete the upgrade.

It is worth mentioning that in a related issue, during the creation of the resources I also saw:

╷
│ Error: Error creating DB Instance: DBInstanceNotFound: The source instance could not be found: dev
│ 	status code: 404, request id: 8ba0b8a6-f7dd-42c5-b611-e2e93bb4597f
│ 
│   with aws_db_instance.dev-replica,
│   on main.tf line 24, in resource "aws_db_instance" "dev-replica":
│   24: resource "aws_db_instance" "dev-replica" {
│ 
╵

I don't want to start down the path of considering separate issues in a single bug report, but I do wonder if this is related to the same root cause. Perhaps this is a matter of eventual consistency from the AWS API and the provider could wait longer, or retry after a short period of time to see if there is a different answer?

Steps to Reproduce

terraform apply
terraform apply -var postgres_engine_version=13.3

Important Factoids

Nothing atypical for this account.

References

Unable to upgrade RDS master and replica database #775

The text was updated successfully, but these errors were encountered:

ewbankkit · 2021-12-08T14:35:58Z

Relates: #20514.
Relates: hashicorp/terraform#4149.

ewbankkit · 2021-12-08T15:09:47Z

@steveteahan Thanks for raising this issue 👏 .
This is an interesting one in how your workflow

Create Primary, create Replica - so Replica depends on Primary creation
Update Replica version, update Primary version - so Primary depends on Replica update

interacts with the Terraform dependency graph.
In configuration the correct way for the resource create/delete dependency to be captured is

resource "aws_db_instance" "dev" {
  ...
}

resource "aws_db_instance" "dev-replica" {
  ...
  replicate_source_db = aws_db_instance.dev.id
}

=> dev created before dev-replica, dev-replica deleted before dev

Updating dev's engine_version requires dev-replica to be updated first, but a plain terraform apply will update dev first.

You could try using Terraform resource targeting although that is not recommended as a long-term solution.
Alternatively you could split dev and dev-replica into separate modules and apply the update to dev-replica's module first. This is not ideal as discussed above, there IS a dependency between the two that you would like to capture in code.

This workflow does not have a simple solution in Terraform today. See hashicorp/terraform#4149 for a discussion of the thinking on additional configuration change modes.

steveteahan · 2021-12-08T15:35:21Z

Thank you for the quick response, @ewbankkit! I do see now that I should have used replicate_source_db = aws_db_instance.dev.id, but it sounds like this change isn't expected to fix this scenario (at least the upgrade).

I believed from the logs that Terraform was applying the upgrade to the replica first, but taking a closer look I missed the modify call for the primary. Thank you for the detailed explanation and the additional resources. I'll take a more careful look through everything and decide on a path forward.

SushanSuresh · 2022-11-10T12:51:07Z

+1,
Facing same issue while trying to update replica

Can't specify the engine version for replica
Can't upgrade master without applying the replica

take-five · 2022-12-23T10:05:44Z

We have implemented a workaround using null_resource with local-exec provisioner. The idea is to run a script that will update replicas engine version before the primary.

# Get current region and IAM role first
data "aws_region" "current" {}

data "aws_caller_identity" "current" {}

data "aws_iam_session_context" "current" {
  arn = data.aws_caller_identity.current.arn
}

locals {
  db_instance_id    = "xyz"
  postgres_version = "14.4"
}

# RDS requires that read replicas engine versions must be updated
# before updating primary DB instance engine version. Since it's not really possible
# with vanilla Terraform (DB replicas depend on the primary, so the primary
# is always changed first), we implement a "hack":
# - This null_resource is "created/updated" first and will update
#   read replicas engine version
# - Primary DB instance resource depends on this null resource to make sure
#   "update-db-replicas.py" script runs before that.
resource "null_resource" "update_replicas_before_primary" {
  triggers = {
    engine_version = local.postgres_version
  }

  provisioner "local-exec" {
    command = "${path.module}/update-db-replicas.py"

    environment = {
      DB_INSTANCE_ID = local.db_instance_id
      ENGINE_VERSION = local.postgres_version
      AWS_ROLE_ARN   = data.aws_iam_session_context.current.issuer_arn
      AWS_REGION     = data.aws_region.current.name
    }
  }
}

resource "aws_db_instance" "primary" {
  identifier = local.db_instance_id

  engine         = "postgres"
  engine_version = null_resource.update_replicas_before_primary.triggers.engine_version

  # ...snip...
}

And here's update-db-replicas.py script

#!/usr/bin/env python3

"""
This script update engine version for read replicas of a one particular DB instance.
It's a workaround for Terraform AWS provider behavior which doesn't allow updating
read replicas engine versions separately since version 4.

The script is supposed to be run only by Terraform and strictly before updating engine version
of the primary DB instance.
"""

import boto3
import os
import random
import logging

logging.basicConfig(level=logging.INFO,
                    format='[%(asctime)s] %(levelname)s %(message)s',
                    handlers=[logging.StreamHandler()])

logger = logging.getLogger()

db_instance_id = os.environ['DB_INSTANCE_ID']
role_arn = os.environ['AWS_ROLE_ARN']
engine_version = os.environ['ENGINE_VERSION']
region = os.environ['AWS_REGION']

# Assume IAM role
sts_client = boto3.client('sts', region_name=region)
assumed_role = sts_client.assume_role(
  RoleArn=role_arn,
  RoleSessionName=f"terraform-update-db-replicas-{random.randint(1, 10000)}"
)
credentials = assumed_role['Credentials']

# Find the source DB instance
rds = boto3.client(
    'rds',
    region_name=region,
    aws_access_key_id=credentials['AccessKeyId'],
    aws_secret_access_key=credentials['SecretAccessKey'],
    aws_session_token=credentials['SessionToken']
)

response = rds.describe_db_instances(Filters=[{'Name': 'db-instance-id', 'Values': [db_instance_id]}])

if len(response['DBInstances']) == 0:
    # If there is no source instance, we're probably running the script before the DB instance is created.
    logging.warn("Source DB instance %s not found, ignoring" % db_instance_id)
    sys.exit(0)

db_instance = response['DBInstances'][0]

for replica_id in db_instance['ReadReplicaDBInstanceIdentifiers']:
    logging.info("Updating DB replica %s engine version to %s" % (replica_id, engine_version))

    waiter = rds.get_waiter('db_instance_available')
    waiter.wait(DBInstanceIdentifier=replica_id)

    rds.modify_db_instance(
        DBInstanceIdentifier=replica_id,
        EngineVersion=engine_version,
        ApplyImmediately=False
    )

logging.info("Done")

It's probably not the most clean solution (e.g. the script only works with IAM-roles, not with static AWS credentials), but it works for us.

rlee-arx · 2024-04-29T13:59:32Z

This issue is still open, though I see it has been discussed elsewhere (#24887) though it seems that the resolution to that ticket was to reenable versioning of a read-replica. Is the accepted way to handle this question, the "two applies" method?

github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/rds Issues and PRs that pertain to the rds service. labels Dec 8, 2021

ewbankkit added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Dec 8, 2021

ewbankkit added upstream-terraform Addresses functionality related to the Terraform core binary. and removed waiting-response Maintainers are waiting on response from community or contributor. labels Dec 8, 2021

ewbankkit mentioned this issue Dec 8, 2021

Terraform gives error while trying to upgrade PostgreSQL primary and read-replica in same module #20514

Closed

w0jnar mentioned this issue May 19, 2022

No longer able to do minor version upgrades on aws_db_instance with read replicas #24887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDS minor version upgrades fail when upgrading primary and replica together #22107

RDS minor version upgrades fail when upgrading primary and replica together #22107

steveteahan commented Dec 8, 2021 •

edited

Loading

ewbankkit commented Dec 8, 2021 •

edited

Loading

ewbankkit commented Dec 8, 2021

steveteahan commented Dec 8, 2021

SushanSuresh commented Nov 10, 2022 •

edited

Loading

take-five commented Dec 23, 2022

rlee-arx commented Apr 29, 2024

RDS minor version upgrades fail when upgrading primary and replica together #22107

RDS minor version upgrades fail when upgrading primary and replica together #22107

Comments

steveteahan commented Dec 8, 2021 • edited Loading

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

ewbankkit commented Dec 8, 2021 • edited Loading

ewbankkit commented Dec 8, 2021

steveteahan commented Dec 8, 2021

SushanSuresh commented Nov 10, 2022 • edited Loading

take-five commented Dec 23, 2022

rlee-arx commented Apr 29, 2024

steveteahan commented Dec 8, 2021 •

edited

Loading

ewbankkit commented Dec 8, 2021 •

edited

Loading

SushanSuresh commented Nov 10, 2022 •

edited

Loading