Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

Terraform: Correctly handle null/unknown conversion to apitypes.BoolOption #991

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

strideynet
Copy link
Contributor

@strideynet strideynet commented Dec 22, 2023

Closes gravitational/teleport#35947
Closes gravitational/teleport#42432
Closes gravitational/teleport#42429

It looks like if the Terraform value was null, it previously ended up sending false rather than null to the Teleport API. This leads to things erroneously being set to false and triggering warnings like:

│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to module.auth.teleport_provision_token.token,
│ provider
│ "module.auth.provider[\"terraform.releases.teleport.dev/gravitational/teleport\"]"
│ produced an unexpected new value: .spec.gitlab.allow[0].ref_protected: was
│ null, but now cty.False.
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

I tested this change with:

resource "teleport_provision_token" "token" {
  metadata = {
    name        = "gitlab-test-terraform"
    description = ""
  }

  spec = {
    roles       = ["Bot"]
    join_method = "gitlab"
    bot_name    = "gitlab-bot"
    gitlab = {
      domain = "bug.report"
      allow = [
        {
          project_path = "my-repo"
        }
      ]
    }
  }
}

Followed by:

resource "teleport_provision_token" "token" {
  metadata = {
    name        = "gitlab-test-terraform"
    description = ""
  }

  spec = {
    roles       = ["Bot"]
    join_method = "gitlab"
    bot_name    = "gitlab-bot"
    gitlab = {
      domain = "bug.report"
      allow = [
        {
          environment_protected = true
          ref_protected = false
          project_path = "my-repo"
        }
      ]
    }
  }
}

I figure that this is potentially a breaking fix ? Surely some folks have run into this with other BoolOptions on other resources and deploying this fix will lead to values being changed when upgrading to this version ? I'd appreciate someone who understands this chiming in here.

@benarent
Copy link
Contributor

This is how I create bots, this looks like it would break.. but TBH, we break compatibility with terraform all the time.

resource "teleport_provision_token" "github_bot" {
  spec = {
    roles       = ["Bot","Kube","Node"]
    join_method = "github"
    # tctl bots add github-bot --token=github-bot --roles=access --logins=root
    # Now added below. This is the token that is used to authenticate the bot. 
    bot_name    = "github-actions-bot"
    github: {
      "allow": [
        {
          "repository_owner": "asteroid-earth"
        }
      ]
    }
  }

  metadata = {
    name        = "github-actions-bot-token"
    expires     = "3000-12-12T12:05:23Z"
    description = "testing the new provider sensitive values"
  }

}

resource "teleport_bot" "github_actions" {
  name = "github-actions-bot"
  token_id = "github-actions-bot-token"
  roles = ["access"]
}

@strideynet
Copy link
Contributor Author

This does result in a breaking change - values that are unset would have previously been set to false. Following this PR, they will be set to null or to a configured default (which may be true)

I'll discuss with some folks and figure out if we want to move forward with this breaking change.

@strideynet
Copy link
Contributor Author

Conversation with Zac suggests proceeding with this as a breaking change in V15, we'll wait to merge this before v15 is cut.

@hugoShaka
Copy link
Contributor

I agree that we must fix this, TF cannot pick random defaults different than Teleport ones.

This change will affect some fields in Roles, which is potentially dangerous for our customers. I think we should not backport it in v14, list all affected fields in the changelog and PR, and put a big flashing warning in v15 changelog. TF users will need to fix their code before updating.

@strideynet
Copy link
Contributor Author

This change will affect some fields in Roles

Yup - from my testing the options fields are the ones most plagued by this since most of them are meant to default to true

@tigrato
Copy link
Contributor

tigrato commented Jan 8, 2024

I agree that we must fix this, TF cannot pick random defaults different than Teleport ones.

This change will affect some fields in Roles, which is potentially dangerous for our customers. I think we should not backport it in v14, list all affected fields in the changelog and PR, and put a big flashing warning in v15 changelog. TF users will need to fix their code before updating.

there is no such thing as backport to v14 in plugins 😢

We exclusively roll out version 15.x.x starting from the release cut-off point. However, this implies that in the event of a bug fix required by a customer using version 14, they would have to employ the Terraform provider v15 with their v14 cluster and discern how the modification is applicable to their specific use case.

@hugoShaka
Copy link
Contributor

hugoShaka commented Jan 8, 2024

there is no such thing as backport to v14 in plugins 😢

I was talking about merging this PR now, which will send the change to v14.3.1. We must hold off this PR until the v15 freeze.

Even if this is a bugfix, we cannot ship this PR in v14 because this is a major breaking change that will change the result of everyone's IaC. Users will need to review the changes before upgrading the TF provider to v15.

@marcoandredinis
Copy link
Contributor

Is there a way to do a two phase deprecation here?

Option A:
Default to the new logic
Keep the old one behind an Env Var or a Provider Setting, but emit a diagnostic warning saying that in it's going to be removed in v16

Option B:
Default to the old logic, but emit a warning that users must set Env Var X or Provider Setting X to be able to upgrade to v16.
Users could set the Env Var or Provider Setting in order to get the new logic.

For new users, Option A is ideal: no warning, correct behavior and no special Env Var/Provider Setting.
For current users, the first time they run terraform apply it will show a lot of changes and users must actively search our docs/changelog/slack so they can see what they need to do (ie setting the Env Var/Provider Setting) to have a clean upgrade. Or accept all the changes.

If using B, users get a warning explaining (or linking to a doc page) what is changing.
They end up setting the Env Var/Provider Setting.
When upgrading to v16, we default to the new logic and ignore the Env Var/Provider Setting

@tigrato
Copy link
Contributor

tigrato commented Jan 9, 2024

there is no such thing as backport to v14 in plugins 😢

I was talking about merging this PR now, which will send the change to v14.3.1. We must hold off this PR until the v15 freeze.

Even if this is a bugfix, we cannot ship this PR in v14 because this is a major breaking change that will change the result of everyone's IaC. Users will need to review the changes before upgrading the TF provider to v15.

Oh got it but we still have a problem.

Until quite recently
, our docs incorrectly PIN the version of the Terraform provider. It always selected the latest available version of the provider instead of pinning it to the major version. There are several customers that were affected by it when trying to configure machineID stuff, so we should assume most users still use the incorrect version pinning.

Given that most APIs are kept unchanged between 12-15, I would assume that if the terraform picks the latest available version (15) against a cluster v14 or v13 the roles changes will be incorporated which will cause behavior breaking changes for v14...

Can we gate this new behavior behind a feature flag that is only enabled if the teleport server is >= 15.0.0? I think that will be the simplest and safest solution to ship this code otherwise we will create so many problems.

@strideynet
Copy link
Contributor Author

strideynet commented Jan 9, 2024

I don't think we can move forward with this fix, if you have a pre-fix resource, and update to the fixed provider, Terraform will not detect any pending changes. If you then modify some other field on the resource, the null fields will silently be changed from false to their default. This sounds dangerous.

As an aside this is caused by another bug in the provider today where Terraform will not detect drift from Null -> Non-null bool values - gravitational/teleport#42427

@marcoandredinis marcoandredinis removed their request for review May 23, 2024 15:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants