Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-refreshed resources #2976

Closed
apparentlymart opened this issue Aug 12, 2015 · 8 comments
Closed

Pre-refreshed resources #2976

apparentlymart opened this issue Aug 12, 2015 · 8 comments

Comments

@apparentlymart
Copy link
Contributor

There are several resource types that don't actually create something but rather either just record some local state inside Terraform (logical providers) or retrieve data about a pre-existing object. The implementations of these usually simply do nothing except populating the id inside the Create function, and then do all of their interesting work inside Read, and possibly Update.

Some resources of this type include:

  • terraform_remote_state
  • consul_keys (though this one is weird because it can also write, depending on the config)
  • template_file
  • atlas_artifact
  • The various TLS-related logical resources I proposed in TLS utility provider #2778

Meanwhile, Terraform allows resource attributes to be interpolated into provider configurations, which is very handy when parts of the config are set dynamically from outside the configuration. For example:

resource "consul_keys" "deploy_config" {
    key {
        name = "deploy_region"
        path = "deploy_config/region"
    }
}
provider "aws" {
    region = "${consul_keys.deploy_config.var.deploy_region}"
}

This capability doesn't work quite right because provider configurations need to be processed in order to run terraform plan, but on the first run the consul_keys resource hasn't been created yet and so its state is empty. This works (after a fashion) if one adds the consul_keys resource first, runs terraform apply, and then adds the provider configuration using the result. But this sort of two-step configuration dance is troublesome when using deployment orchestration tooling.

This limitation could be addressed by allowing particular resources to opt-in to being pre-refreshed, which means that terraform refresh (and all of the implicit paths into the same functionality) treats them as immediately existing.

In other words, terraform refresh will call the Read implementation on such a resource even if there's no record of it in the state already. The Read implementation can then use the resource parameters from the configuration to populate any computed attributes just like it would for a pre-existing resource, making sure that the state is updated before we run terraform plan, and thus that the providers will get the values they are expecting.

Of course, the providers are needed for terraform refresh too, so Terraform still needs to take care to order the individual resource refresh operations so that e.g. in my above example the AWS provider is not instantiated until the consul_keys resource has been refreshed.

With this concept in place, I think it would then be reasonable to make it a fatal error for a provider to depend on a non-pre-refreshed resource, giving the user better feedback about why this is not a sensible configuration.

Although my primary motivation is addressing the "interpolated provider config" use-case, this new concept has some other benefits:

  • This class of resource would not result in a confusing "create" diff appearing in the plan the first time they are added. If the configuration happens to differ from the current resource state it would still show up as an "update" diff, and if it matches then there would be no diff at all. This provides a more accurate diff to the user.
  • For resource types where "create" and "update" are not significantly-different operations e.g. because a unique id is part of the configuration, such as aws_s3_bucket_object, these resources could be set as "pre-refreshed" and then the Read implementation could check if a same-named object already exists and record it into the state, thus causing it to show up as an update rather than a create in the diff. Again, this provides a more accurate diff to the user, and allows the user to make a better decision about whether to apply the plan.

Overall I think this concept better models a class of resources and helps Terraform to "do the right thing" in more cases, by allowing the provider to give it more information.

@apparentlymart apparentlymart mentioned this issue Aug 13, 2015
15 tasks
@phinze
Copy link
Contributor

phinze commented Aug 13, 2015

Great write up, @apparentlymart!

This is a concept I've been kicking around the back of my head in various forms, and I think your model may just be the way to go.

I'm going to let this percolate a bit and I'll follow up.

@bitglue
Copy link

bitglue commented Oct 23, 2015

Seems there's another detail necessary for this to work: having a provider configuration depend on a terraform_remote_state or similar quite frequently makes dependency cycles on terraform destroy. For example, I just got:

Error creating plan: 1 error(s) occurred:

* Cycle: aws_security_group.www_elb (destroy), aws_security_group.db (destroy), aws_security_group.www (destroy), provider.fleet, fleet_unit.www (destroy), aws_db_instance.main (destroy), terraform_remote_state.core (destroy), terraform_remote_state.core

It would seem the condition for this is that not only a provider depends on the remote state, but also some other resource that was created. But that's a pretty common scenario.

But since pre-refreshed resources shouldn't need to participate in the create process, they shouldn't need to participate in destroy, either. Thus breaking the cycle.

@apparentlymart
Copy link
Contributor Author

Having played with my prototype in #3060 some more, my conception of this issue has changed slightly.

The concept of loading data to be used in the configuration seems distinct from the concept of creating something where another thing of the same name may already exist. Therefore I think there are two separate ideas here:

  • I think there's room for a new concept, distinct from resource, that belongs to a provider but only supports a "read" operation. I'd like to call this concept data_source. Data sources are provided by resource providers but have a different lifecycle where they only support a "read" operation, and that read operation is carried out only during the terraform refresh action and other cases where that's implied. (that is, not when -refresh=false is applied to plan or to the bare apply).
  • Certain resource types are able to compute an instance id using just the resource configuration, since it's just the value of one or more of the configuration properties. Terraform should be able to check for the existence of conflicting existing resources during plan so that the naming conflict error can be reported at plan time rather than at apply time. (This is different than my earlier proposal where I called for Terraform to effectively "adopt" the existing resource and update it to match the config. I've found in experimentation that this violates the principle of least surprise, and I'd rather it just be an error that gets caught earlier on.)

The first of these issues is a more pertinent concern to me right now, since my Terraform configs are getting a lot of these "data sources" as we use both terraform_remote_state and consul_keys extensively to allow particular configurations to be deployed in many different environments, with the data sources providing settings that are different in each case.

When I have more time to focus on this I intended to break this issue into two separate issues and then prototype the data_source concept in isolation, putting aside the "resources whose ids must be unique" problem for the moment. In the mean time I just wanted to note here my intent in case other folks come across this and wonder what's going on.

@bitglue
Copy link

bitglue commented Oct 28, 2015

I would point out that we already have a mechanism for "loading data to be used in the configuration": terraform.tfvars. So perhaps if that were made to be more expressive it would be a natural place to do something like loading configuration from remote state on S3, or whatever.

This is basically how I've had to work around this issue anyway: I have a bunch of hairy scripts and rules in make which pull the outputs from one configuration and stuff them into the terraform.tfvars of the next configuration. It works pretty well except for the hair.

@mtougeron
Copy link
Contributor

This functionality would be extremely useful for us for the scenario I described in #3657 (closed).

@sl1pm4t
Copy link
Contributor

sl1pm4t commented Nov 3, 2015

+1

@apparentlymart
Copy link
Contributor Author

As I previously mentioned, I'm splitting this into two separate proposals that each aim to solve half of the problem space presented in this ticket:

@ghost
Copy link

ghost commented Apr 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants