-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot apply CRD and a CR using it in the same plan/apply due to SSA #1367
Comments
I came to ask the same thing, as I ran into this trying to migrate a bunch of things from Helm charts to straight Terraform. There are a lot of similar issues on the old repo: hashicorp/terraform-provider-kubernetes-alpha#247 This is going to make adoption of
My current workaround is to place CRD usage into a local Helm chart and use that through |
So as far as I understand, it wouldn't even work when separating out these 20k lines of yaml from cert manager into individual files so they could be used as manifest. Because the moment we run apply, it does this server-side check and will report that the CRD does not exist. If we had at least an exclude flag, doing 2 times apply wouldnt be so bad, but without that the first apply will the run in the second apply again if we target only that crd the first time around. I am using a Makefile now. Its not that nice but at least I can spin everything up with one command. |
For anyone having the same issue: I wanted to install cert-manager using Helm and deploy a I'm now using the kubectl provider for the ClusterIssuer Resource and combined with a |
I had hoped that the It might be a slightly different reason than the original hashicorp/terraform-provider-kubernetes-alpha#72 but the end result is the same - you can't use the |
This is a deep issue, because any time you want to reference a CRD, you need to split your terraform project into something that can be applied separately. eg using Traefik with CRDs means I need a plan to install Traefik first, then another to set up all my services. I don't know what the solution would be here, I'd guess either terraform needs the ability to run a multi-stage plan, or have much better support for splitting a project into pieces. |
I have the same problem when deploying a cert manager. two resources: helm and Cluster Issuer, and WTF???
We need fix! Very need. |
@alexsomesan sorry for pinging, but have you seen this issue? |
This issue alone renders the manifest resource entirely useless IMHO. There is no good way - for now - to apply CRD's an isolated way, so one can e.g. deploy CRDs of all the helm charts like traefik, chunkydata, minio-operator ... and a lot more (since CRDs are now used everywhere). One has to either pull all the CRDs from the projects helm/kustomize charts manually somekind of CRD-only resources (cumbersome and fragile task, esp. maintaining it), then apply those in an extra, isolated terraform run and then start using the manifest resource to create CRs. It really reminds me of the 'typescript typings' issue back in the days. Now we probably will see all the projects grind out the CRDs in custom, isolated helm charts/kustomize repositories. Then (maybe) a common standard will follow so that CRDs can be 'looked up' by kubectl automagically (and installed) .. and before that terraform kubernetes manifest will rely on this 2-step process of installing CRDs first. This is by no means the fault of terraform / kubernetes manifest here, it is what happens when you introduce strong typings late into game (looks at kubernetes). ConfigMaps (no typing) was the way to go, then we got CRDs and now we have the issue of 'typings first, then declaration of instances'. The same issue/mistake was done with typescript / typings - or maybe it's just the nature those huge things evolve - step by step, with a lot of painful intermediate steps in between. That said, considering the long road until those CRDs can be installed an isolated way, this resource is basically a playing ground IMHO. I'am not sure it makes sense to maintain this huge (and nice) little thing here until then - but of course that is not up to me to judge. We would love to use manifests so much! But i guess, we need to wait for the helm/kustomize/kubernetes projects to make up their minds about CRDs and 'typings' first. |
One possible solution in case of argocd (for those who end up here through google) is to use terraform's target option. This is how I am doing atm with 4 stage eks cluster bootstrapping
basically, on the first run you install only argocd helm chart, and then everything else is being deployed through argo manifests (which support unknown crd if they are in the helm chart). Since this solution uses --target, it's not necessary split the project && state. |
Because of hashicorp/terraform-provider-kubernetes#1367 First deployment of VolumeSnapshotClass fail because the CRDs does not exist yet. Fixes #807 Signed-off-by: Kevin Lefevre <[email protected]>
Just going to add the Kustomization provider doesn't have this problem either: https://registry.terraform.io/providers/kbst/kustomization/latest/docs You can deploy the CRDs using Helm or whatever and then the CRs themselves by using a Kustomization that depends on the Helm release. Would also like to be able to do this directly with a kubernetes_manifest. Would be so much cleaner. |
|
I can't believe I'm going to have to add a totally separate provider just to do manifests. |
Ok, so this is currently the 3rd most upvoted issue of this provider: https://github.com/hashicorp/terraform-provider-kubernetes/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc The main perceived problem for this issue seems to be Terraform's "limitation" that everything needs to be planned in advance. However, technically, this doesn't seem to be a blocker for this issue. This should be proven by the fact that other providers do reportedly already enable this common use case (e.g., kbst/kustomization and gavinbunney/kubectl). The actual limitation here seems to be that this providers wants to validate CRs during the planning phase (i.e., a design choice by this provider). This validation is obviously only possible if the CRD already exists. Therefore, I propose the following solution: We add a What do you think of this proposal/idea? Does this seem like a good idea or did I miss anything relevant (like new issues this validation bypass could create, techical limitations, etc.)? It would be great if we could find a way to finally resolve this issue/limitation as this seems to be quite a major limitation of this provider and something that should be avoidable. Feedback is welcome :) |
Hi @primeos-work! Thanks for taking the time to analyse this problem and make such detailed suggestions! The reason why the provider needs access to the CRD during planning is not for validation purposes (in fact, validation is still a missing feature in the manifest resource), but rather because the provider needs to establish the structure (aka schema) of the resource so that Terraform can correctly persist the state for it. Terraform's state storage and the provider protocol are strongly typed. The entire structure of a resource, including the types of all its attributes, needs to be known to Terraform from the creation of the resource and Terraform will assume it stays constant once it's been defined. The only way we can fully describe a CR resource to Terraform at creation time is by looking up it's schema details in its parent CRD. Unless we do this, all the structural information we would have about the CR is the set of attributes the user included in the configuration. This is not a full description of the resource, but rather a subset of it. The minute the user then updates the resource adding a new, previously unspecified attribute value, Terraform will fail and report inconsistent state structure. This will lead to corrupt state storage and will make any update operations on the CR resource impossible. Because of the above reasons, we cannot avoid requesting the CRD structure during planning. |
Hey @alexsomesan, thanks a lot for your fast and detailed response with the corrections!
Does this step necessarily have to happen during the planning stage or would it, e.g., be possible to use only the set of attributes the user included in the configuration for the plan, if the CRD lookup fails, and then perform the CRD lookup during the apply stage (when creating the CR)? That way, the final state would be correct and only the plan should change. For correctness, the apply could even fail if the CR already exists during the apply stage as it was manually/externally created since the planning (IIRC this is even already the current behavior). So from a theoretical standpoint this approach should mostly keep the correctness of applying the plan (the only issue I see is if the CRD changes between the plan and apply stages - I'll try to test the current behavior later - this might be acceptable though because the CRD is versioned and there shouldn't occur any breaking changes anyway). Would this be something that could be implemented or are there additional technical challenges/restrictions (e.g., the mentioned provider protocol - I'm not sure if the creation you mentioned refers to the creation of the full description/specification/structure during the planning or the creation of the actually object during the application of the plan)? |
@primeos-work, unfortunately it's still not that simple (as you realised yourself). Let's first consider what is the point of having a "plan" step in Terraform. It is to present the user with an accurate preview of exactly what is going to be performed during "apply". This is to give the user a chance to vet the proposed changes and confirm or abort before anything is touched. Once a "plan" is confirmed by the user, "apply" will only perform exactly what was "promised" during the plan, not more not less. That means we establish a trust contract between Terraform and the user. This is a major differentiator of Terraform against other tools that might at first glance seem "simpler". I can appreciate that maybe this value isn't immediately recognised by new Terraform users, but that is a whole other conversation. Because the plan is a "promise" that should not be broken during apply, Terraform enforces consistency checks on values and types between the plan and the apply result. One of these checks is for schema types of resources and attributes to match between plan and apply. Missing or extra attributes between plan and apply will fail this check and in turn the whole operation fails. This is why there is no other option but to fully construct and "promise" the type of the resource during planning. Doing it later will be rejected by Terraform's consistency checks. Terraform isn't trying to be unreasonably pedantic here. There is a very solid point about keeping types consistent like this. Consider some resource depending on a In conclusion, if the planning phase is of little value to one's use-case then there are alternative providers that decided to opt out of offering these guarantees (a few mentioned above in this thread). These providers just treat the resources as blocks of unstructured text (from Terraform's POV) that they just throw at the API. While they do provide a seemingly simpler first-use experience, they will soon fall short when it comes to combining the resources in more complex configurations. There is no point in us creating yet another provider with the same limitations. Instead, this provider tries to offer as much of Terraform's value propositions as possible and that includes properly following the plan-apply workflow. The solution for the problem we are discussing here is best solved in Terraform itself, in order to preserve all the guarantees Terraform is designed to offer. We are having ongoing conversations with the Terraform team about approaches to it and they are experimenting with various solutions. Once they settle on an approach, we will be discussing it with the wider audience. |
Thanks again for all of your insights @alexsomesan! If I may share my thoughts (from a user's perspective) on Terraform's planning and state features / design decisions (a bit out of scope here but I feel like it's relevant - then again it shouldn't be something new): I do see the value of Terraform’s planning and it does indeed provide many great advantages (that I like/appreciate as a user). However, it obviously also comes with many drawbacks that can be quite a PITA (if I may say so), especially when first working with Terraform. But as you already mentioned it can be a huge benefit in the long run. Overall, I’m quite split on this design decision. I do find it nice/useful but its strictness also often seems to limit practicability (or at least comfort / ease of use; in addition, it would be nice if diverged state could be imported semi-automatically / detected better, etc.). Terraform’s strict and complete state tracking especially seems a bit out of place or at least redundant in cases like K8s where the complete, declarative state is already available (sure, it still offers advantages / additional features but I'd say in such cases it provides fewer advantages and more drawbacks than in other cases). This will also result in multiple issues/annoyances if the K8s cluster’s state diverges from what Terraform thinks it still is but this divergence should obviously be avoided by the user in the first place. And even with all of the nice planning features you can still do things like using the -> Anyway, what I’m trying to say is that it might be nice if Terraform could be a bit less strict in some places (in the sense of “perfect is the enemy of good”) but that’s obviously also a very dangerous thing to do and needs to be considered very carefully (to avoid making things worse / opening Pandora's box). Regarding the motivation of the current behavior/design (dependencies, variable interpolation, promises, etc.):
In that case it shouldn’t (and I'd say that's fine). IMO the normal behavior of
So with an implementation as described above this problem should be avoided (the plan wouldn't be complete but this is already the case as Terraform cannot know everything in advance anyway) and only some features of
I agree with all of that. Tbh it’d be nice though to have a more official one (in terms of reviews/verifications, maintenance / community support, etc.) – but that is of course out-of-scope for this issue (and a challenge in general).
That sounds very interesting! Thanks a lot for looking into this :) I’m excited to learn more about this so, if possible, please do share a link here once that discussion is public (unless it'll be discussed in this issue anyway). I hope my thoughts/suggestions here (from a users’ perspective) do provide some value and that some parts of it might be considered. I wish I could be of more help but unfortunately my knowledge of Terraform, the code/implementation, design decisions, data structures, protocols, etc. is still by far too limited for more concrete and reasonable ideas. |
In my case, I'm using the helm provider to install fluxcd and trying to make a kubernetes_manifest with a GitRepository object. Simply adding a |
any update on this issue? |
The kubectl provider has been abandoned and there's an existing issue where non-namespaced resources fail to get created. This prevents one of the primary workarounds to the issue documented here. Is it possible to get an update on if this is planning on being addressed? |
@jmgilman check https://github.com/alekc/terraform-provider-kubectl, I haven't used it, but someone mentioned it in an issue of the other repository |
Are there any open PRs that adress this issue? It looks like the validation should treat missing CRD as if the resource itself was missing, as in k8s its not possible to have a resource created with a missing CRD (iirc). |
Any update on this at all? It means we're having to put everything in seperate stacks. |
this sent me back to pulumi 😭 |
@alexsomesan could the |
But this defeats the point as you'd basically never be able to use the non functional version. IMO most of the "errors" should be folded into manifest not existing:
Edit: I of course agree that this is best solved inside Terraform itself, but if wishes were fishes. |
Really, after years we still have this issue? Why this wouldn't work?
|
So meanwhile the only way to resolve the issue is to comment out the |
That, or split roll out in different stages (first card install, then
manifests using it), or to use alekc/kubectl provider as alternative
…On Tue, 25 Jun 2024, 15:07 Gerry Agbobada, ***@***.***> wrote:
So meanwhile the only way to resolve the issue is to comment out the
kubernetes_manifest that rely on CRDs, and then apply, and then uncomment
the manifest and reapply? Not sure I’m understanding this correctly
—
Reply to this email directly, view it on GitHub
<#1367 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACJ5V7XLGBL7PRMLYBS5KDZJF2QFAVCNFSM5CB67J7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJYHEYDKNRZGI3Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
.com>
|
Not so far off from the 3 year mark when I opened this issue. Even back then we were opposed to deploying k8s resources via terraform but deploying flux with it to help bootstrap the cluster seemed like a good idea, this was before flux offered a provider to do the bootstrapping, we migrated to that once it was available IIRC. I suppose the lesson here is don't manage your k8s resources in TF unless you absolutely have to as, and this is probably not a surprise to most folks, how k8s manages its resources isn't very compatible with TF. |
It has nothing to do with k8s and all to do with how the terraform
kubernetes provider has been written. By the choice, it prepares the plan
for all entries right at the beginning, thus requiring CRD to be present.
Some other providers are using dynamic binding, so they can overcome this
particular problem.
…On Tue, 25 Jun 2024 at 17:10, Stephen Schlie ***@***.***> wrote:
Not so far off from the 3 year mark when I opened this issue. Even back
then we were opposed to deploying k8s resources via terraform but deploying
flux with it to help bootstrap the cluster seemed like a good idea, this
was before flux offered a provider to do the bootstrapping, we migrated to
that once it was available IIRC.
I suppose the lesson here is don't manage your k8s resources in TF unless
you absolutely have to as, and this is probably not a surprise to most
folks, how k8s manages its resources isn't very compatible with TF.
—
Reply to this email directly, view it on GitHub
<#1367 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACJ5V2EEF6SCJ6LYGWBK4TZJGJAFAVCNFSM5CB67J7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJYHEZTOMRQGIZA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
.com>
|
Reflecting on the previous comment here, I propose that Terraform could benefit from supporting multiple plan + apply cycles. Specifically, if Terraform detects that a Custom Resource (CR) cannot be planned because its CRD isn’t yet installed, it could defer the CR to a subsequent cycle, provided that it can still make progress in the current cycle. During the apply phase, Terraform would only execute the actions from the completed plan and then indicate if additional cycles are needed. This approach would be more efficient than the current workarounds, which involve separating resources into different sets of Terraform files, or temporarily commenting out parts of the code or using conditional logic to manage resource dependencies. Such enhancements would streamline the development process, allowing Terraform to handle resource ordering more intuitively and reducing the manual effort required to ensure resources are applied in the correct order. Update:
|
The best workaround I've found is to package the custom resources YAMLs into a small Helm chart, bundle it with Terraform module code, and then install it using the helm_release resource: resource "helm_release" "custom_resources" {
name = "custom_resources"
chart = "${path.module}/custom_resources"
depends_on = [
helm_release.crds
]
} |
I did exactly the same thing for the cert-manager clusterissuer.
cert-manager.tf: locals {
solvers_ingress_class_name = "ingress-nginx"
}
resource "kubernetes_namespace_v1" "cert_manager" {
metadata {
name = "cert-manager"
}
}
resource "helm_release" "cert_manager" {
name = kubernetes_namespace_v1.cert_manager.metadata.0.name
repository = "https://charts.jetstack.io"
chart = kubernetes_namespace_v1.cert_manager.metadata.0.name
version = var.cert_manager_helm_version
namespace = kubernetes_namespace_v1.cert_manager.metadata.0.name
max_history = 1
set {
name = "installCRDs"
value = "true"
}
}
resource "helm_release" "cert_manager_clusterissuer" {
name = "cert-manager-clusterissuer"
chart = "${path.module}/charts/cert-manager-clusterissuer"
max_history = 1
set {
name = "acme_email"
value = var.acme_email
}
set {
name = "solvers_ingress_class_name"
value = local.solvers_ingress_class_name
}
depends_on = [
helm_release.cert_manager
]
} Chart.yaml: apiVersion: v2
name: cert-manager-clusterissuer
version: 0.1.0 cluster_issuer_prod.yaml: apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: {{ .Values.acme_email }}
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
ingressClassName: {{ .Values.solvers_ingress_class_name }} cluster_issuer_prod.yaml: apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: {{ .Values.acme_email }}
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
ingressClassName: {{ .Values.solvers_ingress_class_name }} |
This issue is still open for more than 3 years with no viable solution and as most already said here this renders the provider useless for half of its use cases. If time is the issue I can invest some. Is there any available maintainer for which a solution could be discussed with? |
Terraform version, Kubernetes provider version and Kubernetes version
Terraform configuration
There is a bit going on here, but essentially this is the output from the terraform_flux_provder, and through some HCL abuse I'm massaging it into the right format.
Question
Essentially I am using the
kubernetes_manifest
resource, and am trying to:Upon doing this I am greeted with an error during the plan because the CRDs have not been created and SSA is not happy about it:
Is there a way to tell the provider that things are ok, and not try to plan this? It seems like a bug or required feature before this comes out of experimental, as asking for someone to first apply the CRDs, then add and apply the CRs doesn't seem like a valid long term solution.
The text was updated successfully, but these errors were encountered: