add HTTPCallbackAddress, needed when running within a kubernetes cluster #268

lknite · 2024-11-27T03:31:54Z

Describe the change you are making here!

When running within a kubernetes workflow/pipelines, the http server used by the vm to communicate back completion cannot be reached directly. Instead, with kubernetes you 'expose' a pod via a LoadBalancer ip or ingress. The exposed ip is available but must be passed in somehow in order for it to be used.

This pull request adds a variable HTTPCallbackAddress, which if provided, can be used by a provider to pass the needed ip to the vm.

Another pull request has been submitted on the packer-plugin-proxmox as a first candidate once this change is in place (hashicorp/packer-plugin-proxmox#305), which will resolve hashicorp/packer-plugin-proxmox#304 .

There must also be template source code for those who are looking to create a provider. Such templated code should be updated also, eventually. I will look to do that later on.

Please include tests. Check out existent tests in the code for example.

The only test would be to see if the variable ends up in the config.hcl2spec.go . I suspect there already exist tests which test mapstructure-to-hcl2.

If your PR resolves any open issue(s), please indicate them like this so they will be closed when your PR is merged:

Closes #264

hashicorp-cla-app · 2024-11-27T03:32:08Z

All committers have signed the CLA.

lbajolet-hashicorp · 2025-01-20T22:07:29Z

Hey @lknite,

Regarding this PR, I have a few things to discuss before we can merge this if you don't mind. The code is very simple, but before we introduce this to the config/step here, I want to make sure we pick the right name/behaviour, as once it's in there, we cannot remove it if we've missed something.

Regarding the name: I don't like the Callback part here, the idea of that configuration variable is to override all the other methods to populate HTTPIP essentially, is that correct? It's not a callback URI much more than any of the other potentially inferred addresses can be, therefore I would get rid of this in the variable's name to make it clearer. Maybe consider something like http_server_static_host, which imho is clearer.
In terms of implementation, I am not certain the SDK is where we want to introduce this, as I'm not sure this is something we want to be broadly available. For the plugins that support connecting to an HTTP server that Packer exposes, generally the expectation is that they run on the same machine. While network configuration can be something complex, generally it's something that the plugin dealing with the HTTP server is able to infer from its topology and how the VM is started for the build. Having an extra global configuration knob for those attributes feels a bit early, especially since there's already a bunch of magic with the existing attributes there.

In the current state, I would not accept this today, as I don't have enough confidence that there isn't a weird edge-case that could throw-off the proposed implementation. I would think that, since Proxmox is the plugin you are interested in, you can probably implement this in a self-contained manner in that plugin for now (by adding an attribute to this plugin only for now), and later we can discuss whether or not to make this a global property.

Thanks for the PR, and sorry we won't merge it right now on the SDK at least.

lknite · 2025-01-20T23:18:50Z

Calling that variable "static_host" is pretty much the opposite of how kubernetes does things, trying to say in the most polite way possible that there maybe not be enough kubernetes knowledge on both sides of this one. I used 'callback' to differentiate the variable from the others, that it was more of a callback in the way oauth does in its workflow when its done.

In all my years of working with open source this was perhaps my proudest contribution. Spending whole days ... weeks of effort, trying to get image builder working within the kubernetes environment, talking with others who had tried and gave up, eager to have me succeed, looking forward to the update so they could use it, and maintainers who weren't really interested, I persisted and kept digging in until I found it, 3 tiny changes in 3 repos. Changes which enabled the use via kubernetes without interfering with anything else. The only step remaining, get buy-in and support from three repos simultaneously.

I can tell there has been a miscommunication, because you believe this goal can be accomplished leaving your repo out of the equation, and though I haven't looked at this code for about 90 days, I'm pretty sure it wasn't really possible without all three. It may be possible, but not with two tiny changes. Without this tiny change, the other two become major work arounds.

I'm not sure I have the energy to drive this any further if there's a total lack of interest... but there really should be, on your side, to want this to work within a kubernetes... one would think. In any case, its your project, if there's no interest, so be it.

If there is some interest, we can address them one at a time.

The concern there might be a weird edge cases is difficult for me to address. Maybe if you were to look at the pull requests in the other two repos it might help to make the over-all picture more clear.

If you let this one go I will to. If you reply back I can help to clarify things. This pull request is the type that if it isn't brought in with this effort, someone else will make the effort again later, or the tool will be replaced by something which works within the kubernetes environment. So ... in the end I suppose it doesn't matter. I can run the image-builder app via a docker container on a linux vm, so I have a work around. I was hoping to redesign the image-builder project's testing platform to run via kubernetes, but can't do so without the ability to run within kubernetes. I don't architect things outside of kubernetes anymore. I'm sure you are busy, its ok whatever you decide. I certainly wouldn't want to encourage a pull request if its a bad idea.

lknite · 2025-01-20T23:28:30Z

I did have one additional thought, it may be that rearchitecting things to always use the HTTPCallbackAddress strategy and rip out the existing strategy, because it is becoming "aged", expecting 0.0.0.0 to be used in a world of cloud solutions and all, might be a better way to go ... but I didn't want to do all that cause it'd be kind of a massive overhaul. The strategy in this pull request could be a seed towards that other direction though. Along that line, I think Callback is more descriptive, with the only other choice being perhaps Webhook. Alternatively, I could see some other solution being used in the future such as NATS for messaging, or a redis queue. Once things are spun up in kubernetes, additional micro-services become easily available.

I just googled 'callback vs webhook', maybe webhook is better?

Additional note: You mentioned that having a global option, via the sdk feels a bit early, kubernetes was released June 2014... but I know what you mean. I don't want to pressure you in any way, but here to interact with if doing so would be helpful.

lknite · 2025-01-20T23:48:43Z

I mentioned wanting to redesign to the image-builder testing process, just so that's more than words, here's the issue tracking that potential idea: kubernetes-sigs/image-builder#1585 (comment)

lknite · 2025-01-23T22:14:43Z

I'll post to the image-builder channel to see if there might be some other folks there interested in this pull request and able to add to the conversation.

lknite · 2025-01-23T22:31:42Z

You mentioned the new solution replacing the old, let me clarify, the existing solution still needs to be in place to bind to 0.0.0.0 , its just that the ip the vm uses to reach back to that server must be specified within a kubernetes environment, using the ip of the system where the 0.0.0.0 bind occured won't work ... so everything which currently exists needs to stay its just that we need to be able to specify the callback, which will either be an ip address or it can be an FQDN.

The existing solution just uses the ip of the system where the bind occurs, somehow we have to check if an ip was specified and use that for the callback ... without changing the bind 0.0.0.0 . If the code is inspected it looks like you can specify an ip, but doing so replaces the 0.0.0.0 with the ip specified, that's not what we want.

Someone may want to use the existing solution to change the 0.0.0.0 to something they specify, we wouldn't want to break that if they are using it, so I created a new environment variable. In two other git repos I added an "if" statement that checks the if callback is specified, then that's passed to the vm to communicate back when its finished, otherwise it has no effect (if not specified).

Once this is implemented and the other two git repos are implemented, and we can see it working, I would then over-time begin visiting the other image-builder related projects one at a time adding the two if statements so that they can work via kubernetes as well.

lknite · 2025-01-23T23:27:16Z

Regarding the name, I hesitate on 'static' though agree we should decide on something together as once it is in there it might stick around for awhile, some thoughts:

When something is initially deployed via kubernetes its is unreachable by the world, it must first be exposed. This is generally done through 1 of 2 methods. A LoadBalancerIP is requested, and provided automatically via kubernetes. We won't know what it is. Via cloud providers, each loadbalancerip costs money, and so ... the 2nd option is often used. A single loadbalancerip is used, and then a reverse-proxy is used to expose everything via FQDN. All the FQDNs resolve to the same single loadbalancerip.

The second option requires additional setup but is very common, cert-manger to generate certificates automatically, external-dns to register the FQDN with DNS, and a loadbalancer such as nginx to handle the reverse proxy. My preference would be to use an FQDN rather than a LoadBalancerIP, but folks might prefer to use a LoadBalancerIP.

If a LoadBalancerIP is used, it'll pretty much always be the same, but that shouldn't be expected. It's possible to force it to be the same but that's kind of falling back on "before kubernetes existed" and before infrastructure as code. Kubernetes lets you just kind of define here's the idea of how I want this deployed, then it fills in the gaps, generating the certs, getting an ip and registering that in dns ... you just deploy, then access via the FQDN without neededing to know anything in between.

Long story short, I would expect a loadbalancerip to change and would probably use an fqdn which overtime might resolve to different ips. If FQDN wasn't an option I'd suggest a variable name like "loadbalancerip", but since FQDN is an option, maybe something more specific to its purpose like "VM_STATUS_UPDATE_TEMPORARY_MESSAGING_SERVER", or more practically maybe "VmMsgServer"? I feel like the vm is "calling home" to report its ready, another reason I initially suggested Callback.

lknite · 2025-01-24T00:06:07Z

If I could have someone with the packer team make the change, without my help, I would think the best solution would be to "adjust" things so the localip is never used in any scenario automatically, that a "callbackurl" or whatever it is called, would always have to be set and used, and by default ... if no one sets it, then it would be set as the local ip ... but in all cases the 'callbackurl' would be used. ... with this pull request as the next best thing and least impactful.

lknite · 2025-01-24T17:01:41Z

I've been talking with a couple image-builder maintainers today and they are now fully aware of the problem, that it effects multiple image providers (they've been seeing the same issue reported now and then), and what it is that the pull requests solve. Now they are thinking about the issue. Essentially going through the same process I did. Let's see if they come up with the same solution or an alternative. At least a couple providers have created a custom work around. They are revisiting security etc ... I've asked them to update here with a note that they are aware of the issue so its not just me. Let's give them some time ...

add HTTPCallbackAddress, needed when running within a kubernetes cluster

7f2ae73

lknite requested a review from a team as a code owner November 27, 2024 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add HTTPCallbackAddress, needed when running within a kubernetes cluster #268

add HTTPCallbackAddress, needed when running within a kubernetes cluster #268

lknite commented Nov 27, 2024 •

edited

Loading

hashicorp-cla-app bot commented Nov 27, 2024 •

edited

Loading

lbajolet-hashicorp commented Jan 20, 2025

lknite commented Jan 20, 2025

lknite commented Jan 20, 2025 •

edited

Loading

lknite commented Jan 20, 2025

lknite commented Jan 23, 2025 •

edited

Loading

lknite commented Jan 23, 2025 •

edited

Loading

lknite commented Jan 23, 2025

lknite commented Jan 24, 2025

lknite commented Jan 24, 2025

add HTTPCallbackAddress, needed when running within a kubernetes cluster #268

Are you sure you want to change the base?

add HTTPCallbackAddress, needed when running within a kubernetes cluster #268

Conversation

lknite commented Nov 27, 2024 • edited Loading

hashicorp-cla-app bot commented Nov 27, 2024 • edited Loading

lbajolet-hashicorp commented Jan 20, 2025

lknite commented Jan 20, 2025

lknite commented Jan 20, 2025 • edited Loading

lknite commented Jan 20, 2025

lknite commented Jan 23, 2025 • edited Loading

lknite commented Jan 23, 2025 • edited Loading

lknite commented Jan 23, 2025

lknite commented Jan 24, 2025

lknite commented Jan 24, 2025

lknite commented Nov 27, 2024 •

edited

Loading

hashicorp-cla-app bot commented Nov 27, 2024 •

edited

Loading

lknite commented Jan 20, 2025 •

edited

Loading

lknite commented Jan 23, 2025 •

edited

Loading

lknite commented Jan 23, 2025 •

edited

Loading