You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm always frustrated when I need to make stack updates and I have to incur downtime to the agent pool when doing so. When I deploy updates, all agents in that group are terminated and replaced. This leads to jobs failing with Exited with status -1 (agent lost). I then have to manually restart all those jobs or rely on users to do so.
Describe the solution you'd like
I would like agents to drain their workload before being terminated and replaced during a stack update.
Describe alternatives you've considered
Performing the stack update during non-peak hours.
Manually creating an adjacent stack, migrating to the stack, and then turning off the original stack
Additional context
Perhaps using AWS lifecycle hooks to put instances in a Terminating:Wait state to allow draining would be helpful.
Alternatively, I could detaching all instances from the ASG before stack update, but I then have the problem of determining when those agents are drained and can be terminated. Maybe if the buildkite-agent service could detect the detached state and then drain the workload that would be helpful
The text was updated successfully, but these errors were encountered:
It was configured like this 6 years ago, which most probably there was a reason for doing it that way, but I'm not sure if it's still relevant.
I haven't found any confirmation in this, or the agent's repo.
It was configured like this 6 years ago, which most probably there was a reason for doing it that way, but I'm not sure if it's still relevant. I haven't found any confirmation in this, or the agent's repo.
Is your feature request related to a problem? Please describe.
I'm always frustrated when I need to make stack updates and I have to incur downtime to the agent pool when doing so. When I deploy updates, all agents in that group are terminated and replaced. This leads to jobs failing with
Exited with status -1 (agent lost)
. I then have to manually restart all those jobs or rely on users to do so.Describe the solution you'd like
I would like agents to drain their workload before being terminated and replaced during a stack update.
Describe alternatives you've considered
Additional context
Perhaps using AWS lifecycle hooks to put instances in a Terminating:Wait state to allow draining would be helpful.
Alternatively, I could detaching all instances from the ASG before stack update, but I then have the problem of determining when those agents are drained and can be terminated. Maybe if the buildkite-agent service could detect the detached state and then drain the workload that would be helpful
The text was updated successfully, but these errors were encountered: