Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

coupling of service.yaml on the continuous-deployment.yaml #33

Open
froilan opened this issue Jan 23, 2018 · 9 comments
Open

coupling of service.yaml on the continuous-deployment.yaml #33

froilan opened this issue Jan 23, 2018 · 9 comments
Assignees

Comments

@froilan
Copy link

froilan commented Jan 23, 2018

Since the service.yaml is defined in the ecs-refarch-continuous-deployment.yaml, whenever we have changes on that file like adding a new subnet, it will update the service back to using the initial image: "amazon/amazon-ecs-sample". Which might be undesirable.

Assuming my understanding is correct, any ideas how to mitigate that? Or how to go around it?

@jpignata jpignata self-assigned this Jan 23, 2018
@jpignata
Copy link
Contributor

jpignata commented Jan 23, 2018

Thanks for opening this! Your understanding is definitely correct. It'll also reset parameters like the DesiredCount which also seems undesirable. I think we're going to have to figure out a way to separate the service resources from the rest of the stack as we did in the previous iteration. Thinking about this some more. that would be insufficient to address the example you provided above. I'm talking to some folks internally to figure out what approach we should take here.

@jpignata
Copy link
Contributor

I'm not sure there's a great answer here about how to handle both CloudFormation and CodePipeline mutating the same set of resources. If your goal is to manage your infrastructure using CloudFormation, you'd better served using CodePipeline's CloudFormation integration in order to create new task definitions and change the service's task definition ARN rather than using the native CodePipeline ECS integration on deployment. This would allow you to change the template in the S3 bucket or provide new subnet IDs in the pipeline and maintain a consistent state. This doesn't solve the DesiredCount issue, but let's set that aside for the moment.

You can use this version as a reference showing how to implement this. Note that Service is not a nested stack of the main template here.

I'm going to leave this open until a better answer emerges.

@jinty
Copy link

jinty commented Mar 12, 2018

I'd just like to note that I've also been bitten by this. More of a chicken and egg situation: I need to define a custom entry point, thus I really need a non-example container but I can't get that till the stack is loaded and the pipeline running.

@jpignata
Copy link
Contributor

Thanks @jinty. 😔 Wish I had a better answer here. In some ways, the flexibility of the CloudFormation approach driven by CodePipeline lends itself to a more reusable example.

@SunlightJoe
Copy link

I just wanted to let you know that I've spent the last couple week building this version of the CD pipeline. It seems to work fine and I'm pretty happy with it. I ended up using Fn::ImportValue a lot because the ParameterOverrides got so long it went over the 4096 character limit.

One major problem is that if I push a program that fails to launch but builds fine in docker (maybe because of a missing parameter or something), ECS will keep trying to start the service over and over and it takes CloudFormation 3 hours to time out. It's not great to have the pipeline stall for 3 hours. There seem to be no way to shorten that timeout. (I spent about 3 hours trying 😉 )

Obviously I'll need integration tests before trying to deploy to ECS, but another thing to try would be to create the TaskDefinition with the new image tag as another Pipeline stage and try to launch it via aws ecs run-task via CodeBuild after that. If it passes, then I can pass the new TaskDefinition.Arn to ECS instead of just the ECR.

As for the DesiredCount, you can have CodeBuild run aws ecs describe-services as part of the building stage.

I recommend merging the alternate implementation into master because what's on master right now isn't a usable example for anyone building and deploying their own applications. Otherwise I think everyone using this as a starting point will get bitten by this issue eventually.

@jpignata
Copy link
Contributor

jpignata commented Apr 4, 2018

Thanks, @SunlightJoe. Can you expand pon "merging the alternate implementation" suggestion? Are you suggesting moving back toward using CloudFormation for the deploy stage?

@SunlightJoe
Copy link

@SunlightJoe
Copy link

SunlightJoe commented Apr 9, 2018

I think I've found another solution.

When I was looking for a way to get around the stuck ECS service, somebody suggested setting the DesiredCount to 0. It worked! This gave me an idea.

I added a parameter for the DesiredCount that defaults to 0. When first creating the pipeline, ecr, and ecs service, I leave it at the default and cloudformation creates the service without actually trying to launch the task. I also have the CodeBuild step update the latest tag in the ECR, which is what I use in the TaskDefinition.

After the first successful run of the pipeline that creates an image, I rerun the cloudformation via aws cloudformation deploy ... --parameter-overrides DesiredCount=1. This makes cloudformation actually try to run the task and start the service, and it works.

This doesn't solve the problem of resetting the DesiredCount back to what you set during deploy, but it's a start. Maybe you could set it via aws ecs update-service ... -desired-count n as a CodeBuild step after the ECS step.

@lyoungblood
Copy link

lyoungblood commented May 19, 2018

@SunlightJoe this seems like a great workaround, however, I ran into a strange issue:

  • Launch the CFN with DesiredCount 0
  • Wait until the first CodePipeline runs and builds an artifact that is tagged with the commit hash
  • Update stack and change DesiredCount to 1
  • Containers try to launch at this point, the artifact is there in ECR (tagged with the commit hash), and the task definition is updated to point at the correct artifact, but the tasks fail to launch with CannotPullContainerError: API error (404): manifest for 476593617425.dkr.ecr.eu-west-1.amazonaws.com/baker-repos-1lvq9scec8c5o:latest not found

The only fix for this seems to be to do a local docker build/tag/push with a "latest" tag, which will do something to the repository that makes the Fargate task now able to pull the commit hash tag it was trying to pull all along.

Edit: I fixed this. By adding a build and post_build command that creates a latest tag, this now works. The steps you go through are:

  1. Deploy the CloudFormation with DesiredCount 0.
  2. Wait for the pipeline to finish completely.
  3. Update stack and change the DesiredCount to whatever you want it to be.

Here is my git diff output - @jpignata please let me know if you'd like a pull request for this:

diff --git a/templates/deployment-pipeline.yaml b/templates/deployment-pipeline.yaml
index 5526a09..80f3d67 100644
--- a/templates/deployment-pipeline.yaml
+++ b/templates/deployment-pipeline.yaml
@@ -127,9 +127,11 @@ Resources:
             build:
               commands:
                 - docker build --tag "$IMAGE_URI" .
+                - docker build --tag "${REPOSITORY_URI}:latest" .
             post_build:
               commands:
                 - docker push "$IMAGE_URI"
+                - docker push "${REPOSITORY_URI}:latest"
                 - printf '[{"name":"simple-app","imageUri":"%s"}]' "$IMAGE_URI" > images.json
           artifacts:
             files: images.json

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants