Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent HTTP 400 responses when pushing to caches #271

Open
dpipemazo opened this issue May 3, 2020 · 32 comments
Open

Intermittent HTTP 400 responses when pushing to caches #271

dpipemazo opened this issue May 3, 2020 · 32 comments

Comments

@dpipemazo
Copy link

dpipemazo commented May 3, 2020

Hello, we're using buildx in the CircleCI build chain for the atom project. We're noticing intermittent HTTP/400 responses when pushing cache back to Docker Hub.

An example workflow exhibiting this failure can be seen here

The build invocation command looks similar to the below for all failures:

docker buildx build
    --platform=linux/amd64
    --progress plain
    --load 
    f Dockerfile
    -t ${DOCKERHUB_ORG}/${DOCKERHUB_ATOM_REPO}:build-${CIRCLE_WORKFLOW_ID} --target=atom
    --build-arg BASE_IMAGE=${DOCKERHUB_ORG}/${DOCKERHUB_ATOM_REPO}:base-3050
    --build-arg PRODUCTION_IMAGE=debian:buster-slim
    --pull
    --cache-from=type=registry,ref=${DOCKERHUB_ORG}/${DOCKERHUB_CACHE_REPO}:atom
    --cache-to=type=registry,ref=${DOCKERHUB_ORG}/${DOCKERHUB_CACHE_REPO}:atom,mode=max .

We're using --cache-from and --cache-to as the same repo on DockerHub with a cache tag that's reused by our build jobs. It's worth noting that we might have multiple jobs running in parallel that would be pulling from/pushing to this cache tag and I'm not sure if that could be exacerbating the issue. I think the issue has been seen when both it's been the only build running at a time and when there's been multiple builds running.

The error always is on the writing manifest stage of the --cache-to push:

#34 writing layer sha256:fd80cd7eb0067d2a1272bfd46d71d2bc52f3b10c5f77f59986f55f04ce037cbf
#34 writing layer sha256:fd80cd7eb0067d2a1272bfd46d71d2bc52f3b10c5f77f59986f55f04ce037cbf 0.2s done
#34 writing config sha256:6d34f356902ecc7880e15a20f914852dc3b3311ea394d41166411217dd447710
#34 writing config sha256:6d34f356902ecc7880e15a20f914852dc3b3311ea394d41166411217dd447710 1.0s done
#34 writing manifest sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b
#34 writing manifest sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b 1.4s done
#34 ERROR: error writing manifest blob: failed commit on ref "sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b": unexpected status: 400 Bad Request
------
 > exporting cache:
------
failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:78e10324ddbb99f2ef901e77b7a8945b8c83c80fa690f3bd928f33798303a38b": unexpected status: 400 Bad Request

Other examples of the same failure: 1 2 3 4 5

It seems to happen between 1 and 5 percent of the time which seems odd. Most of the jobs it's failing on are ones in which nothing in the build changed and the entire build job should be stored in the cache -- not sure if that's playing a role. I have seen it fail on jobs where the build has changed significantly though as well.

Builds are done using CircleCI's ubuntu-1604:202004-01 machine which is running:

  • Ubuntu 16.04
  • docker 19.03.8
  • docker-compose 1.25.5

I realize that these cache pushing errors might be on the DockerHub side as well so I'm not entirely sure if this is the right place to file this issue. One thing that might be nice though, as a feature, would be a flag to turn off build failure (and a nonzero exit status) on cache push failure. When this failure occurs I need to restart my build jobs (some of which are x-compilation jobs for aarch64 which can take multiple hours unfortunately). It would be nice to have a flag to have this failure print an error/warning but return a zero exit code since the build itself is OK and it's only the cache that failed.

Thanks for taking a look -- happy to provide any more info/context as needed.

@k911
Copy link

k911 commented May 23, 2020

I can confirm the same buggy behaviour, with a little bit different environment set-up:

  • CI: CircleCI
  • Registry: docker hub (docker.io)
  • Docker daemon: 18.09.3 (remote docker)
    # https://github.com/k911/swoole-bundle/blob/develop/.circleci/config.yml
    # ...
    setup_remote_docker:
      version: 18.09.3
  • Docker client: custom docker image based on docker 19.03.8 official image (Dockerfile) (Docker Hub)
    • docker-compose: 1.25.5
    • docker buildx: 0.4.1

Example failed builds:

  • 1 - Timeout on exporting cache
  • 2 - Bad Request 400 on exporting cache
  • 3 - Bad Request 400 on exporting cache
  • 4 - Bad Request 400 on exporting cache

EDIT: Recently we switched to:

setup_remote_docker:
  version: 19.03.8

But nothing changed.

@ts-mini
Copy link

ts-mini commented Jun 24, 2020

We're currently experience similar issues with a stack like @k911 - but we're using docker hub as the cache registry but the final destination is gcr.io. Can provide details for investigation if needed.

@opsnull
Copy link

opsnull commented Jul 16, 2020

Same issue here, I'm using harbor as cache registry.

@shouze
Copy link

shouze commented Aug 20, 2020

don't having it with docker hub, but seems that Github docker packages and AWS ECR are always failing with 400 when trying to push cache layers in the repository (despite they've announced manifest support since 2020, May 1st).

@viceice
Copy link

viceice commented Sep 29, 2020

We are seeing this too. 😢

https://github.com/renovatebot/docker-renovate/runs/1181364013

@shouze
Copy link

shouze commented Sep 29, 2020

does anyone have tested since github container registry announcement (sept. 17th)?

@saulshanabrook
Copy link
Contributor

saulshanabrook commented Oct 27, 2020

Yeah, I am also seeing an issue where I am getting 400s on the GitHub Container Registry.

EDIT: I opened a thread on their community forum https://github.community/t/cannot-push-cache-layer-to-ghcr/140190

@Silvenga
Copy link

Silvenga commented Nov 5, 2020

I had to go searching for a registry that didn't have issues. It looks like the Azure Container Registry offering is compatible.

@arlyon
Copy link

arlyon commented Nov 26, 2020

@Silvenga could you maybe provide some context here as to which registries you've tried? I have personally had issues with ghcr, ecr, and docker hub:

  • github container registry (new): broken pipe / connection reset by peer / 502 bad gateway
  • docker hub: connection reset by peer
  • aws elastic container: connection reset by peer

I can confirm Azure is reliable (and cheap!) which is fine for me in the mean time. A trial of their basic tier is enough for a year's free usage.

@Silvenga
Copy link

I was getting 400 level errors from Github's registry (non-preview version). I never tried DockerHub or AWS's offering.

Didn't AWS's offering have issues today? Maybe related?

@chinmaya-n
Copy link

chinmaya-n commented Jul 1, 2021

We are having the exact same issue when pushing cache to our on-prem Artifactory Docker Registry.
Just like the OP, the error always is on the writing manifest stage of the --cache-to push. However, this is not intermittent for us, it seems to be happening consistently.

Client: Docker Engine - Community
   Version:           19.03.12
   API version:       1.40
   Go version:        go1.13.10
   Git commit:        48a66213fe
   Built:             Mon Jun 22 15:42:53 2020
   OS/Arch:           linux/amd64
   Experimental:      false

Any updates/progress on this?

@grubenhund
Copy link

Same issue with Harbor here:

github.com/docker/buildx v0.5.1-docker 11057da37336192bfc57d81e02359ba7ba848e4a

Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
Harbor
Version v2.2.2-56d7937f

build with --cache-to fails with

 => => writing config sha256:89e0fedd4ea581cc7c18fbfb194c6274997a26e7f2bd6e21ffe1b9d9161dc367                                                                                                                 1.0s
 => => writing manifest sha256:ccaa61b5bdf7bafe60b44a9f80d7905d55ad8d6dfb8c672f9c1b764c2a07e073                                                                                                               0.8s
 => [auth] library/testcache:pull,push token for registry.this                                                                                                                           0.0s
------
 > exporting cache:
------
error: failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:ccaa61b5bdf7bafe60b44a9f80d7905d55ad8d6dfb8c672f9c1b764c2a07e073": unexpected status: 404 Not Found

@artyom-p
Copy link

Still no updates on this?

@joekhoobyar
Copy link

also seeing the same issue

@GuyMev
Copy link

GuyMev commented Feb 16, 2022

Same issue but with ECR

@stanleymho
Copy link

Also saw the same issue.

@royletron
Copy link

For anyone getting this with Google Cloud Registries, move them to the Artefact registry instead - shifted the problem for us. Sadly, can't give much of a clue for anyone else!

@l4d2boomer
Copy link

Same issue with Harbor when exporting cache :(

@JuergenKindler
Copy link

JuergenKindler commented Jan 12, 2023

Also see that with Artifactory 7.21.12 ... but it's not at all intermittent. Currently I am blocked as it happens each time. So same as for @chinmaya-n .

@VelocityLight
Copy link

VelocityLight commented Jan 14, 2023

Also see that with buildx v2, with ECR

@Gyvastis
Copy link

Same here, no update yet what it might be related to?

@mfittko
Copy link

mfittko commented Apr 17, 2023

I now switched to using local caching with EFS as backing cache filesystem. Hope that this will be fixed for ECR soon though! I saw that there was some progress recently 🤞 aws/containers-roadmap#876 (comment)

@sheurich
Copy link

sheurich commented Nov 9, 2023

For those using (legacy) Google Container Registry, only the cache layers need to be stored in Artifact Registry. This still allows storage of the final image in GCR.

@abhijit838
Copy link

It works with JFrog repositories but failing for quay repositories.

@FeryET
Copy link

FeryET commented Dec 14, 2023

It works with JFrog repositories but failing for quay repositories.

It's failing for JFrog for me.

@klevo
Copy link

klevo commented Jan 2, 2024

With Amazon ECR I got this error constantly and it was solved by adding ,image-manifest=true,oci-mediatypes=true to the --cache-to portion of the command, as per the example at https://aws.amazon.com/blogs/containers/announcing-remote-cache-support-in-amazon-ecr-for-buildkit-clients/

@viceice
Copy link

viceice commented Feb 9, 2024

image-manifest=true seems to be enough. oci-mediatypes=true is the default value since a long time.
let's see how it works with multi arch images.

@FosuLiang
Copy link

Same issue for me...It's still no solutions here?

@lowc1012
Copy link

I still got this error intermittently with image-manifest=true to export caches to ECR

@eathtespagheti
Copy link

I'm getting this error too on ECR with both image-manifest=true,oci-mediatypes=true

@colbynh
Copy link

colbynh commented Sep 10, 2024

I now switched to using local caching with EFS as backing cache filesystem. Hope that this will be fixed for ECR soon though! I saw that there was some progress recently 🤞 aws/containers-roadmap#876 (comment)

Interesting! I tried to do this as we have self hosted runners. I got it hooked up with EFS. It does export the cache to it but it never seems to actually get cache hits. It always does a full build. Im curious if you could share how you were able to get it to work?

I tried switching to using ECR registry cache instead but like others I'm getting the same 400 issue.

here's what I tried with local that didnt work either. docker buildx build . -t <my_ecr_repo>/php7.4-apache-base:latest -f ../Dockerfile_php7.4-apache --push --cache-to=type=local,mode=max,dest=/mnt/efs/.buildx-cache-new --cache-from=type=local,src=/mnt/efs/.buildx-cache

@fdutey
Copy link

fdutey commented Sep 26, 2024

Dont know if this answer can help someone, but in my case, this error was only happening during some production builds.
My production builds are the only builds that update the "latest" tag.
I also noticed that the only builds who were failing were pushing to immutable ECR repositories.

The solution was to make the repos mutable. I dont like it but it solved the issue

HippocampusGirl added a commit to HALFpipe/RAMP that referenced this issue Dec 6, 2024
Fix "manifest blob unknown" error when pushing buildcache as per
docker/buildx#271 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests