Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Docker feedback #1

Open
jarshwah opened this issue Oct 11, 2018 · 25 comments
Open

Docker feedback #1

jarshwah opened this issue Oct 11, 2018 · 25 comments

Comments

@jarshwah
Copy link

jarshwah commented Oct 11, 2018

This will be a meta issue for collecting feedback. Otherwise we can just create an issue per item and close this one off, up to you.

Regarding the extra dependencies, can you add them from the dockerfile, install them, then volume mount over the top? Don't copy the entire django folder over, just the requirements.

Try combining as many instructions as you can to minimise the number of layers. For example, you have 5 RUN instructions in a row, they could all be combined. https://github.com/orf/django-docker-box/blob/master/Dockerfile#L23

I have no other specific feedback yet because I've just glanced at some of the files, I'm waiting for the weekend before I take it for a spin.

@orf
Copy link
Owner

orf commented Oct 12, 2018

Thanks for this! The meta issue is fine 👍

Regarding the dependencies: this brings some trouble. I would like to be able to have Django live in a directory outside of this project. Unfortunately the Django .git directory is rather large, so if you build the project with the django root as the context it has to copy the entire 400+mb tree (including .git) to the docker daemon before building which is pretty slow. The solution is to add .git to a .dockerignore but we don't control the django source directory and so cannot do this.

A fix might be to use a symlink within this project, i.e ./django/ -> ~/projects/django/, and have django/.git in a .dockerignore here but I am unsure how or if this would work on Windows.

I've worked around this by persisting the pip cache across runs, so installing the dependencies is pretty fast. If this does get accepted into the django project we should add a .dockerignore to speed things up and remove the need for that volume.

About the Dockerfile, good points, but I'm more concerned right now about getting a green build across the board just to prove that this is viable: https://travis-ci.com/orf/django-docker-box/builds/87770210

Right now there is:

  1. a persistent gis failure on Sqlite (Add a gc.collect() to make GDALBandTests.test_band_data more deterministic django/django#10501, but this isn't the correct solution)
  2. two failures in master on mysql 8 (https://code.djangoproject.com/ticket/29847#ticket)
  3. persistent failures regarding the database images taking too long to start, resulting in the test runner failing to connect, so we need to add a healthcheck of some kind

@jarshwah
Copy link
Author

Regarding the requirements, what I was thinking about was something like:

ARG DJANGO_PATH
RUN mkdir /requirements
COPY $DJANGO_PATH/tests/requirements/*.txt /requirements/
COPY extra-requirements.txt /requirements/
RUN pip install -r /requirements/py3.txt -r /requirements/..etc.txt

Then you can still volume mount the django directory, and all of the requirements have been built into the image.

Which of the databases is taking too long to start? I see pg has a healthcheck but the mysql varieties do not.

Also missing a headless test runner using the chrome or firefox drivers. There's an xvfb example in the vagrant image but I'm not sure how well that works. It might be worth looking into using chrome headless directly - but I have no experience with it.

If this project can get total coverage then it might even be attractive to depend on travis.ci directly and move the django project away from Jenkins. I don't know how well that would be received, but there's at least some burden of managing those jenkins instances and the ansible scripts behind them so I'd be keen to raise that possibility.

(Sorry, I was away for most of the weekend and didn't get much chance to dive in like I wanted to)

@apollo13
Copy link
Collaborator

I am very much interested in moving this forward; so if there is anything where you need help please ping me on IRC.

@orf
Copy link
Owner

orf commented Oct 21, 2018

Thanks for your help @apollo13, much appreciated!

So, a few things about the implementation. If we go with @jarshwah's suggestion (which is more clean, for sure) we run into some issues. To do this we need to specify the Django path as a the build context. That's fine. The problem is we have a .env file specifying the default environment variables (like PYTHON_VERSION), however it seems that this file has to be relative to the build context. So it will look in $DJANGO_PATH/.env. This is quite annoying, as every single compose run needs at least two variables.

So, a quickstart would become: PYTHON_VERSION=3.6 POSTGRES_VERSION=10 docker-compose run postgres ..., which is pretty verbose. We'd also loose all default values, and to be extra annoying docker-compose fails with a pretty unobvious error in the case above (because postgres:${POSTGRES_VERSION} resolves to postgres: if POSTGRES_VERSION it's not defined, which is in invalid reference). We could work around this by defining defaults in the Dockerfile and the compose definition I guess, I will try this tonight.

Also, while it's conceptually more 'dockery' it will actually be slower on the CI run, as you cannot mount volumes during the build. So the dependencies would always need to be downloaded, whereas now they are cached between runs in a volume. I'm not too fussed about this as the CI is not the primary use case here (yet!), however even a 40 second increase in each run across a full matrix equals ~40 minutes or so added in total.

As a side note, if we are going to consider a Jenkins replacement, Travis CI gives large open source projects special accounts with more concurrent runners. The exact specifics are done on a case-by-case basis, but Django qualifies for this.

@orf
Copy link
Owner

orf commented Oct 21, 2018

Also, as a side note, in the future with this we can potentially test on MacOS (while not a deployment platform, it's a huge development platform) and Windows. A full run across each of these would be out of the question but a minimal test suite may not be.

@jarshwah
Copy link
Author

I wasn't suggesting to use the Django directory as the build context by the way. I was suggesting we should copy the requirements files from the django directory, but leave the build context as-is. Does that change things in any way?

@orf
Copy link
Owner

orf commented Oct 21, 2018

Oh right! Unfortunately that is not possible without the build context being Django (unless we duplicate the requirements files here). Docker COPY commands are relative to the build context, so you can't do COPY /any/directory/any/file.

Making Django the build context isn't a bad idea, I've implemented it in #11 and it all seems to work OK.

@carltongibson
Copy link

Hi @orf. I've been using this now, it's great. Thank you!

I was going to mark the Trac ticket as Patch needs improvement since I don't think it's quite ready to progress, but here seems the place to discuss currently so... (we can move the discussion elsewhere if you prefer).

So, first off, a question: what's the benefit to moving the files into django/django rather than say, django/django-docker-box and setting DJANGO_PATH as we do here? The PR adds quite a lot of files, and I suspect quite frequent edits, so lots of noise — could we not just keep that separate?

Second, for me at least, having done nothing to optimise in any way, performance is quite down on running without docker: 16mins vs 4mins.

(That's with the images already built etc.)

(That's running with --parallel=2 to keep it fair, because docker is setup with the default "2 CPUs", but none of that is even mentioned on the Quickstart...)

Any thoughts on this?

I have various other little thoughts, that tie-in to the tweaks suggested here and in the other issues here and on the PR, but they really come under "frequent edits, so lots of noise" above.

Current status: we should definitely pull this into the Django org, as an official project etc, point people towards it and so on, but not include the files directly in django/django or make it the top option in the testing Quickstart.

@adamchainz
Copy link

+1 to moving into django org and pointing people towards it without including in django/django.

@orf
Copy link
Owner

orf commented Feb 18, 2019

Thanks for your feedback Carlton! I'll try and address a few points below.

Moving under Django: yes please! I can transfer the repo directly or we could create a new one, up to you. Let me know how to proceed.

Speed: it really should not be that slow, unless perhaps you are on Windows and using Virtual box as your docker server? I'd love to diagnose why this is, so any info you can give me would be great.

On Linux the speed should be native, on Mac it should be nearly native. With hyper-v it should still be fast.

Regarding including this inside Django: the main reason is the simplicity of running this while it's inside Django. Just clone and go (in theory 😁).

The secondary reason is it unlocks a lot of cool improvements down the line. Using docker-compose as the primary test runner for Django has some real tangible benefits - we can adapt the way we run tests via pull requests, the CI is always in sync with the developers, less reliance on Jenkins, easier to adapt other services (like the VS CI solution that's recently been proposed). Plus there is a lot less maintenance of Jenkins in general, and it's easier to expand our coverage.

In short, this repo running a single full-matrix build a day has caught a few bugs. It would be good to run it on every PR, or at least improve our coverage, and I see docker-compose as a gateway to making faster improvements to this in the future.

And the third reason is that it's a bit annoying to keep it separate. It's not really a separate project, it's intimately tied to Django and needs to be kept in sync (specifically versions, but other stuff as well). It feels natural to me to move this into core.

But, baby steps I guess. If not everyone agrees with this then I'm happy to keep it separate.

Edit: I would definitely not like to maintain the PR if I do not have to, so it would be good to get a consensus on this. I started the PR after someone suggested bringing it into core on the mailing list, so perhaps it warrants some more discussion there?

@jarshwah
Copy link
Author

jarshwah commented Feb 18, 2019 via email

@carltongibson
Copy link

I'm using Docker on macOS. Just default settings.

I think we need to go back to the mailing list...

@carltongibson
Copy link

One thought: If we had an "official" image users could just docker pull to get up and running. (???)

@orf
Copy link
Owner

orf commented Feb 18, 2019

I'm using Docker on macOS. Just default settings.

Ok, let me dig into this a bit. As @jarshwah mentioned there are some volume options we can play with, so I will run some benchmarks locally and see any improvements.

One thought: If we had an "official" image users could just docker pull to get up and running. (???)

So that would ideally be part of this project. Right now the Travis builds pull and push an image under my account on every build and I would definitely like to keep that, which might require the Django project to create an account on Docker Hub. But yeah, this is the twofold advantage of not having to rebuild endlessly on every CI run, and also every developer can pull the latest image used to run the test suite for the latest versions of packages, configuration changes and obviously so they don't need to build it locally. That should be the preferred way of running this project.

It would be docker-compose pull mysql/postgres/sqlite/etc rather than docker-pull though, and you'd still need the docker-compose.yml to do anything interesting.

@jarshwah
Copy link
Author

My last response was written on the go which is why it was so terse, apologies.

It seems like the major concern for you Carlton is that there are a lot of new files, and there'll be a lot of churn as a consequence. I don't believe this is something that'll actually happen. The docker-compose and .dockerignore would live in the root. The test settings could be moved to the tests directory. The Dockerfile/Entrypoint can be moved to a separate sub directory, though the convention is to leave them in the root.

The dockerfiles themselves should rarely change - and would only do so to provide optimisations. When a new Python/DB version is added or removed, the env var files will change as a consequence. Which brings me to one of the benefits. The configuration of the containers (and docker-compose) stays current with the version of django that is checked out.

@adamchainz keen to hear why you'd prefer the repo to remain separate?

If we're already using official channels to point towards a supported repo, then why not keep the middle step and bundle the extra files? The repo has little value without being connected to Django, so we can make that connection strong and easy for new contributors to use, or not as strong with supporting documentation that users are going to find somewhat more difficult to follow.

@carltongibson
Copy link

carltongibson commented Feb 19, 2019

Yes, my main concern is churn. It seems to me there are lots of opinions on how Docker files (in general, rather than just Dockerfiles) should look, and that these things are still evolving rapidly (what was good last version, isn't this) and so I think there would be lots of changes (and so noise).

I guess, I'd favour official project + see how that goes as a stepping-stone, or test phase... I like the vision of the all improved CI, but I suspect it will take some time™ to get there. I'd rather be sure it was all going to work as promised before merging it in.

Separate point: I'm not (yet) convinced that "install docker" etc is easier for a potential contributor than "create a venv" etc. EVERY Python user deals with venvs already. (I already accept that venvs are confusion for beginners, e.g. when learning Django. I guess I think, they've got used to them by the time they hit contributing...)

@jarshwah
Copy link
Author

jarshwah commented Feb 19, 2019 via email

@carltongibson
Copy link

carltongibson commented Feb 19, 2019

Sure, totally. But the issue there is just What goes at the top of the quickstart? Running against SQLite is plenty good enough for most issues, for most new contributors, I'd say. (Anyhow... 🙂)

@orf
Copy link
Owner

orf commented Feb 19, 2019

I would be more than happy to move it below virtualenvs in the quickstart, I guess it is a more specialized test suite. If you are making changes to views or templates then it's unneeded overhead.

Let's work on first moving this to an official Django project, and then see how it goes. I think that's the best way forward. I can adapt the PR to include instructions on how to clone and run the project.

As a side note, I think the automated testing has real value in this project. What do we think of keeping that Travis integration when we move this to the Django org?

@carltongibson
Copy link

...I think the automated testing has real value in this project...

Yeah. I don't know exactly how that roles-out, but it's the great hope right. (Who knows most about this stuff? @apollo13 has the right permisssions etc, I think.)

(I wonder about these GitHub Actions I keep reading about. Can we use those to trigger building images and doing CI etc?)

@apollo13
Copy link
Collaborator

Yes, I should be able to help with the move into an official repo. I am just wondering if we shouldn't just commit to django/django directly and do not do the dance via an intermediate repository before.

I haven't really used github actions myself, but from what I have heard they are pretty basic for now (which is completely okay, I just don't think we'd run a whole ci against it).

@carltongibson
Copy link

Update: we still want to move this into Django. (Was talking to @felixxm just yesterday...)

I'm coming round to the idea that we should be keeping things like Dockerfiles and CI config in the main repo (if nothing else to make then community maintained) (despite churn) but I think we probably need to do it in a two-step. (Move to Django Org, let it sit for a while, maybe move to main repo.)

@apollo13 Can we just make this happen?

@apollo13
Copy link
Collaborator

Yes, if @orf adds me as maintainer I should be able to move into the django org. Then we will probably have to install docker on the CI server and see how it goes. One thing I am wondering a bit is how concurreny works, can we just execute stuff twice or will we have problems with overlapping container names?

@orf
Copy link
Owner

orf commented May 22, 2019

I'm not sure I can add you as a maintainer, I only see options for a collaborator (which you already are). I can transfer this repo to you @apollo13 (if you delete your current fork), or alternatively you could just fork this one under the Django org? I'm not sure we'd loose much by doing it that way.

As for concurrency, docker-compose ensures that this doesn't happen (in theory). If you use the -p flag with a Jenkins build ID then containers will have unique names and will not clash.

Sorry that I've been inactive on this recently, I've been incredibly busy with real life and have not had any time to push this forward.

@orf
Copy link
Owner

orf commented Jun 2, 2019

Thanks for all the feedback in this issue everyone. The repo has been migrated, however I can't migrate the issues. That's not too much of a problem, so I'm going to archive this repo 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants