-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Dockerfiles to use common base layer #92
base: master
Are you sure you want to change the base?
Conversation
- Dockerfile.base contains all the Bifrost dependencies. - Dockerfile inherits Dockerfile.base and contains the Bifrost build. - The old Dockerfile.cpu and Dockerfile.gpu are removed. - The FROM line and build args in the Dockerfiles are parameterised using ARGs, which are inserted by the Makefile (a WAR is currently needed for the FROM line because it doesn't support ARG, but this was recently fixed in upstream Docker so we can eventually do it properly). - The Makefile defines 4 targets: docker-base (GPU container with dependencies only), docker (GPU container with Bifrost installed), docker-base-cpu (CPU container with dependencies only), and docker-cpu (CPU container with Bifrost installed). - We can now also build containers with different OS images by specifying CUDA_IMAGE_NAME or CPU_IMAGE_NAME. E.g., "make docker CUDA_IMAGE_NAME=nvidia/cuda:7.5 IMAGE_NAME=ledatelescope/bifrost-cuda-7.5", which allows us to easily test compatibility with older versions of CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that my review has a big red "X" on the top of it, I wasn't sure what the difference between the options were for suggesting changes. This PR is great.
Some general comments:
(I can push some of these changes in the morning (AEST morning))
- README needs an update to the new build instructions to make
- vim, nano, doxygen could be removed as dependencies (if we are aiming for a minimal container with working bifrost, as the user can always add more)
- Having never seen a project with a Dockerfile which requires pre-processing by Make (though I do really like the idea), I think we could have an additional "standard" dockerfile for people who don't like to read every word of a README, the file having an explicit FROM at the top so you can build it with a vanilla docker build ....
- Since
Dockerfile
is a filename required by the docker hub (weird, I know, but even the official library images are all just Dockerfile in different directories), theDockerfile
's actually can't have pre-processing if we want to host images on docker hub - We can still have them as is, but with different names (and maybe in a different directory because there are so many front-page files now... I'll move out the travis docs builder too), then with explicit Dockerfiles on the front
- The Dockerfile should pull from a ledatelescope/bifrost:deps image from docker hub once we get it set up (to avoid the rebuilding of unchanged dependencies)
- I can set one up off my fork until I get approval from @ledatelescope to set up a docker hub under ledatelescope.
- Since
- We may want to rename the default local images to be of the form
bifrost...
instead ofledatelescope/bifrost
, as that is where the docker hub images will live - I get errors with the
version.py
file duringmake docker
:
Traceback (most recent call last):
File "setup.py", line 34, in <module>
with open(bifrost_version_file, 'r') as version_file:
IOError: [Errno 2] No such file or directory: 'bifrost/version.py'
make[1]: *** [clean] Error 1
(it still builds and runs though!)
- Everything builds and works (aside from a couple broken unit tests in the GPU container, I'll post at bottom)
- The git clone of pyclibrary should be replaced by the instructions I put on the README (pip install git+https...), as it's faster and can be combined with the above pip for one less layer. Thanks @caseyjlaw for showing me this magic.
Other
(Unrelated) Here's a nice hack I found (arguments to docker run
)
-v /var/run/docker.sock:/var/run/docker.sock -v /usr/bin/docker:/usr/bin/docker
This lets you control the docker daemon from within a container, so you never have to leave.
Here are the errors I saw from the GPU unit tests. The last one is weird. I think it should be np.testing.assert_almost_equal to kill tiny rounding errors?
linalg.cu:148 Condition failed: "Supported dtype for array a"
linalg.cu:148 error 22: BF_STATUS_UNSUPPORTED_DTYPE
linalg.cu:243 error 22: BF_STATUS_UNSUPPORTED_DTYPE
E..................F................s.........................
======================================================================
ERROR: test_matmul_aa_ci8 (test_linalg.TestLinAlg)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/workspace/bifrost/test/test_linalg.py", line 75, in test_matmul_aa_ci8
self.run_test_matmul_aa_ci8_shape((11,23))
File "/workspace/bifrost/test/test_linalg.py", line 54, in run_test_matmul_aa_ci8_shape
self.linalg.matmul(1, a, None, 0, c)
File "build/bdist.linux-x86_64/egg/bifrost/linalg.py", line 59, in matmul
beta, c_array))
File "build/bdist.linux-x86_64/egg/bifrost/libbifrost.py", line 154, in _check
raise RuntimeError(status_str)
RuntimeError: BF_STATUS_UNSUPPORTED_DTYPE
======================================================================
FAIL: test_repr (test_ndarray.NDArrayTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/workspace/bifrost/test/test_ndarray.py", line 62, in test_repr
self.assertEqual(repr_f, repr_k)
AssertionError: '([[ 0., 1.],\n [ 2., 3.],\n [ 4., 5.]], dtype=float32)' != '([[ 0., 1.],\n [ 2., 3.],\n [ 4., 5.]], dtype=float32)'
----------------------------------------------------------------------
Ran 160 tests in 189.538s
FAILED (failures=1, errors=1, skipped=3)
docker: | ||
docker build --pull -t $(IMAGE_NAME):$(LIBBIFROST_MAJOR).$(LIBBIFROST_MINOR) -f Dockerfile.gpu -t $(IMAGE_NAME) . | ||
docker-base: | ||
echo "FROM $(CUDA_IMAGE_NAME)" > _Dockerfile.tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could all be .dockerfile.tmp
, instead of _dockerfile.tmp
, the latter of which seems like a python-specific convention for private/hidden objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
COPY . . | ||
RUN make clean && \ | ||
make -j16 $make_args && \ | ||
make doc && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve heard the best docker practice is to make containers
as minimal as possible (?), so I don't think the full doxygen reference should go in. It would be hard to visualize it efficiently without setting up x11 as well, and we have a website to hold it all anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this make clean
is probably causing the version.py error. Does that need to be in there? Is it there just in case the user already built bifrost outside the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the make doc sounds like the right idea.
Indeed the make clean is important to avoid strange errors when the user already built the library outside. I'll look into what's going on with version.py.
We could probably also do make clean again after make install.
@@ -0,0 +1,19 @@ | |||
# Note: FROM command is inserted by Makefile | |||
# TODO: As of Docker-CE 17.05, we could use this better approach instead: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also build from streams (moby/moby#31236) which is nice, so we won't have to create a temporary file within the build context.
|
||
# Build the library | ||
WORKDIR /bifrost | ||
COPY . . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create a .dockerignore file with at least .git
included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it can be useful to have the git history inside the container, but perhaps I should be doing that with your approach of using the base container and mapping the source directory instead. In that case, this sounds like a good idea.
The specification of base images from the command line is very nice. I will have to read how travis tests different languages, and if there is a similar settings where we can set different environment variables to run over. |
Sorry that my review has a big red "X" on the top of it, I wasn't sure what the difference between the options were for suggesting changes. This PR is great. |
Thanks Miles, fantastic review! Lots of excellent points. Re the test failures, what CUDA version and GPU model is this running with? |
CUDA 8.0 on a Tesla K80 (it's an AWS |
I set up docker hub on my fork. You can pull with, e.g.,
The default (:latest) is the cpu docker. Here are all the images: https://hub.docker.com/r/mcranmer/bifrost/tags/ |
We could also pull from docker library images that eliminate redundancy in our builds, e.g., python:2.7-slim (which comes with pip/setuptools, and basic build dependencies) |
Actually, I forgot that that won't work, as they need the nvidia/cuda base instead of debian:jessie. Do you know of any NVIDIA docker library which attempts to re-create some of the official docker-library using an nvidia/cuda base? It looks like quite a few of the official images are built off of debian variants, which means it wouldn't break anything to use the base image as nvidia/cuda but otherwise leave the dockerfile identical. I made one for buildpack-deps:xenial - https://github.com/MilesCranmer/docker-cuda-buildpack, but it is obviously not official. |
ledatelescope now has a Docker hub: https://hub.docker.com/r/ledatelescope/bifrost/ The only image is |
New image built on top of pypy is |
|
The nvidia/cuda:10.2 image disappeared from the hub, so the GPU versions in master were not buildable anymore. Also, the two that build bifrost were not yet using the configure script. However, the full-build `.gpu` version doesn't quite work because nvidia-docker seems to provide access to GPU during run phase, but not during build phase. Perhaps relevant: https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime The `_prereq.gpu` version builds fully and is still helpful. Also pinging PR #92 which is outdated but relevant.
using ARGs, which are inserted by the Makefile (a WAR is currently
needed for the FROM line because it doesn't support ARG, but this was
recently fixed in upstream Docker so we can eventually do it
properly).
dependencies only), docker (GPU container with Bifrost installed),
docker-base-cpu (CPU container with dependencies only), and docker-cpu
(CPU container with Bifrost installed).
specifying CUDA_IMAGE_NAME or CPU_IMAGE_NAME. E.g.,
"make docker
CUDA_IMAGE_NAME=nvidia/cuda:7.5
IMAGE_NAME=ledatelescope/bifrost-cuda-7.5", which allows us to easily
test compatibility with older versions of CUDA.