-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Refactor Dockerfile & entrypoint #8923
base: main
Are you sure you want to change the base?
Conversation
eda21a9
to
32ef2ef
Compare
32ef2ef
to
6932d9a
Compare
The test was using a custom config file set in `test_variables`. However, the file was not included in the Docker image, and the entrypoint script created a new, default one under the original file's path. Zebra then loaded this new file, and the test passed because the pattern in `grep_patterns` matched Zebra's output containing the original path, even though the config file was different.
cdc54ba
to
c5010b8
Compare
485423a
to
e9f0479
Compare
@upbqdn can the motivation be expanded/updated here? For future reference, it might be confusing if someone looks at the PR and understand all these changes were required to fix the use of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully review the entrypoint.sh
as some of these requested changes might have a slight impact there. Changes to CI files—unless they're related to the Docker changes—should be in a different PR that depends on this one.
ARG UID=10001 | ||
ENV UID=${UID} | ||
ARG GID=10001 | ||
ENV GID=${GID} | ||
|
||
RUN addgroup --system --gid ${GID} ${USER} \ | ||
&& adduser \ | ||
--system \ | ||
--disabled-login \ | ||
--shell /bin/bash \ | ||
--home ${APP_HOME} \ | ||
--uid "${UID}" \ | ||
--gid "${GID}" \ | ||
${USER} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be instances where a user would like to (re-)build the image with a custom UID:GUID, as they might require to mount files from their host, which will be incompatible with the UID:GUID of the container user, and it's also a good Docker practice to specify the UID for the default user for other edge-cases.
Unless there's a good reason to remove this, I'd suggest to keep it.
The PR that added this has some references to the reasoning behind it: #8803 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we're creating a system user. Those users should have UIDs in [1, 999].
In Debian, the docs for adduser
say that the default dynamic range for system UIDs is defined by [FIRST_SYSTEM_UID, LAST_SYSTEM_UID], which is [100, 999] in etc/adduser.conf
: https://manpages.debian.org/bullseye/adduser/adduser.8.en.html
Moreover, the Linux spec says that system UIDs in [100, 499] should be reserved for dynamic allocation: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/uidrange.html
I'm surprised the Docker docs don't mention this.
adduser
automatically checks if a UID is available before assigning one from the right range. Since I wasn't sure what UID to pick, and no user brought it up, I removed it to avoid scope creep. I'd prefer to address this issue according to the spec once a user brings it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is not clear in Docker's documentation. But some of their tools, docker init
for example, output the recommended approach, which is the following (with alpine
linux):
ARG UID=10001
RUN adduser \
--disabled-password \
--gecos "" \
--home "/nonexistent" \
--shell "/sbin/nologin" \
--no-create-home \
--uid "${UID}" \
appuser
USER appuser
In any case, to match their recommendation in debian
, this would change to:
RUN addgroup --system --gid ${GID} ${USER} \
&& useradd \
--uid "${UID}" \
--no-create-home \
--home-dir /nonexistent \
--shell /usr/sbin/nologin \
--comment "" \
appuser
They main reasons to keep this instruction are:
-
Host and Container Alignment: When you mount host directories into the container, file permissions are based on
UID
andGID
. If the container's user ID match those on the host, you avoid permission issues.- I've experienced this myself when I was testing the docker-compose and mounting a cached state from my host into the container, which had different UID:GID, and the container was not able to write to it.
-
Predictable IDs: By specifying the UID and GID, you ensure that the user inside the container has the same IDs across different builds, deployments, and host systems.
Additionally, this is very common in Docker, and plenty of projects use this approach to deal with these and other use-cases: https://github.com/search?q=%22useradd%22+%22--uid%22++language%3ADockerfile&type=code
ARG APP_HOME | ||
ENV APP_HOME=${APP_HOME} | ||
WORKDIR ${APP_HOME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set a WORKDIR
otherwise the user will end up in the /
directory with no permissions, which they might require for testing/personal purposes.
The ARG + ENV
combination also allows the user to set a custom directory, in case their host permissions does not allow the one we've chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System users should have no home dir and should not even be able to log in to the machine. Our users should use -u root
when they're logging to the machine. The logic I had in mind is very simple and minimal:
- The whole
runtime
target has its entrypoint executed under the non-privilegedzebra
system user which has no home dir and no login (just as in a regular Linux env). - The Dockerfile sets up the minimal requirements for the
zebra
user to execute the entrypoint. - When our user wants to do some tweaks, they explicitly use
-u root
, which they can always do, and which gives them the clarity that they have full privileges.
I wanted to describe this in our docs in a subsequent PR. Is it OK if we do it like that?
ARG FEATURES | ||
ENV FEATURES=${FEATURES} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the runtime
stage is built with custom FEATURES
this will be propagated by default as an ENV
variable to the entrypoint.sh
, if we remove this instructions, then we need to always specify the FEATURES
environment variable when running the image, otherwise it will be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that var is not used in any meaningful way in entrypoint.sh
, so I removed it. I would like to refactor the way we configure Zebra inside Docker in follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used here:
https://github.com/ZcashFoundation/zebra/pull/8923/files#diff-4f5cabe26761257a4d685a6edc7a43e0fe0f78762f50eeb48530f2bd3b3ee7caR81
and here:
https://github.com/ZcashFoundation/zebra/pull/8923/files#diff-4f5cabe26761257a4d685a6edc7a43e0fe0f78762f50eeb48530f2bd3b3ee7caR100
From our documentation, if we suggest:
docker build -f ./docker/Dockerfile --target runtime --build-arg FEATURES='default-release-binaries prometheus' --tag local/zebra.mining:latest .
The entrypoint
will evaluate the $FEATURES
var and it will be empty, as it was never defined as a variable. This would be confusing for the user as they built it with --build-arg FEATURES='default-release-binaries prometheus'
but the configuration file is not adding the corresponding section, as that argument was not passed as an Environment variable (at build time) to the container.
RUN mkdir -p ${ZEBRA_CONF_DIR} && chown ${UID}:${UID} ${ZEBRA_CONF_DIR} \ | ||
&& chown ${UID}:${UID} ${APP_HOME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a HOME
directory for the application is a good practice, it also starts the container in an empty directory the users can use as they see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System users should have no home dir. Our users should go with -u root
. Another approach would be to execute the entrypoint under root
, and then run zebrad
under the zebra
system user. Our users could then login implicitly as root
without -u root
, but that adds a bit of extra complexity to the entrypoint, which I wanted to keep simple for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not allowing the user to log in with the user running Zebra, then we should document the approach they can use to get a bash/sh
terminal in the container to troubleshoot in it.
prepare_env_vars | ||
|
||
if [[ ! -f "${ZEBRA_CONF_PATH}" ]] && [[ -d "${ZEBRA_CONF_DIR}" ]]; then | ||
ZEBRA_CONF_PATH="${ZEBRA_CONF_DIR}/zebrad.toml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default should be set in the Dockerfile, to keep it as the single source of truth for default variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which var do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the var you defined: ZEBRA_CONF_PATH
. But have the default value defined in the Dockerfile, as we do here:
Line 189 in afeb05f
ENV ZEBRA_CONF_DIR=${ZEBRA_CONF_DIR} |
# Test reconfiguring the the docker image for tesnet. | ||
test-configuration-file-testnet: | ||
name: Test CI testnet Docker config file | ||
# Make sure Zebra can sync the genesis block on testnet | ||
uses: ./.github/workflows/sub-test-zebra-config.yml | ||
with: | ||
test_id: 'testnet-conf' | ||
docker_image: ${{ vars.GAR_BASE }}/${{ vars.CI_IMAGE_NAME }}@${{ inputs.image_digest }} | ||
grep_patterns: '-e "net.*=.*Test.*estimated progress to chain tip.*Genesis" -e "net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter"' | ||
# TODO: improve the entrypoint to avoid using `ENTRYPOINT_FEATURES=""` | ||
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="/etc/zebrad/zebrad.toml" -e ENTRYPOINT_FEATURES=""' | ||
network: 'Testnet' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was created to confirm that any change we do in CI or in Docker doesn't affect the ability to read the proper $NETWORK
environment variable. As it had happened before that some changes breaks this behavior, and then the tests are running in Mainnet
instead of Testnet
, but we realized too late or had to wait for some tests to run to confirm it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing with this PR, similarly to the other one. I have a better approach in mind, which I didn't do in this PR. Let's add it back in a subsequent PR?
Moreover, there seem to be some parts that are bugs or hard to understand. For example, I couldn't figure out what -e NETWORK
in test_variables
is supposed to do. Also, setting ENTRYPOINT_FEATURES
to the empty string to enable the test in the entrypoint makes it very hard to follow the execution path in the whole pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-e NETWORK
tells docker to use whatever value the $NETWORK
env variable is set to, to override any default that was set at build time, or in the Dockerfile. This happens here:
zebra/.github/workflows/sub-test-zebra-config.yml
Lines 90 to 91 in 46c6b6e
env: | |
NETWORK: '${{ inputs.network }}' |
This has saved me (and others) from making mistakes multiple times while making CI refactors, so this is very important under those circumstances.
I do agree that setting the ENTRYPOINT_FEATURES
to an empty string is a dirty hack to make this work, but that's a tech debt that wouldn't justify removing the whole test. In any case, we can remove the use of the ENTRYPOINT_FEATURES
variables, while keeping this test behavior.
I'd suggest commenting this and adding a TODO in top of it, or creating an issue, instead of removing the test. Just so we don't forget later on, as this is an important validation.
# Test that Zebra works using $ZEBRA_CONF_PATH config | ||
test-zebra-conf-path: | ||
name: Test CD custom Docker config file | ||
needs: build | ||
uses: ./.github/workflows/sub-test-zebra-config.yml | ||
with: | ||
test_id: 'custom-conf' | ||
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }} | ||
grep_patterns: '-e "loaded zebrad config.*config_path.*=.*v1.0.0-rc.2.toml"' | ||
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="zebrad/tests/common/configs/v1.0.0-rc.2.toml"' | ||
network: ${{ inputs.network || vars.ZCASH_NETWORK }} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although very simple, the objective of this test is to confirm that the entrypoint
is able to handle a custom configuration path ($ZEBRA_CONF_PATH
), run with it and confirm that path is being used.
This could be extended to validate is running with a mounted file using --mount type=bind,source="$(pwd)"/target,target=/app
as part of the test_variables
, but that's out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test was using a custom config file set in test_variables
.
However, the file was not included in the Docker image, and the
entrypoint script created a new, default one under the original file's
path. Zebra then loaded this new file, and the test passed because the
pattern in grep_patterns
matched Zebra's output containing the
original path, even though the config file was different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test fails in this PR due to the fixes in the entrypoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest commenting this and adding a TODO in top of it, or creating an issue. Just so we don't forget later on.
Either of them I'd suggest indicating something like:
We need to create a test that validates we can mount a configuration file to a different path that the default used by
ZEBRA_CONF_PATH
, and that Zebra runs using this new file and path.
I updated the PR description. |
My yml linter updated the formatting of the |
Do we want to do anything further here? Otherwise we should either merge or close without merging |
I added some review comments which are pending for a reply, and we should also consider the latest interaction we had with some users in Discord, as some changes related to permission handling, mounting a configuration file, and the use of |
I'm planning to address the comments and get the PR approved. |
Motivation & Solution
The current
Dockerfile
andentrypoint.sh
files contain a bunch of bugs. This PR contains the following changes:runtime
Docker stage and switch to it.gosu
.runtime
stage.EXPOSE
instruction in Docker.Tests
Manually test that
zebrad
rund under the newzebra
user:Running
displays
PR Checklist