Rootless Docker/Optimized build #932

PseudoResonance · 2025-01-19T04:35:02Z

This is a replacement for #894

Changes

Add unneeded files to .dockerignore (I chose to leave security/license in the image, as those could be useful if someone inspects it)
In addition to feat(docker): copy PHP extensions from builder stage to speedup the b… #918, split Dockerfile into more stages to allow Composer/Yarn to install concurrently
Ex: The stages 2-1, 2-2 can run independently of each other, and so can 3-1 and 3-2.
Don't log supervisord to a file, as file logging in a Docker container makes no sense
Redirect process output to supervisord/container output for log processors
It's generally important to not simply discard all logs. Let the Docker host handle logfile rotation if using file logging, or let the host redirect to their desired log processor, such as Grafana Loki.
Run all processes as non-root
Replace cron with supercronic, as it's designed to easily run as non-root and redirect output wherever you want it
Minimize files with write permission for non-root user
The majority of the build should ideally be left as root, so that in the event there is a vulnerability, the user would be unable to easily overwrite the working code. I don't think there's a good solution to the Laravel cache however.
Move docker folder out of .github, as it has nothing to do with GitHub
Use php-extension-installer wrapper that handles automatically adding build dependencies and removing dev dependencies after building to simplify package list.

Additional Information

As in #894, I separated the install/build stages for Composer/Yarn so that the install stage can be skipped from cache if the code is unchanged. I know there were questions about it in the last PR, but this is standard practice in Dockerfiles. First copy the list of dependencies to install and install them, then copy the remainder of the code.

After these changes, I was able to iterate on the build process and rebuild the docker image in mere seconds, compared to the several minutes it took previously, as the whole image had to be rebuilt every time because the code was copied early on.

Important Note

I am using COPY --exclude=Caddyfile --exclude=docker/ . ./ to copy files, which is a new addition to the Dockerfile copy directive which is currently only present in the 1.7 labs spec of Dockerfile. It's been around for a year now and I don't expect this basic use of it to break. This does require declaring the 1.7-labs spec at the top of the file though, which should eventually be removed when exclude is merged to stable.

Other Options

Instead of using supercronic and supervisord and everything to run as non-root, sidecar containers can also be added. This is how some other projects, like Nextcloud, deal with running cron tasks, and running a more configurable HTTP server.

Add unneeded files to .dockerignore Split Dockerfile into more stages to allow Composer/Yarn to run concurrently Don't log supervisord to a file, as file logging in a Docker container makes no sense Redirect process output to container output for log processors Run all processes as non-root Minimize files with write permission for non-root user Move docker folder out of .github, as it has nothing to do with GitHub

PseudoResonance · 2025-01-19T04:37:37Z

This additionally resolves the issue mentioned in #894, where the entrypoint breaks permissions, as everything is run as www-data now.

PseudoResonance · 2025-01-19T05:46:39Z

I just tried rerunning this build on GitHub Actions, and while it's not a very scientific test, it saved quite some time. I think mostly due to the parallelization.

	Original	This PR
amd64	4:00	3:04
arm64	6:38	3:11

It appears there is a bug with Docker Buildx GitHub Actions caching when using multiple OS runners... So the caching actually only helps on one of the 2 images at random...

ed1473f fixes this by specifying separate scopes for each OS.

As an example, first run used 0% cache, 2nd run with 0 changes finished in seconds, and a code change. ran in a bit over a minute, because dependency stages were cached.

.github/workflows/docker-publish.yml

Dockerfile

… supervisord user

QuintenQVD0 · 2025-01-19T10:26:54Z

I can confirm the build time is faster.

I can confirm it does migrate the db
The slow arm64 builder will be fixt by GitHub as there are some known issues with it.

This is it build from inside the container https://paste.pelistuff.com/YAA6pK8PXW7HbKMsrfiyCGyF

PseudoResonance · 2025-01-19T10:34:47Z

I can confirm it does migrate the db The slow arm64 builder will be fixt by GitHub as there are some known issues with it.

I'm not really sure why the arm64 builds were so slow on GitHub prior to what I changed. I don't know if anything I changed affected it in particular, but I have also heard that the runners might be buggy still.

Also I think I got all the permissions correct, but if anyone knows there's some other place that needs write access, that should be added too. As far as I know, only database, storage and bootstrap/cache get written to?

PseudoResonance · 2025-01-21T06:10:05Z

Here are the current permissions after the latest commit.

/var/www/html

drwxr-x---    1 root     www-data      4096 Jan 20 13:48 .
drwxr-xr-x    1 root     root          4096 Jan 17 01:27 ..
lrwxrwxrwx    1 root     root            18 Jan 20 13:48 .env -> /pelican-data/.env
-rw-r-----    1 root     www-data       121 Jan  7 05:11 .env.example
drwxr-x---    1 root     www-data      4096 Jan 19 08:03 app
-rw-r-----    1 root     www-data       350 Jan  7 05:11 artisan
drwxr-x---    1 root     www-data      4096 Jan  8 13:06 bootstrap
-rw-r-----    1 root     www-data      3332 Jan 19 05:29 composer.json
-rw-r-----    1 root     www-data    516684 Jan 19 05:29 composer.lock
drwxr-x---    1 root     www-data      4096 Jan 19 08:03 config
-rw-r-----    1 root     www-data       495 Jan  7 05:11 crowdin.yml
drwxr-x---    1 root     www-data      4096 Jan 20 13:48 database
drwxr-xr-x    2 root     root          4096 Jan 20 13:48 docker
drwxr-x---    1 root     www-data      4096 Jan  7 05:11 lang
-rw-r-----    1 root     www-data     34524 Jan  7 05:11 license
-rw-r-----    1 root     www-data       532 Jan 19 10:10 package.json
-rw-r-----    1 root     www-data       227 Jan  7 05:11 pint.json
-rw-r-----    1 root     www-data       143 Jan  8 13:06 postcss.config.js
drwxr-x---    1 root     www-data      4096 Jan 19 08:03 public
drwxr-x---    1 root     www-data      4096 Jan  8 13:32 resources
drwxr-x---    1 root     www-data      4096 Jan 19 08:03 routes
-rw-r-----    1 root     www-data       392 Jan  7 05:11 security.md
drwxrwx---    1 www-data www-data      4096 Jan  7 05:11 storage
-rw-r-----    1 root     www-data       400 Jan  8 13:06 tailwind.config.js
drwxr-x---    1 root     www-data      4096 Jan  7 05:11 tests
drwxr-x---    1 root     www-data      4096 Jan 20 13:48 vendor
-rw-r-----    1 root     www-data       359 Jan  8 13:06 vite.config.js
-rw-r-----    1 root     www-data     42996 Jan 19 10:10 yarn.lock

/var/www/html/bootstrap

drwxr-x---    1 root     www-data      4096 Jan  8 13:06 .
drwxr-x---    1 root     www-data      4096 Jan 20 13:48 ..
-rw-r-----    1 root     www-data      2293 Jan  8 13:06 app.php
drwxrwx---    1 www-data www-data      4096 Jan 21 05:54 cache
-rw-r-----    1 root     www-data       476 Jan  8 13:06 providers.php
-rw-r-----    1 root     www-data      1442 Jan  7 05:11 tests.php

/pelican-data

drwxrwx---    4 www-data www-data      4096 Jan 21 05:54 .
drwxr-xr-x    1 root     root          4096 Jan 21 05:54 ..
-rw-r--r--    1 www-data www-data        61 Jan 21 05:54 .env
drwx------    5 www-data www-data      4096 Jan 21 05:54 caddy
drwxr-xr-x    2 www-data www-data      4096 Jan 21 06:01 database

QuintenQVD0 · 2025-01-21T06:57:40Z

docker/README.md

Don't do this, keep the releated docker files in .github/docker please

I really don't think they should be in .github though. They have nothing to do with Github... Additionally, it complicates the .dockerignore, because you don't expect anything in .github to be in the final container, but you do need things like the entrypoint.

From me, I think it's better to have them in docker, rather than .github.

compose.yml

QuintenQVD0 · 2025-01-21T07:02:02Z

docker/supervisord.conf

@@ -39,5 +35,12 @@ command=caddy run --config /etc/caddy/Caddyfile --adapter caddyfile
 autostart=%(ENV_SUPERVISORD_CADDY)s
 autorestart=%(ENV_SUPERVISORD_CADDY)s
 priority=10
-stdout_events_enabled=true
-stderr_events_enabled=true
+stdout_logfile=/dev/fd/1


What is the point in switching to redirect_stderr instad of the stderr_events_enabled and why did you disable the stdout_events_enabled?

There's no event listener configured, so stdout_events_enabled and stderr_events_enabled are pointless.

Also sorry, I had to check and confirm it to be sure and forgot to mention. The redirect options redirect logs to stdout/stderr as expected of the name, but the events redirect it to special event listeners that can ex: send emails on errors.

QuintenQVD0 · 2025-01-21T07:02:40Z

docker/entrypoint.sh

-#mkdir -p /var/log/supervisord/ /var/log/php8/ \
-
-## check for .env file and generate app keys if missing
-if [ -f /pelican-data/.env ]; then


This should stay so if they mess up the mount it fails

The .env is symlinked to the mounted .env, so assuming they haven't done anything special, it will still fail. I did that to allow the .env to be overwritten by its own mount if anyone desires, while still retaining the original functionality.

As for why .env might be mounted separately, I think it'd make more sense to do that if run in Kubernetes. (You probably want to mount it from a secret, and it would allow the panel to be scaled more easily once setup if desired.) I know that's not really a big goal for Pelican, but if the functionality is the same, I don't see the point in making that harder.

QuintenQVD0 · 2025-01-21T07:05:13Z

docker/entrypoint.sh

@@ -59,7 +49,5 @@ else
  export SUPERVISORD_CADDY=true
 fi

-chown -R www-data:www-data /pelican-data/.env /pelican-data/database


This should stay as if for some reason the perms change a new key will be generated and the db will be nuked

You can't, because the container is run as www-data, so it lacks permission to change ownership. If the permissions change, it can't overwrite the files anyways, so it shouldn't matter? For example, if .env is restricted and can't be read so it decides to make a new key, writing .env will also fail anyways? Is there something I'm missing?

QuintenQVD0 · 2025-01-21T07:06:30Z

docker/supervisord.conf

@@ -4,16 +4,13 @@ username=dummy
 password=dummy

 [supervisord]
-logfile=/var/log/supervisord/supervisord.log    ; supervisord log file


Please link to one the volumes as it might be useful to debug if something goos wrong

This change also was along with my philosophy of logging to Docker.

I know that you disagree with my opinion, but I'd like to try discussing it with you to understand why, or if maybe I could provide another acceptable solution that can meet both our needs.

I can remove it for now and do another PR later or something too, if you wanted to talk about that separately.

Why not do both for now?

Log to a file that is in a volume so it is saved but also log to stdout so docker can log it.

This is easier to debug if there are any problems as most log files would be in one place

I'm not an expert on supervisor, so I'm not 100% sure if it's possible, but I'll try to take a look tomorrow. Surely there's some way.

Oh, I misunderstood how the logging works. I thought supervisord would capture child logs and output them to that log file as well, but that's not true. I'm not sure if that's what you were thinking as well, but the log file is actually not particularly important, as it's already limited to error logs only, so in theory, there is never anything logged.

When it's set to just syslog as the log file, then supervisord's own logs go to the standard log.

As seen in this sample log from the documentation, the only thing at level error or above is that the internal HTTP server has no authentication. Not even processes dying makes it in as an error.

2007-09-08 14:43:22,886 DEBG 127.0.0.1:Medusa (V1.11) started at Sat Sep 8 14:43:22 2007 Hostname: kingfish Port:9001 2007-09-08 14:43:22,961 INFO RPC interface 'supervisor' initialized 2007-09-08 14:43:22,961 CRIT Running without any HTTP authentication checking 2007-09-08 14:43:22,962 INFO supervisord started with pid 27347 2007-09-08 14:43:23,965 INFO spawned: 'listener_00' with pid 27349 2007-09-08 14:43:23,970 INFO spawned: 'eventgen' with pid 27350 2007-09-08 14:43:23,990 INFO spawned: 'grower' with pid 27351 2007-09-08 14:43:24,059 DEBG 'listener_00' stderr output: /Users/chrism/projects/supervisor/supervisor2/dev-sandbox/bin/python: can't open file '/Users/chrism/projects/supervisor/supervisor2/src/supervisor/scripts/osx_eventgen_listener.py': [Errno 2] No such file or directory 2007-09-08 14:43:24,060 DEBG fd 7 closed, stopped monitoring <PEventListenerDispatcher at 19910168 for <Subprocess at 18892960 with name listener_00 in state STARTING> (stdout)> 2007-09-08 14:43:24,060 INFO exited: listener_00 (exit status 2; not expected) 2007-09-08 14:43:24,061 DEBG received SIGCHLD indicating a child quit

QuintenQVD0 · 2025-01-21T07:08:08Z

docker/supervisord.conf

@@ -30,7 +27,6 @@ autorestart=true

 [program:queue-worker]
 command=/usr/local/bin/php /var/www/html/artisan queue:work --tries=3
-user=www-data


I'm assuming you removed this because you intend supervisord to run as www-data? if not then add this back

Everything in the container is non-root, and it's www-data in the current Dockerfile, however if the user is ever changed for whatever reason (maybe you want it to be pelican down the line? I don't know) supervisord will get mad, so I figured it's pointless to keep it anyways if it's running as non-root anyways.

.dockerignore

PseudoResonance added 2 commits January 18, 2025 21:27

Remove install-php-extensions utility after use and name final stage

74f65c9

Test arm64 runner

20e46b5

Allow Docker workflow caching multi-arch separately

ed1473f

QuintenQVD0 reviewed Jan 19, 2025

View reviewed changes

.github/workflows/docker-publish.yml Outdated Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

Dockerfile Show resolved Hide resolved

Fix Docker publish workflow branches

fd3cce2

PseudoResonance mentioned this pull request Jan 19, 2025

Speed up Docker build and fix entrypoint permissions #894

Closed

Move Caddyfile/crontab config into docker directory, remove redundant…

30f765e

… supervisord user

Further restrict permissions

6de2a72

lancepioch approved these changes Jan 21, 2025

View reviewed changes

lancepioch assigned QuintenQVD0 Jan 21, 2025

QuintenQVD0 reviewed Jan 21, 2025

View reviewed changes

Supervisord logs

0336e55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rootless Docker/Optimized build #932

Rootless Docker/Optimized build #932

PseudoResonance commented Jan 19, 2025 •

edited

Loading

PseudoResonance commented Jan 19, 2025

PseudoResonance commented Jan 19, 2025 •

edited

Loading

QuintenQVD0 commented Jan 19, 2025

PseudoResonance commented Jan 19, 2025 •

edited

Loading

PseudoResonance commented Jan 21, 2025

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025

alexevladgabriel Jan 21, 2025

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025 •

edited

Loading

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025 •

edited

Loading

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 22, 2025

QuintenQVD0 Jan 21, 2025

PseudoResonance Jan 21, 2025 •

edited

Loading

Rootless Docker/Optimized build #932

Are you sure you want to change the base?

Rootless Docker/Optimized build #932

Conversation

PseudoResonance commented Jan 19, 2025 • edited Loading

Changes

Additional Information

Important Note

Other Options

PseudoResonance commented Jan 19, 2025

PseudoResonance commented Jan 19, 2025 • edited Loading

QuintenQVD0 commented Jan 19, 2025

PseudoResonance commented Jan 19, 2025 • edited Loading

PseudoResonance commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PseudoResonance Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PseudoResonance Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PseudoResonance Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

PseudoResonance commented Jan 19, 2025 •

edited

Loading

PseudoResonance commented Jan 19, 2025 •

edited

Loading

PseudoResonance commented Jan 19, 2025 •

edited

Loading

PseudoResonance Jan 21, 2025 •

edited

Loading

PseudoResonance Jan 21, 2025 •

edited

Loading

PseudoResonance Jan 21, 2025 •

edited

Loading