-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relabel-etc: Split remount into, remount and relabel-etc #3267
base: main
Are you sure you want to change the base?
Conversation
Incomplete and untested right now |
4c565cd
to
b18a2ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this!
Conflicts=umount.target | ||
# Run after core mounts | ||
After=-.mount var.mount | ||
After=systemd-remount-fs.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can (and should) be Before=systemd-remount-fs.service
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I was thinking about this more and I think we're in a situation where actually we tell people using composefs in particular to just mount the rootfs writable in the initramfs, so the whole role of systemd-remount-fs.service is a no-op from our PoV basically.
At least...I thought we enforced that, but now I can't find it.
OnFailure=emergency.target | ||
Conflicts=umount.target | ||
# Run after core mounts | ||
After=-.mount var.mount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need var.mount
here.
Actually let's just use:
RequiresMountsFor=/etc
.
Before=systemd-random-seed.service plymouth-read-write.service systemd-journal-flush.service systemd-sysusers.service | ||
Before=systemd-tmpfiles-setup.service systemd-rfkill.service systemd-rfkill.socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop all this stuff, as all of that is After=systemd-remount-fs.service
.
But I'm fine going belt-and-suspenders here with e.g. Before=ostree-remount.service
- which already has all those Before=
itself.
To combine all this I'd suggest:
Before=local-fs-pre.target systemd-remount-fs.service
# This one is already itself After=systemd-remount-fs.service, but for redundancy
Before=ostree-remount.service
glnx_autofd int fd = open (OTCORE_RUN_BOOTED, O_RDONLY | O_CLOEXEC); | ||
if (fd < 0) | ||
{ | ||
/* We really expect that nowadays that everything is done in the initramfs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need this copy here.
src/switchroot/ostree-relabel-etc.c
Outdated
* Today systemd remounts / (recursively) as shared, so we're undoing that as early | ||
* as possible. See also a copy of this in ostree-prepare-root.c. | ||
*/ | ||
if (mount ("none", "/sysroot", NULL, MS_REC | MS_PRIVATE, NULL) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also drop this
[Service] | ||
Type=oneshot | ||
RemainAfterExit=yes | ||
ExecStart=/usr/lib/ostree/ostree-relabel-etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so far we haven't introduced many new little binaries, but instead we've been tending to dispatch via hidden CLI verbs, see e.g. ostree admin finalize-staged
. I'd slightly prefer that, i.e. we introduce a new ostree admin relabel-etc
(which should barf if it's not run under systemd, i.e. we check INVOCATION_ID
etc).
@@ -109,6 +109,11 @@ require_internal_units (const char *normal_dir, const char *early_dir, const cha | |||
< 0) | |||
return glnx_throw_errno_prefix (error, "symlinkat"); | |||
|
|||
if (symlinkat (SYSTEM_DATA_UNIT_PATH "/ostree-relabel-etc.service", normal_dir_dfd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this as is, but just a note: I think we can change things so that the initramfs code itself acts as a generator and writes the enablement symlink into /run
directly then.
As is it's a bit silly because the initramfs writes the file in /run
which says that transient etc is enabled, then we boot in the real root, run a binary which reads that file and exits if the flag isn't found...but we can bypass all that and be slightly more efficient by just telling systemd whether or not to run the binary.
Split these into two different binaries we two different resonsibilities. They don't share so much functionality. Signed-off-by: Eric Curtin <[email protected]>
b18a2ed
to
0a19594
Compare
@@ -0,0 +1,179 @@ | |||
/* | |||
* Copyright (C) 2011 Colin Walters <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangentially related, I think you can update this to be you and or/alex, or just say (C) Red Hat, Inc.
* You should have received a copy of the GNU Lesser General Public | ||
* License along with this library. If not, see <https://www.gnu.org/licenses/>. | ||
* | ||
* Author: Colin Walters <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been dropping this in new code BTW, "git log" is source of truth and I don't think I wrote any of this code.
static void | ||
relabel_dir_for_upper (const char *upper_path, const char *real_path, gboolean is_dir) | ||
{ | ||
#ifdef HAVE_SELINUX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I think this entire binary could become conditional on SELinux actually, i.e. we don't even compile or ship it if built without selinux support.
Hmm, why aren't we doing the relabeling from the initramfs instead? Current kernels allow relabeling before a policy is loaded. This is notably used by Ignition: https://github.com/coreos/ignition/blob/b2ea35258bf5367a3522a4bea7f71063954b740c/internal/exec/util/selinux.go#L80. From my experience, doing relabeling from the real root causes a lot of issues and quickly becomes a game of whack-a-race condition. |
Seems to work at first glance: # https://gitlab.com/fedora/bootc/examples/-/blob/main/transient-etc/Containerfile
FROM quay.io/centos-bootc/centos-bootc:stream9
# Configure the initramfs, see https://github.com/ostreedev/ostree/blob/main/man/ostree-prepare-root.xml
RUN echo -e '[etc]\ntransient=true' >> /usr/lib/ostree/prepare-root.conf && \
set -x; kver=$(cd /usr/lib/modules && echo *); dracut -vf /usr/lib/modules/$kver/initramfs.img $kver I
(Though you'd normally add |
Relabeling from the initramfs indeed sounds way better. The only thing I'm a little uncertain about is the semantics for how e.g. |
I like the idea of using setfiles or something similar, less code to maintain... I think selinux=0, etc. is easy to handle in systemd .service file: ConditionKernelCommandLine=!selinux=0 We may be able to achieve this with zero C code here, just write a well crafted systemd .service file. Although, I would rather we relabelled from rootfs, even if it meant running this .service file in it's own exclusive time slot. We put a lot of effort in Automotive for example to make our initramfs super tiny like < 10M . Ideally we wouldn't relabel on boot at all tbh, it's another thing to do during boot and again we are trying to cut down on doing tasks at boot to make that as fast as possible. Any reason we can't do this at image composition time or shutdown? /.autorelabel for example seems to perform relabeling on shutdown |
Automotive is using transient etc, this is about the in-memory inodes for the overlayfs. There's no real way to shortcut this today, we have a kind of complex interlock between selinux policy loading vs the mount. However as far as the cost...slimming down what's in |
Split these into two different binaries we two different resonsibilities. They don't share so much functionality.