Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): mount EmptyDir volumes for launcher write locations #10538

Closed

Conversation

gregsheremeta
Copy link
Contributor

Launcher writes input artifacts to root paths /gcs, /minio, and /s3. These paths are not accessible by non-root users by default, which is problematic in locked-down Kubernetes installations and/or OpenShift. /gcs is currently a contract for KFP v2 python component wrappers, so the path cannot be changed.

Mount an EmptyDir scratch volume to these paths to work around this.

Additionally, /.local and /.cache are written to by pip, so add EmptyDir mounts for those too.

Fixes: #5673
Fixes: #7345

Launcher writes input artifacts to root paths /gcs, /minio, and /s3.
These paths are not accessible by non-root users by default, which is
problematic in locked-down Kubernetes installations and/or OpenShift.
/gcs is currently a contract for KFP v2 python component wrappers, so
the path cannot be changed.

Mount an EmptyDir scratch volume to these paths to work around this.

Additionally, /.local and /.cache are written to by pip, so add
EmptyDir mounts for those too.

Fixes: kubeflow#5673
Fixes: kubeflow#7345
Copy link

Hi @gregsheremeta. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chensun for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gregsheremeta, just wondering is there anyway we can reduce the amount of the volume mount definition? Due to the k8s yaml size limitation, we won't able to fit more tasks if the individual container definition is too big.

@Tomcli
Copy link
Member

Tomcli commented Mar 4, 2024

/ok-to-test

Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregsheremeta You also need to update the unit tests because you modified the container spec in this PR.

Copy link

@gregsheremeta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-samples-v2 e46440b link false /test kubeflow-pipelines-samples-v2
kubeflow-pipeline-backend-test e46440b link true /test kubeflow-pipeline-backend-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rimolive
Copy link
Member

@Tomcli Maybe add a condition to mount one of the 3 possible options?

@andraxin
Copy link

This will not solve the Permission denied error reported in #10397,
unless a version of the v2 launcher that includes this fix is used.

BTW: where/how do I tell Kubeflow which launcher image to use?

That's because the root cause for the original error is that the installer creates
(or used to create, after the fix) directories with mode 0644, i.e. drw-r--r--,
which makes it impossible for unprivileged users to create subdirectories within,
but doesn't hamper root...

Brief Demo

$ docker run --rm -it alpine
/ # adduser -D demo
/ # su - demo
05decfa077fd:~$ mkdir -m 0644 demo
05decfa077fd:~$ mkdir demo/test
mkdir: can't create directory 'demo/test': Permission denied
05decfa077fd:~$ 
/ # mkdir /home/demo/demo/test
/ # ls -lR /home/demo
/home/demo:
total 4
drw-r--r--    3 demo     demo          4096 Apr 14 11:16 demo

/home/demo/demo:
total 4
drwxr-xr-x    2 root     root          4096 Apr 14 11:16 test

/home/demo/demo/test:
total 0
/ # 

BTW, that also explains why creating alternate images with world-read-write
/minio pre-created (i.e. RUN mkdir -m 1777 /minio) doesn't help.

I tracked this down by mounting a PVC at /minio and observing permissions
on the first level (that does get created).

@rimolive
Copy link
Member

unless a version of the v2 launcher that includes this fix is used.

This fix was introduced in KFP 2.0.2

@andraxin
Copy link

unless a version of the v2 launcher that includes this fix is used.

This fix was introduced in KFP 2.0.2

Yes, I'm well aware. GitHub shows that quite prominently.

fix(backend): OutPutPath dir creation mode Fixes https://github.com/kubeflow/pipelines/issues/7629 (https://github.com/kubeflow/pipelines/pull/9946)
[master](https://github.com/kubeflow/pipelines) ([#9946](https://github.com/kubeflow/pipelines/pull/9946))
[sdk-2.7.0](https://github.com/kubeflow/pipelines/releases/tag/sdk-2.7.0)
[2.0.2](https://github.com/kubeflow/pipelines/releases/tag/2.0.2)

BUT, that doesn't really help me or answer my question.
The launcher that's spawned by default in the Kubeflow v1.7.x
deployment I have to use apparently pre-dates that; and,
I have no idea how/if Kubeflow can be instructed to grab
a newer launcher with the fix.

@rimolive
Copy link
Member

Kubeflow 1.7 comes with an alpha release of KFP. You should either fully upgrade your kubeflow installation to 1.8 or you can upgrade individually kfp. Just a reminder that if you choose the later one, you need to use the multi-user manifests, as you probably deployment Kubeflow w/ multi-user support.

@andraxin
Copy link

Kubeflow 1.7 comes with an alpha release of KFP. You should either fully upgrade your kubeflow installation to 1.8 or you can upgrade individually kfp. Just a reminder that if you choose the later one, you need to use the multi-user manifests, as you probably deployment Kubeflow w/ multi-user support.

I guess I wasn't sufficiently clear, when I said I have to use Kubeflow 1.7.
I cannot upgrade the installation nor apply any manifests at the cluster level.
Applying manifest in my profile's namespace is about as far as I can reach.

My re-stated question becomes: as a Kubeflow user, can I somehow instruct
Kubeflow to use a newer launcher image to run my pipelines?

BTW, I can use custom images for my pipeline stages with world-read-write
/gcs, /minio, and /s3, which renders these emptyDir volume mounts moot.
If I could only tell Kubeflow to use a launcher with the fix, that is.

@rimolive
Copy link
Member

rimolive commented Apr 17, 2024

My re-stated question becomes: as a Kubeflow user, can I somehow instruct
Kubeflow to use a newer launcher image to run my pipelines?

If, and only if, you upgrade your KFP installation to at least 2.0.5 release, you can parametrize the launcher and driver images.

@andraxin
Copy link

My re-stated question becomes: as a Kubeflow user, can I somehow instruct
Kubeflow to use a newer launcher image to run my pipelines?

If, and only if, you upgrade your KFP installation to at least 2.0.5 release, you can parametrize the launcher and driver images.

I see. Thanks for the info, and your patience! 😏

Given that the code changes you linked appear to be
in backend Golang code, what you call "KFP installation"
probably includes more than just pip install kfp==2.0.2
inside a notebook in the existing Kubeflow 1.7 cluster.
Is that correct?

@rimolive
Copy link
Member

rimolive commented Apr 17, 2024

Yes, this is not a simple SDK upgrade or code changes. You will need to upgrade your entire KFP installation.

@majuss
Copy link

majuss commented May 29, 2024

@rimolive can you please describe a hotfix how I can use this patch with kubeflow 1.8.1. Because for now kfp is not usable :(

@rimolive
Copy link
Member

@rimolive can you please describe a hotfix how I can use this patch with kubeflow 1.8.1. Because for now kfp is not usable :(

KFP 2.0.5 should already have this fix, but this issue is different (mount emptyDir volumes in launcher) and we don't have the PR merged yet.

@thesuperzapper
Copy link
Member

@chensun @HumairAK @Tomcli we definitely need to prioritize fixing this issue, because it's pretty bad to have a hard requirement on root container images, see #10397 for an example of user having this problem.

@droctothorpe
Copy link
Contributor

droctothorpe commented May 30, 2024

FYI, this comment describes a short term solution (that depends on kyverno): #10397 (comment)

@HumairAK
Copy link
Collaborator

HumairAK commented May 31, 2024

@gregsheremeta is currently out,

due to the expressed urgency and prioritzation of this, I'm picking this PR up here: #10857

please forward any further discussions to the pr above

I'll bring this up in the next KFP call as well if it is not merged by then

/close

@google-oss-prow google-oss-prow bot closed this May 31, 2024
Copy link

@HumairAK: Closed this PR.

In response to this:

@gregsheremeta is currently out,

due to the expressed urgency and prioritzation of this, I'm picking this PR up here: #10857

please forward any further discussions to the pr above

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants