feat: add support for non-anonymous login for Network Isolated Clusters #5650

AlisonB319 · 2025-01-23T01:05:14Z

What type of PR is this?

This PR adds support for non-anonymous ACR pull. It currently is not available, as there is a front end block on using a non-anonymous ACR. At the moment there is no way to get the kubelet identity, so the login cannot be preformed. However, it was decided that we wanted to have the logic in place.

This is a combination of two PRs
#4879
#5508

TODO:
Going to attach a vmss identity through e2es to test out the functionality of oras login

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
tested upgrade from previous version

Special notes for your reviewer:

Release note:

none

AlisonB319 · 2025-01-23T01:09:31Z

parts/linux/cloud-init/artifacts/cse_config.sh

@@ -405,6 +405,40 @@ getPrimaryNicIP() {
    echo "$ip"
 }

+orasLogin() {


@djsly right now the flow is
check anonymous pull -> success return (this is the current case in prod atm)

check anonymous pull -> failure -> check non-anonymous

In the chat, it was stated to do the anonymous check first, it might be better to do anonymous first since that is what is currently happening in production? When the customer turns off anonymous in the ACR, then the first check should fail and the oras login will take place.

it currently does the anonymous check by pulling a helloworld from the acr

this flow is mostly XInhe's logic, I thought it made sense, but I can adjust as we see fit.

I'm not sure we should download a binary to test access.

Also, I think that they are looking to not support anonymous pull at all in the future, so it might be best to try non-anonymous first ?

parts/linux/cloud-init/artifacts/cse_config.sh

parts/linux/cloud-init/artifacts/cse_helpers.sh

cameronmeissner · 2025-01-23T01:24:55Z

parts/linux/cloud-init/artifacts/cse_helpers.sh

+    local acr_url=$3
+
+    echo "${access_retries} retries for acr access check"
+    sample_image="$acr_url/mcr/hello-world:latest"


this image is used to check that the cache is correct? seems a bit brittle, is there a different way we can check this?

if we need to do it this way, I'd opt for also removing this image from disk afterwards as well

a quick chatgpt check said that I could try pulling a non-existant image
oras pull .azurecr.io/nonexistent:tag
and if it says like error unknown than your logged in, verses an auth error - I'll look into this when I get the e2e testing working

cam@workbook ~$ oras repo Repository operations Usage: oras repo [command] Aliases: repo, repository Available Commands: ls List the repositories under the registry tags Show tags of the target repository Flags: -h, --help help for repo Use "oras repo [command] --help" for more information about a command.

would oras repo ls work?

if use oras pull .azurecr.io/nonexistent:tag, we need to remove ERR_ORAS_PULL_INCORRECT_CACHE error logic

what happens if the hello world image, for some reason ever, is removed? I agree with Cameron, that this is brittle.

cameronmeissner · 2025-01-23T01:27:09Z

parts/linux/cloud-init/artifacts/cse_helpers.sh

@@ -683,4 +714,42 @@ removeKubeletFlag() {
    fi
 }

+oras_login_with_identity() {


we might want to set +x at the top of this so we don't log the AAD or ACR tokens

sounds good, for some reason set +x makes the shellspec unit tests very unhappy, so I'll need to dig a bit here.

they stall out and then throw a huge error message

ohhh I actually ran into this myself recently

I found that if you invoke the function you're trying test using run (as opposed to call) in the spec, then the function will be called within a sub-shell rather than the parent shell where shellspec is actually running - that will prevent the breaking and infinte hang you're seeing

the other option is to just have the caller invoke set instead of doing it directly within the function itself

ah I see you're already using run in your other specs, so yeah you can just do that to test oras_login_with_kubelet_identity as well

cameronmeissner · 2025-01-23T01:28:04Z

parts/linux/cloud-init/artifacts/cse_helpers.sh

+    local client_id=$2
+    local tenant_id=$3
+
+    raw_access_token=$(curl "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F&client_id=$client_id" -H Metadata:true -s)


probably want to wrap the calls to IMDS + ACR exchange endpoint in retries

parts/linux/cloud-init/artifacts/cse_helpers.sh

spec/parts/linux/cloud-init/artifacts/cse_config_spec.sh

anujmaheshwari1 · 2025-01-23T18:03:23Z

parts/linux/cloud-init/artifacts/cse_config.sh

@@ -405,6 +405,40 @@ getPrimaryNicIP() {
    echo "$ip"
 }

+orasLogin() {
+    echo "Checking access to ACR with anonymous pull"
+    logs_to_events "AKS.CSE.orasLogin.retrycmd_acr_access_check_anon" retrycmd_acr_access_check 10 1 "${BOOTSTRAP_PROFILE_CONTAINER_REGISTRY_SERVER}"


nit: keep naming consistent, i.e either consider adding _anon to the actual function call, or remove it from the logs_to_events name

Curious on your thoughts here, having a slightly different name allows me to essentially mock both function calls in the unit test

logs_to_events() { logs_to_events() { case "$1" in "AKS.CSE.orasLogin.retrycmd_acr_access_check_anon") return $ERR_ORAS_PULL_UNAUTHORIZED ;; "AKS.CSE.orasLogin.oras_login_with_kubelet_identity") return 0 ;; "AKS.CSE.orasLogin.retrycmd_acr_access_check_non_anon") return 1 ;; *) return -1 ;; esac }

also while logging, it might make it more clear which attempt its trying, anonymous vs non-anonymous.

However, I am going to tweak the flow so I'll see if I even end up making the retrycmd_acr_access_check call twice

AlisonB319 added 2 commits January 21, 2025 15:37

figuring out what broke

80c9686

add unit tests

6c9f547

AlisonB319 temporarily deployed to test January 23, 2025 01:05 — with GitHub Actions Inactive

Merge branch 'master' into alburgess/oras-identity2

737d623

AlisonB319 temporarily deployed to test January 23, 2025 01:05 — with GitHub Actions Inactive

AlisonB319 changed the title ~~Alburgess/oras identity2~~ feat: add support for non-anonymous login for Network Isolated Clusters Jan 23, 2025

AlisonB319 commented Jan 23, 2025

View reviewed changes