Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mirror Maker OAuth Authentication with Client Assertion expiration #11002

Open
MarekLani opened this issue Jan 2, 2025 · 24 comments
Open

Comments

@MarekLani
Copy link

MarekLani commented Jan 2, 2025

Bug Description

I am using OAuth authentication with Client Assertion from within Mirror Maker. I have configured Mirror Maker to pickup the client assertion from Kubernetes secret. Since the client assertion JWT has its expiration, I do have an mechanism which generates new client assertions and updates Kubernetes Secret. Mirror Maker however seems not to have a mechanism to pickup the updated version of the secret and it locks to the value of secret set during the pod/container start and naturaly fails to authenticate to target Kafka cluster after some time (in my case it is Azure Event Hubs Kafka interface)

Given the limited time validity of client assertions I would need the Mirror Maker to be able to force refresh the client assertion value. Please what is the suggeste approach here?

Steps to reproduce

See above and provided configuration.

Expected behavior

I would expect Mirror Maker to pickup the new value of the client assertion secret

Strimzi version

0.44

Kubernetes version

1.30.6

Installation method

Helm Chart

Infrastructure

Azure Kubernetes Service

Configuration files and logs

Mirror Maker configuration:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: mymirrormaker
  namespace: kafka
spec:
  version: 3.8.0
  replicas: 1
  connectCluster: "eventhub"
  clusters:
  - alias: "my-cluster"
    bootstrapServers: x.x.x.x:9092
  - alias: "eventhub"
    bootstrapServers: myeh.servicebus.windows.net:9093
    config:
      config.storage.replication.factor: 1
      offset.storage.replication.factor: 1
      status.storage.replication.factor: 1
      producer.connections.max.idle.ms: 180000
      producer.metadata.max.age.ms: 180000
    authentication:
      type: oauth
      tokenEndpointUri: https://login.microsoftonline.com/<tenantid>/oauth2/v2.0/token
      clientId: <clientid>
      scope: https://myeh.servicebus.windows.net/.default
      clientAssertion:
        secretName: client-assertion-secret
        key: client-assertion
    tls:
      trustedCertificates: []
  mirrors:
  - sourceCluster: "my-cluster"
    targetCluster: "eventhub"
    sourceConnector:
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false"
    heartbeatConnector:
      config:
        heartbeats.topic.replication.factor: 1
    checkpointConnector:
      config:
        checkpoints.topic.replication.factor: 1
    topicsPattern: ".*"
    groupsPattern: ".*"

Additional context

No response

@scholzj
Copy link
Member

scholzj commented Jan 2, 2025

I don't think this is a bug. The client connector is instantiated once and is not reconfigured by Kafka MirrorMaker2/Connect.

@MarekLani
Copy link
Author

Agree this is more of a generic question about design, nevertheless when the access token expires, reauthentication is needed and for that the fresh client assertion is needed. I was thinking about using the clientAssertionPath instead where I would use volume shared with additional custom pod, which would take care of updating the secret (client assertion). Would this work?

If there is no way to periodically update the client assertion it essentialy makes it unusable, but there might be some gap in my udnerstanding of the process.

@scholzj
Copy link
Member

scholzj commented Jan 2, 2025

Maybe you need to use it in the way that allows the library to refresh the token - as far as I know if you configure it with client id and secret it will auto-refresh the tokens.

@MarekLani
Copy link
Author

MarekLani commented Jan 2, 2025

I am looking for as secure approach as possible and use of certificates/client assertion is preffered for that reason. I understand strimzi CRD supports feeding the client assertion to underlying Kafka libraries via yaml file and I would expect kafka libraries to be able to request new token. Please can you help me understand where exactly the gap which prevents reauthentication is? Is it udenrlying Kafka or strimzi? Thank you.

@scholzj
Copy link
Member

scholzj commented Jan 2, 2025

As I said, the assertion is loaded when the container starts and not updated later.

@MarekLani
Copy link
Author

Sorry for delay, I needed some time for testing. I have explored approach with clientAssertionPath. I was able to validate that if I change the client assertion value on the provided path, the reauthentication of MirrorMaker against event hubs works. However there is additional problem to this approach. I was not able to figure out how to create a volume mapping for the Mirror Maker container spinned up thru Strimzi CRD. Is this possible? I would like to have a standalone container or a pod, making care of the refresh of client assertion and propagating the update to mirror maker pod via volume

@scholzj
Copy link
Member

scholzj commented Jan 7, 2025

The assertion is mounted from a file from a mounted Secret, or? That normally auto-updates when you update the Secret. But the assertion is loaded at the startup. So I do not think you can reload it regardless of how you update the file.

@scholzj
Copy link
Member

scholzj commented Jan 7, 2025

Actually, I guess it depends whether you use it in the source or target cluster. For source cluster, it is loaded through the FileConfigProvider when the connector (task?) is started. But that will definitely not do any periodic reloading of the file either.

@MarekLani
Copy link
Author

So this is my auth specification for target cluster Event Hub

    authentication:
      type: oauth
      tokenEndpointUri: https://login.microsoftonline.com/tenantid/oauth2/v2.0/token
      clientId: <clientId>
      scope: https://<sbname>.servicebus.windows.net/.default
      clientAssertionLocation: /output/jwt.txt

I produce the assertion jwt.txt file using cron job baked into custom Mirror Maker 2 Image which I reference in Strimzi Operator config:

FROM quay.io/strimzi/kafka:0.45.0-kafka-3.8.0

USER root

# Install necessary tools if not already available
RUN microdnf install -y openssl util-linux cronie && microdnf clean all

# Ensure directories exist and have appropriate permissions
RUN mkdir -p /var/run /var/log /etc/cron.d /output && chmod -R 777 /var/run /var/log /etc/cron.d /output

# Copy the script and necessary files into the container
COPY jwtgen.sh /usr/local/bin/jwtgen.sh
# Script generating Client Assertion JWT
COPY service-principal.pem /usr/local/bin/service-principal.pem
COPY private.key /usr/local/bin/private.key

# Make the script executable
RUN chmod +x /usr/local/bin/jwtgen.sh

# Add a cron job to execute the script every minute
RUN echo "* * * * * root PATH="/usr/bin:/bin" /usr/bin/sh /usr/local/bin/jwtgen.sh" > /etc/cron.d/jwtgen-cron
RUN chmod 0644 /etc/cron.d/jwtgen-cron
RUN crontab /etc/cron.d/jwtgen-cron

# Custom entrypoint script
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh

# Set the custom entrypoint
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

And this works I am not seeing authentication errors as I have seen when referencing client assertion from secret.

Nevertheless I'd like to prevent creation of custom docker image and running the cron job and as I mentioned rather have additional custom pod generating the assertion and writting it into a volume or a secret.

I was trying to achieve this but I am unable to add a volume into Mirror Maker pod. In the Mirror Maker 2 CRD spec I found the option to add a volume and volume mounts like so:

  template:
    pod:
	  volumes:
        - name: secret-volume
          secret:
            secretName: my-secret
	connectContainer:
	  volumeMounts:
        - name: secret-volume
          mountPath: "/mnt/my-secret"
          readOnly: true

but no mattter what mount path I use I do get following error from operator when trying to spin up the Mirror Maker:

Message: Pod "mymirrormaker-mirrormaker2-0" is invalid: spec.containers[0].volumeMounts[3].mountPath: Invalid value: "/mnt/my-secret": must be unique. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[3].mountPath, message=Invalid value: "/mnt/my-secret": must be unique, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=mymirrormaker-mirrormaker2-0, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "mymirrormaker-mirrormaker2-0" is invalid: spec.containers[0].volumeMounts[3].mountPath: Invalid value: "/mnt/my-secret": must be unique, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).

@scholzj
Copy link
Member

scholzj commented Jan 8, 2025

I guess you would need to share the full custom resource to understand why you get the error.

@MarekLani
Copy link
Author

Of course here is the Mirror Maker definition I use:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: mymirrormaker
  namespace: kafka
spec:
  version: 3.8.0
  replicas: 1
  connectCluster: "eventhub"
  clusters:
  - alias: "my-cluster"
    bootstrapServers: <kafkaip>:9092
  - alias: "eventhub"
    bootstrapServers: <ehanme>.servicebus.windows.net:9093
    config:
      config.storage.replication.factor: 1
      offset.storage.replication.factor: 1
      status.storage.replication.factor: 1
      producer.connections.max.idle.ms: 180000
      producer.metadata.max.age.ms: 180000
    authentication:
      type: oauth
      tokenEndpointUri: https://login.microsoftonline.com/<tenantId>/oauth2/v2.0/token
      clientId: <clientId>
      scope: https://<ehname>.servicebus.windows.net/.default
      clientAssertionLocation: /output/jwt.txt
    tls:
      trustedCertificates: []
  mirrors:
  - sourceCluster: "my-cluster"
    targetCluster: "eventhub"
    sourceConnector:
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false"
    heartbeatConnector:
      config:
        heartbeats.topic.replication.factor: 1
    checkpointConnector:
      config:
        checkpoints.topic.replication.factor: 1
    topicsPattern: ".*"
    groupsPattern: ".*"
  template:
    pod:
	  volumes:
        - name: secret-volume
          secret:
            secretName: my-secret
	connectContainer:
	  volumeMounts:
        - name: secret-volume
          mountPath: "/mnt/my-secret"
          readOnly: true

@scholzj
Copy link
Member

scholzj commented Jan 8, 2025

There is some misalignment in the YAML. But I guess that was caused when copy pasting it as it would be otherwise invalid. But I do not seem to be able to reproduce the invalid pod issue. Maybe you can get the StrimziPodSet resource that contains the Pod definition and we will see there what is invalid on it (kubectl get sps mymirrormaker-mirrormaker2 -o yaml).

@MarekLani
Copy link
Author

MarekLani commented Jan 8, 2025

Yes, sorry and copy paste error.
Here is the output, I see the problem, but not sure where is it comming from. There is duplicit volume mount:

apiVersion: core.strimzi.io/v1beta2
kind: StrimziPodSet
metadata:
  creationTimestamp: "2025-01-07T20:58:11Z"
  generation: 4
  labels:
    app.kubernetes.io/instance: mymirrormaker
    app.kubernetes.io/managed-by: strimzi-cluster-operator
    app.kubernetes.io/name: kafka-mirror-maker-2
    app.kubernetes.io/part-of: strimzi-mymirrormaker
    strimzi.io/cluster: mymirrormaker
    strimzi.io/component-type: kafka-mirror-maker-2
    strimzi.io/kind: KafkaMirrorMaker2
    strimzi.io/name: mymirrormaker-mirrormaker2
  name: mymirrormaker-mirrormaker2
  namespace: kafka
  ownerReferences:
  - apiVersion: kafka.strimzi.io/v1beta2
    blockOwnerDeletion: false
    controller: false
    kind: KafkaMirrorMaker2
    name: mymirrormaker
    uid: c0e93c9f-998c-4021-8932-3cd6fdd33371
  resourceVersion: "1744207"
  uid: 27eb9ac7-2550-4cc6-9a57-87b4c17073fb
spec:
  pods:
  - apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        strimzi.io/auth-hash: "0"
        strimzi.io/logging-hash: 06ee78c4
        strimzi.io/revision: 23c60006
      labels:
        app.kubernetes.io/instance: mymirrormaker
        app.kubernetes.io/managed-by: strimzi-cluster-operator
        app.kubernetes.io/name: kafka-mirror-maker-2
        app.kubernetes.io/part-of: strimzi-mymirrormaker
        statefulset.kubernetes.io/pod-name: mymirrormaker-mirrormaker2-0
        strimzi.io/cluster: mymirrormaker
        strimzi.io/component-type: kafka-mirror-maker-2
        strimzi.io/controller: strimzipodset
        strimzi.io/controller-name: mymirrormaker-mirrormaker2
        strimzi.io/kind: KafkaMirrorMaker2
        strimzi.io/name: mymirrormaker-mirrormaker2
        strimzi.io/pod-name: mymirrormaker-mirrormaker2-0
      name: mymirrormaker-mirrormaker2-0
      namespace: kafka
    spec:
      affinity: {}
      containers:
      - args:
        - /opt/kafka/kafka_mirror_maker_2_run.sh
        env:
        - name: KAFKA_CONNECT_CONFIGURATION
          value: |
            config.storage.topic=mirrormaker2-cluster-configs
            group.id=mirrormaker2-cluster
            status.storage.topic=mirrormaker2-cluster-status
            config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
            offset.storage.topic=mirrormaker2-cluster-offsets
            config.providers=file
            value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
            key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
            header.converter=org.apache.kafka.connect.converters.ByteArrayConverter
            config.storage.replication.factor=1
            offset.storage.replication.factor=1
            producer.connections.max.idle.ms=180000
            producer.metadata.max.age.ms=180000
            status.storage.replication.factor=1
        - name: KAFKA_CONNECT_METRICS_ENABLED
          value: "false"
        - name: KAFKA_CONNECT_BOOTSTRAP_SERVERS
          value: ehmljnt.servicebus.windows.net:9093
        - name: STRIMZI_KAFKA_GC_LOG_ENABLED
          value: "false"
        - name: KAFKA_HEAP_OPTS
          value: -Xms128M
        - name: KAFKA_CONNECT_TLS
          value: "true"
        - name: KAFKA_CONNECT_SASL_MECHANISM
          value: oauth
        - name: KAFKA_CONNECT_OAUTH_CONFIG
          value: oauth.client.id="<clientId>" oauth.token.endpoint.uri="https://login.microsoftonline.com/<tenantid>/oauth2/v2.0/token"
            oauth.scope="https://<ehname>.servicebus.windows.net/.default" oauth.client.assertion.location="/output/jwt.txt"
        - name: KAFKA_MIRRORMAKER_2_CLUSTERS
          value: my-cluster;eventhub
        - name: KAFKA_MIRRORMAKER_2_TLS_CLUSTERS
          value: "true"
        image: pcktml.azurecr.io/custom-kafka-with-jwt:0.7
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /
            port: rest-api
          initialDelaySeconds: 60
          timeoutSeconds: 5
        name: mymirrormaker-mirrormaker2
        ports:
        - containerPort: 8083
          name: rest-api
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /
            port: rest-api
          initialDelaySeconds: 60
          timeoutSeconds: 5
        volumeMounts:
        - mountPath: /tmp
          name: strimzi-tmp
        - mountPath: /opt/kafka/custom-config/
          name: kafka-metrics-and-logging
        - mountPath: /mnt/my-secret
          name: secret-volume
          readOnly: true
        - mountPath: /mnt/my-secret
          name: secret-volume
          readOnly: true
      hostname: mymirrormaker-mirrormaker2-0
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccountName: mymirrormaker-mirrormaker2
      subdomain: mymirrormaker-mirrormaker2
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir:
          medium: Memory
          sizeLimit: 5Mi
        name: strimzi-tmp
      - configMap:
          name: mymirrormaker-mirrormaker2-config
        name: kafka-metrics-and-logging
      - name: secret-volume
        secret:
          secretName: my-secret
  selector:
    matchLabels:
      strimzi.io/cluster: mymirrormaker
      strimzi.io/controller: strimzipodset
      strimzi.io/controller-name: mymirrormaker-mirrormaker2
      strimzi.io/kind: KafkaMirrorMaker2
      strimzi.io/name: mymirrormaker-mirrormaker2
status:
  conditions:
  - lastTransitionTime: "2025-01-08T08:40:27.434726266Z"
    message: 'Failure executing: POST at: https://10.0.0.1:443/api/v1/namespaces/kafka/pods.
      Message: Pod "mymirrormaker-mirrormaker2-0" is invalid: spec.containers[0].volumeMounts[3].mountPath:
      Invalid value: "/mnt/my-secret": must be unique. Received status: Status(apiVersion=v1,
      code=422, details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[3].mountPath,
      message=Invalid value: "/mnt/my-secret": must be unique, reason=FieldValueInvalid,
      additionalProperties={})], group=null, kind=Pod, name=mymirrormaker-mirrormaker2-0,
      retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod
      "mymirrormaker-mirrormaker2-0" is invalid: spec.containers[0].volumeMounts[3].mountPath:
      Invalid value: "/mnt/my-secret": must be unique, metadata=ListMeta(_continue=null,
      remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}),
      reason=Invalid, status=Failure, additionalProperties={}).'
    reason: KubernetesClientException
    status: "true"
    type: Error
  currentPods: 0
  observedGeneration: 4
  pods: 0
  readyPods: 0

@scholzj
Copy link
Member

scholzj commented Jan 8, 2025

Ok, I managed to reproduce it. I will look into it deeper. But this is likely some bug in the code.

@MarekLani
Copy link
Author

I understand, thank you Jakub for your support

@scholzj
Copy link
Member

scholzj commented Jan 8, 2025

FYI: I found the issue. I will open a PR later today. But I'm afraid it is a bug and there is no workaround for it. So it will be fixed in the next release only.

@MarekLani
Copy link
Author

Thank you Jakub, please do you have some estimation on when it could be released?

@scholzj
Copy link
Member

scholzj commented Jan 8, 2025

I opened #11022 to fix it. I think it should be backported to 0.45.0 release as the 0.46.0 release is still far away. But I think right now there is no exact plan for a patch release and it would need to be discussed.

@scholzj
Copy link
Member

scholzj commented Jan 8, 2025

I started a Slack thread to see what other issues we might have for a possible 0.45.1 patch release: https://cloud-native.slack.com/archives/C018247K8T0/p1736344099810809

@im-konge
Copy link
Member

im-konge commented Jan 9, 2025

Triaged 9.1.2025: @MarekLani is there anything else that you find issues with?

@MarekLani
Copy link
Author

Thank you, nothing at this point. Just one kindly ask, would it be possible to update this thread once you have estimated fix release date? Thank you!

@scholzj
Copy link
Member

scholzj commented Jan 9, 2025

It was discussed in the community call today and the overall feeling was that we will not do a patch release yet. But we can keep you posted.

@MarekLani
Copy link
Author

understand, please if no patch release will be done, can we expect roughly 2 months between minor version releases? I am just judging based on previous cadence

@scholzj
Copy link
Member

scholzj commented Jan 9, 2025

The next minor version depends on Kafka 4.0. I think we will do the patch release sooner or later. The question is more when exactly. That will likely depend on if more bugs in 0.45 are found, how does it look with Kafka 4.0 etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants