Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

caponetto
Copy link

See https://issues.redhat.com/browse/RHOAIENG-16247 for more information

Description

This PR adds the option to append the run ID to Cloud Object Storage (COS) output paths. This way, output files are not overridden when a new run is triggered from the same pipeline execution on the Dashboard. Notice that the current behavior is kept as default to avoid breaking existing flows. Users who want to use this feature must deliberately enable it through the new property.

image

In case the property is not set, then the output files will be stored in a path like (current behavior):
<bucket_name>/<pipeline_name>-<timestamp>/<output_file>

In case the property is set, then the output files will be stored in a path like:
<bucket_name>/<pipeline_name>-<timestamp>/<run_id>/<output_file>

Validation

Be sure to have the Pipeline server, COS and connection correctly set up on ODH.

Step 1

Run this code locally and connect to a running ODH or build a notebook-based image with an Elyra version that includes the changes from this PR.

Alternativelly, you can use the image that is built from this code through our automation: ghcr.io/caponetto/opendatahub-io-elyra/workbench-images:cuda-jupyter-tensorflow-ubi9-python-3.11-20250124-adbd066-sha-9c4c8ebf (this image is based on quay.io/opendatahub/workbench-images and simply uninstalls the current ODH Elyra and installs the ODH Elyra built from this code). The following steps use this image.

Step 2

Import the new image into ODH and create a new workbench from it. Be sure to set up the following env vars so that the updated bootstrapper.py is downloaded and used:

ELYRA_FILE_BASE_PATH=/opt/app-root/bin (or any other folder)
ELYRA_GITHUB_BRANCH=RHOAIENG-16247
ELYRA_GITHUB_ORG=caponetto
ELYRA_GITHUB_REPO=opendatahub-io-elyra

Given recent changes not propagated to notebooks yet, you also need to set up:

KF_PIPELINES_SSL_SA_CERTS=/etc/pki/tls/custom-certs/ca-bundle.crt

Step 3

Open up the newly created workbench and add a pipeline that stores files to COS. An example can be found here, which stores a text file to COS.

Step 4

Keep the new property disabled and run the pipeline. It is expected that output files be stored on:
<bucket_name>/<pipeline_name>-<timestamp>/<output_file>

image

If you create a new run from Dashboard, the output files will be overridden, which is the current behavior.

image

Step 5

Enable the new property and run the pipeline again.

image

It is expected that output files be stored on:
<bucket_name>/<pipeline_name>-<timestamp>/<run_id>/<output_file>

image image

Notice that, the run ID in the folder matches with the information on the Dashboard:

image

If you create a new run from Dashboard, the output files from the run will be stored in a new folder.

image image

@caponetto caponetto requested a review from harshad16 January 24, 2025 19:33
@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 87.17949% with 5 lines in your changes missing coverage. Please review.

Project coverage is 80.53%. Comparing base (a74d345) to head (9c4c8eb).

Files with missing lines Patch % Lines
elyra/airflow/bootstrapper.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #105      +/-   ##
==========================================
- Coverage   80.61%   80.53%   -0.08%     
==========================================
  Files         151      151              
  Lines       19421    19455      +34     
  Branches      487      487              
==========================================
+ Hits        15656    15669      +13     
- Misses       3584     3606      +22     
+ Partials      181      180       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants