RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105

caponetto · 2025-01-24T19:33:26Z

See https://issues.redhat.com/browse/RHOAIENG-16247 for more information

Description

This PR adds the option to append the run ID to Cloud Object Storage (COS) output paths. This way, output files are not overridden when a new run is triggered from the same pipeline execution on the Dashboard. Notice that the current behavior is kept as default to avoid breaking existing flows. Users who want to use this feature must deliberately enable it through the new property.

In case the property is not set, then the output files will be stored in a path like (current behavior):
<bucket_name>/<pipeline_name>-<timestamp>/<output_file>

In case the property is set, then the output files will be stored in a path like:
<bucket_name>/<pipeline_name>-<timestamp>/<run_id>/<output_file>

Validation

Be sure to have the Pipeline server, COS and connection correctly set up on ODH.

Step 1

Run this code locally and connect to a running ODH or build a notebook-based image with an Elyra version that includes the changes from this PR.

Alternativelly, you can use the image that is built from this code through our automation: ghcr.io/caponetto/opendatahub-io-elyra/workbench-images:cuda-jupyter-tensorflow-ubi9-python-3.11-20250124-adbd066-sha-9c4c8ebf (this image is based on quay.io/opendatahub/workbench-images and simply uninstalls the current ODH Elyra and installs the ODH Elyra built from this code). The following steps use this image.

Step 2

Import the new image into ODH and create a new workbench from it. Be sure to set up the following env vars so that the updated bootstrapper.py is downloaded and used:

ELYRA_FILE_BASE_PATH=/opt/app-root/bin (or any other folder)
ELYRA_GITHUB_BRANCH=RHOAIENG-16247
ELYRA_GITHUB_ORG=caponetto
ELYRA_GITHUB_REPO=opendatahub-io-elyra

Given recent changes not propagated to notebooks yet, you also need to set up:

KF_PIPELINES_SSL_SA_CERTS=/etc/pki/tls/custom-certs/ca-bundle.crt

Step 3

Open up the newly created workbench and add a pipeline that stores files to COS. An example can be found here, which stores a text file to COS.

Step 4

Keep the new property disabled and run the pipeline. It is expected that output files be stored on:
<bucket_name>/<pipeline_name>-<timestamp>/<output_file>

If you create a new run from Dashboard, the output files will be overridden, which is the current behavior.

Step 5

Enable the new property and run the pipeline again.

It is expected that output files be stored on:
<bucket_name>/<pipeline_name>-<timestamp>/<run_id>/<output_file>

Notice that, the run ID in the folder matches with the information on the Dashboard:

If you create a new run from Dashboard, the output files from the run will be stored in a new folder.

codecov-commenter · 2025-01-24T19:41:07Z

Codecov Report

Attention: Patch coverage is 87.17949% with 5 lines in your changes missing coverage. Please review.

Project coverage is 80.53%. Comparing base (a74d345) to head (9c4c8eb).

Files with missing lines	Patch %	Lines
elyra/airflow/bootstrapper.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #105      +/-   ##
==========================================
- Coverage   80.61%   80.53%   -0.08%     
==========================================
  Files         151      151              
  Lines       19421    19455      +34     
  Branches      487      487              
==========================================
+ Hits        15656    15669      +13     
- Misses       3584     3606      +22     
+ Partials      181      180       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add option to append the run ID to Cloud Object Storage output paths

9c4c8eb

caponetto requested a review from harshad16 January 24, 2025 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105

RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105

caponetto commented Jan 24, 2025

codecov-commenter commented Jan 24, 2025 •

edited

Loading

RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105

Are you sure you want to change the base?

RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105

Conversation

caponetto commented Jan 24, 2025

Description

Validation

Step 1

Step 2

Step 3

Step 4

Step 5

codecov-commenter commented Jan 24, 2025 • edited Loading

Codecov Report

codecov-commenter commented Jan 24, 2025 •

edited

Loading