RHOAIENG-16247: Add option to append the run ID to Cloud Object Storage output paths #105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See https://issues.redhat.com/browse/RHOAIENG-16247 for more information
Description
This PR adds the option to append the run ID to Cloud Object Storage (COS) output paths. This way, output files are not overridden when a new run is triggered from the same pipeline execution on the Dashboard. Notice that the current behavior is kept as default to avoid breaking existing flows. Users who want to use this feature must deliberately enable it through the new property.
In case the property is not set, then the output files will be stored in a path like (current behavior):
<bucket_name>/<pipeline_name>-<timestamp>/<output_file>
In case the property is set, then the output files will be stored in a path like:
<bucket_name>/<pipeline_name>-<timestamp>/<run_id>/<output_file>
Validation
Be sure to have the Pipeline server, COS and connection correctly set up on ODH.
Step 1
Run this code locally and connect to a running ODH or build a notebook-based image with an Elyra version that includes the changes from this PR.
Alternativelly, you can use the image that is built from this code through our automation:
ghcr.io/caponetto/opendatahub-io-elyra/workbench-images:cuda-jupyter-tensorflow-ubi9-python-3.11-20250124-adbd066-sha-9c4c8ebf
(this image is based on quay.io/opendatahub/workbench-images and simply uninstalls the current ODH Elyra and installs the ODH Elyra built from this code). The following steps use this image.Step 2
Import the new image into ODH and create a new workbench from it. Be sure to set up the following env vars so that the updated
bootstrapper.py
is downloaded and used:Given recent changes not propagated to notebooks yet, you also need to set up:
Step 3
Open up the newly created workbench and add a pipeline that stores files to COS. An example can be found here, which stores a text file to COS.
Step 4
Keep the new property disabled and run the pipeline. It is expected that output files be stored on:
<bucket_name>/<pipeline_name>-<timestamp>/<output_file>
If you create a new run from Dashboard, the output files will be overridden, which is the current behavior.
Step 5
Enable the new property and run the pipeline again.
It is expected that output files be stored on:
<bucket_name>/<pipeline_name>-<timestamp>/<run_id>/<output_file>
Notice that, the run ID in the folder matches with the information on the Dashboard:
If you create a new run from Dashboard, the output files from the run will be stored in a new folder.