-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): move comp logic to workflow params #10979
Conversation
Hi @zazulam. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/rerun-all |
/lgtm @chensun Please review and approve if this makes sense to you. |
@HumairAK Thank you for volunteering on testing this implementation! |
@zijianjoy brought up a great question about potential backward compatibility issues. The IR is the source of truth and is not impacted by this change. The compiled Workflow is a downstream artifact / representation. Whether it compiles to a Workflow with Annotations or a Workflow with Parameters shouldn't impact end users, as long as the source of truth (the IR) stays the same, and it does. If a user has built abstractions on top of KFP that interact directly with the Workflow, they could conceivably be impacted. |
func ExtractBaseComponentName(componentName string) string { | ||
baseComponentName := componentName | ||
componentNameArray := strings.Split(componentName, "-") | ||
|
||
if _, err := strconv.Atoi(componentNameArray[len(componentNameArray)-1]); err == nil { | ||
baseComponentName = strings.Join(componentNameArray[:len(componentNameArray)-1], "-") | ||
|
||
} | ||
|
||
return baseComponentName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, component function names that have underscores also get converted to dashes -
in the IR .components
section, example:
@dsl.component()
def foo_1(name: str) -> str:
hello_text = f"Hello 2, {name}!"
print(hello_text)
return hello_text
would result in:
components:
comp-foo-1:
In this case it seems like this function would use the baseComponentName comp-foo
, which seems incorrect no?
Another alternative could be to introduce a separate sdk field that keeps the component name in check, maybe here, but then we have to consider backwards incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Nice catch! This is an edge case that we were thinking of tackling in a discrete PR, since the component naming logic already gets confused -- if you have a component named foo
and a component named foo_2
and you invoke foo
twice, you wind up with components named comp-foo
, comp-foo-2
, and comp-foo-2-2
in the IR. We can include that edge case handling in this PR though. The easiest way to do that would be to raise an exception in the SDK when a component function name culminates in _<int(s)>
. Any backward compatibility concerns would only take place at the IR compilation stage, which (a) happens client-side and is easily resolved, and (b) would be an exceedingly rare edge case. Open to alternative approaches ofc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a component named foo and a component named foo_2 and you invoke foo twice, you wind up with components named comp-foo, comp-foo-2, and comp-foo-2-2
Yeah in this case it looks like it would extract the right base names for each. The edge case I mentioned will occur every time you have components with unique base names with a _<int>
prefix.
The current approach feels a little hacky to me, because we are trying to rebuild info that should have been provided by the sdk initially. For example, in the alternative I proposed we could have sdk provide something like:
components:
comp-foo-1:
componentName: foo
executorLabel: exec-foo-1
comp-foo-1-1:
componentName: foo-1
executorLabel: exec-foo-1-2
And just use the component name componentName
, instead of inferring it. Due to backwards incompatibility this would require we support this only future KFP versions though.
Anyways, food for thought. It is an edge case as you mentioned that is likely very rare.
We can include that edge case handling in this PR though. The easiest way to do that would be to raise an exception in the SDK when a component function name culminates in _<int(s)>.
I'm personally fine with this, we should provide documentations on component naming maybe here.
I would also suggest adding some simple unit tests to illustrate the handling of the different component name formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid introducing naming constraints if possible. Introducing a naming constraints is a breaking changes.
The current state of appending component name with index suffix isn't ideal, but that's more of an implementation detail which can be changed without user impact, and we can dedupe them:
# TODO(chensun): dedupe IR component_spec and contaienr_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy that. Thanks, @chensun!
One solution could be to update ExtractBaseComponentName to return the full component name if the IR includes a task with a name in the format of .*-\d+-\d+
. If you think this makes sense, maybe @zazulam and I can crank it out tomorrow morning before the community call. Just have to confirm with @zazulam about bandwidth.
Another approach could be preserving dash -
characters in the IR instead of converting them into underscores _
. That way we could use -<index>
in make_name_unique_by_adding_index
and not have to worry about collisions since python function names cannot include dashes. Sounds easy in theory but maybe including dashes violates expectations in the protobuf / backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current state of appending component name with index suffix isn't ideal, but that's more of an implementation detail which can be changed without user impact, and we can dedupe them:
thanks @chensun
if we don't address it here, and we fear this breaking change can impact users, then we still need to resolve this as it would introduce a regression for this niche case. @droctothorpe correct me if I'm wrong here, it's been a couple of weeks since I looked at this, but in this case the component can assume the wrong parameters if paired with another component incorrectly, right?
One solution could be to update ExtractBaseComponentName to return the full component name if the IR includes a task with a name in the format of .*-\d+-\d+.
so I understand, this sounds like the alternative I proposed here yes? If so I like this approach more. We can add it so as not to introduce any regression, and only support this for future sdk/kfp releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean this specific suggestion, right, @HumairAK? i.e. add component name to the IR spec to eliminate any guess work. I think that might be the ideal solution. Having an explicit map from the task name to the component name is an important piece of information to store in the IR. The only drawback that I can think of is that it theoretically increases the complexity of this PR, but if it's the desired target state, then the complexity is justified. @zazulam can you please mark this conversation as unresolved? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only drawback that I can think of is that it theoretically increases the complexity of this PR, but if it's the desired target state, then the complexity is justified.
Yeah, this is why I proposed sticking with what we had, given that it does introduces a breaking change, but for a very niche case, and the fix is relatively straight forward.
But, if there is concerns around the impact, going IR spec route is the only way I can think of that is least destructive. I suppose we could also preserve the dashes, I'm not super familiar with the reasoning behind why we didn't initially, so my concern there would be possible unforeseen side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming, @HumairAK.
To summarize the three proposed solutions:
- Update ExtractBaseComponentName to return the full component name if the IR includes a task with a name in the format of .*-\d+-\d+.
- Preserve dash
-
characters in the IR instead of converting them into underscores_
. That way we could use-<index>
inmake_name_unique_by_adding_index
and not have to worry about collisions since python function names cannot include dashes. Sounds easy in theory but including dashes may violate expectations in the protobuf / backend. - Add an explicit
componentName
field to each component in the IR so that the backend doesn't have to deduce the "base name."
Do you have a preference between these three approaches, @chensun? Are there other options that we haven't taken into consideration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented and validated a solution. Continuing the thread here because this is too dang nested.
a34a1c6
to
3b3eed6
Compare
/rerun-all |
31035ae
to
ba1d02a
Compare
@droctothorpe @zazulam I have a PR open to fix the e2e tests: #11072 |
Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: zazulam <[email protected]>
Signed-off-by: zazulam <[email protected]>
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>
Thank you so much, @hbelmiro! Rebased from master and force pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
|
||
// extractFunctionName extracts the function name of a component by looking it | ||
// up in the pipeline spec. | ||
func (c *workflowCompiler) extractFunctionName(componentName string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every component has a visible function name. Why not just use the original name as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, Chen!
So this PR tackles two separate problems:
-
This PR moves the component logic from the metadata annotations (which have a 256KB limit) to the spec (which has a much higher, configurable limit defined by the etcd request size limit). This in it of itself solves a major regression in v2.
-
In addition, this PR deduplicates component logic in the compiled AWF manifest. This duplication is a long-standing issue that was carried over from v1 and was reported as early as 2020. Since the IR compiler appends an incrementing index to duplicate tasks, and the IR -> AWF compiler doesn't properly account for this, the underlying component logic gets duplicated in full in the compiled AWF manifest for each instance. This is kind of like copying a function in full every time you want to invoke it. In a sense, we're kind of using the function name as a key in a set instead of adding each task's logic to a list. Since components can get quite large, with all this duplicate logic, it's easy to hit even the etcd requests size limit. Several of our internal model development teams have run into this exact problem and have been forced to partition large pipelines or minify compiled manifests (i.e. strip verbose annotations and comments), but minification only works in v1 where compilation happens client-side.
Not every component has a visible function name. Why not just use the original name as-is?
If the --function_to_execute
is not specified in the executor, then extractFunctionName
just returns the original (indexed) component name, i.e. "the original name as-is." So that's precisely what happens when the function name is not specified.
At a bare minimum, this PR circumvents the 256KB limit, but it also substantially reduces the size of the compiled manifest (when possible) thanks to deduplication (we saw a 70% file size reduction in our tests).
Apologies if I'm over-explaining anything!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the --function_to_execute is not specified in the executor, then extractFunctionName just returns the original (indexed) component name, i.e. "the original name as-is." So that's precisely what happens when the function name is not specified.
Exactly as this function checks multiple cases, but doesn't guarantee any improvement over the original, so It feels to me an over-engineering that doesn't help much. I would suggest we remove it and use the original name as-is for simplicity, if the goal it to guarantee a short length and/or unique name, then we should use some hashing method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the original name as is would guarantee redundancy in the compiled manifest. Hashing is an interesting approach! We could hash the IR.deploymentSpec.executors.<executor>.container.command.
and use that as the key. One concern with that approach is that it significantly reduces the human readability of the AWF manifest. Also, could hashing hundreds of large components increase AWF compilation time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every component has a visible function name. Why not just use the original name as-is?
QQ: Under what conditions does a component not include --function_to_execute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For hashing, I was thinking hash the entire ComponentSpec (and possibly the expanded ExecutorSpec when applicable). That would guarantee uniqueness. I don't think it would add noticable AWF compilation time, but admit I didn't do any experiment.
QQ: Under what conditions does a component not include --function_to_execute?
Exactly what I was planning to explain. :)
The issue with the current extraFunctionName implementation
is that it only makes a difference for maybe 50%-60% of the time. IMO, it's not worth the added complexity and it may even hurts AWF readability--if we value that--one needs to look into this implementation to figure out where the name comes from.
I can list a number of examples where the current approach fails to improve over original name as-is:
- condition, loop, and exit handler all produce DAG component with no executor. e.g.:
pipelines/sdk/python/test_data/pipelines/pipeline_with_loops_and_conditions.yaml
Lines 64 to 84 in e128bdb
comp-condition-13: dag: tasks: print-text-8: cachingOptions: enableCache: true componentRef: name: comp-print-text-8 inputs: parameters: msg: runtimeValue: constant: '1' taskInfo: name: print-text-8 inputDefinitions: parameters: pipelinechannel--flip-coin-op-Output: parameterType: STRING pipelinechannel--loop-item-param-11: parameterType: STRING - pipeline can be used as component, which is another DAG component without executor, e.g.:
pipelines/sdk/python/test_data/pipelines/pipeline_in_pipeline_complex.yaml
Lines 62 to 80 in e128bdb
comp-inner-pipeline: dag: tasks: condition-1: componentRef: name: comp-condition-1 dependentTasks: - print-op1 inputs: parameters: pipelinechannel--print-op1-Output: taskOutputParameter: outputParameterKey: Output producerTask: print-op1 taskInfo: name: condition-1 triggerPolicy: condition: inputs.parameter_values['pipelinechannel--print-op1-Output'] == 'Hello' - Custom container component doesn't have
--function_to_execute
, and may comes with very generic entry point likemain
,run
, etc. And large enterprise may sometimes build a shared container with a single entrypoint that takes arguments to execute different component function. e.g:pipelines/sdk/python/test_data/components/container_io.yaml
Lines 20 to 28 in e128bdb
exec-container-io: container: args: - --output_path - '{{$.outputs.parameters[''output_path''].output_file}}' command: - my_program - '{{$.inputs.parameters[''text'']}}' image: python:3.7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my suggestion for this PR is to simply remove the extractFunctionName
part and use original component name as-is. I think the major concern on the size limit has been solved by moving the spec from annotation to argument. If further size reduction is needed, we can then look into a hashing solution that would work for 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the thorough response and additional context, @chensun. We really appreciate it.
We POCed the hash-based approach that you suggested before we saw your comment and pushed it up since the work was done anyway. It works like a charm!
We tested against the three edge cases that you posted. Hashing successfully deduplicates all of the corresponding component logic. It's true that duplicate conditionals, DAGs, and nested pipelines are not deduplicated per say, but we confirmed that their underlying shared components, which are what typically inflate the manifest file size beyond the 1.5 MB etcd limit, are in fact deduplicated.
For example, pipeline_in_pipeline_complex.yaml references print_op1
three times: once at the root of my_pipeline
and twice in the graph_component
sub_pipeline (since it's parallelized).
We confirmed that the compiled AWF manifest consolidates these various tasks under the shared key, implementations-83cf3e248270dfedd9adf4317961878cb8ebdf352ee8121750c35bfee5d408ae
, and therefore only stores the logic for the corresponding function once.
We think the change may be small enough in scope and lines of code and the underlying problem significant enough to warrant keeping it in this PR. That being said, we don't mind tackling the hash approach in a discrete PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the component_io.yaml
example slightly to demonstrate deduplication (with hashing) for container components that don't list the function name:
from kfp import Client
from kfp import dsl
@dsl.container_component
def container_io(arg: str, output_path: dsl.OutputPath(str)):
return dsl.ContainerSpec(
image="python:3.7",
command=["echo", arg],
args=["--output_path", output_path],
)
@dsl.pipeline
def pipeline():
task1 = container_io(arg="foo")
task2 = container_io(arg="foo")
task3 = container_io(arg="bar")
client = Client()
client.create_run_from_pipeline_func(pipeline)
The resulting AWF manifest looks like this:
Spec:
Arguments:
Parameters:
Name: components-fc838d49ffb7211dd93e608a1671731bc81e22197bbdb2e423df0ac72d451bc9
Value: {"executorLabel":"exec-container-io","inputDefinitions":{"parameters":{"arg":{"parameterType":"STRING"}}},"outputDefinitions":{"parameters":{"output_path":{"parameterType":"STRING"}}}}
Name: implementations-fc838d49ffb7211dd93e608a1671731bc81e22197bbdb2e423df0ac72d451bc9
Value: {"args":["--output_path","{{$.outputs.parameters['output_path'].output_file}}"],"command":["echo","{{$.inputs.parameters['arg']}}"],"image":"python:3.7"}
Name: components-root
Value: {"dag":{"tasks":{"container-io":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-container-io"},"inputs":{"parameters":{"arg":{"runtimeValue":{"constant":"foo"}}}},"taskInfo":{"name":"container-io"}},"container-io-2":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-container-io-2"},"inputs":{"parameters":{"arg":{"runtimeValue":{"constant":"foo"}}}},"taskInfo":{"name":"container-io-2"}},"container-io-3":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-container-io-3"},"inputs":{"parameters":{"arg":{"runtimeValue":{"constant":"bar"}}}},"taskInfo":{"name":"container-io-3"}}}}}
The tasks that share logic share the same hash in the Spec.Arguments.Parameters
so the component logic is not duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to the pipelines_with_loops_and_conditions.yaml edge case you mentioned: it has 10 instances of the print_text
component.
Thanks to the deduplication (through hashing), it stores just one copy of the component logic!
go run backend/src/v2/cmd/compiler/main.go -spec sdk/python/test_data/pipelines/pipeline_with_loops_and_conditions.yaml | grep 'print_text(msg' | wc -l
1
print_text
is a small component (though there's plenty of fluff around even one line components in the form of pip installs and what not), but it's not uncommon to parallelize massive components; that 10x increase in storage consumption makes reaching the etcd max-request-bytes
threshold inevitable for our more complex models.
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>
|
||
// hashComponentCommandAndArgs combines and hashes the command and args fields of a | ||
// given component. | ||
func (c *workflowCompiler) hashComponentCommandAndArgs(componentName string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting improvement. Thank you, @droctothorpe !
While we are on the right track, a small change is needed to ensure we dedupe correctly. cmds and args are not sufficient. At bare minimum, container image must be part of the hash as well. For completeness, if we only handle components with executor spec, can we hash the entire PipelineContainerSpec? WDYT?
It would probably be easier to implement as well, if you dump the entire PipelineContainerSpec to a json, then apply hashing on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @chensun! And thank you for the excellent suggestion! You're absolutely right that hashing the entire PipelineContainerSpec
makes more sense. We just pushed up a commit implementing that improved approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if componentName == "root" { | ||
return componentName | ||
} | ||
if c.executors != nil { // Don't bother if there are no executors in the pipeline spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: There's opportunity to dedupe DAG component as well. We can hash the entire ComponentSpec (plus ExecutorSpec when available) minus their naming part: component name, executor label, etc.
We can leave it to future improvements, not a blocker for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droctothorpe looks like we didn't go this far on this PR yes? would you be willing to open up a new github issue as a follow up, in case someone wants to pick this up?
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks you, @zazulam and @droctothorpe !
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for the input, design review, and merge, @chensun! 🎉 |
much needed change, great work folks! @droctothorpe ++ @zazulam ++ |
* feat(backend): move comp logic to workflow params Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: zazulam <[email protected]> * address pr comments Signed-off-by: zazulam <[email protected]> * Use function name instead of base name and address edge cases Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Improve logic and update tests Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * POC hashing command and args Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Add comments to clarify the logic Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Hash entire PipelineContainerSpec Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> --------- Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]>
* feat(backend): move comp logic to workflow params Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: zazulam <[email protected]> * address pr comments Signed-off-by: zazulam <[email protected]> * Use function name instead of base name and address edge cases Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Improve logic and update tests Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * POC hashing command and args Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Add comments to clarify the logic Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Hash entire PipelineContainerSpec Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> --------- Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: KevinGrantLee <[email protected]>
* feat(backend): move comp logic to workflow params Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: zazulam <[email protected]> * address pr comments Signed-off-by: zazulam <[email protected]> * Use function name instead of base name and address edge cases Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Improve logic and update tests Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * POC hashing command and args Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Add comments to clarify the logic Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Hash entire PipelineContainerSpec Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> --------- Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]>
…etters (#11097) * temp title: change title Signed-off-by: KevinGrantLee <[email protected]> * add release notes Signed-off-by: KevinGrantLee <[email protected]> * formatting Signed-off-by: KevinGrantLee <[email protected]> * feat(backend): move comp logic to workflow params (#10979) * feat(backend): move comp logic to workflow params Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: zazulam <[email protected]> * address pr comments Signed-off-by: zazulam <[email protected]> * Use function name instead of base name and address edge cases Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Improve logic and update tests Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * POC hashing command and args Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Add comments to clarify the logic Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Hash entire PipelineContainerSpec Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> --------- Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(component): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 660985413 Signed-off-by: KevinGrantLee <[email protected]> * feat(components): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 661332120 Signed-off-by: KevinGrantLee <[email protected]> * fix(components): Fix to model batch explanation component for Structured Data pipelines Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 661475667 Signed-off-by: KevinGrantLee <[email protected]> * feat(components): Support dynamic values for boot_disk_type, boot_disk_size in preview.custom_job.utils.create_custom_training_job_from_component Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 662242688 Signed-off-by: KevinGrantLee <[email protected]> * chore: Upgrade Argo to v3.4.17 (#10978) Signed-off-by: Giulio Frasca <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Moved kubeflow-pipelines-manifests to GitHub Actions (#11066) Signed-off-by: vmudadla <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix: re-enable exit hanler test. (#11100) Signed-off-by: Liav Weiss (EXT-Nokia) <[email protected]> Co-authored-by: Liav Weiss (EXT-Nokia) <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(frontend): retrieve archived logs from correct location (#11010) * fix(frontend): retrieve archived logs from correct location Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> * Add namespace tag handling and validation Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> * Remove whitespace from keyFormat Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> * Update frontend unit tests Signed-off-by: droctothorpe <[email protected]> * Remove superfluous log statements Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]> * Add link to keyFormat in manifests Signed-off-by: droctothorpe <[email protected]> * Fix workflow parsing for log artifact Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]> * Fix unit test Signed-off-by: droctothorpe <[email protected]> --------- Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> Co-authored-by: quinnovator <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(component): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 663774557 Signed-off-by: KevinGrantLee <[email protected]> * feat(component): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 663872006 Signed-off-by: KevinGrantLee <[email protected]> * chore(components): GCPC 2.16.1 Release Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 663883139 Signed-off-by: KevinGrantLee <[email protected]> * test: Fail fast when image build fails on tests #11102 (#11115) * Fail fast when image build fails on tests #11102 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> * Fail fast when image build fails on tests #11102 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> --------- Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> Co-authored-by: Elay Aharoni (EXT-Nokia) <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(components): Use instance.target_field_name format for text-bison models only, use target_field_name for gemini models Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 665638487 Signed-off-by: KevinGrantLee <[email protected]> * chore: Renamed GitHub workflows from *.yaml to *.yml for consistency (#11126) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * Fix view edit cluster roles (#11067) * Fixing incorrect typing in loop_parallism example Signed-off-by: Oswaldo Gomez <[email protected]> * Fixing samples/core/loop_parameter example Signed-off-by: Oswaldo Gomez <[email protected]> * Fixing aggregate-to-kubeflow-pipelines-edit Signed-off-by: Oswaldo Gomez <[email protected]> * keeping MRs separate. Signed-off-by: Oswaldo Gomez <[email protected]> * Adding blank line Signed-off-by: Oswaldo Gomez <[email protected]> --------- Signed-off-by: Oswaldo Gomez <[email protected]> Co-authored-by: Oswaldo Gomez <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(components): Pass moddel name to eval_runner to process batch prediction's output as per the output schema of model used Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 665977093 Signed-off-by: KevinGrantLee <[email protected]> * feat(components): release LLM Model Evaluation image version v0.7 Signed-off-by: Jason Dai <[email protected]> PiperOrigin-RevId: 666102687 Signed-off-by: KevinGrantLee <[email protected]> * chore: Adding @DharmitD to SDK reviewers (#11131) Signed-off-by: ddalvi <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Kubeflow Pipelines V2 integration Tests (#11125) Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Add make targets for building driver and launcher images (#11103) Signed-off-by: Giulio Frasca <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(Backend + SDK): Update kfp backend and kubernetes sdk to support EmptyDir (#10913) Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods. Inspired by #10427 Fixes: #10656 Signed-off-by: Greg Sheremeta <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * docs:fixing broken links in readme (#11108) Signed-off-by: Fiona Waters <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(deps): bump micromatch from 4.0.5 to 4.0.8 in /test/frontend-integration-test (#11132) Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8. - [Release notes](https://github.com/micromatch/micromatch/releases) - [Changelog](https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md) - [Commits](micromatch/micromatch@4.0.5...4.0.8) --- updated-dependencies: - dependency-name: micromatch dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: KevinGrantLee <[email protected]> * Fix: Basic sample tests - sequential is flaky (#11138) Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Wrapped "Failed GetContextByTypeAndName" error for better troubleshooting (#11098) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(components): Update AutoSxS and RLHF image tags Signed-off-by: Michael Hu <[email protected]> PiperOrigin-RevId: 668536503 Signed-off-by: KevinGrantLee <[email protected]> * test: Improvements to wait_for_pods function (#11162) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(frontend): fixes filter pipeline text box shows error when typing anything in it. Fixes #10241 (#11096) * Filter pipeline text box shows error when typing anything in it #10241 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> * Filter pipeline text box shows error when typing anything in it #10241 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> --------- Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> Co-authored-by: Elay Aharoni (EXT-Nokia) <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * correct artifact preview behavior in UI (#11059) This change allows KFP UI to fallback to UI host namespace when no namespaces are provided when referencing the artifact object store provider secret, in default kubeflow deployments this namespace is simply "kubeflow", however the user can customize this behavior by providing the environment variable "SERVER_NAMESPACE" to the KFP UI deployment. Further more, this change addresses a bug that caused URL parse to fail when parsing endpoints without a protocol, this will support such endpoint types as <ip>:<port> for object store endpoints, as is the case in the default kfp deployment manifests. Signed-off-by: Humair Khan <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Added DCO link to PR template (#11176) Signed-off-by: Helber Belmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(backend): Update driver and launcher licenses (#11177) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(backend): update driver and launcher default images (#11178) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Add instructions for releasing driver and launcher images (#11179) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Fixed `kfp-runtime-tests` to run on master branch (#11158) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * (fix): reduce executor logs (#11169) * remove driver logs from executor These logs congest the executor runtime logs making it difficult for the user to differentiate between logs. The driver logs are unnecessary here and can be removed to reduce this clutter. Signed-off-by: Humair Khan <[email protected]> * remove duplicate emissary call in executor As per the initial inline dev comment, argo podspecpatch did not add the emissary call, and had to be manualy added. This was fixed a couple of argo versions back. However, as a result executor pod makes double calls to the executor, which as a consequence also results in superflous logs. This change removes the additional call to emissary to resolve this. Signed-off-by: Humair Khan <[email protected]> --------- Signed-off-by: Humair Khan <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: add PaulinaPacyna and ouadakarim as reviewers (#11180) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Move run-all-gcpc-modules to GitHub Actions (#11157) * add gcpc modules tests to gha Signed-off-by: Amanpreet Singh Bedi <[email protected]> * remove run-all-gcpc-modules test driver script Signed-off-by: Amanpreet Singh Bedi <[email protected]> * fix path under gcpc modules tests github action Signed-off-by: Amanpreet Singh Bedi <[email protected]> * upgrade ubuntu base image Signed-off-by: Amanpreet Singh Bedi <[email protected]> * upgrade python version to 3.9 Signed-off-by: Amanpreet Singh Bedi <[email protected]> --------- Signed-off-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: Amanpreet Singh Bedi <[email protected]> Co-authored-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(sdk): Kfp support for pip trusted host (#11151) Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(sdk): Loosening kubernetes dependency constraint (#11079) * Loosening kubernetes dependency constraint Signed-off-by: egeucak <[email protected]> * added setuptools in test script Signed-off-by: egeucak <[email protected]> --------- Signed-off-by: egeucak <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Remove unwanted Frontend test files (#10973) Signed-off-by: ddalvi <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(ui): fixes empty string value in pipeline parameters (#11175) Signed-off-by: Jan Staněk <[email protected]> Co-authored-by: Jan Staněk <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(backend): update driver and launcher default images (#11182) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(release): bumped version to 2.3.0 Signed-off-by: KevinGrantLee <[email protected]> * chore: Update RELEASE.md to remove obsolete instructions (#11183) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Release kfp-pipeline-spec 0.4.0 (#11189) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: release kfp-kubernetes 1.3.0 (#11190) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: update kfp-kubernetes release scripts and instructions (#11191) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(sdk)!: Pin kfp-pipeline-spec==0.4.0, kfp-server-api>=2.1.0,<2.4.0 (#11192) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(sdk): release KFP SDK 2.9.0 (#11193) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * Delete test pipelines as they are duplicate with pipeline_with_resource_spec Signed-off-by: KevinGrantLee <[email protected]> --------- Signed-off-by: KevinGrantLee <[email protected]> Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Signed-off-by: Googler <[email protected]> Signed-off-by: Giulio Frasca <[email protected]> Signed-off-by: vmudadla <[email protected]> Signed-off-by: Liav Weiss (EXT-Nokia) <[email protected]> Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> Signed-off-by: hbelmiro <[email protected]> Signed-off-by: Oswaldo Gomez <[email protected]> Signed-off-by: Jason Dai <[email protected]> Signed-off-by: ddalvi <[email protected]> Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: Greg Sheremeta <[email protected]> Signed-off-by: Fiona Waters <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Michael Hu <[email protected]> Signed-off-by: Humair Khan <[email protected]> Signed-off-by: Helber Belmiro <[email protected]> Signed-off-by: Chen Sun <[email protected]> Signed-off-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: egeucak <[email protected]> Signed-off-by: Jan Staněk <[email protected]> Co-authored-by: Michael <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Co-authored-by: Googler <[email protected]> Co-authored-by: Giulio Frasca <[email protected]> Co-authored-by: Vani Haripriya Mudadla <[email protected]> Co-authored-by: Liav Weiss <[email protected]> Co-authored-by: Liav Weiss (EXT-Nokia) <[email protected]> Co-authored-by: owmasch <[email protected]> Co-authored-by: quinnovator <[email protected]> Co-authored-by: ElayAharoni <[email protected]> Co-authored-by: Elay Aharoni (EXT-Nokia) <[email protected]> Co-authored-by: Helber Belmiro <[email protected]> Co-authored-by: Oswaldo Gomez <[email protected]> Co-authored-by: Oswaldo Gomez <[email protected]> Co-authored-by: Jason Dai <[email protected]> Co-authored-by: Dharmit Dalvi <[email protected]> Co-authored-by: Diego Lovison <[email protected]> Co-authored-by: Greg Sheremeta <[email protected]> Co-authored-by: Fiona Waters <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Hu <[email protected]> Co-authored-by: Humair Khan <[email protected]> Co-authored-by: Chen Sun <[email protected]> Co-authored-by: aman23bedi <[email protected]> Co-authored-by: Amanpreet Singh Bedi <[email protected]> Co-authored-by: ege uçak <[email protected]> Co-authored-by: Jan Staněk <[email protected]> Co-authored-by: Jan Staněk <[email protected]>
Description of your changes:
This PR relocates component logic from Workflow Annotations to Workflow Arguments.
In addition, it deduplicates the component logic, dramatically reducing the size of the compiled Workflow (in our preliminary testing we saw as much as a 70% reduction in Workflow size).
They’re still referenced by template tag in the actual Argo Workflow Templates, the tags are just pointing to Workflow Arguments instead of Workflow Annotations now.
The testdata manifests diff neatly illustrates how the compiled Workflow manifest has been improved.
This resolves #10968.
Before this PR, the Argo Workflow compiler was storing component logic as Workflow Annotations. Since Kubernetes Annotations cannot exceed 256 KB in size (cumulatively for a given object), this limited the maximum size of any given KFP DAG. Furthermore, when a pipeline referenced a single component multiple times, that component’s logic was duplicated in the Workflow Annotations each time it was referenced, unnecessarily inflating the workflow size.
Checklist: