Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not run TFX Pipeline: DNS resolution failed for metadata-grpc-service:8080 #394

Open
Daard opened this issue Dec 1, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@Daard
Copy link

Daard commented Dec 1, 2023

Bug Description

I have created TFX Pipeline. It works in local mode. But when I tried to run it in the Kubeflow. I have faced this error in the UI:

Error

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 510, in <module>
    main(sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 502, in main
    execution_info = component_launcher.launch()
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 555, in launch
    execution_preparation_result = self._prepare_execution()
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 250, in _prepare_execution
    contexts = context_lib.prepare_contexts(
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 186, in prepare_contexts
    pipeline_context = _register_context_if_not_exist(
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 91, in _register_context_if_not_exist
    context = metadata_handler.store.get_context_by_type_and_name(
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 1462, in get_context_by_type_and_name
    self._call('GetContextByTypeAndName', request, response)
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 203, in _call
    return self._call_method(method_name, request, response)
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 233, in _call_method
    raise errors.make_exception(e.details(), e.code().value[0]) from e  # pytype: disable=attribute-error
ml_metadata.errors.UnavailableError: DNS resolution failed for metadata-grpc-service:8080: C-ares status is not ARES_SUCCESS qtype=A name=metadata-grpc-service is_balancer=0: Domain name not found
time="2023-11-30T18:11:20.197Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-11-30T18:11:20.197Z" level=error msg="cannot save artifact /mlpipeline-ui-metadata.json" argo=true error="stat /mlpipeline-ui-metadata.json: no such file or directory"
Error: exit status 1

To Reproduce

I took runner configuration from this and this guides.

from dotenv import load_dotenv
load_dotenv()

import tfx

from tf_config import int_columns, float_columns, vector_columns
from grouper.pipelines.tr_pipeline import create_pipeline
from grouper.pipelines.transform import preprocessing_fn

from tfx.v1.orchestration.kubeflow import kubeflow_dag_runner
from tfx.orchestration.kubeflow.runner import kubeflow_dag_runner
from kfp.onprem import mount_pvc

from collections import namedtuple

PIPELINE_IMAGE = 'daard/resell-component:last'

PIPELINE_NAME = "transform"

ROOT = './pipeline'
DATA_ROOT = os.path.join(ROOT, 'data/tf_data/')
TFX_ROOT = os.path.join(ROOT, 'tfx')
PIPELINE_ROOT = os.path.join(TFX_ROOT, 'pipelines', PIPELINE_NAME)
## only for local check
METADATA_PATH = os.path.join(TFX_ROOT, 'metadata', PIPELINE_NAME, 'metadata.db')

TRANSFORM_FILE = os.path.join('./', 'grouper/pipelines/transform.py') 

PVC = namedtuple("PVC", "pvc_name volume_name mount_path")
TFX_PVC = PVC('resell-grouper-tfx',
              'pvc-1500bb5f-2163-4862-89c6-d8dc5b3f095a',
              './pipeline/tfx')

DATA_PVC = PVC('resell-grouper-data', 
               'pvc-b0fea06b-dfba-4e7d-98b3-7d4894132592',
               './pipeline/data')

train_pattern = 'train_data-*.tfrec'
val_pattern = 'val_data-*.tfrec'

pipe = create_pipeline(
      pipeline_name=PIPELINE_NAME,
      pipeline_root=PIPELINE_ROOT,
      data_root=DATA_ROOT,
      transform_module_file=TRANSFORM_FILE,
      metadata_path=None, 
      train_pattern=train_pattern, 
      val_pattern=val_pattern)
      
 tfx_image = PIPELINE_IMAGE

metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config()
## First guide adds mysql configs. I faced some errors due to absence of mlmd in the previous version of kubeflow. 
## But the current one (ch:kubeflow 1.8) has mlmd already. So I decided to test it. 

metadata_config.mysql_db_service_host.value = 'mysql.kubeflow' 
metadata_config.mysql_db_service_port.value = "3306"
metadata_config.mysql_db_name.value = "metadb"
metadata_config.mysql_db_user.value = "root"
metadata_config.mysql_db_password.value = ""
## Second one does not. Only grpc. 
metadata_config.grpc_config.grpc_service_host.value = 'metadata-grpc-service'
metadata_config.grpc_config.grpc_service_port.value = '8080'

runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(tfx_image=tfx_image,
                                                                pipeline_operator_funcs=([
                                                                     mount_pvc(*TFX_PVC),
                                                                    mount_pvc(*DATA_PVC)
                                                                ]),
                                                                kubeflow_metadata_config=metadata_config
                                                                )
                                                                
kubeflow_dag_runner.KubeflowDagRunner(config=runner_config).run(pipe)

# I can connect to metadata also, it is empty, but nevertheless: 

from grpc import insecure_channel
from ml_metadata.proto import metadata_store_service_pb2
from ml_metadata.proto import metadata_store_service_pb2_grpc
import tensorflow_data_validation as tfdv

channel = insecure_channel('metadata-grpc-service.kubeflow:8080')
stub = metadata_store_service_pb2_grpc.MetadataStoreServiceStub(channel)

request = metadata_store_service_pb2.GetArtifactsRequest()
response = stub.GetArtifacts(request)

last_id = 0
uri = ""
for artifact in response.artifacts:
    if artifact.custom_properties["pipeline_name"].string_value == "iris" and artifact.custom_properties["producer_component"].string_value == "StatisticsGen":
        if artifact.id > last_id:
            last_id = artifact.id
            uri = artifact.uri
uri

Environment

I have deployed Charmed Kubeflow 1.8 via this guide

I have installed this packages to the notebook container via pip install tfx[kfp] :

tensowrlow 2.13.1
tfx 1.14.0
kfp 1.8.22

I have tested toy kf piplines from v1 and v2 sdks. Both works.

Workflows:

kubectl get workflow -A

NAMESPACE   NAME                             STATUS      AGE
admin       hello-pipeline-slpl5             Succeeded   21h
admin       execution-order-pipeline-xrnwf   Succeeded   18h
admin       transform-2-2rn9d                Failed      17h

Juju relations

juju_relations.txt

Relevant Log Output

**Logs from pod**

time="2023-11-30T18:11:09.219Z" level=info msg="Starting Workflow Executor" executorType=emissary version=v3.3.10
time="2023-11-30T18:11:09.223Z" level=info msg="Creating a emissary executor"
time="2023-11-30T18:11:09.223Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-11-30T18:11:09.223Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=admin podName=transform-2-2rn9d-932438961 template="{\"name\":\"importexamplegen\",\"inputs\":{\"parameters\":[{\"name\":\"pipeline-root\",\"value\":\"./pipeline/tfx/pipelines/transform\"}]},\"outputs\":{\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true}]},\"metadata\":{\"annotations\":{\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"add-pod-env\":\"true\",\"pipelines.kubeflow.org/cache_enabled\":\"true\",\"pipelines.kubeflow.org/enable_caching\":\"true\",\"pipelines.kubeflow.org/kfp_sdk_version\":\"1.8.22\",\"pipelines.kubeflow.org/pipeline-sdk-type\":\"tfx\"}},\"container\":{\"name\":\"\",\"image\":\"daard/resell-component:last\",\"command\":[\"python\",\"-m\",\"tfx.orchestration.kubeflow.container_entrypoint\"],\"args\":[\"--pipeline_root\",\"./pipeline/tfx/pipelines/transform\",\"--kubeflow_metadata_config\",\"{\\n  \\\"mysql_db_service_host\\\": {\\n    \\\"value\\\": \\\"mysql.kubeflow\\\"\\n  },\\n  \\\"mysql_db_service_port\\\": {\\n    \\\"value\\\": \\\"3306\\\"\\n  },\\n  \\\"mysql_db_name\\\": {\\n    \\\"value\\\": \\\"metadb\\\"\\n  },\\n  \\\"mysql_db_user\\\": {\\n    \\\"value\\\": \\\"root\\\"\\n  },\\n  \\\"mysql_db_password\\\": {\\n    \\\"value\\\": \\\"\\\"\\n  },\\n  \\\"grpc_config\\\": {\\n    \\\"grpc_service_host\\\": {\\n      \\\"value\\\": \\\"metadata-grpc-service\\\"\\n    },\\n    \\\"grpc_service_port\\\": {\\n      \\\"value\\\": \\\"8080\\\"\\n    }\\n  }\\n}\",\"--node_id\",\"ImportExampleGen\",\"--tfx_ir\",\"{\\n  \\\"pipelineInfo\\\": {\\n    \\\"id\\\": \\\"transform\\\"\\n  },\\n  \\\"nodes\\\": [\\n    {\\n      \\\"pipelineNode\\\": {\\n        \\\"nodeInfo\\\": {\\n          \\\"type\\\": {\\n            \\\"name\\\": \\\"tfx.components.example_gen.import_example_gen.component.ImportExampleGen\\\"\\n          },\\n          \\\"id\\\": \\\"ImportExampleGen\\\"\\n        },\\n        \\\"contexts\\\": {\\n          \\\"contexts\\\": [\\n            {\\n              \\\"type\\\": {\\n                \\\"name\\\": \\\"pipeline\\\"\\n              },\\n              \\\"name\\\": {\\n                \\\"fieldValue\\\": {\\n                  \\\"stringValue\\\": \\\"transform\\\"\\n                }\\n              }\\n            },\\n            {\\n              \\\"type\\\": {\\n                \\\"name\\\": \\\"pipeline_run\\\"\\n              },\\n              \\\"name\\\": {\\n                \\\"runtimeParameter\\\": {\\n                  \\\"name\\\": \\\"pipeline-run-id\\\",\\n                  \\\"type\\\": \\\"STRING\\\"\\n                }\\n              }\\n            },\\n            {\\n              \\\"type\\\": {\\n                \\\"name\\\": \\\"node\\\"\\n              },\\n              \\\"name\\\": {\\n                \\\"fieldValue\\\": {\\n                  \\\"stringValue\\\": \\\"transform.ImportExampleGen\\\"\\n                }\\n              }\\n            }\\n          ]\\n        },\\n        \\\"outputs\\\": {\\n          \\\"outputs\\\": {\\n            \\\"examples\\\": {\\n              \\\"artifactSpec\\\": {\\n                \\\"type\\\": {\\n                  \\\"name\\\": \\\"Examples\\\",\\n                  \\\"properties\\\": {\\n                    \\\"split_names\\\": \\\"STRING\\\",\\n                    \\\"span\\\": \\\"INT\\\",\\n                    \\\"version\\\": \\\"INT\\\"\\n                  },\\n                  \\\"baseType\\\": \\\"DATASET\\\"\\n                }\\n              }\\n            }\\n          }\\n        },\\n        \\\"parameters\\\": {\\n          \\\"parameters\\\": {\\n            \\\"input_config\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"stringValue\\\": \\\"{\\\\n  \\\\\\\"splits\\\\\\\": [\\\\n    {\\\\n      \\\\\\\"name\\\\\\\": \\\\\\\"train\\\\\\\",\\\\n      \\\\\\\"pattern\\\\\\\": \\\\\\\"train_data-*.tfrec\\\\\\\"\\\\n    },\\\\n    {\\\\n      \\\\\\\"name\\\\\\\": \\\\\\\"eval\\\\\\\",\\\\n      \\\\\\\"pattern\\\\\\\": \\\\\\\"val_data-*.tfrec\\\\\\\"\\\\n    }\\\\n  ]\\\\n}\\\"\\n              }\\n            },\\n            \\\"input_base\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"stringValue\\\": \\\"./pipeline/data/tf_data/\\\"\\n              }\\n            },\\n            \\\"output_data_format\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"intValue\\\": \\\"6\\\"\\n              }\\n            },\\n            \\\"output_config\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"stringValue\\\": \\\"{}\\\"\\n              }\\n            },\\n            \\\"output_file_format\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"intValue\\\": \\\"5\\\"\\n              }\\n            }\\n          }\\n        },\\n        \\\"downstreamNodes\\\": [\\n          \\\"StatisticsGen\\\",\\n          \\\"Transform\\\"\\n        ],\\n        \\\"executionOptions\\\": {\\n          \\\"cachingOptions\\\": {\\n            \\\"enableCache\\\": true\\n          }\\n        }\\n      }\\n    }\\n  ],\\n  \\\"runtimeSpec\\\": {\\n    \\\"pipelineRoot\\\": {\\n      \\\"runtimeParameter\\\": {\\n        \\\"name\\\": \\\"pipeline-root\\\",\\n        \\\"type\\\": \\\"STRING\\\",\\n        \\\"defaultValue\\\": {\\n          \\\"stringValue\\\": \\\"./pipeline/tfx/pipelines/transform\\\"\\n        }\\n      }\\n    },\\n    \\\"pipelineRunId\\\": {\\n      \\\"runtimeParameter\\\": {\\n        \\\"name\\\": \\\"pipeline-run-id\\\",\\n        \\\"type\\\": \\\"STRING\\\"\\n      }\\n    }\\n  },\\n  \\\"executionMode\\\": \\\"SYNC\\\",\\n  \\\"deploymentConfig\\\": {\\n    \\\"@type\\\": \\\"type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig\\\",\\n    \\\"executorSpecs\\\": {\\n      \\\"ImportExampleGen\\\": {\\n        \\\"@type\\\": \\\"type.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec\\\",\\n        \\\"pythonExecutorSpec\\\": {\\n          \\\"classPath\\\": \\\"tfx.components.example_gen.import_example_gen.executor.Executor\\\"\\n        }\\n      }\\n    },\\n    \\\"customDriverSpecs\\\": {\\n      \\\"ImportExampleGen\\\": {\\n        \\\"@type\\\": \\\"type.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\\\",\\n        \\\"classPath\\\": \\\"tfx.components.example_gen.driver.FileBasedDriver\\\"\\n      }\\n    }\\n  }\\n}\",\"--metadata_ui_path\",\"/mlpipeline-ui-metadata.json\",\"--runtime_parameter\",\"pipeline-root=STRING:./pipeline/tfx/pipelines/transform\"],\"env\":[{\"name\":\"WORKFLOW_ID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['workflows.argoproj.io/workflow']\"}}},{\"name\":\"KFP_POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.name\"}}},{\"name\":\"KFP_POD_UID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.uid\"}}},{\"name\":\"KFP_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.namespace\"}}},{\"name\":\"WORKFLOW_ID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['workflows.argoproj.io/workflow']\"}}},{\"name\":\"KFP_RUN_ID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['pipeline/runid']\"}}},{\"name\":\"ENABLE_CACHING\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['pipelines.kubeflow.org/enable_caching']\"}}}],\"resources\":{},\"volumeMounts\":[{\"name\":\"pvc-1500bb5f-2163-4862-89c6-d8dc5b3f095a\",\"mountPath\":\"./pipeline/tfx\"},{\"name\":\"pvc-b0fea06b-dfba-4e7d-98b3-7d4894132592\",\"mountPath\":\"./pipeline/data\"}]},\"volumes\":[{\"name\":\"pvc-1500bb5f-2163-4862-89c6-d8dc5b3f095a\",\"persistentVolumeClaim\":{\"claimName\":\"resell-grouper-tfx\"}},{\"name\":\"pvc-b0fea06b-dfba-4e7d-98b3-7d4894132592\",\"persistentVolumeClaim\":{\"claimName\":\"resell-grouper-data\"}}],\"archiveLocation\":{\"archiveLogs\":false,\"s3\":{\"endpoint\":\"minio.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"transform-2-2rn9d/transform-2-2rn9d-932438961\"}}}" version="&Version{Version:v3.3.10,BuildDate:2022-11-29T18:18:30Z,GitCommit:b19870d737a14b21d86f6267642a63dd14e5acd5,GitTag:v3.3.10,GitTreeState:clean,GoVersion:go1.17.13,Compiler:gc,Platform:linux/amd64,}"
time="2023-11-30T18:11:09.318Z" level=info msg="Start loading input artifacts..."
time="2023-11-30T18:11:09.319Z" level=info msg="Alloc=6556 TotalAlloc=12028 Sys=30162 NumGC=4 Goroutines=2"
time="2023-11-30T18:11:11.669Z" level=info msg="Starting Workflow Executor" executorType=emissary version=v3.3.10
time="2023-11-30T18:11:11.673Z" level=info msg="Creating a emissary executor"
time="2023-11-30T18:11:11.673Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-11-30T18:11:11.674Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=admin podName=transform-2-2rn9d-932438961 template="{\"name\":\"importexamplegen\",\"inputs\":{\"parameters\":[{\"name\":\"pipeline-root\",\"value\":\"./pipeline/tfx/pipelines/transform\"}]},\"outputs\":{\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true}]},\"metadata\":{\"annotations\":{\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"add-pod-env\":\"true\",\"pipelines.kubeflow.org/cache_enabled\":\"true\",\"pipelines.kubeflow.org/enable_caching\":\"true\",\"pipelines.kubeflow.org/kfp_sdk_version\":\"1.8.22\",\"pipelines.kubeflow.org/pipeline-sdk-type\":\"tfx\"}},\"container\":{\"name\":\"\",\"image\":\"daard/resell-component:last\",\"command\":[\"python\",\"-m\",\"tfx.orchestration.kubeflow.container_entrypoint\"],\"args\":[\"--pipeline_root\",\"./pipeline/tfx/pipelines/transform\",\"--kubeflow_metadata_config\",\"{\\n  \\\"mysql_db_service_host\\\": {\\n    \\\"value\\\": \\\"mysql.kubeflow\\\"\\n  },\\n  \\\"mysql_db_service_port\\\": {\\n    \\\"value\\\": \\\"3306\\\"\\n  },\\n  \\\"mysql_db_name\\\": {\\n    \\\"value\\\": \\\"metadb\\\"\\n  },\\n  \\\"mysql_db_user\\\": {\\n    \\\"value\\\": \\\"root\\\"\\n  },\\n  \\\"mysql_db_password\\\": {\\n    \\\"value\\\": \\\"\\\"\\n  },\\n  \\\"grpc_config\\\": {\\n    \\\"grpc_service_host\\\": {\\n      \\\"value\\\": \\\"metadata-grpc-service\\\"\\n    },\\n    \\\"grpc_service_port\\\": {\\n      \\\"value\\\": \\\"8080\\\"\\n    }\\n  }\\n}\",\"--node_id\",\"ImportExampleGen\",\"--tfx_ir\",\"{\\n  \\\"pipelineInfo\\\": {\\n    \\\"id\\\": \\\"transform\\\"\\n  },\\n  \\\"nodes\\\": [\\n    {\\n      \\\"pipelineNode\\\": {\\n        \\\"nodeInfo\\\": {\\n          \\\"type\\\": {\\n            \\\"name\\\": \\\"tfx.components.example_gen.import_example_gen.component.ImportExampleGen\\\"\\n          },\\n          \\\"id\\\": \\\"ImportExampleGen\\\"\\n        },\\n        \\\"contexts\\\": {\\n          \\\"contexts\\\": [\\n            {\\n              \\\"type\\\": {\\n                \\\"name\\\": \\\"pipeline\\\"\\n              },\\n              \\\"name\\\": {\\n                \\\"fieldValue\\\": {\\n                  \\\"stringValue\\\": \\\"transform\\\"\\n                }\\n              }\\n            },\\n            {\\n              \\\"type\\\": {\\n                \\\"name\\\": \\\"pipeline_run\\\"\\n              },\\n              \\\"name\\\": {\\n                \\\"runtimeParameter\\\": {\\n                  \\\"name\\\": \\\"pipeline-run-id\\\",\\n                  \\\"type\\\": \\\"STRING\\\"\\n                }\\n              }\\n            },\\n            {\\n              \\\"type\\\": {\\n                \\\"name\\\": \\\"node\\\"\\n              },\\n              \\\"name\\\": {\\n                \\\"fieldValue\\\": {\\n                  \\\"stringValue\\\": \\\"transform.ImportExampleGen\\\"\\n                }\\n              }\\n            }\\n          ]\\n        },\\n        \\\"outputs\\\": {\\n          \\\"outputs\\\": {\\n            \\\"examples\\\": {\\n              \\\"artifactSpec\\\": {\\n                \\\"type\\\": {\\n                  \\\"name\\\": \\\"Examples\\\",\\n                  \\\"properties\\\": {\\n                    \\\"split_names\\\": \\\"STRING\\\",\\n                    \\\"span\\\": \\\"INT\\\",\\n                    \\\"version\\\": \\\"INT\\\"\\n                  },\\n                  \\\"baseType\\\": \\\"DATASET\\\"\\n                }\\n              }\\n            }\\n          }\\n        },\\n        \\\"parameters\\\": {\\n          \\\"parameters\\\": {\\n            \\\"input_config\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"stringValue\\\": \\\"{\\\\n  \\\\\\\"splits\\\\\\\": [\\\\n    {\\\\n      \\\\\\\"name\\\\\\\": \\\\\\\"train\\\\\\\",\\\\n      \\\\\\\"pattern\\\\\\\": \\\\\\\"train_data-*.tfrec\\\\\\\"\\\\n    },\\\\n    {\\\\n      \\\\\\\"name\\\\\\\": \\\\\\\"eval\\\\\\\",\\\\n      \\\\\\\"pattern\\\\\\\": \\\\\\\"val_data-*.tfrec\\\\\\\"\\\\n    }\\\\n  ]\\\\n}\\\"\\n              }\\n            },\\n            \\\"input_base\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"stringValue\\\": \\\"./pipeline/data/tf_data/\\\"\\n              }\\n            },\\n            \\\"output_data_format\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"intValue\\\": \\\"6\\\"\\n              }\\n            },\\n            \\\"output_config\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"stringValue\\\": \\\"{}\\\"\\n              }\\n            },\\n            \\\"output_file_format\\\": {\\n              \\\"fieldValue\\\": {\\n                \\\"intValue\\\": \\\"5\\\"\\n              }\\n            }\\n          }\\n        },\\n        \\\"downstreamNodes\\\": [\\n          \\\"StatisticsGen\\\",\\n          \\\"Transform\\\"\\n        ],\\n        \\\"executionOptions\\\": {\\n          \\\"cachingOptions\\\": {\\n            \\\"enableCache\\\": true\\n          }\\n        }\\n      }\\n    }\\n  ],\\n  \\\"runtimeSpec\\\": {\\n    \\\"pipelineRoot\\\": {\\n      \\\"runtimeParameter\\\": {\\n        \\\"name\\\": \\\"pipeline-root\\\",\\n        \\\"type\\\": \\\"STRING\\\",\\n        \\\"defaultValue\\\": {\\n          \\\"stringValue\\\": \\\"./pipeline/tfx/pipelines/transform\\\"\\n        }\\n      }\\n    },\\n    \\\"pipelineRunId\\\": {\\n      \\\"runtimeParameter\\\": {\\n        \\\"name\\\": \\\"pipeline-run-id\\\",\\n        \\\"type\\\": \\\"STRING\\\"\\n      }\\n    }\\n  },\\n  \\\"executionMode\\\": \\\"SYNC\\\",\\n  \\\"deploymentConfig\\\": {\\n    \\\"@type\\\": \\\"type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig\\\",\\n    \\\"executorSpecs\\\": {\\n      \\\"ImportExampleGen\\\": {\\n        \\\"@type\\\": \\\"type.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec\\\",\\n        \\\"pythonExecutorSpec\\\": {\\n          \\\"classPath\\\": \\\"tfx.components.example_gen.import_example_gen.executor.Executor\\\"\\n        }\\n      }\\n    },\\n    \\\"customDriverSpecs\\\": {\\n      \\\"ImportExampleGen\\\": {\\n        \\\"@type\\\": \\\"type.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\\\",\\n        \\\"classPath\\\": \\\"tfx.components.example_gen.driver.FileBasedDriver\\\"\\n      }\\n    }\\n  }\\n}\",\"--metadata_ui_path\",\"/mlpipeline-ui-metadata.json\",\"--runtime_parameter\",\"pipeline-root=STRING:./pipeline/tfx/pipelines/transform\"],\"env\":[{\"name\":\"WORKFLOW_ID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['workflows.argoproj.io/workflow']\"}}},{\"name\":\"KFP_POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.name\"}}},{\"name\":\"KFP_POD_UID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.uid\"}}},{\"name\":\"KFP_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.namespace\"}}},{\"name\":\"WORKFLOW_ID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['workflows.argoproj.io/workflow']\"}}},{\"name\":\"KFP_RUN_ID\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['pipeline/runid']\"}}},{\"name\":\"ENABLE_CACHING\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.labels['pipelines.kubeflow.org/enable_caching']\"}}}],\"resources\":{},\"volumeMounts\":[{\"name\":\"pvc-1500bb5f-2163-4862-89c6-d8dc5b3f095a\",\"mountPath\":\"./pipeline/tfx\"},{\"name\":\"pvc-b0fea06b-dfba-4e7d-98b3-7d4894132592\",\"mountPath\":\"./pipeline/data\"}]},\"volumes\":[{\"name\":\"pvc-1500bb5f-2163-4862-89c6-d8dc5b3f095a\",\"persistentVolumeClaim\":{\"claimName\":\"resell-grouper-tfx\"}},{\"name\":\"pvc-b0fea06b-dfba-4e7d-98b3-7d4894132592\",\"persistentVolumeClaim\":{\"claimName\":\"resell-grouper-data\"}}],\"archiveLocation\":{\"archiveLogs\":false,\"s3\":{\"endpoint\":\"minio.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"transform-2-2rn9d/transform-2-2rn9d-932438961\"}}}" version="&Version{Version:v3.3.10,BuildDate:2022-11-29T18:18:30Z,GitCommit:b19870d737a14b21d86f6267642a63dd14e5acd5,GitTag:v3.3.10,GitTreeState:clean,GoVersion:go1.17.13,Compiler:gc,Platform:linux/amd64,}"
time="2023-11-30T18:11:11.675Z" level=info msg="Starting deadline monitor"
time="2023-11-30T18:11:20.678Z" level=info msg="Main container completed"
time="2023-11-30T18:11:20.678Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-11-30T18:11:20.678Z" level=info msg="No output parameters"
time="2023-11-30T18:11:20.678Z" level=info msg="Saving output artifacts"
time="2023-11-30T18:11:20.678Z" level=info msg="Staging artifact: mlpipeline-ui-metadata"
time="2023-11-30T18:11:20.678Z" level=info msg="Copying /mlpipeline-ui-metadata.json from container base image layer to /tmp/argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2023-11-30T18:11:20.678Z" level=info msg="/var/run/argo/outputs/artifacts/mlpipeline-ui-metadata.json.tgz -> /tmp/argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2023-11-30T18:11:20.678Z" level=warning msg="Ignoring optional artifact 'mlpipeline-ui-metadata' which does not exist in path '/mlpipeline-ui-metadata.json': open /var/run/argo/outputs/artifacts/mlpipeline-ui-metadata.json.tgz: no such file or directory"
time="2023-11-30T18:11:20.707Z" level=info msg="Create workflowtaskresults 201"
time="2023-11-30T18:11:20.708Z" level=info msg="Killing sidecars []"
time="2023-11-30T18:11:20.709Z" level=info msg="Alloc=6501 TotalAlloc=12338 Sys=30162 NumGC=4 Goroutines=7"
WARNING:absl:metadata_connection_config is not provided by IR.
INFO:root:Component ImportExampleGen is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.example_gen.import_example_gen.component.ImportExampleGen"
  }
  id: "ImportExampleGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "transform"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "transform-2-2rn9d"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "transform.ImportExampleGen"
      }
    }
  }
}
outputs {
  outputs {
    key: "examples"
    value {
      artifact_spec {
        type {
          name: "Examples"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
          properties {
            key: "version"
            value: INT
          }
          base_type: DATASET
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "input_base"
    value {
      field_value {
        string_value: "./pipeline/data/tf_data/"
      }
    }
  }
  parameters {
    key: "input_config"
    value {
      field_value {
        string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"train\",\n      \"pattern\": \"train_data-*.tfrec\"\n    },\n    {\n      \"name\": \"eval\",\n      \"pattern\": \"val_data-*.tfrec\"\n    }\n  ]\n}"
      }
    }
  }
  parameters {
    key: "output_config"
    value {
      field_value {
        string_value: "{}"
      }
    }
  }
  parameters {
    key: "output_data_format"
    value {
      field_value {
        int_value: 6
      }
    }
  }
  parameters {
    key: "output_file_format"
    value {
      field_value {
        int_value: 5
      }
    }
  }
}
downstream_nodes: "StatisticsGen"
downstream_nodes: "Transform"
execution_options {
  caching_options {
    enable_cache: true
  }
}

INFO:absl:MetadataStore with gRPC connection initialized
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 228, in _call_method
    response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec))
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "DNS resolution failed for metadata-grpc-service:8080: C-ares status is not ARES_SUCCESS qtype=A name=metadata-grpc-service is_balancer=0: Domain name not found"
	debug_error_string = "UNKNOWN:DNS resolution failed for metadata-grpc-service:8080: C-ares status is not ARES_SUCCESS qtype=A name=metadata-grpc-service is_balancer=0: Domain name not found {grpc_status:14, created_time:"2023-11-30T18:11:18.928547175+00:00"}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 510, in <module>
    main(sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 502, in main
    execution_info = component_launcher.launch()
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 555, in launch
    execution_preparation_result = self._prepare_execution()
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 250, in _prepare_execution
    contexts = context_lib.prepare_contexts(
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 186, in prepare_contexts
    pipeline_context = _register_context_if_not_exist(
  File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 91, in _register_context_if_not_exist
    context = metadata_handler.store.get_context_by_type_and_name(
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 1462, in get_context_by_type_and_name
    self._call('GetContextByTypeAndName', request, response)
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 203, in _call
    return self._call_method(method_name, request, response)
  File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 233, in _call_method
    raise errors.make_exception(e.details(), e.code().value[0]) from e  # pytype: disable=attribute-error
ml_metadata.errors.UnavailableError: DNS resolution failed for metadata-grpc-service:8080: C-ares status is not ARES_SUCCESS qtype=A name=metadata-grpc-service is_balancer=0: Domain name not found
time="2023-11-30T18:11:20.197Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-11-30T18:11:20.197Z" level=error msg="cannot save artifact /mlpipeline-ui-metadata.json" argo=true error="stat /mlpipeline-ui-metadata.json: no such file or directory"
Error: exit status 1

Additional Context

No response

@Daard Daard added the bug Something isn't working label Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant