You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have created TFX Pipeline. It works in local mode. But when I tried to run it in the Kubeflow. I have faced this error in the UI:
Error
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 510, in <module>
main(sys.argv[1:])
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 502, in main
execution_info = component_launcher.launch()
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 555, in launch
execution_preparation_result = self._prepare_execution()
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 250, in _prepare_execution
contexts = context_lib.prepare_contexts(
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 186, in prepare_contexts
pipeline_context = _register_context_if_not_exist(
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 91, in _register_context_if_not_exist
context = metadata_handler.store.get_context_by_type_and_name(
File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 1462, in get_context_by_type_and_name
self._call('GetContextByTypeAndName', request, response)
File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 203, in _call
return self._call_method(method_name, request, response)
File "/usr/local/lib/python3.8/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 233, in _call_method
raise errors.make_exception(e.details(), e.code().value[0]) from e # pytype: disable=attribute-error
ml_metadata.errors.UnavailableError: DNS resolution failed for metadata-grpc-service:8080: C-ares status is not ARES_SUCCESS qtype=A name=metadata-grpc-service is_balancer=0: Domain name not found
time="2023-11-30T18:11:20.197Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-11-30T18:11:20.197Z" level=error msg="cannot save artifact /mlpipeline-ui-metadata.json" argo=true error="stat /mlpipeline-ui-metadata.json: no such file or directory"
Error: exit status 1
To Reproduce
I took runner configuration from this and this guides.
from dotenv import load_dotenv
load_dotenv()
import tfx
from tf_config import int_columns, float_columns, vector_columns
from grouper.pipelines.tr_pipeline import create_pipeline
from grouper.pipelines.transform import preprocessing_fn
from tfx.v1.orchestration.kubeflow import kubeflow_dag_runner
from tfx.orchestration.kubeflow.runner import kubeflow_dag_runner
from kfp.onprem import mount_pvc
from collections import namedtuple
PIPELINE_IMAGE = 'daard/resell-component:last'
PIPELINE_NAME = "transform"
ROOT = './pipeline'
DATA_ROOT = os.path.join(ROOT, 'data/tf_data/')
TFX_ROOT = os.path.join(ROOT, 'tfx')
PIPELINE_ROOT = os.path.join(TFX_ROOT, 'pipelines', PIPELINE_NAME)
## only for local check
METADATA_PATH = os.path.join(TFX_ROOT, 'metadata', PIPELINE_NAME, 'metadata.db')
TRANSFORM_FILE = os.path.join('./', 'grouper/pipelines/transform.py')
PVC = namedtuple("PVC", "pvc_name volume_name mount_path")
TFX_PVC = PVC('resell-grouper-tfx',
'pvc-1500bb5f-2163-4862-89c6-d8dc5b3f095a',
'./pipeline/tfx')
DATA_PVC = PVC('resell-grouper-data',
'pvc-b0fea06b-dfba-4e7d-98b3-7d4894132592',
'./pipeline/data')
train_pattern = 'train_data-*.tfrec'
val_pattern = 'val_data-*.tfrec'
pipe = create_pipeline(
pipeline_name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT,
data_root=DATA_ROOT,
transform_module_file=TRANSFORM_FILE,
metadata_path=None,
train_pattern=train_pattern,
val_pattern=val_pattern)
tfx_image = PIPELINE_IMAGE
metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config()
## First guide adds mysql configs. I faced some errors due to absence of mlmd in the previous version of kubeflow.
## But the current one (ch:kubeflow 1.8) has mlmd already. So I decided to test it.
metadata_config.mysql_db_service_host.value = 'mysql.kubeflow'
metadata_config.mysql_db_service_port.value = "3306"
metadata_config.mysql_db_name.value = "metadb"
metadata_config.mysql_db_user.value = "root"
metadata_config.mysql_db_password.value = ""
## Second one does not. Only grpc.
metadata_config.grpc_config.grpc_service_host.value = 'metadata-grpc-service'
metadata_config.grpc_config.grpc_service_port.value = '8080'
runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(tfx_image=tfx_image,
pipeline_operator_funcs=([
mount_pvc(*TFX_PVC),
mount_pvc(*DATA_PVC)
]),
kubeflow_metadata_config=metadata_config
)
kubeflow_dag_runner.KubeflowDagRunner(config=runner_config).run(pipe)
# I can connect to metadata also, it is empty, but nevertheless:
from grpc import insecure_channel
from ml_metadata.proto import metadata_store_service_pb2
from ml_metadata.proto import metadata_store_service_pb2_grpc
import tensorflow_data_validation as tfdv
channel = insecure_channel('metadata-grpc-service.kubeflow:8080')
stub = metadata_store_service_pb2_grpc.MetadataStoreServiceStub(channel)
request = metadata_store_service_pb2.GetArtifactsRequest()
response = stub.GetArtifacts(request)
last_id = 0
uri = ""
for artifact in response.artifacts:
if artifact.custom_properties["pipeline_name"].string_value == "iris" and artifact.custom_properties["producer_component"].string_value == "StatisticsGen":
if artifact.id > last_id:
last_id = artifact.id
uri = artifact.uri
uri
Environment
I have deployed Charmed Kubeflow 1.8 via this guide
I have installed this packages to the notebook container via pip install tfx[kfp] :
tensowrlow 2.13.1
tfx 1.14.0
kfp 1.8.22
I have tested toy kf piplines from v1 and v2 sdks. Both works.
Workflows:
kubectl get workflow -A
NAMESPACE NAME STATUS AGE
admin hello-pipeline-slpl5 Succeeded 21h
admin execution-order-pipeline-xrnwf Succeeded 18h
admin transform-2-2rn9d Failed 17h
Bug Description
I have created TFX Pipeline. It works in local mode. But when I tried to run it in the Kubeflow. I have faced this error in the UI:
Error
To Reproduce
I took runner configuration from this and this guides.
Environment
I have deployed Charmed Kubeflow 1.8 via this guide
I have installed this packages to the notebook container via
pip install tfx[kfp]
:tensowrlow 2.13.1
tfx 1.14.0
kfp 1.8.22
I have tested toy kf piplines from v1 and v2 sdks. Both works.
Workflows:
Juju relations
juju_relations.txt
Relevant Log Output
Additional Context
No response
The text was updated successfully, but these errors were encountered: