-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running TFX transform fails to pip install #6955
Comments
That whl contains your |
Hi @kolaente |
Seems like an issue with my environment. It works when running it in a jupyter notebook server. Still, why does running the component try to install something via pip? |
Since Transform is a beam powered component, and beam, in a local environment, can execute multiple workers as different processes, each process needs the right packages, including the module_file containing preprocessing_fn, to execute in a distributed manner. This is the reason for the pip install. In fact, if you look at what happens when a pipeline is compiled to be executed in GCP, you will see that similar whl's are generated/staged in GCS, so that the workers executing in DataflowRunner also have a similar execution environment. If you unzip the whl file, you'll see what all is in there. However, I don't think pip installation is the issue here in the local environment, there is something more fundamental going on because of the SIGSEV being seen in the logs. It is possible to run Transform locally without the pip dependency (e.g. you want to do multi-threaded vs. multi-process approach), but that won't solve your problem unfortunately. |
System information
pip freeze
output):click to expand
Describe the current behavior
Following this TFX guide, running this code in a jupyter notebook cell:
(examples and schema have been generated previously)
results in this error:
It looks like running pip here failed - why does it even call pip in the first place? I'm able to run pip without issues in the venv.
Describe the expected behavior
Does not crash.
Standalone code to reproduce the issue
see above
The text was updated successfully, but these errors were encountered: