-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counter Component #7700
Comments
In studying the potential protocol buffers I could use as a template for this, I ran into the protocol buffer used for As such, I don't believe a Counter Component is needed, however, how does one go about using this |
I have this implemented locally. Based on this implementation, in
The
Please let me know if there is interest in integrating this into TFX. If there is, I would write the test cases etc., and submit the PR. I would recommend we try this out with the vanilla Trainer component, make sure everything is good, then implement Tuner. After we're all good there, we can update the cloud versions of Trainer and Tuner. |
Enables the user to use number of examples information computed by StatisticsGen in their training code. Passing statistics to trainer enables the use of fn_args.num_examples['train'] etc., in run_fn More details at: tensorflow#7700
Hello, Can you please provide feedback on the PR I have mentioned above? Thank you! Pritam |
If the feature is related to a specific library below, please raise an issue in
the respective repo directly:
TensorFlow Data Validation Repo
TensorFlow Model Analysis Repo
TensorFlow Transform Repo
TensorFlow Serving Repo
System information
(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc..): Local, GCP
Describe the feature and the current behavior/state.
Knowing how many examples you have is a very useful thing. The Counter Component would count the number of examples in the input data and provide this information to downstream components (e.g. Tuner, Trainer) that may want to use that information.
Will this change the current API? How?
This would introduce a new component, it could potentially add additional information that could be sent to Trainer and Tuner components. Those inputs would naturally be made optional in order to not break existing API.
Who will benefit with this feature?
Users who would benefit from the pipeline knowing how many examples there are in input data.
Do you have a workaround or are completely blocked by this? :
Workaround currently is to count rows in the original csv etc., which introduces additional code to be maintained. This could also be done with tfrecords, but everybody would be solving the same problem in many different ways.
Name of your Organization (Optional)
Intuitive Cloud (GCP Partner).
Any Other info.
I have this partially written, I need advice on data formats etc (e.g. how to store this on disk) and how to deal with spans etc., I don't need help writing it per se, but rather need advise in how to make it fit well into the TFX ecosystem. I've been using StatisticsGen as the component to model this after, as StatisticsGen is most similar (e.g. produces numbers, works across splits).
The text was updated successfully, but these errors were encountered: