Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

this is more question than an issue #180

Open
silvacarl2 opened this issue May 11, 2023 · 2 comments
Open

this is more question than an issue #180

silvacarl2 opened this issue May 11, 2023 · 2 comments

Comments

@silvacarl2
Copy link

hi, we have 12M names and we would like to fine tune whisper on them. also, i am happy to share with you the results.

the question is it better to fine tune whisper using the entire spoken name? Or is it better to fine tune using invidial names and recording snippets of each anme spoken?

@sanchit-gandhi
Copy link
Contributor

Hey @silvacarl2! Sorry for the late reply here! The best option would be to fine-tune on the closest scenario to what you expect at inference time. If you expect the model to transcribe the entire spoken name, then you should go with that.

@silvacarl2
Copy link
Author

excellent. we now have 14M samples to use for fine tuning. yes i think entire spoken name is the way to go.

Any pointers to the latest best techniques to fine tuning whisper for a use case like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants