this is more question than an issue #180

silvacarl2 · 2023-05-11T23:48:01Z

hi, we have 12M names and we would like to fine tune whisper on them. also, i am happy to share with you the results.

the question is it better to fine tune whisper using the entire spoken name? Or is it better to fine tune using invidial names and recording snippets of each anme spoken?

sanchit-gandhi · 2023-12-07T13:18:55Z

Hey @silvacarl2! Sorry for the late reply here! The best option would be to fine-tune on the closest scenario to what you expect at inference time. If you expect the model to transcribe the entire spoken name, then you should go with that.

silvacarl2 · 2023-12-07T15:14:49Z

excellent. we now have 14M samples to use for fine tuning. yes i think entire spoken name is the way to go.

Any pointers to the latest best techniques to fine tuning whisper for a use case like this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

this is more question than an issue #180

this is more question than an issue #180

silvacarl2 commented May 11, 2023

sanchit-gandhi commented Dec 7, 2023

silvacarl2 commented Dec 7, 2023

this is more question than an issue #180

this is more question than an issue #180

Comments

silvacarl2 commented May 11, 2023

sanchit-gandhi commented Dec 7, 2023

silvacarl2 commented Dec 7, 2023