-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with CTranslate2 / faster-whisper #3
Comments
The weights will be released in Transformers format on the Hugging Face Hub tomorrow. It should be pretty straightforward to export them to faster-whisper format following these instructions: https://github.com/guillaumekln/faster-whisper/#model-conversion I'll add them to the model repos once converted! |
Unfortunately, conversion to CTranslate2 format throws an error
|
FYI: Related issue on faster-whisper to track full support |
The cross attention head dimensions should be exactly the same as the corresponding teacher models (which are whisper-large-v2 for distil-whisper-32-2 and whisper-medium.en for distil-whisper-24-2) |
@patrickvonplaten unfortunately not all the cross attentions are highly correlated with word timing. Different cross attention might attend to different things. So what openai did was to find out specifically which of the cross attentions are correlated and only use this subset for the timing alignment. |
Indeed, OpenAI hardcode these word-level timestamp alignment heads in their repo based on the cross-attention plots. We haven't found the optimal alignment heads for word-level timestamps for Distil-Whisper, so these word-level timestamps aren't available yet. Feel free to repeat the analysis from Jong Wook to see what the best configuration is here! We can then update the model's generation config accordingly to store this information. I'll also try and determine the best alignments from the validation sets in Distil-Whisper this week. |
upgrade ctranslate2 to 3.21.0 |
Great work!
I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not to mention speculative decoding.
Thanks,
Ewald
The text was updated successfully, but these errors were encountered: