Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with CTranslate2 / faster-whisper #3

Open
entn-at opened this issue Oct 31, 2023 · 8 comments
Open

Compatibility with CTranslate2 / faster-whisper #3

entn-at opened this issue Oct 31, 2023 · 8 comments

Comments

@entn-at
Copy link

entn-at commented Oct 31, 2023

Great work!

I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not to mention speculative decoding.

Thanks,
Ewald

@sanchit-gandhi
Copy link
Contributor

sanchit-gandhi commented Nov 1, 2023

The weights will be released in Transformers format on the Hugging Face Hub tomorrow. It should be pretty straightforward to export them to faster-whisper format following these instructions: https://github.com/guillaumekln/faster-whisper/#model-conversion

I'll add them to the model repos once converted!

@alexey-mik
Copy link

alexey-mik commented Nov 2, 2023

Unfortunately, conversion to CTranslate2 format throws an error

ValueError: Some required model attributes are not set:

decoder/layer_2/self_attention/layer_norm/gamma
decoder/layer_2/self_attention/layer_norm/beta
decoder/layer_2/self_attention/linear_0/weight
...

@AnkushMalaker
Copy link

FYI: Related issue on faster-whisper to track full support
SYSTRAN/faster-whisper#533

@chiiyeh
Copy link

chiiyeh commented Nov 4, 2023

Hi! I have done a PR on Ctranslate2 which will support the conversion for distil-whisper.
Though for the word timing alignment it seems like openai hardcoded the specific cross attention head that are highly correlated with the word timing here. Not sure if there is similar one for distil-whisper.

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Nov 7, 2023

The cross attention head dimensions should be exactly the same as the corresponding teacher models (which are whisper-large-v2 for distil-whisper-32-2 and whisper-medium.en for distil-whisper-24-2)

@chiiyeh
Copy link

chiiyeh commented Nov 8, 2023

@patrickvonplaten unfortunately not all the cross attentions are highly correlated with word timing. Different cross attention might attend to different things. So what openai did was to find out specifically which of the cross attentions are correlated and only use this subset for the timing alignment.
Currently there is some heuristic used (which is all the cross attention for the last half layers i think), but this should be less accurate then handpicking the subset. So the PR can work but expect more inaccuracy with the word level timing.
If anyone is interested jongwook replied how he handpick the layer in this discussion here

@sanchit-gandhi
Copy link
Contributor

sanchit-gandhi commented Nov 13, 2023

Indeed, OpenAI hardcode these word-level timestamp alignment heads in their repo based on the cross-attention plots.

We haven't found the optimal alignment heads for word-level timestamps for Distil-Whisper, so these word-level timestamps aren't available yet.

Feel free to repeat the analysis from Jong Wook to see what the best configuration is here! We can then update the model's generation config accordingly to store this information. I'll also try and determine the best alignments from the validation sets in Distil-Whisper this week.

@shuaijiang
Copy link

Unfortunately, conversion to CTranslate2 format throws an error

ValueError: Some required model attributes are not set:

decoder/layer_2/self_attention/layer_norm/gamma
decoder/layer_2/self_attention/layer_norm/beta
decoder/layer_2/self_attention/linear_0/weight
...

upgrade ctranslate2 to 3.21.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants