Compatibility with CTranslate2 / faster-whisper #3

entn-at · 2023-10-31T22:26:10Z

Great work!

I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not to mention speculative decoding.

Thanks,
Ewald

sanchit-gandhi · 2023-11-01T15:25:13Z

The weights will be released in Transformers format on the Hugging Face Hub tomorrow. It should be pretty straightforward to export them to faster-whisper format following these instructions: https://github.com/guillaumekln/faster-whisper/#model-conversion

I'll add them to the model repos once converted!

alexey-mik · 2023-11-02T17:17:16Z

Unfortunately, conversion to CTranslate2 format throws an error

ValueError: Some required model attributes are not set:

decoder/layer_2/self_attention/layer_norm/gamma
decoder/layer_2/self_attention/layer_norm/beta
decoder/layer_2/self_attention/linear_0/weight
...

AnkushMalaker · 2023-11-02T18:09:56Z

FYI: Related issue on faster-whisper to track full support
SYSTRAN/faster-whisper#533

chiiyeh · 2023-11-04T00:56:58Z

Hi! I have done a PR on Ctranslate2 which will support the conversion for distil-whisper.
Though for the word timing alignment it seems like openai hardcoded the specific cross attention head that are highly correlated with the word timing here. Not sure if there is similar one for distil-whisper.

patrickvonplaten · 2023-11-07T11:29:49Z

The cross attention head dimensions should be exactly the same as the corresponding teacher models (which are whisper-large-v2 for distil-whisper-32-2 and whisper-medium.en for distil-whisper-24-2)

chiiyeh · 2023-11-08T00:48:25Z

@patrickvonplaten unfortunately not all the cross attentions are highly correlated with word timing. Different cross attention might attend to different things. So what openai did was to find out specifically which of the cross attentions are correlated and only use this subset for the timing alignment.
Currently there is some heuristic used (which is all the cross attention for the last half layers i think), but this should be less accurate then handpicking the subset. So the PR can work but expect more inaccuracy with the word level timing.
If anyone is interested jongwook replied how he handpick the layer in this discussion here

sanchit-gandhi · 2023-11-13T14:13:10Z

Indeed, OpenAI hardcode these word-level timestamp alignment heads in their repo based on the cross-attention plots.

We haven't found the optimal alignment heads for word-level timestamps for Distil-Whisper, so these word-level timestamps aren't available yet.

Feel free to repeat the analysis from Jong Wook to see what the best configuration is here! We can then update the model's generation config accordingly to store this information. I'll also try and determine the best alignments from the validation sets in Distil-Whisper this week.

shuaijiang · 2023-11-14T04:21:32Z

Unfortunately, conversion to CTranslate2 format throws an error

ValueError: Some required model attributes are not set:

decoder/layer_2/self_attention/layer_norm/gamma
decoder/layer_2/self_attention/layer_norm/beta
decoder/layer_2/self_attention/linear_0/weight
...

upgrade ctranslate2 to 3.21.0

chiiyeh mentioned this issue Nov 3, 2023

Support conversion for distil-whisper model OpenNMT/CTranslate2#1529

Merged

H-G-11 mentioned this issue Nov 3, 2023

Is it possible to use this with 'faster_whisper' #17

Closed

9throok mentioned this issue Feb 21, 2024

Best way to implement streaming application? #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with CTranslate2 / faster-whisper #3

Compatibility with CTranslate2 / faster-whisper #3

entn-at commented Oct 31, 2023

sanchit-gandhi commented Nov 1, 2023 •

edited

Loading

alexey-mik commented Nov 2, 2023 •

edited

Loading

AnkushMalaker commented Nov 2, 2023

chiiyeh commented Nov 4, 2023

patrickvonplaten commented Nov 7, 2023 •

edited

Loading

chiiyeh commented Nov 8, 2023 •

edited

Loading

sanchit-gandhi commented Nov 13, 2023 •

edited

Loading

shuaijiang commented Nov 14, 2023

Compatibility with CTranslate2 / faster-whisper #3

Compatibility with CTranslate2 / faster-whisper #3

Comments

entn-at commented Oct 31, 2023

sanchit-gandhi commented Nov 1, 2023 • edited Loading

alexey-mik commented Nov 2, 2023 • edited Loading

AnkushMalaker commented Nov 2, 2023

chiiyeh commented Nov 4, 2023

patrickvonplaten commented Nov 7, 2023 • edited Loading

chiiyeh commented Nov 8, 2023 • edited Loading

sanchit-gandhi commented Nov 13, 2023 • edited Loading

shuaijiang commented Nov 14, 2023

sanchit-gandhi commented Nov 1, 2023 •

edited

Loading

alexey-mik commented Nov 2, 2023 •

edited

Loading

patrickvonplaten commented Nov 7, 2023 •

edited

Loading

chiiyeh commented Nov 8, 2023 •

edited

Loading

sanchit-gandhi commented Nov 13, 2023 •

edited

Loading