Support conversion for distil-whisper model #1529

chiiyeh · 2023-11-03T05:42:33Z

Previous code was assuming same number of encoder and decoder layer which is no longer true with the distil-whisper model. Removed this assumptions and obtain the number of layer separately.

distil-whisper: https://github.com/huggingface/distil-whisper
Fixes this issue: huggingface/distil-whisper#3

vince62s · 2023-11-03T12:48:39Z

please rebase or fix your PR the same way as here: 5258816

Previous code was assuming same number of encoder and decoder layer. Removed this assumptions and obtain the number of layer separately.

AmgadHasan · 2023-11-03T17:44:19Z

Previous code was assuming same number of encoder and decoder layer which is no longer true with the distil-whisper model. Removed this assumptions and obtain the number of layer separately.

distil-whisper: https://github.com/huggingface/distil-whisper Fixes this issue: huggingface/distil-whisper#3

Can you share the code to convert it from HF to CTranslate2?

chiiyeh · 2023-11-04T00:45:15Z

@vince62s have rebased as requested.
@AmgadHasan with this PR you can convert it just like for openai whisper. https://opennmt.net/CTranslate2/guides/transformers.html#whisper

kristiankielhofner · 2023-11-04T06:00:10Z

Can you share the code to convert it from HF to CTranslate2?

If it helps I converted them to CTranslate2 float16 with this PR here:

https://huggingface.co/tovera/wis-distil-whisper-medium.en

https://huggingface.co/tovera/wis-distil-whisper-large-v2

alexey-mik · 2023-11-06T15:42:10Z

I converted them to CTranslate2 float16 with this PR

Thank you for your work! I tried this converted models and I'm getting error:

ValueError: <|startoftranscript|> token was not found in the prompt

Can you confirm that it's working in your case so I can be sure it's something on my end?

kristiankielhofner · 2023-11-06T15:45:43Z

I converted them to CTranslate2 float16 with this PR

Thank you for your work! I tried this converted models and I'm getting error:
ValueError: <|startoftranscript|> token was not found in the prompt
Can you confirm that it's working in your case so I can be sure it's something on my end?

I've observed this with the medium.en model, but not with large-v2. For my applications I've never used the English-only models, so I haven't taken the time (yet?) to dig into this more.

AmgadHasan · 2023-11-06T16:59:59Z

Can you share the code to convert it from HF to CTranslate2?

If it helps I converted them to CTranslate2 float16 with this PR here:

https://huggingface.co/tovera/wis-distil-whisper-medium.en

https://huggingface.co/tovera/wis-distil-whisper-large-v2

Thanks!

Is there a way to convert into CTranslate2 while preserving the original float32 precision?
I'm asking about whisper in general not specific to distil-whisper. When I tried to do this, the conversion process failed.

kristiankielhofner · 2023-11-06T17:06:56Z

Can you share the code to convert it from HF to CTranslate2?

If it helps I converted them to CTranslate2 float16 with this PR here:
https://huggingface.co/tovera/wis-distil-whisper-medium.en
https://huggingface.co/tovera/wis-distil-whisper-large-v2

Thanks!

Is there a way to convert into CTranslate2 while preserving the original float32 precision? I'm asking about whisper in general not specific to distil-whisper. When I tried to do this, the conversion process failed.

This is also something I haven't experimented with. For a variety of reasons most ctranslate2 users don't utilize float32 and this is why you see most conversion scripts, examples, etc targeting float16. Also many/most ctranslate2 users end up using int8, int8_float16, etc so conversion to float16 for distribution of models makes the most sense for most use cases. Ctranslate2 includes the incredible ability to automatically and very performantly use float16 converted models in the user's configured precision automatically at runtime.

BBC-Esq · 2023-11-07T13:26:59Z

This is also something I haven't experimented with. For a variety of reasons most ctranslate2 users don't utilize float32 and this is why you see most conversion scripts, examples, etc targeting float16. Also many/most ctranslate2 users end up using int8, int8_float16, etc so conversion to float16 for distribution of models makes the most sense for most use cases. Ctranslate2 includes the incredible ability to automatically and very performantly use float16 converted models in the user's configured precision automatically at runtime.

I love Float32. I've done tests with the small.en and other models, and most of them actually run faster when using the float32 format when compared to intn8_float32, for example. The trade off is that float32 requires more vram. Float 32 also performs faster than bfloat16 in my testing (RTX 4090 using CUDA). Again, bfloat16 uses less VRAM. Interestingly, float16 is significantly faster than bfloat16 in my tests...not sure why that is.

At any rate, having ALL the ctranslate2 quants for the distil-whisper model would be helpful so people can choose.

You'll notice how I've converted all whisper models to all permutations of ctranslate2 quants here:

https://huggingface.co/ctranslate2-4you

I would love to do the same for the distil-whisper models, but need the conversion command as well as the updated version of ctranslate2 library to actually support converting distil-whisper models.

Any help would be much appreciated on this and I'd be happy to convert any and all quants for people to use.

I also want to add...Converting at "runtime" is not always the most efficient way of running the models...hence, why additional converted quants are preferable. There's the overhead of conversion (albeit small), but more importantly, it's my understanding that quality will suffer if you convert at runtime...let me explain.

Scenario 1 = Model in float32 but converted to int8_float32. Runtime, no conversion necessary since already in int8_float32.

Scenario 2 - Model in flolat16 but converted to int8_float32 at runtime.

It's my understanding that even though the model is run in int8_float32 at runtime, Scenario #1 is higher quality. This is because certain data is lost in converting to float16 first...and then to int8_float32. This comes down to how the "exponent" and "fraction" bits of the floating point format itself are essentially zeroed out or truncated...you can't gain them back. Float16 has truncated the bits...and then when you try to use int8_float32...you don't get the full "float32" portion because your basing the runtime conversion on float16...

Thus, having the quants you want natively is helpful. I'm willing to do the work converting if I only knew how. Thanks.

Support conversion for distil-whisper

cd26b3e

Previous code was assuming same number of encoder and decoder layer. Removed this assumptions and obtain the number of layer separately.

chiiyeh force-pushed the bugfix--distil-whisper-support branch from a255174 to cd26b3e Compare November 3, 2023 13:56

chiiyeh mentioned this pull request Nov 4, 2023

Compatibility with CTranslate2 / faster-whisper huggingface/distil-whisper#3

Open

vince62s merged commit d0a9227 into OpenNMT:master Nov 7, 2023
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support conversion for distil-whisper model #1529

Support conversion for distil-whisper model #1529

chiiyeh commented Nov 3, 2023

vince62s commented Nov 3, 2023

AmgadHasan commented Nov 3, 2023

chiiyeh commented Nov 4, 2023

kristiankielhofner commented Nov 4, 2023

alexey-mik commented Nov 6, 2023

kristiankielhofner commented Nov 6, 2023

AmgadHasan commented Nov 6, 2023

kristiankielhofner commented Nov 6, 2023 •

edited

Loading

BBC-Esq commented Nov 7, 2023 •

edited

Loading

Support conversion for distil-whisper model #1529

Support conversion for distil-whisper model #1529

Conversation

chiiyeh commented Nov 3, 2023

vince62s commented Nov 3, 2023

AmgadHasan commented Nov 3, 2023

chiiyeh commented Nov 4, 2023

kristiankielhofner commented Nov 4, 2023

alexey-mik commented Nov 6, 2023

kristiankielhofner commented Nov 6, 2023

AmgadHasan commented Nov 6, 2023

kristiankielhofner commented Nov 6, 2023 • edited Loading

BBC-Esq commented Nov 7, 2023 • edited Loading

kristiankielhofner commented Nov 6, 2023 •

edited

Loading

BBC-Esq commented Nov 7, 2023 •

edited

Loading