Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support conversion for distil-whisper model #1529

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

chiiyeh
Copy link
Contributor

@chiiyeh chiiyeh commented Nov 3, 2023

Previous code was assuming same number of encoder and decoder layer which is no longer true with the distil-whisper model. Removed this assumptions and obtain the number of layer separately.

distil-whisper: https://github.com/huggingface/distil-whisper
Fixes this issue: huggingface/distil-whisper#3

@vince62s
Copy link
Member

vince62s commented Nov 3, 2023

please rebase or fix your PR the same way as here: 5258816

Previous code was assuming same number of encoder and decoder layer.
Removed this assumptions and obtain the number of layer separately.
@chiiyeh chiiyeh force-pushed the bugfix--distil-whisper-support branch from a255174 to cd26b3e Compare November 3, 2023 13:56
@AmgadHasan
Copy link

Previous code was assuming same number of encoder and decoder layer which is no longer true with the distil-whisper model. Removed this assumptions and obtain the number of layer separately.

distil-whisper: https://github.com/huggingface/distil-whisper Fixes this issue: huggingface/distil-whisper#3

Can you share the code to convert it from HF to CTranslate2?

@chiiyeh
Copy link
Contributor Author

chiiyeh commented Nov 4, 2023

@vince62s have rebased as requested.
@AmgadHasan with this PR you can convert it just like for openai whisper. https://opennmt.net/CTranslate2/guides/transformers.html#whisper

@kristiankielhofner
Copy link

Can you share the code to convert it from HF to CTranslate2?

If it helps I converted them to CTranslate2 float16 with this PR here:

https://huggingface.co/tovera/wis-distil-whisper-medium.en

https://huggingface.co/tovera/wis-distil-whisper-large-v2

@alexey-mik
Copy link

I converted them to CTranslate2 float16 with this PR

Thank you for your work! I tried this converted models and I'm getting error:

ValueError: <|startoftranscript|> token was not found in the prompt

Can you confirm that it's working in your case so I can be sure it's something on my end?

@kristiankielhofner
Copy link

I converted them to CTranslate2 float16 with this PR

Thank you for your work! I tried this converted models and I'm getting error:

ValueError: <|startoftranscript|> token was not found in the prompt

Can you confirm that it's working in your case so I can be sure it's something on my end?

I've observed this with the medium.en model, but not with large-v2. For my applications I've never used the English-only models, so I haven't taken the time (yet?) to dig into this more.

@AmgadHasan
Copy link

Can you share the code to convert it from HF to CTranslate2?

If it helps I converted them to CTranslate2 float16 with this PR here:

https://huggingface.co/tovera/wis-distil-whisper-medium.en

https://huggingface.co/tovera/wis-distil-whisper-large-v2

Thanks!

Is there a way to convert into CTranslate2 while preserving the original float32 precision?
I'm asking about whisper in general not specific to distil-whisper. When I tried to do this, the conversion process failed.

@kristiankielhofner
Copy link

kristiankielhofner commented Nov 6, 2023

Can you share the code to convert it from HF to CTranslate2?

If it helps I converted them to CTranslate2 float16 with this PR here:
https://huggingface.co/tovera/wis-distil-whisper-medium.en
https://huggingface.co/tovera/wis-distil-whisper-large-v2

Thanks!

Is there a way to convert into CTranslate2 while preserving the original float32 precision? I'm asking about whisper in general not specific to distil-whisper. When I tried to do this, the conversion process failed.

This is also something I haven't experimented with. For a variety of reasons most ctranslate2 users don't utilize float32 and this is why you see most conversion scripts, examples, etc targeting float16. Also many/most ctranslate2 users end up using int8, int8_float16, etc so conversion to float16 for distribution of models makes the most sense for most use cases. Ctranslate2 includes the incredible ability to automatically and very performantly use float16 converted models in the user's configured precision automatically at runtime.

@vince62s vince62s merged commit d0a9227 into OpenNMT:master Nov 7, 2023
17 checks passed
@BBC-Esq
Copy link

BBC-Esq commented Nov 7, 2023

This is also something I haven't experimented with. For a variety of reasons most ctranslate2 users don't utilize float32 and this is why you see most conversion scripts, examples, etc targeting float16. Also many/most ctranslate2 users end up using int8, int8_float16, etc so conversion to float16 for distribution of models makes the most sense for most use cases. Ctranslate2 includes the incredible ability to automatically and very performantly use float16 converted models in the user's configured precision automatically at runtime.

I love Float32. I've done tests with the small.en and other models, and most of them actually run faster when using the float32 format when compared to intn8_float32, for example. The trade off is that float32 requires more vram. Float 32 also performs faster than bfloat16 in my testing (RTX 4090 using CUDA). Again, bfloat16 uses less VRAM. Interestingly, float16 is significantly faster than bfloat16 in my tests...not sure why that is.

At any rate, having ALL the ctranslate2 quants for the distil-whisper model would be helpful so people can choose.

You'll notice how I've converted all whisper models to all permutations of ctranslate2 quants here:

https://huggingface.co/ctranslate2-4you

I would love to do the same for the distil-whisper models, but need the conversion command as well as the updated version of ctranslate2 library to actually support converting distil-whisper models.

Any help would be much appreciated on this and I'd be happy to convert any and all quants for people to use.

I also want to add...Converting at "runtime" is not always the most efficient way of running the models...hence, why additional converted quants are preferable. There's the overhead of conversion (albeit small), but more importantly, it's my understanding that quality will suffer if you convert at runtime...let me explain.

Scenario 1 = Model in float32 but converted to int8_float32. Runtime, no conversion necessary since already in int8_float32.

Scenario 2 - Model in flolat16 but converted to int8_float32 at runtime.

It's my understanding that even though the model is run in int8_float32 at runtime, Scenario #1 is higher quality. This is because certain data is lost in converting to float16 first...and then to int8_float32. This comes down to how the "exponent" and "fraction" bits of the floating point format itself are essentially zeroed out or truncated...you can't gain them back. Float16 has truncated the bits...and then when you try to use int8_float32...you don't get the full "float32" portion because your basing the runtime conversion on float16...

Thus, having the quants you want natively is helpful. I'm willing to do the work converting if I only knew how. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants