update cuda 11.2 to cuda 12.2 #1590

minhthuc2502 · 2023-12-20T13:49:09Z

No description provided.

Purfview · 2023-12-22T03:41:50Z

Thx for the PR, but I got disappointed by CUDA 12...
Would be nice if you could keep future CUDA 11 builds.

RTX 3050 GPU [diff vs CUDA 11]:

float16:      -10% drop in speed 
bfloat16:      -8% drop in speed
int8_bfloat16:  0% same

BBC-Esq · 2023-12-22T03:52:16Z

I'm all for keeping backwards compatibility with CUDA 11, if feasible. Not sure if Purfview was suggesting to not pursue CUDA 12 builds altogether...but if that's the case I'd definitely recommend a lot more testing. But in terms of backward compatibility, virtually always a good thing IMHO...coming from a non-professional and hobbyist developer so...

(apologize in advance if backwards compatibility wasn't the correct phrase...)

nguyendc-systran · 2023-12-26T09:29:52Z

Thx for the PR, but I got disappointed by CUDA 12... Would be nice if you could keep future CUDA 11 builds.

RTX 3050 GPU [diff vs CUDA 11]:
float16:      -10% drop in speed 
bfloat16:      -8% drop in speed
int8_bfloat16:  0% same

Thanks @Purfview for sharing information.
We will try to make some bench-marking on our side at beginning of 2024. If nothing is blocking I think we would merge to support CUDA 12, and still keep the support CUDA 11 for a while (if this support is interesting for community).

Purfview · 2023-12-26T11:55:16Z

Diff from the tests in a new environment [various optimizations for performance]:

float16:       -1% drop in speed 
bfloat16:      -5% drop in speed
int8_float16: -21% drop in speed

EDIT:
This and the previous tests were done on Windows [Nvidia Driver Version: 546.33].
Every compute type ran 3 times and the speed was averaged.
"new environment" had a hyper option enabled -> Hardware-accelerated GPU scheduling: ON

BBC-Esq · 2023-12-26T11:59:45Z

Thx for the PR, but I got disappointed by CUDA 12... Would be nice if you could keep future CUDA 11 builds.
RTX 3050 GPU [diff vs CUDA 11]:
float16:      -10% drop in speed 
bfloat16:      -8% drop in speed
int8_bfloat16:  0% same
Thanks @Purfview for sharing information. We will try to make some bench-marking on our side at beginning of 2024. If nothing is blocking I think we would merge to support CUDA 12, and still keep the support CUDA 11 for a while (if this support is interesting for community).

On my end (and I'm just one amateur developer among professionals), I know that my user base (however small) would appreciate CUDA 11 support for awhile longer at least. Not all computer setups support CUDA 12, python libraries, etc. so to have ctranslate2 only work with one version of CUDA at a time would be harsh. I've noticed PyTorch's policy is to generally advertise two major version support...Maybe that could be a policy of ctranslate2?

And then ctranslate2 could have a repository of older builds, easy for user's to understand, showing which version of CUDA are supported up to which version of Ctranslate2...like PyTorch has an "old builds" page.

Anyways, I'm excited! Just saw this...IMHO, a year is kinda long to not yet have CUDA 12 support without having to compile from source...

BBC-Esq · 2024-01-25T12:16:02Z

Does anyone know if this is still being worked on? It was on the verge of being incorporating CUDA 12+ but it's been awhile.

Qubitium · 2024-02-02T23:47:01Z

Get this merged asap! I see no regression on my end.

BBC-Esq · 2024-02-02T23:52:38Z

Get this merged asap! I see no regression on my end.

Yes, please merge. I don't even think they're talking about removing support for CUDA 11.8, but just adding CUDA 12 support!

BBC-Esq · 2024-02-15T10:19:05Z

@minhthuc2502 Is it possible to upload this to pypi.org now so that I can "pip install" the newer version that supports CUDA 12?

ozancaglayan · 2024-02-16T12:57:02Z

Do you have an idea why I get a nice speedup with the small whisper model with bfloat16 compared to auto which selects int8_float16 but an horrible slowdown for the medium model? GPU is A10G. With bfloat16, runtime is also fluctuating a lot with weird outlier runs that are super slow. What's also interesting that it does not seem to happen with the large-v2 model.

Whisper config is the following:

temperature: 0
beam_size: 1
condition_on_previous_text: false
vad_filter: false

Each experiment is run 5 times, median time is reported. Audio file has 5 minutes of content. I report the number of words generated at each trial as well. They're consistently the same across 5 runs so no stochasticity happens in generation.
You can see that *bfloat16 dtype always generates much more text than other data types but for large-v2 the situation is much better. These are marked with * and in parentheses you can see the actual word count for that run. However it's never getting near 120 (closest is 130-138 for large-v2) which is the golden transcription produced by other dtypes
Number of words generated do not change across CUDA/CTranslate2 versions.
I see no significant speed differences between CTranslate v3 CUDA11 and CTranslate v4 CUDA12 which is good.

	#Words	CTv3-CUDA11			CTv4-CUDA12
5 runs		small	medium	large-v2	small	medium	large-v2
Compute Type		Time	Time	Time	Time	Time	Time
float32	~120	1.72	3.45	6.54	1.71	3.41	6.50
float16	~120	1.17	2.18	3.88	1.20	2.27	3.82
bfloat16	>500	2.26	6.18	5.20 (258w)	2.37	6.20	5.30 (250w)
int8_float32	~120	1.47	2.80	4.47	1.50	2.93	4.41
int8_float16	~120	1.29	2.34	3.73	1.34	2.78	3.56
int8_bfloat16	>500	2.70	6.48	3.50 (137w)	2.30 (400w)	6.92	3.75 (130w)

Conclusion: Main problem is bfloat16 over-generating. Maybe its a quantization issue or a faster-whisper - CTranslate2 conversion issue of the floats.

Purfview · 2024-02-16T13:05:09Z

@ozancaglayan Use temperature=0 so the benchmark tests would be consistent.

ozancaglayan · 2024-02-16T13:17:37Z

Yes I just noticed that but it's again weird that bfloat16 runs have a tendency to be affected by temperature whereas other ones are not.

Purfview · 2024-02-16T13:19:36Z

It's not weird at all.

ozancaglayan · 2024-02-16T13:22:44Z

OK I updated the table, bfloat16 is still inconsistent across model types.

Purfview · 2024-02-16T13:38:57Z

Try much longer tests, not the few seconds.

Btw, are you saying that this inconsistency appeared with CUDA12?

ozancaglayan · 2024-02-16T13:43:23Z

I'm now repeating the tests with CTranslate < 4 using CUDA11. Inconsistencies are there as well. I'm running each test 5 times on the same audio file of length 5 minutes so I think it's good enough.

I'm now counting the length of the texts generated at each run and bfloat16 is definitely over-generating the same contents again and again. Applying VAD beforehands seems to cut the number of segments/words generated for bfloat16 but its still over-generating.

Purfview · 2024-02-16T14:13:27Z

definitely over-generating the same contents again and again

Ar you using a clear speech audio without noise and silence?

ozancaglayan · 2024-02-16T14:17:07Z

Very clear speech with very silent blocks. But even if I apply VAD, bfloat16 still seems to over-generate.

ozancaglayan · 2024-02-16T14:17:33Z

I updated my previous table with final final results btw see #1590 (comment)

Purfview · 2024-02-16T14:21:10Z

I'm sure there is something wrong with your test than something else.

ozancaglayan · 2024-02-16T14:29:40Z

Did you try to do similar benchmarking with and without bfloat16 on a supported device? Why would I only see this consistently with *bfloat16 types?

update cuda-11 to cuda 12

97aaff7

minhthuc2502 changed the title ~~update cuda 11.2 to cuda 12.2~~ update cuda 11.2 to cuda 12.2 WIP Dec 20, 2023

thucpham added 8 commits December 20, 2023 16:39

update cuda-11 to cuda 12 windows

ebc6b63

update path cudnn for windows

aadf58f

check installer cudnn for windows

fb1aa68

add silence model in cudnn installer

5f34729

try new cudnn root

61590af

try wait installer finish

610b3b3

try sleep 10 secnds

9e0ed25

fix cp

1e8b3fb

minhthuc2502 changed the title ~~update cuda 11.2 to cuda 12.2 WIP~~ update cuda 11.2 to cuda 12.2 Dec 21, 2023

update dockerfile for cuda 12

074ab90

minhthuc2502 merged commit 8c6715e into OpenNMT:master Feb 7, 2024
17 checks passed

BBC-Esq mentioned this pull request Feb 7, 2024

update readme re faster-whisper collabora/WhisperLive#129

Closed

Purfview mentioned this pull request Apr 10, 2024

CUDA version and updated installation instructions SYSTRAN/faster-whisper#785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update cuda 11.2 to cuda 12.2 #1590

update cuda 11.2 to cuda 12.2 #1590

minhthuc2502 commented Dec 20, 2023

Purfview commented Dec 22, 2023 •

edited

Loading

BBC-Esq commented Dec 22, 2023 •

edited

Loading

nguyendc-systran commented Dec 26, 2023

Purfview commented Dec 26, 2023 •

edited

Loading

BBC-Esq commented Dec 26, 2023 •

edited

Loading

BBC-Esq commented Jan 25, 2024

Qubitium commented Feb 2, 2024

BBC-Esq commented Feb 2, 2024

BBC-Esq commented Feb 15, 2024

ozancaglayan commented Feb 16, 2024 •

edited

Loading

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Feb 16, 2024 •

edited

Loading

ozancaglayan commented Feb 16, 2024 •

edited

Loading

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

update cuda 11.2 to cuda 12.2 #1590

update cuda 11.2 to cuda 12.2 #1590

Conversation

minhthuc2502 commented Dec 20, 2023

Purfview commented Dec 22, 2023 • edited Loading

BBC-Esq commented Dec 22, 2023 • edited Loading

nguyendc-systran commented Dec 26, 2023

Purfview commented Dec 26, 2023 • edited Loading

BBC-Esq commented Dec 26, 2023 • edited Loading

BBC-Esq commented Jan 25, 2024

Qubitium commented Feb 2, 2024

BBC-Esq commented Feb 2, 2024

BBC-Esq commented Feb 15, 2024

ozancaglayan commented Feb 16, 2024 • edited Loading

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Feb 16, 2024 • edited Loading

ozancaglayan commented Feb 16, 2024 • edited Loading

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Feb 16, 2024

ozancaglayan commented Feb 16, 2024

Purfview commented Dec 22, 2023 •

edited

Loading

BBC-Esq commented Dec 22, 2023 •

edited

Loading

Purfview commented Dec 26, 2023 •

edited

Loading

BBC-Esq commented Dec 26, 2023 •

edited

Loading

ozancaglayan commented Feb 16, 2024 •

edited

Loading

Purfview commented Feb 16, 2024 •

edited

Loading

ozancaglayan commented Feb 16, 2024 •

edited

Loading