Slower performance on Whisper model #269

uanandaraja · 2023-07-14T06:11:08Z

uanandaraja
Jul 14, 2023

Hi!

I noticed a quite significant difference in performance, specifically Whisper Large model. Usually, a 1 hour audio will be done in 1-2 minutes, but now it's about 10-15 minutes.

Thank you.

Answered by jjmaldonis

Jul 31, 2023

Hey all, early last week we improved the underlying systems that run Whisper to support the growing demand from our users. You should have seen significant improvements in the number of errors received and the latency of API requests to Whisper.

We will continue to improve the underlying infrastructure over the coming months, and may introduce additional rate limits if Whisper latency remains higher than desired.

Many of our users rely on Deepgram to serve voice-to-text solutions to their own customer base. When Whisper latency is high (and request take longer than normal to complete), those customers can see a poorer user experience because the audio files they upload seem to "hang" for …

View full answer

jjmaldonis · 2023-07-14T14:32:07Z

jjmaldonis
Jul 14, 2023
Maintainer

Whisper is currently having some issues. You can keep an eye on our status page to see when it's resolved.

6 replies

uanandaraja Jul 14, 2023
Author

Yeah, I thought it was resolved? So the issue still persists yes?

jjmaldonis Jul 14, 2023
Maintainer

I just learned that Whisper is currently experiencing an increase in latency. Spikes like this happen occasionally and they get resolved fairly quickly.

We have a couple engineers looking into whisper performance. If that yields results, you'll see some additional performance improvements.

uanandaraja Jul 16, 2023
Author

@jjmaldonis the latency is still quite bad as of today. Any updates?

TomaszSzymanskiDl Jul 19, 2023

I recently tested Deepgram's Whisper Large model and OpenAI's hosting using a 1.7MB mp3 file from a Polish podcast. I used Deepgram free trial. The processing times were notably different, with Deepgram taking 23 seconds and OpenAI only 7 seconds.

The parameters used were:
punctuate: true,
smart_format: true,
paragraphs: true

I'm curious - are your experiences similar? Is OpenAI's hosting typically faster, or could this be a one-off error? I'd appreciate your insights on this. Or maybe it is the problem of free Deepgram version I test.

mosnicholas Jul 19, 2023

Yes @TomaszSzymanskiDl :(

ramild · 2023-07-20T11:47:56Z

ramild
Jul 20, 2023

Hey @jjmaldonis, we too have been noticing that Whisper has been slowing down quite a bit in the past month. A month ago, it used to process 1-2 minute audio files in about 10 seconds, which was a bit slower than OpenAI's Whisper at 6-7 seconds, but it was manageable.

Lately, for the same audio it may take between 40 seconds to 1.5 minutes, and sometimes even longer than 10 minutes. The response times have become unpredictable. Here are a couple of examples of request ids with extended wait times:

d83d7966-437b-4198-9423-d2e2d6571c0f (more than 2 minutes)
74bad510-8f97-437f-8164-d9c362263986 (more than 10 minutes)

Do you know when this might be fixed? Can we expect Whisper will be more stable and fast in the near future?

We're trying to figure out our game plan while we wait it to get fixed. Thanks!

0 replies

TomaszSzymanskiDl · 2023-07-31T12:31:35Z

TomaszSzymanskiDl
Jul 31, 2023

guys, did you notice any improvement in speed?

0 replies

jjmaldonis · 2023-07-31T14:47:39Z

jjmaldonis
Jul 31, 2023
Maintainer

Hey all, early last week we improved the underlying systems that run Whisper to support the growing demand from our users. You should have seen significant improvements in the number of errors received and the latency of API requests to Whisper.

We will continue to improve the underlying infrastructure over the coming months, and may introduce additional rate limits if Whisper latency remains higher than desired.

Many of our users rely on Deepgram to serve voice-to-text solutions to their own customer base. When Whisper latency is high (and request take longer than normal to complete), those customers can see a poorer user experience because the audio files they upload seem to "hang" for an unknown amount of time. If necessary, additional rate limits will improve the end customer's experience by providing a near-instantaneous message saying, "this API request cannot be fulfilled in a reasonable amount of time", which allows products that rely on Deepgram to handle this scenario in a way that meets their customers' expectations. If you are using or plan to use thousands of dollars of Whisper requests per month, please keep an eye out for additional rate limits; you may also want to reach out to our sales team to continue the conversation.

14 replies

ujlm Aug 5, 2023

@jjmaldonis Thank you for the clarification. What's the limit rate for the other models?

I prefer to use Whisper for all languages apart from English since in my experience it performs better than Nova in these cases. The reason why I chose the Deepgram managed workflow was because of the increased speed, and the handling of large and long files. Your response and the deteriorated performance seem to suggest that the Whisper managed workflow is no longer a priority for Deepgram, is that correct?

Despite converting larger files to m4a format, I still encounter timeout errors. I'd like to avoid the complexity of splitting files, as one of the reasons I moved to Deepgram was to reduce the complexity of my workflow.

As I see it, there are two options that would solve my case:

Either a more reliable Whisper service (which can be more expensive than the current one)
Or an improved version of the other models so that it handles other languages than English as well as Whisper does, so there's no more reason to use Whisper in the first place

jjmaldonis Aug 7, 2023
Maintainer

Hey @ujlm, I understand where you're coming from and thanks for sharing your thought process. Whisper is certainly a priority for Deepgram and supports a small but critical piece of our product offering. It may appear as if Whisper is not a priority, but the opposite is true. The deteriorated performance started a few weekends ago when Whisper usage increased drastically. The increase in usage resulted in poorer performance across the board, and our executive team, product team, and engineering team are focused on improving the performance. Within a week of the performance degradation, we improved the infrastructure around Whisper in multiple ways. Whisper may not be working as well as it was a few weeks ago when traffic was much lower, but the service can now support much more traffic. Your feedback (and everyone's!) is extremely helpful as we continue to improve Whisper.

The options you laid out to solve your use case are spot on. We are building more accurate, more cost effective, and faster models than Whisper that support other languages. For example, you may have noticed we released a Spanish version of Nova a few weeks ago. We are continuing to build and release other models with improved accuracy.

Another option that we've discussed is to split your files into multiple pieces. I understand this isn't ideal and creates a more complicated codebase on your end, and it's a temporary solution until more accurate models are available for non-English languages. But it's something actionable right now to solve the issue you're running into. We are simultaneously working on technology to solve your issue, I just can't give a public ETA on when it will be available.

The challenge is to offer a model that is: more accurate than Whisper in all languages, inexpensive, as fast as Nova, can handle large and long files, and can scale to millions of requests per day in a cost-effective way. OpenAI's version of Whisper is accurate and somewhat scalable (maximum of 50 requests/minute), but is slow and limited to small (25MB) files. Deepgram's implementation of Whisper is accurate, less expensive, can handle much larger files, but does not scale well. Nova supports English and Spanish with high accuracy, is scalable, fast, and supports large files, but does not support all languages. As you pointed out, there are tradeoffs. We would love to give you a product right now that exceeds every benchmark you're looking for - and we are working on it! It will happen, but it will take some time. I can't provide a public ETA or roadmap, but I'm excited. While that technology is being developed, you may have to make some changes on your end to support the existing global constraints of speech-to-text technology.

ujlm Aug 7, 2023

Thank you for the explanation and the follow-up, I understand these changes may take some time. I'll make sure to keep an eye on the latest Deepgram product updates to stay up-to-date 😊

peldszus Aug 22, 2023

Thanks @jjmaldonis for the answers & insights!

each customer has an independent rate limit, and Whisper is rate limited to 5 concurrent requests unless you have a sales contract with us.

It would be good to show somewhere the rate limits for the different plans and models. On the pricing page the pay-as-you-go plan mentions 100 concurrent requests and lists as models Nova as well as Whisper. That's why I was mislead to conclude that I would have 100 concurrent requests for Whisper.

Besides that, I guess Whisper is the choice for most non-english/spanish applications. In my case, German is the target language. So I appreciate your effort to scale according to the increased need, or even work on Nova to support other than the two currently supported languages.

(Sorry for the late reply, just got back from vacation.)

jjmaldonis Aug 23, 2023
Maintainer

Hey @peldszus, that's a good callout. We'll get the pricing page updated to reflect the Whisper rate limits.

We are working on making Nova available for more languages. If you aren't signed up for our emails, you may want to subscribe and keep an eye out for these updates!

yousifa · 2023-08-25T01:54:00Z

yousifa
Aug 25, 2023

Whisper is ridiculously slow. ~45 seconds for a 1min audio file, where OpenAI processes in ~6 sec

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Slower performance on Whisper model #269

{{title}}

Replies: 5 comments 20 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Slower performance on Whisper model #269

Replies: 5 comments · 20 replies

jjmaldonis Jul 14, 2023 Maintainer

uanandaraja Jul 14, 2023 Author

jjmaldonis Jul 14, 2023 Maintainer

uanandaraja Jul 16, 2023 Author

jjmaldonis Jul 31, 2023 Maintainer

jjmaldonis Aug 7, 2023 Maintainer

jjmaldonis Aug 23, 2023 Maintainer

Replies: 5 comments 20 replies

jjmaldonis
Jul 14, 2023
Maintainer

uanandaraja Jul 14, 2023
Author

jjmaldonis Jul 14, 2023
Maintainer

uanandaraja Jul 16, 2023
Author

jjmaldonis
Jul 31, 2023
Maintainer

jjmaldonis Aug 7, 2023
Maintainer

jjmaldonis Aug 23, 2023
Maintainer