Error 408 when making async calls to Speech-to-Text API. (Python) #160

noahOvertone · 2023-05-22T14:21:02Z

noahOvertone
May 22, 2023

Hey guys, somewhat new to python3 here, and part of my internship is developing a pipeline to transcribe audio news.
I am getting the error

Traceback (most recent call last):
  File "C:\Users\ACER\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\deepgram\_utils.py", line 92, in attempt
    async with aiohttp.request(
  File "C:\Users\ACER\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\aiohttp\client.py", line 1189, in __aenter__
    self._resp = await self._coro
  File "C:\Users\ACER\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\aiohttp\client.py", line 643, in _request
    resp.raise_for_status()
  File "C:\Users\ACER\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\aiohttp\client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 408, message='Request Timeout', url=URL('https://api.deepgram.com/v1/listen?punctuate=true')

When running the following code (example snippet):

import requests
import asyncio
from deepgram import Deepgram
DEEPGRAM_API_KEY = 
MIMETYPE = 'audio/mpeg'
audio_location = "./news_test.mp3"
audio = open(audio_location,'rb')
source = {
    'buffer':audio,
    'mimetype' :MIMETYPE
}

deepgram = Deepgram(DEEPGRAM_API_KEY)
async def deep_gram(x):
    print(x)

    if x == 1:
        audio_location = "./talking.mp3"
        audio = open(audio_location,'rb')
        source = {
            'buffer':audio,
            'mimetype' :MIMETYPE
        }
    else: 
        audio_location = "./news_test.mp3"
        audio = open(audio_location,'rb')
        source = {
        'buffer':audio,
        'mimetype' :MIMETYPE
        }
    print(audio)
    response = await deepgram.transcription.prerecorded(source, 
    {'punctuate': True})
    breakpoint()
    return response

async def main():

    output = [deep_gram(x) for x in range(2)]
    thing = await asyncio.gather(*output)
    return thing

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

I'm not an expert at Async, but I want to have my pipeline be able to que up to 50 requests to the API at a time.
I've tried many variations of using async including asyncio.run(main()), but I understand the fact I am running the same function repeatedly in a for loop may be causing issues.

I would appreciate any advice.

Answered by jjmaldonis

May 24, 2023

See deepgram/deepgram-python-sdk#89 for context on the timeouts PR.

View full answer

jjmaldonis · 2023-05-22T17:30:25Z

jjmaldonis
May 22, 2023
Maintainer

Hey @noahOvertone, hope you're enjoying the internship so far! I wasn't able to reproduce the timeout issue so I have a few questions that will help dig into what's going on.

How are you running the python script? For example, is it running on your own computer and you're running it in a terminal using python <my-filename.py? Or is it being run on a server using something like crontab?

I ran the script at the end of this comment on my computer and was able to get transcripts for all 10 files in a few seconds, and I hope you're able to do the same.

I also tried running the following script on an audio files (pulled from a news broadcast from this morning) that is over an hour long. The script eventually returned a response, but it took approximately one minute to get the response. So I'm guessing there's something going on that is causing your request to timeout after X seconds. -- Are you able to use a stopwatch to figure out how long it takes before the request times out? That might help to narrow down where the timeout is coming from.

from deepgram import Deepgram  # pip install deepgram-sdk
import asyncio
import os

# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"]  # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)


async def deep_gram(audio_location: str):
    MIMETYPE = "audio/mpeg"
    audio = open(audio_location, "rb")
    source = {"buffer": audio, "mimetype": MIMETYPE}
    response = await deepgram.transcription.prerecorded(source, {"punctuate": True})
    return response


async def main():
    filenames = [
        "./test-audio-files/test_news-short-1.mp3",
        "./test-audio-files/test_news-short-2.mp3",
        "./test-audio-files/test_news-short-3.mp3",
        "./test-audio-files/test_news-short-4.mp3",
        "./test-audio-files/test_news-short-5.mp3",
        "./test-audio-files/test_news-short-6.mp3",
        "./test-audio-files/test_news-short-7.mp3",
        "./test-audio-files/test_news-short-8.mp3",
        "./test-audio-files/test_news-short-9.mp3",
        "./test-audio-files/test_news-short-10.mp3",
    ]
    output = [deep_gram(fn) for fn in filenames]
    results = await asyncio.gather(*output)
    for result in results:
        print(result)
        print()
    return results


if __name__ == "__main__":
    asyncio.run(main())

Here are the 20-second long test files I used: test_news-short-5.zip

4 replies

noahOvertone May 23, 2023
Author

Thanks for such a detailed response @jjmaldonis .

I am running the script locally from my computer via "python3 file_name.py."

Basically, I'm scrapping audio news from an RSS feed, and while it is downloading/uploading to Deepgram API, I want to "await" it and begin scraping and transforming the next audio clip.

Our implementations look basically the same (the actual code looks very similar) , to give an example the file structure is:

main.py: asyncio.run( scrape_Rss())
-->
def scrape_Rss():
- Grab all episodes from RSS, as list
- output = await [ call_episode(episode) for episode in episodes ]
- await asyncio.gather(*output)
async def call_episodes(episode):
- await download_episode
- await call_deepgram(episode_location) #same as your deep_gram function.
- save_data()
- delete episode

So unless you can see me doing something wrong with asyncio here, it seems like we are doing the same thing.

Do you think it might be a hardware/network issue with my local wifi?

I noticed when I tried this scrape upload operation with 1 extremely short voice clip (less than 5 seconds long), and 1 audio news clip (30min). The 5 seconds clip would give a response, but the news clip returns an error.

Do you have a suggestion?

I will say I know Deepgram can handle large audio clips, as previously I had this pipeline running synchronously (one at a time), and it ran perfectly, albeit a touch slow.

Once again thank you very much, I appreciate your response.

jjmaldonis May 23, 2023
Maintainer

Hey Noah, I think I figured it out. aiohttp has a default timeout of 300 seconds (5 minutes). I ran the code at the end of this comment and got the following output + TimeoutError:

duration = 300.9847962856293
Traceback (most recent call last):
  File "C:\Users\jjmal\Documents\deepgram\projects\quick-and-dirty\debug.py", line 45, in <module>
    raise e
  File "C:\Users\jjmal\Documents\deepgram\projects\quick-and-dirty\debug.py", line 40, in <module>
    asyncio.run(main())
  File "C:\Users\jjmal\miniconda3\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\jjmal\miniconda3\lib\asyncio\base_events.py", line 647, in run_until_complete
    return future.result()
  File "C:\Users\jjmal\Documents\deepgram\projects\quick-and-dirty\debug.py", line 30, in main
    results = await asyncio.gather(*output)
  File "C:\Users\jjmal\Documents\deepgram\projects\quick-and-dirty\debug.py", line 17, in deep_gram
    response = await deepgram.transcription.prerecorded(source, {"punctuate": True})
  File "C:\Users\jjmal\miniconda3\lib\site-packages\deepgram\transcription.py", line 341, in prerecorded
    return await PrerecordedTranscription(
  File "C:\Users\jjmal\miniconda3\lib\site-packages\deepgram\transcription.py", line 59, in __call__
    return await _request(
  File "C:\Users\jjmal\miniconda3\lib\site-packages\deepgram\_utils.py", line 111, in _request
    return await attempt()
  File "C:\Users\jjmal\miniconda3\lib\site-packages\deepgram\_utils.py", line 92, in attempt
    async with aiohttp.request(
  File "C:\Users\jjmal\miniconda3\lib\site-packages\aiohttp\client.py", line 1189, in __aenter__
    self._resp = await self._coro
  File "C:\Users\jjmal\miniconda3\lib\site-packages\aiohttp\client.py", line 560, in _request
    await resp.start(conn)
  File "C:\Users\jjmal\miniconda3\lib\site-packages\aiohttp\client_reqrep.py", line 914, in start
    self._continue = None
  File "C:\Users\jjmal\miniconda3\lib\site-packages\aiohttp\helpers.py", line 721, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

Importantly, you'll notice that the output contains duration = 300.9847962856293, which is really close to the 300 second timeout of aiohttp (the extra 0.98 seconds is likely due to the coroutine being put on pause by the CPU). So that's why I think this timeout is due to aiohttp.

Can you confirm that you're seeing the same 300 second / 5 minute timeout? Please see the code below for an example of how to calculate this number.

Unfortunately, GitHub has a max file size upload of 25MB and the long audio file I am using is 167MB after compressing it, so I can't upload it here for you to test. How large is your audio file? And what is its duration?

Additionally, I reran the same code but I only used one "./test-audio-files/test_news-long.mp3" entry in the filenames list. In this case - when there is only one filename - the API request succeeds. So I think this is exactly what you're seeing.

I don't have a solution yet other than to increase the aiohttp / asyncio timeout to something greater than 5 minutes. I'll try to figure out why the processing would take longer for multiple files rather than one since the requests should all be independent.

from deepgram import Deepgram  # pip install deepgram-sdk
import asyncio
import time
import os

# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"]  # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)


async def deep_gram(audio_location: str):
    MIMETYPE = "audio/mpeg"
    audio = open(audio_location, "rb")
    source = {"buffer": audio, "mimetype": MIMETYPE}
    response = await deepgram.transcription.prerecorded(source, {"punctuate": True})
    return response


async def main():
    filenames = [
        "./test-audio-files/test_news-long.mp3",
        "./test-audio-files/test_news-long.mp3",
        "./test-audio-files/test_news-long.mp3",
        "./test-audio-files/test_news-long.mp3",
        "./test-audio-files/test_news-long.mp3",
    ]
    output = [deep_gram(fn) for fn in filenames]
    results = await asyncio.gather(*output)
    for result in results:
        print(result)
        print()
    return results


if __name__ == "__main__":
    try:
        start = time.time()
        asyncio.run(main())
    except Exception as e:
        end = time.time()
        duration = end - start
        print(f"duration = {duration}")
        raise e

jjmaldonis May 23, 2023
Maintainer

P.S. If you want to include a codeblock in GitHub, you can use markdown format. The markdown format for codeblocks is to surround the code by three backticks. Optionally, you can include the programming language after the top three backticks to enable syntax highlighting.

print("Here is an example")

jjmaldonis May 23, 2023
Maintainer

We have an idea about why the timeout issue may be occurring with 5 requests but not with 1 request: The upload time of 5 large files is much larger than one file, so the upload time of the files is likely the cause for the additional request time.

Unless you are able to increase your upload bandwidth, it's likely that the best solution is to increase the default timeout of aiohttp / asyncio to compensate for the additional upload time.

noahOvertone · 2023-05-23T20:49:24Z

noahOvertone
May 23, 2023
Author

Thanks @jjmaldonis .

I've attached another picture of the error message that I'm getting and i suspect you are right about the timeout from asyncio raising an error, as the operation takes some time before it fails. I will look into both suggestions of increasing timeout and upload bandwidth (cloud vm)

1 reply

jjmaldonis May 24, 2023
Maintainer

Hmm, this is not exactly the TimeoutError I am getting. Hopefully it's the same one under the hood. If the code I posted below to override the default timeout doesn't work, I may need your audio file and exact code to reproduce the error.

noahOvertone · 2023-05-23T21:46:57Z

noahOvertone
May 23, 2023
Author

Out of curiosity do you have any recommendations for increasing the timeout to aiohttp for deepgram?
The examples online suggest creating a aiohttp.ClientSession(), with parameters specified, then making a request to the api using session.get(url).
@jjmaldonis

            session_timeout = aiohttp.ClientTimeout(total=None,
                                                    sock_connect = 1000,
                                                    sock_read = 10000)
            client_args = dict( trust_env = True, timeout = session_timeout)
            async with aiohttp.ClientSession(**client_args) as session:
                     async with session.get(url) as response:
                           data = await response.text()

I noticed that we do not directly interact with Deepgram's end-point, and instead call it via a built in function, so there is no opportunity to use the session.get() method.

deepgram.transcription.prerecorded(source,  {'punctuate': True})

Is there a method you would recommend to increase the async timeout, short of deconstructing how the Deepgram-SDK works to use this methodology?

0 replies

jjmaldonis · 2023-05-24T14:03:26Z

jjmaldonis
May 24, 2023
Maintainer

I tracked down where the HTTPS call was being made in the Deepgram SDK. Here is the relevant line in the code: https://github.com/deepgram/deepgram-python-sdk/blob/main/deepgram/_utils.py#L92

You'll notice it's using aiohttp.request. Here is the relevant documentation, which contains a timeout keyword arg which defaults to 300 seconds / 5 minutes.

The Deepgram SDK currently does not allow the user to override the default timeout. I will submit a request to change that.

In the meantime, I'll show you how to do override the default timeout parameter value for aiohttp.request. It's a bit of ugliness, so don't tell anyone at your company I recommended it ;) The technique is called "monkey patching", and one of the major downsides of this approach is that it will affect the default timeout for all aiohttp.request calls, not just the Deepgram API call. Still, it should work as a temporary solution until the Deepgram SDK is updated.

Please see below for the relevant code. To test the code, I recommend changing the total default timeout to e.g. 1 second (rather than 5 minutes) because that will result in a TimeoutError after 1 second and you can see that the code is working.

import functools
import aiohttp

# Create a monkey patch to update the `aiohttp.request` default timeout. This must be run at the very beginning of the code.
DEFAULT_AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(
    total=60 * 10,  # 10 minutes
    connect=None,
    sock_read=None,
    sock_connect=None,
)
aiohttp.request = functools.partial(aiohttp.request, timeout=DEFAULT_AIOHTTP_TIMEOUT)

from deepgram import Deepgram  # pip install deepgram-sdk
import asyncio
import time
import os

# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"]  # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)

... # The rest of your code

1 reply

jjmaldonis May 24, 2023
Maintainer

See deepgram/deepgram-python-sdk#89 for context on the timeouts PR.

Answer selected by jpvajda

aguthrie19 · 2023-06-05T23:36:41Z

aguthrie19
Jun 5, 2023

@noahOvertone I have the same question about how to extend the default aiohttp timeout without messing too much with deepgram object. @jjmaldonis your rca is much appreciated and that mock timeout instance is great thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Error 408 when making async calls to Speech-to-Text API. (Python) #160

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Error 408 when making async calls to Speech-to-Text API. (Python) #160

noahOvertone May 22, 2023

Replies: 5 comments · 6 replies

jjmaldonis May 22, 2023 Maintainer

noahOvertone May 23, 2023 Author

jjmaldonis May 23, 2023 Maintainer

jjmaldonis May 23, 2023 Maintainer

jjmaldonis May 23, 2023 Maintainer

noahOvertone May 23, 2023 Author

jjmaldonis May 24, 2023 Maintainer

noahOvertone May 23, 2023 Author

jjmaldonis May 24, 2023 Maintainer

jjmaldonis May 24, 2023 Maintainer

aguthrie19 Jun 5, 2023

noahOvertone
May 22, 2023

Replies: 5 comments 6 replies

jjmaldonis
May 22, 2023
Maintainer

noahOvertone May 23, 2023
Author

jjmaldonis May 23, 2023
Maintainer

jjmaldonis May 23, 2023
Maintainer

jjmaldonis May 23, 2023
Maintainer

noahOvertone
May 23, 2023
Author

jjmaldonis May 24, 2023
Maintainer

noahOvertone
May 23, 2023
Author

jjmaldonis
May 24, 2023
Maintainer

jjmaldonis May 24, 2023
Maintainer

aguthrie19
Jun 5, 2023