Error 408 when making async calls to Speech-to-Text API. (Python) #160
-
Hey guys, somewhat new to python3 here, and part of my internship is developing a pipeline to transcribe audio news.
When running the following code (example snippet): import requests
import asyncio
from deepgram import Deepgram
DEEPGRAM_API_KEY =
MIMETYPE = 'audio/mpeg'
audio_location = "./news_test.mp3"
audio = open(audio_location,'rb')
source = {
'buffer':audio,
'mimetype' :MIMETYPE
}
deepgram = Deepgram(DEEPGRAM_API_KEY)
async def deep_gram(x):
print(x)
if x == 1:
audio_location = "./talking.mp3"
audio = open(audio_location,'rb')
source = {
'buffer':audio,
'mimetype' :MIMETYPE
}
else:
audio_location = "./news_test.mp3"
audio = open(audio_location,'rb')
source = {
'buffer':audio,
'mimetype' :MIMETYPE
}
print(audio)
response = await deepgram.transcription.prerecorded(source,
{'punctuate': True})
breakpoint()
return response
async def main():
output = [deep_gram(x) for x in range(2)]
thing = await asyncio.gather(*output)
return thing
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close() I'm not an expert at Async, but I want to have my pipeline be able to que up to 50 requests to the API at a time. I would appreciate any advice. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 6 replies
-
Hey @noahOvertone, hope you're enjoying the internship so far! I wasn't able to reproduce the timeout issue so I have a few questions that will help dig into what's going on. How are you running the python script? For example, is it running on your own computer and you're running it in a terminal using I ran the script at the end of this comment on my computer and was able to get transcripts for all 10 files in a few seconds, and I hope you're able to do the same. I also tried running the following script on an audio files (pulled from a news broadcast from this morning) that is over an hour long. The script eventually returned a response, but it took approximately one minute to get the response. So I'm guessing there's something going on that is causing your request to timeout after X seconds. -- Are you able to use a stopwatch to figure out how long it takes before the request times out? That might help to narrow down where the timeout is coming from. from deepgram import Deepgram # pip install deepgram-sdk
import asyncio
import os
# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"] # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)
async def deep_gram(audio_location: str):
MIMETYPE = "audio/mpeg"
audio = open(audio_location, "rb")
source = {"buffer": audio, "mimetype": MIMETYPE}
response = await deepgram.transcription.prerecorded(source, {"punctuate": True})
return response
async def main():
filenames = [
"./test-audio-files/test_news-short-1.mp3",
"./test-audio-files/test_news-short-2.mp3",
"./test-audio-files/test_news-short-3.mp3",
"./test-audio-files/test_news-short-4.mp3",
"./test-audio-files/test_news-short-5.mp3",
"./test-audio-files/test_news-short-6.mp3",
"./test-audio-files/test_news-short-7.mp3",
"./test-audio-files/test_news-short-8.mp3",
"./test-audio-files/test_news-short-9.mp3",
"./test-audio-files/test_news-short-10.mp3",
]
output = [deep_gram(fn) for fn in filenames]
results = await asyncio.gather(*output)
for result in results:
print(result)
print()
return results
if __name__ == "__main__":
asyncio.run(main()) Here are the 20-second long test files I used: test_news-short-5.zip |
Beta Was this translation helpful? Give feedback.
-
Thanks @jjmaldonis . I've attached another picture of the error message that I'm getting and i suspect you are right about the timeout from asyncio raising an error, as the operation takes some time before it fails. I will look into both suggestions of increasing timeout and upload bandwidth (cloud vm) |
Beta Was this translation helpful? Give feedback.
-
Out of curiosity do you have any recommendations for increasing the timeout to aiohttp for deepgram? session_timeout = aiohttp.ClientTimeout(total=None,
sock_connect = 1000,
sock_read = 10000)
client_args = dict( trust_env = True, timeout = session_timeout)
async with aiohttp.ClientSession(**client_args) as session:
async with session.get(url) as response:
data = await response.text() I noticed that we do not directly interact with Deepgram's end-point, and instead call it via a built in function, so there is no opportunity to use the session.get() method. deepgram.transcription.prerecorded(source, {'punctuate': True}) Is there a method you would recommend to increase the async timeout, short of deconstructing how the Deepgram-SDK works to use this methodology? |
Beta Was this translation helpful? Give feedback.
-
I tracked down where the HTTPS call was being made in the Deepgram SDK. Here is the relevant line in the code: https://github.com/deepgram/deepgram-python-sdk/blob/main/deepgram/_utils.py#L92 You'll notice it's using The Deepgram SDK currently does not allow the user to override the default timeout. I will submit a request to change that. In the meantime, I'll show you how to do override the default Please see below for the relevant code. To test the code, I recommend changing the import functools
import aiohttp
# Create a monkey patch to update the `aiohttp.request` default timeout. This must be run at the very beginning of the code.
DEFAULT_AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(
total=60 * 10, # 10 minutes
connect=None,
sock_read=None,
sock_connect=None,
)
aiohttp.request = functools.partial(aiohttp.request, timeout=DEFAULT_AIOHTTP_TIMEOUT)
from deepgram import Deepgram # pip install deepgram-sdk
import asyncio
import time
import os
# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"] # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)
... # The rest of your code |
Beta Was this translation helpful? Give feedback.
-
@noahOvertone I have the same question about how to extend the default aiohttp timeout without messing too much with deepgram object. @jjmaldonis your rca is much appreciated and that mock timeout instance is great thanks! |
Beta Was this translation helpful? Give feedback.
See deepgram/deepgram-python-sdk#89 for context on the timeouts PR.