A method to retrieve finished transcripts via request_ids after call backs? #162
-
I am currently on a serverless setup and trying to process multiple hours of transcripts. The problem is that using Callbacks I will sometimes run into 413 errors (payload too large) and can't retrieve my transcripts from deepgram. I would love a method to use the request_id that deepgram provides for me to query deepgram to retrieve past transcripts. I could then just use the callback to alert my app that the transcript is ready, but not run into issues where the data posted to the callback url is too large and errors out. Is there something like this already in place? Or will it be enabled? Thanks in advance? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
At present, deepgram doesn't store any transcripts locally. You have a few options:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for responding. That last option could work for me (I’m on AWS). Do
you remember where relevant threads on this might be?
…On Thu, May 25, 2023 at 2:06 PM Cliff Dyer ***@***.***> wrote:
At present, deepgram doesn't store any transcripts locally. You have a few
options:
1. Ensure that your callback receivers can accept post bodies large
enough for the transcripts of the audio you expect to send to it.
2. Split your audio into chunks that you expect to produce transcripts
that will fit your callbacks.
3. I don't remember the details on this, but I've heard about
customers configuring their callbacks to PUT the transcript to an s3
bucket. With that sort of setup, you can listen for changes on that s3
bucket to kick off other parts of your serverless workflow. That of course
assumes you are on aws. If you're using a different provider, there may be
a similar flow that works with your provider's object/blob storage service.
—
Reply to this email directly, view it on GitHub
<#162 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADWSSSDML7LNV5KQEGKQ53XH6NUFANCNFSM6AAAAAAYPDNW74>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
OK thanks very much for the detailed walkthrough and example code. I'll need to try this out. It does add some complexity to my set up though, and I can't imagine I'm the only one with serverless architecture who will bump into this moving forward. I hope the deepgram team offers a way to retrieve past transcripts in the future - if nothing else it's another feature to charge for. :) |
Beta Was this translation helpful? Give feedback.
@jaxomlotus :
This works as a proof of concept lambda:
Whenever an audio file is uploaded to AUDIO_BUCKET, a deepgram request is made with a presigned url pointing to that object, and a PUT callback pointing to a matching filename in the TRANSCRIPT_BUCKET.