Timecode/Timestamp big chunks. #133

kawhianthony · 2023-04-29T02:18:50Z

kawhianthony
Apr 29, 2023

Hi folks - wondering what would be my best bet to get timestamps in larger chunks (say every minute) or so as opposed to word by word?
Is there a way to have:
00:01:00 - "text text text etc etc..."
00:02:00 - "text text text etc etc..."

Just looking to take longer video files (say interviews) and use them for timecode with video editing similar to the approach that otter does with their transcriptions (larger chunks). With thanks in advance!

Answered by scottstephenson

May 1, 2023

Have you seen the utterances feature?

It doesn’t break it out every minute, but breaks out transcripts and times on meaningful chunks of spoken words (think sentences, pauses, turn taking).

Check out more here:
https://developers.deepgram.com/documentation/features/utterances/

If that built in feature doesn’t quite solve it, you could make a simple script to group transcripts into approximately minute based chunks. Here is an example in Python:

import requests
import json

# Replace with your Deepgram API Key
api_key = "YOUR_DEEPGRAM_API_KEY"

# Endpoint for Deepgram's transcription service
url = "https://api.deepgram.com/v1/listen"

# Headers for the request
headers = {
    "Authorization"…

View full answer

scottstephenson · 2023-05-01T09:19:01Z

scottstephenson
May 1, 2023
Maintainer

Have you seen the utterances feature?

It doesn’t break it out every minute, but breaks out transcripts and times on meaningful chunks of spoken words (think sentences, pauses, turn taking).

Check out more here:
https://developers.deepgram.com/documentation/features/utterances/

If that built in feature doesn’t quite solve it, you could make a simple script to group transcripts into approximately minute based chunks. Here is an example in Python:

import requests
import json

# Replace with your Deepgram API Key
api_key = "YOUR_DEEPGRAM_API_KEY"

# Endpoint for Deepgram's transcription service
url = "https://api.deepgram.com/v1/listen"

# Headers for the request
headers = {
    "Authorization": "Token " + api_key
}

# Path to your local audio file
audio_file_path = "/path/to/your/audio/file.wav"

# Open the audio file in read-bytes mode
with open(audio_file_path, "rb") as audio_file:
    # Send the request to Deepgram
    response = requests.post(url, headers=headers, data=audio_file)

# Parse the response from Deepgram
data = json.loads(response.text)

# Process the transcript to break it into minute-long chunks
chunks = []
current_chunk = {"start": 0, "end": 0, "text": ""}
for word_info in data["results"]["channels"][0]["alternatives"][0]["words"]:
    # If the current word is more than a minute after the start of the current chunk,
    # finalize the current chunk and start a new one
    if word_info["end"] - current_chunk["start"] > 60:
        chunks.append(current_chunk)
        current_chunk = {"start": current_chunk["end"], "end": word_info["end"], "text": word_info["word"]}
    else:
        # Otherwise, add the current word to the current chunk
        current_chunk["text"] += " " + word_info["word"]
        current_chunk["end"] = word_info["end"]

# Add the last chunk if it's non-empty
if current_chunk["text"]:
    chunks.append(current_chunk)

# Print out the chunks
for chunk in chunks:
    start_min, start_sec = divmod(chunk['start'], 60)
    start_hour, start_min = divmod(start_min, 60)
    end_min, end_sec = divmod(chunk['end'], 60)
    end_hour, end_min = divmod(end_min, 60)

    print(f"{int(start_hour):02d}:{int(start_min):02d}:{int(start_sec):02d} - {int(end_hour):02d}:{int(end_min):02d}:{int(end_sec):02d}: {chunk['text']}\n")

This will work if you only need approximately 1 minute chunks.

If you want precisely 1 minute chunks, then you have to figure out what you want to do with words that straddle the minute boundary (it will inevitably happen occasionally). If you want to include words in a specific minute based on when the word starts, then you could do this.

import requests
import json

# Replace with your Deepgram API Key
api_key = "YOUR_DEEPGRAM_API_KEY"

# Endpoint for Deepgram's transcription service
url = "https://api.deepgram.com/v1/listen"

# Headers for the request
headers = {
    "Authorization": "Token " + api_key
}

# Path to your local audio file
audio_file_path = "/path/to/your/audio/file.wav"

# Open the audio file in read-bytes mode
with open(audio_file_path, "rb") as audio_file:
    # Send the request to Deepgram
    response = requests.post(url, headers=headers, data=audio_file)

# Parse the response from Deepgram
data = json.loads(response.text)

# Process the transcript to break it into minute-long chunks
chunks = []
current_chunk = {"start": 0, "end": 0, "text": ""}
for word_info in data["results"]["channels"][0]["alternatives"][0]["words"]:
    # If the current word starts in the current minute,
    # add the current word to the current chunk
    if word_info["start"] < (current_chunk["start"] + 60):
        current_chunk["text"] += " " + word_info["word"]
        current_chunk["end"] = word_info["end"]
    else:
        # If the current word starts after the end of the current minute,
        # finalize the current chunk and start a new one
        chunks.append(current_chunk)
        current_chunk = {"start": current_chunk["start"] + 60, "end": word_info["end"], "text": ""}

# Add the last chunk if it's non-empty
if current_chunk["text"]:
    chunks.append(current_chunk)

# Print out the chunks
for chunk in chunks:
    start_min, start_sec = divmod(chunk['start'], 60)
    start_hour, start_min = divmod(start_min, 60)
    end_min, end_sec = divmod(chunk['end'], 60)
    end_hour, end_min = divmod(end_min, 60)

    print(f"{int(start_hour):02d}:{int(start_min):02d}:{int(start_sec):02d} - {int(end_hour):02d}:{int(end_min):02d}:{int(end_sec):02d}: {chunk['text']}\n")

Here’s some example output for this last one. Let’s say that these are the words being said at these times:

"Hello" from 00:00:00 to 00:00:02
"world" from 00:00:59 to 00:01:01
"this" from 00:01:00 to 00:01:02
"is" from 00:02:00 to 00:02:02
"Deepgram" from 00:02:59 to 00:03:01

The script would output the following:

00:00:00 - 00:01:01:  Hello world
00:01:00 - 00:02:02:  this
00:02:00 - 00:03:01:  is Deepgram

In this example, you can see that the word "world" which starts before the 1-minute mark but ends after it is included in the first chunk. The word "this" which starts exactly at the 1-minute mark starts a new chunk. The word "Deepgram" which starts before the 3-minute mark but ends after it is included in the chunk it started in, the third chunk/minute.

I hope that helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Timecode/Timestamp big chunks. #133

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Deepgram

Timecode/Timestamp big chunks. #133

kawhianthony Apr 29, 2023

Replies: 1 comment

scottstephenson May 1, 2023 Maintainer

kawhianthony
Apr 29, 2023

scottstephenson
May 1, 2023
Maintainer