Timecode/Timestamp big chunks. #133
-
Hi folks - wondering what would be my best bet to get timestamps in larger chunks (say every minute) or so as opposed to word by word? Just looking to take longer video files (say interviews) and use them for timecode with video editing similar to the approach that otter does with their transcriptions (larger chunks). With thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Have you seen the It doesn’t break it out every minute, but breaks out transcripts and times on meaningful chunks of spoken words (think sentences, pauses, turn taking). Check out more here: If that built in feature doesn’t quite solve it, you could make a simple script to group transcripts into approximately minute based chunks. Here is an example in Python: import requests
import json
# Replace with your Deepgram API Key
api_key = "YOUR_DEEPGRAM_API_KEY"
# Endpoint for Deepgram's transcription service
url = "https://api.deepgram.com/v1/listen"
# Headers for the request
headers = {
"Authorization": "Token " + api_key
}
# Path to your local audio file
audio_file_path = "/path/to/your/audio/file.wav"
# Open the audio file in read-bytes mode
with open(audio_file_path, "rb") as audio_file:
# Send the request to Deepgram
response = requests.post(url, headers=headers, data=audio_file)
# Parse the response from Deepgram
data = json.loads(response.text)
# Process the transcript to break it into minute-long chunks
chunks = []
current_chunk = {"start": 0, "end": 0, "text": ""}
for word_info in data["results"]["channels"][0]["alternatives"][0]["words"]:
# If the current word is more than a minute after the start of the current chunk,
# finalize the current chunk and start a new one
if word_info["end"] - current_chunk["start"] > 60:
chunks.append(current_chunk)
current_chunk = {"start": current_chunk["end"], "end": word_info["end"], "text": word_info["word"]}
else:
# Otherwise, add the current word to the current chunk
current_chunk["text"] += " " + word_info["word"]
current_chunk["end"] = word_info["end"]
# Add the last chunk if it's non-empty
if current_chunk["text"]:
chunks.append(current_chunk)
# Print out the chunks
for chunk in chunks:
start_min, start_sec = divmod(chunk['start'], 60)
start_hour, start_min = divmod(start_min, 60)
end_min, end_sec = divmod(chunk['end'], 60)
end_hour, end_min = divmod(end_min, 60)
print(f"{int(start_hour):02d}:{int(start_min):02d}:{int(start_sec):02d} - {int(end_hour):02d}:{int(end_min):02d}:{int(end_sec):02d}: {chunk['text']}\n") This will work if you only need approximately 1 minute chunks. If you want precisely 1 minute chunks, then you have to figure out what you want to do with words that straddle the minute boundary (it will inevitably happen occasionally). If you want to include words in a specific minute based on when the word starts, then you could do this. import requests
import json
# Replace with your Deepgram API Key
api_key = "YOUR_DEEPGRAM_API_KEY"
# Endpoint for Deepgram's transcription service
url = "https://api.deepgram.com/v1/listen"
# Headers for the request
headers = {
"Authorization": "Token " + api_key
}
# Path to your local audio file
audio_file_path = "/path/to/your/audio/file.wav"
# Open the audio file in read-bytes mode
with open(audio_file_path, "rb") as audio_file:
# Send the request to Deepgram
response = requests.post(url, headers=headers, data=audio_file)
# Parse the response from Deepgram
data = json.loads(response.text)
# Process the transcript to break it into minute-long chunks
chunks = []
current_chunk = {"start": 0, "end": 0, "text": ""}
for word_info in data["results"]["channels"][0]["alternatives"][0]["words"]:
# If the current word starts in the current minute,
# add the current word to the current chunk
if word_info["start"] < (current_chunk["start"] + 60):
current_chunk["text"] += " " + word_info["word"]
current_chunk["end"] = word_info["end"]
else:
# If the current word starts after the end of the current minute,
# finalize the current chunk and start a new one
chunks.append(current_chunk)
current_chunk = {"start": current_chunk["start"] + 60, "end": word_info["end"], "text": ""}
# Add the last chunk if it's non-empty
if current_chunk["text"]:
chunks.append(current_chunk)
# Print out the chunks
for chunk in chunks:
start_min, start_sec = divmod(chunk['start'], 60)
start_hour, start_min = divmod(start_min, 60)
end_min, end_sec = divmod(chunk['end'], 60)
end_hour, end_min = divmod(end_min, 60)
print(f"{int(start_hour):02d}:{int(start_min):02d}:{int(start_sec):02d} - {int(end_hour):02d}:{int(end_min):02d}:{int(end_sec):02d}: {chunk['text']}\n") Here’s some example output for this last one. Let’s say that these are the words being said at these times: "Hello" from 00:00:00 to 00:00:02 The script would output the following:
In this example, you can see that the word "world" which starts before the 1-minute mark but ends after it is included in the first chunk. The word "this" which starts exactly at the 1-minute mark starts a new chunk. The word "Deepgram" which starts before the 3-minute mark but ends after it is included in the chunk it started in, the third chunk/minute. I hope that helps! |
Beta Was this translation helpful? Give feedback.
Have you seen the
utterances
feature?It doesn’t break it out every minute, but breaks out transcripts and times on meaningful chunks of spoken words (think sentences, pauses, turn taking).
Check out more here:
https://developers.deepgram.com/documentation/features/utterances/
If that built in feature doesn’t quite solve it, you could make a simple script to group transcripts into approximately minute based chunks. Here is an example in Python: