Live Transcription from Twilio Voice Stream (Nodejs) - Please help debug #127
-
I am struggling with getting a transcription from Deepgram, I am attempting to stream raw audio from Twilio. It is encoded in base64. I can't get any transcription and seem to get an unusual error which I am struggling to debug. Any help would be appreciated and if there is an example of live streaming Twilio with a nodejs client - do point it out please! Only found this python one I have managed to successfully live transcribe twilio with Google STT following the deepgram example provided here as well as transcribing BBC radio also with Google STT as shown here so I am not really sure whats up. I have double checked the following:
This is what ends up being outputted into the console after running the code. The stream is passing data for like 5-10 seconds with no output to the console and then "here" / the object is outputted. I then manually end the call to trigger "Call Has Ended".
Here is my Websocket Handler in Nodejs import Websocket from "ws";
import { Logger } from "../../core/logger";
import http from "http";
import { Server, WebSocketServer } from "ws";
import { PassThrough, Readable } from "stream";
import { transcriber } from "../transcriber";
import { Buffer } from "buffer";
export interface Dependencies {
logger: Logger;
server: http.Server;
}
export const createWebSocketServer = ({
logger,
server,
}: Dependencies): WebSocketServer => {
const wss = new Server({ server });
// Set up WebSocket connection event handling
wss.on("connection", (ws) => {
console.log("WebSocket connection established");
const deepgramLive = transcriber.transcription.live({
punctuate: true,
endpointing: true,
language: "en-GB",
});
// create stream
const stream = new PassThrough();
stream.on("data", (chunk) => {
if (deepgramLive.getReadyState() == 1) {
deepgramLive.send(chunk);
}
});
ws.on("message", function incoming(message: any) {
const msg = JSON.parse(message);
switch (msg.event) {
case "connected":
console.log(`A new call has connected.`);
break;
case "start":
console.log(`Starting Media Stream ${msg.streamSid}`);
break;
case "media":
const buffer = Buffer.from(msg.media.payload, "base64");
stream.write(buffer);
break;
case "stop":
console.log(`Call Has Ended`);
deepgramLive.finish();
break;
}
});
deepgramLive.addListener("transcriptReceived", (transcription) => {
console.log("here");
console.log(transcription);
});
ws.on("close", () => {
console.log("WebSocket connection closed");
});
});
return wss;
}; For those who are familiar with Twilio, Here is my Twiml Bin For reference.
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Have you checked to make sure the encoding types match what the APIs want? Twilio spits out const deepgramLive = transcriber.transcription.live({
punctuate: true,
endpointing: true,
language: "en-GB",
encoding: "mulaw",
sample_rate: 8000,
}); My understanding is Twilio's audio will be |
Beta Was this translation helpful? Give feedback.
-
Ah perfect, works now! Thanks for your help 🙂. |
Beta Was this translation helpful? Give feedback.
Have you checked to make sure the encoding types match what the APIs want? Twilio spits out
audio/x-mulaw
at a sample rate of8000
hz. Deepgram can accept that but you need to specify:My understanding is Twilio's audio will be
audio/x-mulaw
once decoded from base64. If that's the case then you may need to decode the strings before passing them with the above encoding.