AI Voice Connector - Community Edition - Deepgram Flavor

The Deepgram flavor leverages Deepgram's Speech-to-Text API to transcribe the SIP user's spoken input into text. This transcription is then sent to the ChatGPT API, which interprets the message and generates a response. The response text is subsequently passed back to Deepgram's Text-to-Speech API, which converts it into voice format, allowing the response to be played back to the user.

Implementation

Speech to Text

It is using Deepgram's nova-2 module, with the conversationalai option (by default) to interpret the user's input. In order to determine the correct phrasing, we are relying on the model's logic to determine the phrases and punctuate them accordingly.

By default, the language used is English, but can be tuned to support other languages as well, depending on the Models used. You can find out more about how to tune the Deepgram modules here.

Communication with Deepgram is done over WebSocket channels, ensuring efficient transfer of real-time audio media. Media is encoded using the codec received from the user. Currently supported codecs for STT are:

g711 PCMU - mulaw
g711 PCMA - alaw
Opus

A full list of Deepgram's supported encodings is here.

AI Engine

We are using the asynchronous OpenAI Python library to communicate with ChatGPT backend. By default we are using the gpt-4o model for conversational AI, but others can be used as well. A full list of available models and their capabilities can be found here.

Text to Speech

In order to playback the AI's result to the user, we are using Deepgram's Text-to-Speech REST interface.

Codecs used for playing back the audio to the user are the same ones used for STT, with a few constraints enforced by the Deepgram's TTS engine.

Configuration

The following parameters can be tuned for this engine:

Section	Parameter	Environment	Mandatory	Description	Default
`deepgram`	`key`	`DEEPGRAM_API_KEY`	yes	Deepgram API key	not provided
`deepgram`	`chatgpt_key` or `openai_key`	`CHATGPT_API_KEY`/`OPENAI_API_KEY`	yes	OpenAI API key used for ChatGPT	not provided
`deepgram`	`chatgpt_model`	`CHATGPT_API_MODEL`	no	OpenAI Model used for ChatGPT text interaction	`gpt-4o`
`deepgram`	`speech_model`	`DEEPGRAM_SPEECH_MODEL`	no	Deepgram's speech detection model	`nova-2-conversationalai`
`deepgram`	`language`	`DEEPGRAM_LANGUAGE`	no	Deepgram's supported language used for speech transcoding	`en-US`
`deepgram`	`voice`	`DEEPGRAM_VOICE`	no	Deepgram's voice used for speaking back the response	`aura-asteria-en`
`deepgram`	`welcome_message`	`DEEPGRAM_WELCOME_MSG`	no	A welcome message to be played back to the user when the call starts	``
`deepgram`	`disable`	`DEEPGRAM_DISABLE`	no	Disables the flavor	false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepgram.md

deepgram.md

AI Voice Connector - Community Edition - Deepgram Flavor

Implementation

Speech to Text

AI Engine

Text to Speech

Configuration

Files

deepgram.md

Latest commit

History

deepgram.md

File metadata and controls

AI Voice Connector - Community Edition - Deepgram Flavor

Implementation

Speech to Text

AI Engine

Text to Speech

Configuration