Skip to content

Latest commit

 

History

History
72 lines (56 loc) · 3.79 KB

deepgram.md

File metadata and controls

72 lines (56 loc) · 3.79 KB

AI Voice Connector - Community Edition - Deepgram Flavor

The Deepgram flavor leverages Deepgram's Speech-to-Text API to transcribe the SIP user's spoken input into text. This transcription is then sent to the ChatGPT API, which interprets the message and generates a response. The response text is subsequently passed back to Deepgram's Text-to-Speech API, which converts it into voice format, allowing the response to be played back to the user.

Implementation

Speech to Text

It is using Deepgram's nova-2 module, with the conversationalai option (by default) to interpret the user's input. In order to determine the correct phrasing, we are relying on the model's logic to determine the phrases and punctuate them accordingly.

By default, the language used is English, but can be tuned to support other languages as well, depending on the Models used. You can find out more about how to tune the Deepgram modules here.

Communication with Deepgram is done over WebSocket channels, ensuring efficient transfer of real-time audio media. Media is encoded using the codec received from the user. Currently supported codecs for STT are:

  • g711 PCMU - mulaw
  • g711 PCMA - alaw
  • Opus

A full list of Deepgram's supported encodings is here.

AI Engine

We are using the asynchronous OpenAI Python library to communicate with ChatGPT backend. By default we are using the gpt-4o model for conversational AI, but others can be used as well. A full list of available models and their capabilities can be found here.

Text to Speech

In order to playback the AI's result to the user, we are using Deepgram's Text-to-Speech REST interface.

Codecs used for playing back the audio to the user are the same ones used for STT, with a few constraints enforced by the Deepgram's TTS engine.

Configuration

The following parameters can be tuned for this engine:

Section Parameter Environment Mandatory Description Default
deepgram key DEEPGRAM_API_KEY yes Deepgram API key not provided
deepgram chatgpt_key or openai_key CHATGPT_API_KEY/OPENAI_API_KEY yes OpenAI API key used for ChatGPT not provided
deepgram chatgpt_model CHATGPT_API_MODEL no OpenAI Model used for ChatGPT text interaction gpt-4o
deepgram speech_model DEEPGRAM_SPEECH_MODEL no Deepgram's speech detection model nova-2-conversationalai
deepgram language DEEPGRAM_LANGUAGE no Deepgram's supported language used for speech transcoding en-US
deepgram voice DEEPGRAM_VOICE no Deepgram's voice used for speaking back the response aura-asteria-en
deepgram welcome_message DEEPGRAM_WELCOME_MSG no A welcome message to be played back to the user when the call starts ``
deepgram disable DEEPGRAM_DISABLE no Disables the flavor false