A "flavor" refers to an individual AI engine or a combination of services/engines that collectively power an end-to-end, speech-to-speech application. Each flavor represents a unique configuration tailored to deliver seamless voice interactions. Below is an overview of the currently supported AI engine flavors, detailing the specific engines and services integrated to provide robust, real-time speech processing capabilities.
The Deepgram flavor uses the Deepgram's Speech-to-Text API to transcribe the SIP User's input into text, then uses the ChatGPT API to interpret it and get a response in return. The response is then pushed back into Deepram's Text-to-Speech API to grab the voice and play it back to the user. You can read more about this flavor here.
The OpenAI flavor hooks directly into the OpenAI's Realtime API that provides direct Speech-to-Speech interpretation of the user's conversation. Find out more information about OpenAI flavor here.
For every new call, the engine needs to select an AI flavor to use. For this, the SIP To user is being used, with the following logic:
- For each flavor defined in the configuration file as a
section, the
user
is checked against thematch
node. If it matches, the corresponding section is begin used. The priority of the flavors considered is driven by thier order in the configuration file. - If nothing matches in the previous step, then the engine checks if the
user
matches the name of the flavor (in lowercase) - If the name does not match either, the selection is performed by hasing the
user
value - this ensures a consistent engine choosing.
Note that in any step, if the flavor is disabled in the configuration file, its settings are completely ignored in the selection process.