This project demonstrates how to create a voice assistant using Python, FastAPI, WebSockets, and an AG2 RealtimeAgent. The application streams audio from a browser to a FastAPI server and enables real-time voice communication with the RealtimeAgent.
- WebSocket Audio Streaming: Direct real-time audio streaming between the browser and server.
- FastAPI Integration: A lightweight Python backend for handling WebSocket traffic.
Before you begin, ensure you have the following:
- Python 3.9+: The project was tested with
3.9
. Download here. - An OpenAI account and an OpenAI API Key. You can sign up here.
- OpenAI Realtime API access.
Follow these steps to set up the project locally:
git clone https://github.com/ag2ai/realtime-agent-over-websockets.git
cd realtime-agent-over-websockets
Create a OAI_CONFIG_LIST
file based on the provided OAI_CONFIG_LIST_sample
:
cp OAI_CONFIG_LIST_sample OAI_CONFIG_LIST
- In the OAI_CONFIG_LIST file, update the
api_key
to your OpenAI API key for the configuration with the tag "gpt-4o-mini-realtime"
- In the OAI_CONFIG_LIST file, update the
api_key
to your Gemini API key for the configuration with the tag "gemini-realtime" - In realtime_over_websockets/main.py update filter_dict tag to "gemini-realtime"
To reduce cluttering your global Python environment on your machine, you can create a virtual environment. On your command line, enter:
python3 -m venv env
source env/bin/activate
Install the required Python packages using pip
:
pip install -r requirements.txt
Run the application with Uvicorn:
uvicorn realtime_over_websockets.main:app --port 5050
With the server running, open the client application in your browser by navigating to http://localhost:5050/start-chat/. Speak into your microphone, and the AI assistant will respond in real time.
This project is licensed under the MIT License.