BabbleScribe is an audio transcription tool that converts speech to text with high accuracy using the OpenAI whisper API. This repository contains the backend functionality for BabbleScribe.
- Audio file transcription using OpenAI's Whisper API
- Support for multiple audio formats (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
- Organized file management with separate input, output, and transcription folders
- Markdown formatting for transcription output
- Python 3.7+
- requests==2.31.0
- python-dotenv==0.19.1
-
Clone the repository:
git clone https://github.com/yourusername/babblescribe.git cd babblescribe
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file in the project root and add your OpenAI API key:OPENAI_API_KEY=your_api_key_here
babblescribe/
│
├── main.py
├── requirements.txt
├── .env
├── input/
├── output/
└── transcriptions/
main.py
: Contains the main transcription logicinput/
: Place audio files here for transcriptionoutput/
: Processed audio files are moved heretranscriptions/
: Transcription results in markdown format are saved here
-
Place your audio file(s) in the
input/
folder. -
Run the transcription script:
python main.py
-
Follow the prompts to enter the language code (optional) for the audio files.
-
The script will process all supported audio files in the
input/
folder, move them to theoutput/
folder, and save markdown transcriptions in thetranscriptions/
folder.
While BabbleScribe doesn't currently expose its own API, it interacts with the OpenAI Whisper API. Here's how the API is used in the project:
def transcribe_audio(file_path, language=None):
url = "https://api.openai.com/v1/audio/transcriptions"
headers = {
"Authorization": f"Bearer {API_KEY}"
}
with open(file_path, "rb") as audio_file:
files = {"file": audio_file}
data = {
"model": "whisper-1",
"response_format": "text"
}
if language:
data["language"] = language
response = requests.post(url, headers=headers, files=files, data=data)
if response.status_code == 200:
return response.text
else:
raise Exception(f"Error: {response.status_code}, {response.text}")
This function sends a POST request to the Whisper API with the audio file and receives the transcription in response.
This project is licensed under the MIT License - see the LICENSE file for details.
Happy transcribing with BabbleScribe! If you encounter any issues or have suggestions for improvements, please open an issue on GitHub.🎙️📝