Voice2Image is an innovative application that transforms your voice into images using advanced AI models. The app leverages OpenAI's Whisper model to convert your voice recordings into text, which is then used by OpenAI's DALL·E to generate images. Additionally, you can click "Use Last Picture" to regenerate the image using Gemini 1.5 Pro, combining the image with the prompt for enhanced results.
- Voice Recording: Record your voice and convert it into text using OpenAI's Whisper model.
- Image Generation: Create images from text using OpenAI's DALL·E.
- Enhanced Image Generation: Use Gemini 1.5 Pro to regenerate the image with the previous results and prompt.
-
Clone the repository:
git clone https://github.com/kanitvural/generate_image_with_voice.git cd generate_image_with_voice python -m venv venv - Windows: venv\Scripts\activate - Linux: source venv/bin/activate - Mac: source venv/bin/activate
-
Install the required packages:
pip install -r requirements.txt
-
Run app:
streamlit run app.py
-
Record Your Voice:
- Click the "Record" button to start recording your voice.
- Once done, the recording will be processed to generate text.
-
Generate Image:
- The generated text will be used by OpenAI's DALL·E to create an image.
-
Use Last Picture:
- Click "Use Last Picture" to regenerate the image using Gemini 1.5 Pro with the previous image and prompt.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or feedback, please contact [email protected].