Skip to content

Voice2Image transforms your voice into images using advanced AI. Record your voice, get text via OpenAI's Whisper, and generate images with DALL·E. Enhance results by regenerating images with Gemini 1.5 Pro.

License

Notifications You must be signed in to change notification settings

kanitvural/generate_image_with_voice

Repository files navigation

Voice2Image

Voice2Image is an innovative application that transforms your voice into images using advanced AI models. The app leverages OpenAI's Whisper model to convert your voice recordings into text, which is then used by OpenAI's DALL·E to generate images. Additionally, you can click "Use Last Picture" to regenerate the image using Gemini 1.5 Pro, combining the image with the prompt for enhanced results.

Features

  • Voice Recording: Record your voice and convert it into text using OpenAI's Whisper model.
  • Image Generation: Create images from text using OpenAI's DALL·E.
  • Enhanced Image Generation: Use Gemini 1.5 Pro to regenerate the image with the previous results and prompt.

Screenshots

Screenshot 1 Screenshot 2

Installation

  1. Clone the repository:

    git clone https://github.com/kanitvural/generate_image_with_voice.git
    cd generate_image_with_voice
    python -m venv venv
    - Windows: venv\Scripts\activate
    - Linux: source venv/bin/activate
    - Mac: source venv/bin/activate
  2. Install the required packages:

    pip install -r requirements.txt
  3. Run app:

    streamlit run app.py
    
    

Usage

  1. Record Your Voice:

    • Click the "Record" button to start recording your voice.
    • Once done, the recording will be processed to generate text.
  2. Generate Image:

    • The generated text will be used by OpenAI's DALL·E to create an image.
  3. Use Last Picture:

    • Click "Use Last Picture" to regenerate the image using Gemini 1.5 Pro with the previous image and prompt.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any questions or feedback, please contact [email protected].

About

Voice2Image transforms your voice into images using advanced AI. Record your voice, get text via OpenAI's Whisper, and generate images with DALL·E. Enhance results by regenerating images with Gemini 1.5 Pro.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages