building an image to text system
This is a Streamlit web application that takes an image and a text prompt as input, and uses Google's Generative AI model (gemini-1.5-flash
) to generate content based on the input. The app can either take just the image or combine both the image and the text prompt to produce a response.
- Upload an image (jpg, jpeg, png formats)
- Option to provide a text prompt
- Generate content using Google's Generative AI (
gemini-1.5-flash
) - Display the uploaded image and the generated response on the page
To run the app, you need to have the following installed:
- Python 3.x
- Streamlit
- Pillow (Python Imaging Library)
python-dotenv
for managing environment variablesgoogle-generativeai
library for the Generative AI model
-
Clone the repository:
git clone https://github.com/your-username/image-to-text-app.git cd image-to-text-app
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.env
file in the root directory. - Add your Google API key to the
.env
file:
GOOGLE-API-KEY=your-google-api-key-here
- Create a
-
Run the Streamlit app:
streamlit run app.py
- Launch the app and upload an image file (jpg, jpeg, or png).
- Optionally, provide a text prompt to give additional context.
- Click the Submit button to generate content based on the image and the prompt.
- The AI-generated response will be displayed on the page.
- Streamlit: For creating the web interface
- Pillow (PIL): For handling image upload and display
- python-dotenv: For managing environment variables like the Google API key
- google-generativeai: For accessing Google's Generative AI model
Ensure that your Google API key has the correct permissions to access the generative model. You will need to configure the environment with the API key using a .env
file as described above.
This project is licensed under the MIT License. Feel free to modify and use the code as per your needs.