This repository contains an AI-powered Question-Answering application that leverages Retrieval-Augmented Generation (RAG) to provide accurate answers based on uploaded documents. The application uses Google's Gemini AI model and LangChain for document processing and retrieval.
ASR.Demo.mp4
app.py
: Main application scriptrequirements.txt
: List of Python dependencies
- Supports multiple document formats: PDF, DOCX, and TXT
- Utilizes Google's Gemini AI for natural language understanding and generation
- Implements RAG for enhanced answer accuracy
- Interactive chat interface using Streamlit
- Document chunking and embedding for efficient retrieval
- Clone this repository:
- git clone https://github.com/raadongithub/QA-System-using-Langchain.git
- cd QA-System-using-Langchain
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the project root directory and add your Google API key: -
Run the Streamlit app:
-
- Open the provided URL in your web browser.
-
Enter your Google API key in the sidebar.
-
Upload a document (PDF, DOCX, or TXT) using the file uploader.
-
Ask questions about the document in the chat interface.
- Document Processing:
- The application accepts PDF, DOCX, or TXT files.
- Documents are loaded using appropriate libraries (PyPDF2 for PDF, python-docx for DOCX).
- Text is extracted and split into smaller chunks using LangChain's RecursiveCharacterTextSplitter.
- Embedding and Indexing:
- Text chunks are embedded using HuggingFace's sentence-transformers model.
- Embeddings are stored in a FAISS vector store for efficient similarity search.
- Question-Answering:
- User questions trigger a search for relevant document chunks.
- Retrieved chunks and the question are sent to the Gemini AI model.
- The model generates a response based on the provided context and question.
- Chat Interface:
- Streamlit and streamlit-chat are used to create an interactive chat experience.
- Chat history is maintained for context in follow-up questions.
- Streamlit: For creating the web interface
- LangChain: For document processing, embedding, and retrieval
- Google Generative AI (Gemini): For natural language understanding and generation
- FAISS: For efficient similarity search of embeddings
- HuggingFace Transformers: For text embedding
- PyPDF2: For PDF processing
- python-docx: For DOCX processing
This application implements RAG, a technique that enhances large language models with external knowledge. RAG combines the power of retrieval-based systems with generative models:
- The retriever finds relevant information from the uploaded document.
- The generator (Gemini AI) uses this information to produce more accurate and contextually relevant answers.
This approach allows the model to access specific information from the document without needing to encode all details in its parameters, resulting in more accurate and up-to-date responses.
- Support for more document formats
- Integration with additional AI models
- Enhanced error handling and user feedback
- Optimization of chunking and retrieval strategies
- Implementation of source citation in responses
Contributions to improve the application are welcome. Please feel free to submit issues and pull requests.