This paper presents the design and implementation of a Retrieval-Augmented Generation (RAG)-based Web Q&A chatbot leveraging LangChain and Streamlit. The system is designed to provide users with answers to questions about documents retrieved from a provided URL. By processing documents into smaller chunks, creating embeddings for efficient retrieval, and using a pre-trained language model from Google GenAI, the chatbot generates accurate, context-aware responses. The system aims to demonstrate the potential of RAG models in real-time document-based question answering in a user-friendly web interface.
The task of question answering (QA) based on large text corpora has seen significant advancements with the rise of Retrieval-Augmented Generation (RAG) models. RAG models combine the power of information retrieval and generative models to answer questions based on context, as opposed to relying solely on pre-trained knowledge. However, challenges such as document processing, efficient retrieval, and maintaining coherent answers still persist.
This work demonstrates how a RAG-based system can be implemented to answer questions from documents retrieved from a given URL. The system uses the LangChain framework for document processing, HuggingFace Embeddings for creating vector-based document representations, and Google GenAI for language model-based answer generation. The goal is to provide an intuitive and scalable solution for real-time document-based QA applications.
The system is built using Streamlit as the front-end interface and LangChain for backend processing. The core components include:
Document Loading: The system takes a URL input from the user, loads the web page content using LangChain's WebBaseLoader, and extracts the text.
Document Splitting: To manage long documents, the content is split into smaller chunks using RecursiveCharacterTextSplitter, which allows for better retrieval efficiency.
Embeddings: The document chunks are embedded into vector representations using a pre-trained HuggingFace model (NovaSearch/stella_en_1.5B_v5).
Vectorstore: The embedded chunks are stored in an in-memory vector store (InMemoryVectorStore) for efficient similarity-based retrieval.
Answer Generation: For the userβs question, the system retrieves relevant document chunks and uses a Google Gemini 2.5 Flash model to generate answers based on the retrieved context.
User Input: The user provides a URL of a document and a question.
Document Loading and Splitting: The system loads the document from the URL and splits it into manageable chunks.
Embedding Generation: Each document chunk is transformed into vector embeddings for efficient similarity-based retrieval.
Answer Generation: When the user submits a question, the system retrieves the relevant document chunks using similarity search and generates an answer using the Google Gemini model.
Response Display: The generated answer and relevant document context are displayed to the user.
Streamlit: For building the interactive web interface.
LangChain: For document processing, vectorization, and model orchestration.
HuggingFaceEmbeddings: For transforming text into vector embeddings.
Google GenAI: For answer generation using the Gemini language model.
Python-dotenv: To manage sensitive API keys securely.
To evaluate the effectiveness of the proposed system, I conducted several experiments to assess its functionality and performance:
Experiment 1 - Document Retrieval and Answer Generation:
Objective: To evaluate the system's ability to correctly retrieve relevant documents and generate meaningful answers.
Setup: A set of publicly available documents (e.g., USA Constitution) was used for testing. The user entered questions related to these documents, and the system processed the questions by retrieving relevant content and generating answers.
Experiment 2 - User Interaction and Interface:
Objective: To test the usability and responsiveness of the Streamlit web interface.
Setup: Participants interacted with the system by inputting various URLs and questions. The goal was to observe how easy it was to navigate the system and how quickly the system responded.
Experiment 3 - Efficiency and Latency:
Objective: To assess the time taken by the system to load documents, generate embeddings, retrieve relevant content, and generate answers.
Setup: A series of documents of varying lengths were tested to analyze the system's scalability and performance under different workloads.
The system successfully retrieved relevant documents based on the userβs query. For example, when asked about a specific amendment in the USA Constitution, the system returned contextually accurate paragraphs and provided an appropriate answer using the Google Gemini 2.5 Flash model. The accuracy of answers was satisfactory, with the system often selecting relevant document chunks to generate coherent answers.
Users reported a positive experience with the system's interface. The Streamlit layout was intuitive, and the two-column layout for asking questions and viewing answers worked well. Users were able to easily input URLs and receive responses without difficulty. However, some users noted that loading very large documents could cause slight delays, which is something to consider for further optimization.
Document Loading: For documents of average length (around 1000 words), the system processed the content in under 10 seconds.
Embedding Generation: The embedding generation and indexing process took around 5β15 seconds per document, depending on the document's size.
Query Response Time: On average, the response time for generating an answer to a question was under 5 seconds.
Overall, the system showed promising results with the ability to quickly load documents, retrieve relevant content, and generate accurate answers.
In this work, I presented a RAG-based Web Q&A Chatbot implemented using Streamlit and LangChain. The system demonstrated an effective combination of information retrieval and answer generation, allowing users to ask questions about documents retrieved from URLs.
Key Contributions:
We leveraged LangChain for seamless integration of document processing, embeddings, and language models.
The Google Gemini 2.5 Flash model was used for question answering, producing contextually relevant responses.
The Streamlit interface made the system accessible and user-friendly for real-time interaction.
Future Work:
Model Improvement: Future work could include fine-tuning the model on domain-specific documents to improve answer accuracy.
Scalability: Optimizing the system for larger document sets by incorporating more efficient document indexing and retrieval methods.
Multilingual Support: Adding support for multilingual document analysis and question answering.
This work shows that RAG models have significant potential for real-world applications in document-based question answering, and can be easily deployed using accessible frameworks like Streamlit.