RAG Chatbot with Ollama, LangChain, FastAPI & Streamlit
This project implements a Retrieval-Augmented Generation (RAG) chatbot that runs locally using open-source LLMs managed by Ollama. It features a Python backend built with FastAPI and LangChain and a user-friendly frontend using Streamlit.
Overview
The goal of this project is to provide a context-aware chat experience by leveraging documents provided by the user. The chatbot can ingest information from various file types (PDF DOCX TXT) store it efficiently using a vector database (FAISS) and use this knowledge to answer user queries accurately. Users can switch between different locally hosted language models via Ollama.
Features
Retrieval-Augmented Generation (RAG): Answers questions based on the content of uploaded documents.
Local Open-Source LLMs: Integrates with models running locally via Ollama (e.g. Mistral 7B Llama 3 8B Phi-3 Mini).
Multiple Model Support: Allows switching between configured LLMs during a chat session.
Multi-File Upload: Supports uploading PDF DOCX and TXT files. (Note: PDF uses PyPDFLoader due to previous dependency compatibility issues with UnstructuredLoader).
Chat History: Stores and displays the conversation history for the current session.
Vector Store: Uses FAISS (CPU) for efficient document embedding storage and retrieval.
Usage Statistics: Displays basic statistics like query count average processing time etc.
Web Interface: Simple and clean UI built with Streamlit for interaction.
Tech Stack
Backend: Python FastAPI LangChain Uvicorn
Frontend: Streamlit
LLM Orchestration: LangChain
LLM Serving: Ollama
Models Used (Example): Mistral 7B Llama 3 8B (configurable via .env)
Vector Store: FAISS (CPU)
Architecture
The application follows a simple client-server architecture:
Frontend (Streamlit): Provides the user interface for uploading files chatting selecting models and viewing stats. It communicates with the backend API.
Backend (FastAPI): Exposes API endpoints for file processing chat interaction and fetching data (history stats).
Ollama: Runs the open-source LLMs locally serving requests from the backend.
FAISS: Stores vector embeddings of document chunks locally for fast retrieval.
Getting Started
Prerequisites
Python: Version 3.10 or higher recommended.
Ollama: Install the Ollama macOS application from https://ollama.com/. Ensure the Ollama application is running.
Git: For cloning the repository.
Installation
Clone the repository:
git clone https://github.com/spatel1110/RAG-Chatbot-using-LLMs-and-LangChain.git
cd rag-chatbot-project
Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate
Install Python dependencies:
pip install -r requirements.txt
Ollama Setup
Ensure Ollama is running. (Check for the menu bar icon on macOS).
Pull the desired LLMs: Open your terminal and run:
ollama pull mistral
ollama pull llama3
Verify models are available:
ollama list
Configuration
Navigate to the backend directory: cd backend
Create a .env file by copying the example or creating it manually: