RAG Chatbot with Ollama LangChain FastAPI & Streamlit

RAG Chatbot with Ollama, LangChain, FastAPI & Streamlit

This project implements a Retrieval-Augmented Generation (RAG) chatbot that runs locally using open-source LLMs managed by Ollama. It features a Python backend built with FastAPI and LangChain and a user-friendly frontend using Streamlit.

Overview

The goal of this project is to provide a context-aware chat experience by leveraging documents provided by the user. The chatbot can ingest information from various file types (PDF DOCX TXT) store it efficiently using a vector database (FAISS) and use this knowledge to answer user queries accurately. Users can switch between different locally hosted language models via Ollama.

Features

Retrieval-Augmented Generation (RAG): Answers questions based on the content of uploaded documents.
Local Open-Source LLMs: Integrates with models running locally via Ollama (e.g. Mistral 7B Llama 3 8B Phi-3 Mini).
Multiple Model Support: Allows switching between configured LLMs during a chat session.
Multi-File Upload: Supports uploading PDF DOCX and TXT files. (Note: PDF uses PyPDFLoader due to previous dependency compatibility issues with UnstructuredLoader).
Chat History: Stores and displays the conversation history for the current session.
Vector Store: Uses FAISS (CPU) for efficient document embedding storage and retrieval.
Usage Statistics: Displays basic statistics like query count average processing time etc.
Web Interface: Simple and clean UI built with Streamlit for interaction.

Tech Stack

Backend: Python FastAPI LangChain Uvicorn
Frontend: Streamlit
LLM Orchestration: LangChain
LLM Serving: Ollama
Models Used (Example): Mistral 7B Llama 3 8B (configurable via .env)
Vector Store: FAISS (CPU)

Architecture

The application follows a simple client-server architecture:

Frontend (Streamlit): Provides the user interface for uploading files chatting selecting models and viewing stats. It communicates with the backend API.
Backend (FastAPI): Exposes API endpoints for file processing chat interaction and fetching data (history stats).
Ollama: Runs the open-source LLMs locally serving requests from the backend.
FAISS: Stores vector embeddings of document chunks locally for fast retrieval.

Getting Started

Prerequisites

Python: Version 3.10 or higher recommended.
Ollama: Install the Ollama macOS application from https://ollama.com/. Ensure the Ollama application is running.
Git: For cloning the repository.

Installation

Clone the repository:

git clone https://github.com/spatel1110/RAG-Chatbot-using-LLMs-and-LangChain.git
cd rag-chatbot-project

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install Python dependencies:
```
pip install -r requirements.txt
```

Ollama Setup

Ensure Ollama is running. (Check for the menu bar icon on macOS).
Pull the desired LLMs: Open your terminal and run:
```
ollama pull mistral
ollama pull llama3
```
Verify models are available:
```
ollama list
```

Configuration

Navigate to the backend directory: cd backend

Create a .env file by copying the example or creating it manually:

MODEL_1="mistral"
MODEL_2="llama3"
EMBEDDING_MODEL="mistral"
VECTORSTORE_PATH="../vectorstore/faiss_index"

Adjust the model names and paths as needed. Ensure the models listed are pulled via Ollama.
Go back to the project root directory: cd ..

Running the Application

Start the Backend (FastAPI):
Open a terminal in the project root directory.
```
# Ensure venv is active
cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```
(Keep this terminal running)
Start the Frontend (Streamlit):
Open a new terminal in the project root directory.
```
# Ensure venv is active
cd frontend
streamlit run app.py
```

GitHub Repo Link

Kaggle

RAG Chatbot with Ollama, LangChain, FastAPI & Streamlit

Overview

Features

Retrieval-Augmented Generation (RAG): Answers questions based on the content of uploaded documents.
Local Open-Source LLMs: Integrates with models running locally via Ollama (e.g. Mistral 7B Llama 3 8B Phi-3 Mini).
Multiple Model Support: Allows switching between configured LLMs during a chat session.
Multi-File Upload: Supports uploading PDF DOCX and TXT files. (Note: PDF uses PyPDFLoader due to previous dependency compatibility issues with UnstructuredLoader).
Chat History: Stores and displays the conversation history for the current session.
Vector Store: Uses FAISS (CPU) for efficient document embedding storage and retrieval.
Usage Statistics: Displays basic statistics like query count average processing time etc.
Web Interface: Simple and clean UI built with Streamlit for interaction.

Tech Stack

Backend: Python FastAPI LangChain Uvicorn
Frontend: Streamlit
LLM Orchestration: LangChain
LLM Serving: Ollama
Models Used (Example): Mistral 7B Llama 3 8B (configurable via .env)
Vector Store: FAISS (CPU)

Architecture

The application follows a simple client-server architecture:

Frontend (Streamlit): Provides the user interface for uploading files chatting selecting models and viewing stats. It communicates with the backend API.
Backend (FastAPI): Exposes API endpoints for file processing chat interaction and fetching data (history stats).
Ollama: Runs the open-source LLMs locally serving requests from the backend.
FAISS: Stores vector embeddings of document chunks locally for fast retrieval.

Getting Started

Prerequisites

Python: Version 3.10 or higher recommended.
Ollama: Install the Ollama macOS application from https://ollama.com/. Ensure the Ollama application is running.
Git: For cloning the repository.

Installation

Clone the repository:

git clone https://github.com/spatel1110/RAG-Chatbot-using-LLMs-and-LangChain.git
cd rag-chatbot-project

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install Python dependencies:
```
pip install -r requirements.txt
```

Ollama Setup

Ensure Ollama is running. (Check for the menu bar icon on macOS).
Pull the desired LLMs: Open your terminal and run:
```
ollama pull mistral
ollama pull llama3
```
Verify models are available:
```
ollama list
```

Configuration

Navigate to the backend directory: cd backend

Create a .env file by copying the example or creating it manually:

MODEL_1="mistral"
MODEL_2="llama3"
EMBEDDING_MODEL="mistral"
VECTORSTORE_PATH="../vectorstore/faiss_index"

Adjust the model names and paths as needed. Ensure the models listed are pulled via Ollama.
Go back to the project root directory: cd ..

Running the Application

Start the Backend (FastAPI):
Open a terminal in the project root directory.
```
# Ensure venv is active
cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```
(Keep this terminal running)
Start the Frontend (Streamlit):
Open a new terminal in the project root directory.
```
# Ensure venv is active
cd frontend
streamlit run app.py
```

RAG Chatbot with Ollama LangChain FastAPI & Streamlit

Table of contents

RAG Chatbot with Ollama, LangChain, FastAPI & Streamlit

Overview

Features

Tech Stack

Architecture

Getting Started

Prerequisites

Installation

Ollama Setup

Configuration

Running the Application

Table of contents

RAG Chatbot with Ollama, LangChain, FastAPI & Streamlit

Overview

Features

Tech Stack

Architecture

Getting Started

Prerequisites

Installation

Ollama Setup

Configuration

Running the Application

Code

Code