Transforming Documents into Conversational Knowledge Bases with RAG, LangChain, and ChromaDB

publication_cover_image

Project Overview

Type: Software Tool + Real-World Application
Goal: To build a professional, and high-performance chatbot that answers user questions based strictly on the content of uploaded documents using Retrieval-Augmented Generation (RAG).

This project demonstrates a complete RAG pipeline that transforms a general-purpose Large Language Model (LLM) into a specialized, document-grounded expert. By leveraging the high-speed inference of the Groq API, LangChain for orchestration, and ChromaDB for efficient vector storage, this chatbot delivers fast, accurate, and secure answers. It supports multiple document formats (PDF, DOCX, TXT, and Markdown) and is wrapped in a user-friendly Streamlit interface.

The core strength of this application lies in its performance and strict adherence to the provided context, ensuring that it never hallucinates or provides information outside the scope of the uploaded document.

Purpose of Publication

This publication serves as a comprehensive guide for developers, researchers, and AI enthusiasts looking to build reliable and secure RAG systems. While many LLMs offer broad knowledge, they often fail when domain-specific, factual accuracy is required. This project provides a practical, end-to-end solution to that problem.

It is particularly useful for:

AI/ML Engineers seeking a production-ready template for building secure, internal knowledge base tools.
Researchers and Students who need to interactively query their own papers, notes, and textbooks without information leakage.
Businesses aiming to create internal support bots that provide answers based solely on company policies, technical manuals, or legal documents.
Hobbyists interested in a hands-on project that covers the entire modern RAG stack, from data ingestion to a web-based UI.

Key Features

High-Performance Q&A: Powered by the Groq API for near-instantaneous response generation.
Strictly Document-Grounded: Answers are generated exclusively from the content of the uploaded document, preventing hallucinations.
Robust Security & Boundaries: A sophisticated prompt architecture (configs/prompt_config.yaml) ensures the chatbot refuses to answer out-of-scope, unethical, or sensitive questions.
Multi-Format Support: Ingests and processes PDF, DOCX, TXT, and Markdown files using functions in ingest.py.
Interactive UI: A clean and simple web interface built with Streamlit allows for easy file uploads and interaction (chatbot.py).
Local Vector Storage: Utilizes ChromaDB to store document embeddings locally, ensuring data privacy and fast retrieval.
Modular & Extensible Codebase: The project is logically structured into modules for ingestion (ingest.py), RAG pipeline logic (RAG_pipeline.py), and prompt engineering (prompt_builder.py).

How It Works: The RAG Pipeline

The system follows a clear and efficient pipeline to transform a user's question into a document-grounded answer. The entire workflow can be visualized in the architecture diagram below.

RAG Architecture Diagram

1. Document Ingestion & Processing

A user uploads a document via the Streamlit interface.
The ingest.py script uses the load_document function to extract raw text from the file.
The text is then broken down into smaller, manageable pieces using RecursiveCharacterTextSplitter with a chunk size of 512 and an overlap of 128 tokens. This is handled by the chunk_document function.

2. Embedding and Vector Storage

Each text chunk is converted into a numerical vector (embedding) using the sentence-transformers/all-MiniLM-L6-v2 model via the embed_chunks function.
These embeddings, along with the original text chunks, are stored in a local ChromaDB collection, which is initialized and managed by initialize_vector_db and insert_documents.

3. Intelligent Retrieval & Generation

When a user asks a question, the retrieve_relevant_documents function in RAG_pipeline.py first converts the query into an embedding.
It then queries ChromaDB to find the most semantically similar text chunks from the document.
The retrieved chunks are passed to the build_prompt_from_config function, which uses a detailed YAML configuration (configs/prompt_config.yaml) to construct a highly specific system prompt.
The final prompt, containing the strict instructions and the relevant document context, is sent to the Groq API (llama-3.1-8b-instant) to generate a response that adheres to the prompt's constraints.
The final, formatted answer is displayed to the user in the Streamlit UI.

Installation & Setup

Get started locally in just a few steps:

Clone the Repository

git clone https://github.com/mohsinansari0705/File-QnA-Chatbot-using-RAG.git
cd File-QnA-Chatbot-using-RAG

Create a Virtual Environment (Recommended)

# On Windows
python -m venv RAG_env
RAG_env\Scripts\activate

# On macOS/Linux
python3 -m venv RAG_env
source RAG_env/bin/activate

Install Dependencies
```
pip install -r requirements.txt
```
Launch the App
```
streamlit run chatbot.py
```
The chatbot will open in your browser. Enter your Groq API key in the sidebar, upload a document, and start asking questions!

Results

The application provides a seamless and secure user experience. The interface is clean, and the responses are strictly confined to the document's content, delivering fast and accurate answers.

Chatbot Main Interface
Chatbot Response Example

Summary

This project successfully implements a secure, high-performance RAG chatbot by combining the strengths of modern AI tools. It stands out due to its robust security framework, which is enforced by a meticulously crafted system prompt, and its impressive speed, thanks to the Groq LLM API. The modular architecture makes it an excellent foundation for building more complex, domain-specific AI applications.

License

This project is licensed under the MIT License. The full license text is in the MIT License file in the repo.

Feedback Welcome!

I am interested in your opinion about this project!
If you have any suggestions or want to adapt the bot to your own tasks, reach out to me Mohsin Ansari.

publication_cover_image

Project Overview

Purpose of Publication

It is particularly useful for:

AI/ML Engineers seeking a production-ready template for building secure, internal knowledge base tools.
Researchers and Students who need to interactively query their own papers, notes, and textbooks without information leakage.
Businesses aiming to create internal support bots that provide answers based solely on company policies, technical manuals, or legal documents.
Hobbyists interested in a hands-on project that covers the entire modern RAG stack, from data ingestion to a web-based UI.

Key Features

High-Performance Q&A: Powered by the Groq API for near-instantaneous response generation.
Strictly Document-Grounded: Answers are generated exclusively from the content of the uploaded document, preventing hallucinations.
Robust Security & Boundaries: A sophisticated prompt architecture (configs/prompt_config.yaml) ensures the chatbot refuses to answer out-of-scope, unethical, or sensitive questions.
Multi-Format Support: Ingests and processes PDF, DOCX, TXT, and Markdown files using functions in ingest.py.
Interactive UI: A clean and simple web interface built with Streamlit allows for easy file uploads and interaction (chatbot.py).
Local Vector Storage: Utilizes ChromaDB to store document embeddings locally, ensuring data privacy and fast retrieval.
Modular & Extensible Codebase: The project is logically structured into modules for ingestion (ingest.py), RAG pipeline logic (RAG_pipeline.py), and prompt engineering (prompt_builder.py).

How It Works: The RAG Pipeline

The system follows a clear and efficient pipeline to transform a user's question into a document-grounded answer. The entire workflow can be visualized in the architecture diagram below.

RAG Architecture Diagram

1. Document Ingestion & Processing

A user uploads a document via the Streamlit interface.
The ingest.py script uses the load_document function to extract raw text from the file.
The text is then broken down into smaller, manageable pieces using RecursiveCharacterTextSplitter with a chunk size of 512 and an overlap of 128 tokens. This is handled by the chunk_document function.

2. Embedding and Vector Storage

Each text chunk is converted into a numerical vector (embedding) using the sentence-transformers/all-MiniLM-L6-v2 model via the embed_chunks function.
These embeddings, along with the original text chunks, are stored in a local ChromaDB collection, which is initialized and managed by initialize_vector_db and insert_documents.

3. Intelligent Retrieval & Generation

When a user asks a question, the retrieve_relevant_documents function in RAG_pipeline.py first converts the query into an embedding.
It then queries ChromaDB to find the most semantically similar text chunks from the document.
The retrieved chunks are passed to the build_prompt_from_config function, which uses a detailed YAML configuration (configs/prompt_config.yaml) to construct a highly specific system prompt.
The final prompt, containing the strict instructions and the relevant document context, is sent to the Groq API (llama-3.1-8b-instant) to generate a response that adheres to the prompt's constraints.
The final, formatted answer is displayed to the user in the Streamlit UI.

Installation & Setup

Get started locally in just a few steps:

Clone the Repository

git clone https://github.com/mohsinansari0705/File-QnA-Chatbot-using-RAG.git
cd File-QnA-Chatbot-using-RAG

Create a Virtual Environment (Recommended)

# On Windows
python -m venv RAG_env
RAG_env\Scripts\activate

# On macOS/Linux
python3 -m venv RAG_env
source RAG_env/bin/activate

Install Dependencies
```
pip install -r requirements.txt
```
Launch the App
```
streamlit run chatbot.py
```
The chatbot will open in your browser. Enter your Groq API key in the sidebar, upload a document, and start asking questions!

Results

The application provides a seamless and secure user experience. The interface is clean, and the responses are strictly confined to the document's content, delivering fast and accurate answers.

Chatbot Main Interface
Chatbot Response Example

Summary

License

This project is licensed under the MIT License. The full license text is in the MIT License file in the repo.

Feedback Welcome!

I am interested in your opinion about this project!
If you have any suggestions or want to adapt the bot to your own tasks, reach out to me Mohsin Ansari.

Transforming Documents into Conversational Knowledge Bases with RAG, LangChain, and ChromaDB

Table of contents

Project Overview

Purpose of Publication

Key Features

How It Works: The RAG Pipeline

1. Document Ingestion & Processing

2. Embedding and Vector Storage

3. Intelligent Retrieval & Generation

Installation & Setup

Results

Summary

License

Feedback Welcome!

Table of contents

Files

Project Overview

Purpose of Publication

Key Features

How It Works: The RAG Pipeline

1. Document Ingestion & Processing

2. Embedding and Vector Storage

3. Intelligent Retrieval & Generation

Installation & Setup

Results

Summary

License

Feedback Welcome!

Code

Code