DocuChat-RAG: AI RAG Assistant with Conversational Memory - AAIDC Project 1

DocuChat-RAG: AI RAG assistant with optimized conversational memory

Author: MOKEMI TCHAKOUTE BAUTREL Duval
Date: March 2026

Introduction

DocuChat-RAG is an intelligent conversational assistant based on search augmented generation (RAG), designed to answer questions using exclusively your personal documents. Unlike general chatbots that can produce erroneous or speculative answers, this assistant is constrained to respond only from indexed documents, making it ideally suited for domain-specific applications or those subject to compliance requirements.

The system enhances reliability through:

Strict restrictions on prompts to prevent hallucinations
A windowed conversational memory (last 5 exchanges) for optimal context
Complete persistence of history in a JSON file
Multi-LLM support (OpenAI, Groq, Gemini) with automatic failover

Users can interact with the chatbot via a simple and intuitive command-line interface (CLI).

Scope of the document

This publication provides comprehensive instructions on the architecture, operation, and management of the RAG-based chatbot system. It details all the essential components:

Ingestion and processing of documents
Vector search with ChromaDB
Construction of prompts and prevention of hallucinations
Conversational memory management (sliding window)
Multi-LLM integration

In addition to describing the architecture, this document includes instructions for running the application, configuration options, and a test dataset to validate system behavior. It also outlines current limitations and areas for future improvement.

Dataset description

The dataset used for this application is a collection of 7 carefully selected text documents , covering a variety of current topics:

File	Thematic	Main content
`artificial_intelligence.txt`	Artificial intelligence	Principles of AI, Machine Learning, Deep Learning, NLP, AI Ethics
`biotechnology.txt`	Biotechnology	CRISPR, Synthetic biology, Personalized medicine, Regenerative medicine, Bioethics
`climate_science.txt`	Climate Sciences	Climate change, Greenhouse effect, Ecological impacts, Renewable energies, Adaptation
`quantum_computing.txt`	Quantum Computing	Quantum principles (superposition, entanglement), Quantum algorithms, Quantum hardware, Applications
`space_exploration.txt`	Space Exploration	Solar system, Martian exploration, International Space Station, Future of space travel, Exoplanets
`sustainable_energy.txt`	Sustainable Energy	Solar energy, Wind energy, Energy storage, Smart grids, Green hydrogen
`sample_documents.txt`	Data Science	Data science methodology, Feature engineering, Model evaluation, MLOps, Visualization

Together, these files form an encyclopedic and multidisciplinary knowledge base, enabling the assistant to answer a variety of questions in cutting-edge scientific and technological fields.

Limitations of the dataset

It is not exhaustive and does not contain any real-time information.
It lacks updatable numerical data
It does not cover events that occurred after the documents were created.
The assistant can only provide accurate answers if the queries fall within the scope of the included documents.

Technical details

Basic technology stack

Component	Technology	Role
Language	Python 3.9+	Primary programming language
Vector Database	ChromaDB	Semantic storage and retrieval
Embeddings	Sentence Transformers (all-MiniLM-L6-v2)	Text to vector conversion (384 dimensions)
LLM	OpenAI / Groq / Gemini	Response generation
Framework	LangChain	RAG orchestration and memory management
Interface	CLI (Python)	Command-line interaction

System architecture

                 ┌─────────────────────────────┐
                 │           USER              │
                 │        Ask a Question       │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │        QUERY EMBEDDING      │
                 │   Sentence Transformers     │
                 │  (all-MiniLM-L6-v2)         │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │        VECTOR SEARCH        │
                 │          ChromaDB           │
                 │   Top-K Semantic Retrieval  │
                 └──────────────┬──────────────┘
                                │
                                ▼
                 ┌─────────────────────────────┐
                 │     RETRIEVED DOCUMENTS     │
                 │       Relevant Chunks       │
                 │ (chunk_size=500, overlap=50)│
                 └──────────────┬──────────────┘
                                │
               ┌────────────────┴────────────────┐
               ▼                                 ▼

    ┌───────────────────────┐         ┌─────────────────────────┐
    │  CONVERSATION MEMORY  │         │        PROMPT           │
    │                       │         │      CONSTRUCTION       │
    │ Windowed Memory (5)  │         │ Context + Question      │
    │ JSON Chat History    │         │ Memory + Instructions   │
    └─────────────┬─────────┘         └──────────────┬──────────┘
                  │                                  │
                  └──────────────┬───────────────────┘
                                 ▼
                    ┌─────────────────────────┐
                    │           LLM           │
                    │  OpenAI / Groq / Gemini│
                    │  Grounded Generation   │
                    └─────────────┬───────────┘
                                  │
                                  ▼
                       ┌─────────────────────┐
                       │     FINAL ANSWER    │
                       │  Grounded Response  │
                       └─────────────────────┘

Overview of the main components

Component	Description
RAGAssistant	Main class that orchestrates the end-to-end flow of requests
VectorDB	Handles ingestion, chunking, and semantic searching via ChromaDB
WindowedFileChatHistory	Custom memory: window of the last 5 exchanges + JSON persistence
Multi-LLM Support	Integration of multi-vendor models (OpenAI, Groq, Gemini)
Prompt Template	Structured template with hallucination prevention constraints
CLI Interface	Command-line interface for interactive testing

Detailed presentation of the main capabilities

1. Intelligent document ingestion

When the application starts, the system automatically loads all .txt documents present in the data/ directory.

Ingestion pipeline:

Loading : Reading files with UTF-8 encoding
Chunking : Splitting with RecursiveCharacterTextSplitter (chunk_size=500, overlap=50)
Embeddings : Conversion to 384-dimensional vectors via all-MiniLM-L6-v2
Storage : Persistence in ChromaDB with unique identifiers

2. Semantic search with configurable thresholds

Document search uses cosine similarity with HNSW (Hierarchical Navigable Small World) indexing for instant search.

Setting	Value	Role
top_k	3	Number of most relevant chunks to return
distance_threshold	0.5	Maximum distance threshold for validation

3. Optimized Conversational Memory

The application implements custom memory via WindowedFileChatHistory:

class WindowedFileChatHistory(FileChatMessageHistory):
    """Retourne seulement les 5 derniers échanges au LLM,
       mais persiste tout l'historique dans un fichier JSON."""
    
    def __init__(self, file_path: str, k: int = 5):
        super().__init__(file_path)
        self.k = k
    
    @property
    def messages(self):
        all_messages = super().messages
        # Garde les derniers k échanges (k*2 messages)
        return all_messages[-(self.k * 2):]

Benefits:

✅ Limited context = constant token cost

✅ Complete persistent history (JSON)

✅ Automatic reset for each test session

4. Multi-LLM support with automatic tilting

The application automatically detects the available API key from among:

OpenAI: GPT-4o, GPT-3.5-Turbo

Groq: Llama-3, Mixtral

Google: Gemini Pro

5. Prevention of hallucinations through restraint

The prompt system includes strict constraints:

Constraints:

Only answer questions based on the documents provided.
Only answer questions based on the provided documents and conversation history.
If the user's question is not related to the documents, say
"The question is not answerable given the documents".
Never answer a question from your own knowledge.

6. Centralized configuration management

All parameters are configurable via environment variables (.env):

LLM Supplier Selection

Distance thresholds and top_k

Memory window size (k)

Logging levels

Data flow
Document ingestion flow

            ┌───────────────────────┐
            │   data/ directory     │
            │   Raw Text Documents  │
            │   (.txt files)        │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │   Document Loading    │
            │   UTF-8 File Reader   │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │     Text Chunking     │
            │ chunk_size = 500      │
            │ overlap = 50          │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │  Embedding Generation │
            │ SentenceTransformers  │
            │ 384-dim vectors       │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │      ChromaDB         │
            │     Vector Store      │
            │ Persistent Storage    │
            └───────────────────────┘

Request processing flow

            ┌───────────────────────┐
            │      User Query       │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │     Query Embedding   │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │   Vector Search       │
            │      ChromaDB         │
            │       top_k = 3       │
            └───────────┬───────────┘
                        │
                        ▼
            ┌───────────────────────┐
            │   Retrieved Context   │
            └───────────┬───────────┘
                        │
                        ▼
    ┌───────────────────────────────┐
    │        Prompt Assembly        │
    │  (Context + User Question)    │
    └───────────┬───────────────────┘
                │
                ▼
        ┌──────────────────┐
        │   LLM Generation │
        │ (Grounded Answer)│
        └──────────┬───────┘
                   ▼
          ┌────────────────┐
          │    Response    │
          └────────────────┘

Conversation Memory
(last 5 exchanges)
│
└────────────► Injected into Prompt

Example of interaction

Example 1: Artificial Intelligence

Example 2: Follow-up questions (memory test)

Example 3: Quantum Computing

🚀 Quick Start

Follow these steps to install and run DocuChat-RAG on your machine.

1. Clone the repository

git clone https://github.com/Mcduval/docuchat-rag.git
cd docuchat-rag

2. Create a virtual environment

python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

3. Install the dependencies

pip install -r requirements.txt

4. Configure the API key

cp .env.example .env

Edit .env with your key (OpenAI, Groq or Google)

5. Add your documents

Place your .txt files in the data/ folder

6. Launch the wizard

python src/app.py

🧪 Test dataset and scenarios

Here are some examples of questions to ask the assistant, organized by topic:

🤖 Artificial Intelligence

Question
"What is Deep Learning?"
“Explain Natural Language Processing to me”
“What are the ethical challenges of AI?”
“What is the difference between supervised and unsupervised learning?”

🧬 Biotechnology

Question
"How does CRISPR-Cas9 work?"
“What is personalized medicine?”
"Explain synthetic biology"
“What are the ethical considerations in biotechnology?”

🌍 Climate Science

Question
“What is the greenhouse effect?”
“How does climate change affect ecosystems?”
“What are the main renewable energy solutions?”
“What is the difference between climate adaptation and mitigation?”

⚛️ Quantum Computing

Question
“What is quantum superposition?”
"What is Shor's algorithm used for?"
“What are the applications of quantum computing?”
“What are qubits and how do they differ from classical bits?”

🚀 Space Exploration

Question
“What are the goals of Mars exploration?”
“What is the International Space Station?”
“How do scientists discover exoplanets?”
“What are the terrestrial planets in our solar system?”

🌱 Sustainable Energy

Question
"How does solar energy work?"
"What is a smart grid?"
“Why is energy storage important for renewable energy?”
“What is green hydrogen and how is it produced?”

📊 Data Science

Question
“Explain the CRISP-DM methodology”
“What is feature engineering?”
“How do you evaluate a machine learning model?”
"What is MLOps?"

🔄 Follow-up questions (memory test)

Scenario	Description
CRISPR Follow-up	“Tell me about CRISPR” → “Who discovered it?” → “What are its medical applications?”
Quantum Computing Follow-up	“Explain quantum computing” → “What makes it different from classical computing?” → “What are the current hardware limitations?”

Current limitations and future improvements

🔴 Identified limits

Limit	Description
📄 Limited format support	Only `.txt` files are supported.
🖥️ No web interface	CLI interface only (no Streamlit)
👥 No multi-sessions	Only one conversation history at a time
☁️ LLM cloud only	No support for local models (Ollama, LM Studio)

🟢 Future improvements

Functionality	Status
📄 Support PDF, Markdown, DOCX	📋 Planned
🖥️ Web interface (Streamlit)	📋 Planned
👥 Multi-user sessions	📋 Planned
💻 Local LLM (Ollama, LM Studio)	📋 Planned
🔍 Hybrid search (BM25 + vector)	📋 Planned
📤 Export of conversations (PDF, TXT)	📋 Planned
Contact information

GitHub: https://github.com/Mcduval

LinkedIn: https://www.linkedin.com/in/mokemi-tchakoute-bautrel-duval

Email: bautrelduval@gmail.com

License
This project is distributed under the MIT license