RAG-Powered Chatbot for Student powered by Groq & ChromaDB

RAG-Powered Student Chatbot with Groq & ChromaDB

Abstract

This project implements a Retrieval-Augmented Generation (RAG) chatbot designed as an AI tutor for students. It leverages Groq’s LLMs for fast inference, HuggingFace embeddings for semantic understanding, and ChromaDB for vector-based document retrieval. By ingesting custom knowledge bases (e.g., textbooks, lecture notes, PDFs), the chatbot delivers context-aware, accurate, and personalized responses. The system is modular, production-ready, and optimized for educational use cases.

Methodology

System Architecture

Screenshot 2025-09-16 104030.png

The chatbot follows a standard RAG pipeline:

User Query: A student asks a question.
Embedding: The query is converted into a vector using a HuggingFace embedding model.
Retrieval: The vector is used to perform a similarity search in the ChromaDB vector store, which contains chunks of the course material.
Augmentation: The most relevant text chunks are retrieved and inserted into a pre-defined prompt as context.
Generation: The augmented prompt is sent to a powerful language model via Groq, which generates a final, context-rich answer.

![RAG Architecture Diagram]

Data Ingestion
- PDF documents are ingested via rag_ingest.py.
- Text is split into chunks using RecursiveCharacterTextSplitter.
- Chunks are embedded with HuggingFace embeddings and persisted in ChromaDB.
Chatbot Pipeline (student_chatbot.py)
- Loads system + prompt configurations (config/).
- Initializes ChatGroq LLM with controlled temperature.
- Retrieves context from ChromaDB using semantic similarity.
- Injects retrieved context into the conversation for grounded answers.
Logging & Debugging
- Uses color-coded logging (Colorama) with console + rotating file handlers.
- Captures unhandled exceptions for reliability.
Conversation Flow
- Maintains conversation history with truncation for efficiency.
- Supports custom tone, constraints, and role definition from YAML config.

Model Selection Rationale

The choice of models is critical to the performance, cost, and speed of the system.

Embedding Model: all-MiniLM-L6-v2
This sentence-transformers model was selected for its excellent balance between performance and computational efficiency. It maps sentences and paragraphs to a 384-dimensional dense vector space and is well-suited for semantic search tasks like those in this RAG system. Its small size makes it fast for both encoding documents and querying the vector database, which is ideal for a responsive chatbot.
Large Language Model: Mixtral-8x7b via Groq
The Mixtral-8x7B model is a high-quality, open-weight Sparse Mixture of Experts (SMoE) model known for its strong reasoning and instruction-following capabilities. It was chosen for its state-of-the-art open-weight performance. We access it via the Groq API primarily for its unparalleled inference speed. Groq's LPU™ inference engine delivers near-instantaneous responses, which is a fundamental requirement for creating an engaging and natural conversational experience for students, eliminating frustrating wait times.

Chunking Strategy

Effective chunking is essential for retrieving meaningful context. We use LangChain's RecursiveCharacterTextSplitter to split documents by paragraphs while respecting natural language boundaries like \n\n.

Chunk Size: 500 characters. This size was chosen to be large enough to contain a coherent idea or a full paragraph of context, but small enough to be precise and fit into the LLM's context window alongside the user's query and conversation history.
Chunk Overlap: 50characters. Overlap is a crucial strategy to prevent context fragmentation. It ensures that if a key concept is split between two chunks, it is preserved in both. This significantly improves the retrieval quality by providing the language model with complete semantic information, reducing the chance of missing relevant data that was cut off at a chunk's edge.

Results

✅ Successfully ingests PDF knowledge bases into vector storage.
✅ Provides contextual, accurate answers aligned with student learning.
✅ Supports role-driven prompts (e.g., AI Tutor, Mentor, Explainer).
✅ Logging improves debuggability and system transparency.

Maintenance & Support

This project is actively maintained. The current implementation is built with the following core dependencies:

langchain-core==0.1.x
chromadb==0.4.x
groq==0.3.x

For support, please open an issue on the project's GitHub repository. Contributions, bug reports, and feature suggestions are welcome. The MIT License allows for free use, modification, and distribution, provided the license is included.

Future Work

🚀 Multi-user Persistence: Integrate Postgres or MongoDB to store and separate conversation histories for different users.
API Deployment: Package the chatbot as a REST API using FastAPI for easy integration into other educational platforms.
Evaluation Metrics: Implement retrieval evaluation metrics (e.g., Hit Rate, Mean Reciprocal Rank) to quantitatively measure the system's performance and guide improvements.
Advanced Query Processing: Incorporate query expansion and re-writing techniques to improve retrieval quality for poorly phrased questions.

Directory Overview

week3/
│── student_chatbot.py        # Chatbot loop with Groq + Chroma
│── rag_ingest.py             # PDF ingestion + vectorization
│── helper.py                 # Utility functions
│── paths.py                  # Config paths
│── data/                     # PDF knowledge base
│── student_knowledge_base/   # Persisted ChromaDB store
│── config/                   # App + Prompt YAML configs
│── LICENSE.md (MIT)          # Open-source license
│── README.md                 # Documentation

Usage

# Step 1: Ingest PDFs into ChromaDB
python rag_ingest.py

# Step 2: Run the chatbot
python student_chatbot.py

Type exit to quit the chatbot.

License

This project is released under the MIT License, encouraging open collaboration, adaptation, and deployment in academic and research contexts.

RAG-Powered Student Chatbot with Groq & ChromaDB

Abstract

Methodology

System Architecture

Screenshot 2025-09-16 104030.png

The chatbot follows a standard RAG pipeline:

User Query: A student asks a question.
Embedding: The query is converted into a vector using a HuggingFace embedding model.
Retrieval: The vector is used to perform a similarity search in the ChromaDB vector store, which contains chunks of the course material.
Augmentation: The most relevant text chunks are retrieved and inserted into a pre-defined prompt as context.
Generation: The augmented prompt is sent to a powerful language model via Groq, which generates a final, context-rich answer.

![RAG Architecture Diagram]

Data Ingestion
- PDF documents are ingested via rag_ingest.py.
- Text is split into chunks using RecursiveCharacterTextSplitter.
- Chunks are embedded with HuggingFace embeddings and persisted in ChromaDB.
Chatbot Pipeline (student_chatbot.py)
- Loads system + prompt configurations (config/).
- Initializes ChatGroq LLM with controlled temperature.
- Retrieves context from ChromaDB using semantic similarity.
- Injects retrieved context into the conversation for grounded answers.
Logging & Debugging
- Uses color-coded logging (Colorama) with console + rotating file handlers.
- Captures unhandled exceptions for reliability.
Conversation Flow
- Maintains conversation history with truncation for efficiency.
- Supports custom tone, constraints, and role definition from YAML config.

Model Selection Rationale

The choice of models is critical to the performance, cost, and speed of the system.

Embedding Model: all-MiniLM-L6-v2
This sentence-transformers model was selected for its excellent balance between performance and computational efficiency. It maps sentences and paragraphs to a 384-dimensional dense vector space and is well-suited for semantic search tasks like those in this RAG system. Its small size makes it fast for both encoding documents and querying the vector database, which is ideal for a responsive chatbot.
Large Language Model: Mixtral-8x7b via Groq
The Mixtral-8x7B model is a high-quality, open-weight Sparse Mixture of Experts (SMoE) model known for its strong reasoning and instruction-following capabilities. It was chosen for its state-of-the-art open-weight performance. We access it via the Groq API primarily for its unparalleled inference speed. Groq's LPU™ inference engine delivers near-instantaneous responses, which is a fundamental requirement for creating an engaging and natural conversational experience for students, eliminating frustrating wait times.

Chunking Strategy

Chunk Size: 500 characters. This size was chosen to be large enough to contain a coherent idea or a full paragraph of context, but small enough to be precise and fit into the LLM's context window alongside the user's query and conversation history.
Chunk Overlap: 50characters. Overlap is a crucial strategy to prevent context fragmentation. It ensures that if a key concept is split between two chunks, it is preserved in both. This significantly improves the retrieval quality by providing the language model with complete semantic information, reducing the chance of missing relevant data that was cut off at a chunk's edge.

Results

✅ Successfully ingests PDF knowledge bases into vector storage.
✅ Provides contextual, accurate answers aligned with student learning.
✅ Supports role-driven prompts (e.g., AI Tutor, Mentor, Explainer).
✅ Logging improves debuggability and system transparency.

Maintenance & Support

This project is actively maintained. The current implementation is built with the following core dependencies:

langchain-core==0.1.x
chromadb==0.4.x
groq==0.3.x

Future Work

🚀 Multi-user Persistence: Integrate Postgres or MongoDB to store and separate conversation histories for different users.
API Deployment: Package the chatbot as a REST API using FastAPI for easy integration into other educational platforms.
Evaluation Metrics: Implement retrieval evaluation metrics (e.g., Hit Rate, Mean Reciprocal Rank) to quantitatively measure the system's performance and guide improvements.
Advanced Query Processing: Incorporate query expansion and re-writing techniques to improve retrieval quality for poorly phrased questions.

Directory Overview

week3/
│── student_chatbot.py        # Chatbot loop with Groq + Chroma
│── rag_ingest.py             # PDF ingestion + vectorization
│── helper.py                 # Utility functions
│── paths.py                  # Config paths
│── data/                     # PDF knowledge base
│── student_knowledge_base/   # Persisted ChromaDB store
│── config/                   # App + Prompt YAML configs
│── LICENSE.md (MIT)          # Open-source license
│── README.md                 # Documentation

Usage

# Step 1: Ingest PDFs into ChromaDB
python rag_ingest.py

# Step 2: Run the chatbot
python student_chatbot.py

Type exit to quit the chatbot.

License

This project is released under the MIT License, encouraging open collaboration, adaptation, and deployment in academic and research contexts.

RAG-Powered Chatbot for Student powered by Groq & ChromaDB

Table of contents

RAG-Powered Student Chatbot with Groq & ChromaDB

Abstract

Methodology

System Architecture

Model Selection Rationale

Chunking Strategy

Results

Maintenance & Support

Future Work

Directory Overview

Usage

License

Table of contents

RAG-Powered Student Chatbot with Groq & ChromaDB

Abstract

Methodology

System Architecture

Model Selection Rationale

Chunking Strategy

Results

Maintenance & Support

Future Work

Directory Overview

Usage

License

Code

Code