RAG-Based Assistant for Kepler Student Handbook
Harerimana Honore
AAIDC2025
Overview
This project presents a Retrieval-Augmented Generation (RAG)–based AI assistant designed to help students easily access information from the Kepler College Student Handbook. Instead of manually searching through lengthy documents, students can ask questions in natural language and receive accurate, context-aware responses grounded directly in the handbook content.
The system combines semantic document retrieval with large language model (LLM) generation, ensuring that responses are both relevant and factually consistent with institutional policies. This project demonstrates a practical and deployable use of RAG techniques within an educational environment.
Project Objectives
The main objectives of this project are:
To provide students with an interactive AI assistant for fast and reliable access to student handbook information
To demonstrate the application of Retrieval-Augmented Generation in a real-world academic setting
To showcase how modern NLP tools can be deployed for academic support using lightweight web interfaces
System Architecture
The assistant follows a standard Retrieval-Augmented Generation (RAG) pipeline:
The Kepler Student Handbook is loaded and preprocessed.
The document is split into overlapping text chunks to preserve semantic context.
Each chunk is converted into vector embeddings and stored in a Chroma vector database.
User queries are embedded at runtime using the same embedding model.
A similarity search retrieves the top-k most relevant chunks.
Retrieved context is passed to a language model, which generates a grounded, context-aware response.
A system architecture diagram can be included here to visually illustrate the RAG workflow.
Text Chunking Strategy
To balance retrieval precision and contextual completeness, the handbook text is divided using the following strategy:
Chunk size: approximately 500 tokens
Chunk overlap: approximately 50 tokens
This approach improves semantic retrieval accuracy while preserving continuity across sections, reducing the risk of missing relevant policy information during retrieval.
Query Processing and Retrieval
When a user submits a question, the system:
Validates and normalizes the input
Converts the query into a vector embedding
Performs a similarity search against stored document embeddings
Retrieves the most relevant handbook sections
Injects the retrieved context into a structured prompt for response generation
This process ensures that all generated answers are grounded in verified handbook content.
Retrieval Evaluation
The retrieval performance of the system was evaluated using manual inspection and relevance checks, which is appropriate for an early-stage RAG application.
Evaluation criteria included:
Relevance of retrieved text chunks
Accuracy and completeness of generated answers
Absence of hallucinated or unsupported information
Sample evaluation queries such as “graduation requirements” and “attendance policy” consistently returned relevant sections and accurate responses.
Future improvements may include automated evaluation metrics such as recall@k and similarity score thresholds.
Features
Natural language question answering
Semantic retrieval of relevant handbook sections
Context-aware answer generation using an LLM
Web-based user interface built with Streamlit
Secure configuration using environment variables
Project Scope and Limitations
Scope
Designed specifically for Kepler College Student Handbook question answering
Intended for educational and academic support use cases
Limitations
Manual retrieval evaluation
Single-document knowledge base
No query reranking or feedback loop
Future iterations may expand support to multiple documents and introduce automated evaluation and reranking strategies.
Conclusion
This project demonstrates how Retrieval-Augmented Generation can be effectively applied in educational environments to improve information accessibility and reduce cognitive load for students. By grounding AI-generated responses in official policy documents, the assistant provides reliable and trustworthy support for academic inquiries.
Repository
GitHub Repository:
https://github.com/Honore777/RAG_BASED_ASSISTANT