RAG BASED ASSISTANT FOR POLICIES

RAG-Based Assistant for Kepler Student Handbook

Harerimana Honore
AAIDC2025

Overview

This project presents a Retrieval-Augmented Generation (RAG)–based AI assistant designed to help students easily access information from the Kepler College Student Handbook. Instead of manually searching through lengthy documents, students can ask questions in natural language and receive accurate, context-aware responses grounded directly in the handbook content.

The system combines semantic document retrieval with large language model (LLM) generation, ensuring that responses are both relevant and factually consistent with institutional policies. This project demonstrates a practical and deployable use of RAG techniques within an educational environment.

Project Objectives

The main objectives of this project are:

To provide students with an interactive AI assistant for fast and reliable access to student handbook information

To demonstrate the application of Retrieval-Augmented Generation in a real-world academic setting

To showcase how modern NLP tools can be deployed for academic support using lightweight web interfaces

System Architecture

The assistant follows a standard Retrieval-Augmented Generation (RAG) pipeline:

The Kepler Student Handbook is loaded and preprocessed.

The document is split into overlapping text chunks to preserve semantic context.

Each chunk is converted into vector embeddings and stored in a Chroma vector database.

User queries are embedded at runtime using the same embedding model.

A similarity search retrieves the top-k most relevant chunks.

Retrieved context is passed to a language model, which generates a grounded, context-aware response.

A system architecture diagram can be included here to visually illustrate the RAG workflow.

Text Chunking Strategy

To balance retrieval precision and contextual completeness, the handbook text is divided using the following strategy:

Chunk size: approximately 500 tokens

Chunk overlap: approximately 50 tokens

This approach improves semantic retrieval accuracy while preserving continuity across sections, reducing the risk of missing relevant policy information during retrieval.

Query Processing and Retrieval

When a user submits a question, the system:

Validates and normalizes the input

Converts the query into a vector embedding

Performs a similarity search against stored document embeddings

Retrieves the most relevant handbook sections

Injects the retrieved context into a structured prompt for response generation

This process ensures that all generated answers are grounded in verified handbook content.

Retrieval Evaluation

The retrieval performance of the system was evaluated using manual inspection and relevance checks, which is appropriate for an early-stage RAG application.

Evaluation criteria included:

Relevance of retrieved text chunks

Accuracy and completeness of generated answers

Absence of hallucinated or unsupported information

Sample evaluation queries such as “graduation requirements” and “attendance policy” consistently returned relevant sections and accurate responses.

Future improvements may include automated evaluation metrics such as recall@k and similarity score thresholds.

Features

Natural language question answering

Semantic retrieval of relevant handbook sections

Context-aware answer generation using an LLM

Web-based user interface built with Streamlit

Secure configuration using environment variables

Project Scope and Limitations
Scope

Designed specifically for Kepler College Student Handbook question answering

Intended for educational and academic support use cases

Limitations

Manual retrieval evaluation

Single-document knowledge base

No query reranking or feedback loop

Future iterations may expand support to multiple documents and introduce automated evaluation and reranking strategies.

Conclusion

This project demonstrates how Retrieval-Augmented Generation can be effectively applied in educational environments to improve information accessibility and reduce cognitive load for students. By grounding AI-generated responses in official policy documents, the assistant provides reliable and trustworthy support for academic inquiries.

Repository

GitHub Repository:
https://github.com/Honore777/RAG_BASED_ASSISTANT

RAG BASED ASSISTANT FOR POLICIES

RAG BASED ASSISTANT FOR POLICIES

Code

Code