Tutor Assistant API: A Modular RAG System for Context-Aware Educational Chat bots

Abstract

We present Tutor Assistant API, a production-ready Django REST API that combines robust user authentication, profile management, and a Retrieval-Augmented Generation (RAG) assistant. The system leverages LangChain, FAISS, and large language models (LLMs) such as Mistral 7B to deliver context-aware, textbook-grounded answers to user queries. Designed for extensibility and reproducibility, the project features modular code, Dockerized deployment, and comprehensive documentation, enabling rapid integration and experimentation for educational and research applications.

1. Introduction & Purpose

Retrieval-Augmented Generation (RAG) is a powerful paradigm for building LLM-powered assistants that can answer questions using up-to-date, domain-specific knowledge without retraining the model. Tutor Assistant API demonstrates a practical, end-to-end RAG system for educational use cases, enabling secure, authenticated access to a chat endpoint that returns LLM-generated answers grounded in ingested textbook content.

This publication details the system’s architecture, technical decisions, configuration, and deployment, providing a reproducible template for similar RAG-based solutions in education and research.

2. Why Does It Matter? (Value & Impact)

Bridges the gap between static LLMs and dynamic, curriculum-aligned content for students and educators.
Reduces hallucination by grounding answers in real textbook data.
Enables rapid updates: New knowledge can be added by simply ingesting new PDFs—no model retraining required.
Secure and scalable: Built with Django, JWT authentication, and Docker Compose for production use.
Open and extensible: All code, configuration, and deployment scripts are provided for easy adaptation.

3. System Architecture & Methodology

3.1 High-Level Overview

The Tutor Assistant API consists of two main components:

User Management: Handles registration, login, JWT authentication, and profile endpoints.
RAG Assistant: Ingests PDF documents, indexes them using FAISS, and retrieves relevant context for each user prompt, which is then combined with the user’s question and sent to a configurable LLM endpoint.

3.2 RAG Pipeline Details

Document Ingestion: PDF files are loaded and split into overlapping text chunks using LangChain’s RecursiveCharacterTextSplitter. The chunk size is configurable via environment variable (CHUNK_SIZE), defaulting to 10,000 characters for optimal retrieval and LLM context usage.
Embedding & Indexing: Each chunk is embedded using the sentence-transformers/all-MiniLM-L6-v2 model and stored in a FAISS vector database.
Context Retrieval: For each user query, the system retrieves the most relevant chunks using vector similarity search.
LLM Integration: The retrieved context and user prompt are sent to an LLM (Mistral 7B via LM Studio or any OpenAI-compatible endpoint) using LangChain for prompt orchestration.
Response Generation: The LLM generates a grounded answer, which is returned to the user via the API.

3.3 Configuration & Extensibility

Chunk Size: Set via .env and loaded in settings.py (CHUNK_SIZE, default: 10,000). This allows tuning for different document types and LLM context windows.
LLM Endpoint: Configurable via environment variables (LLM_API_URL, LLM_MODEL, LLM_API_KEY).
Dockerized Deployment: All services (Django, PostgreSQL, Redis, Nginx) are orchestrated with Docker Compose for easy local and production deployment.

4. Technical Implementation

4.1 Repository Structure

api: User authentication, profile, JWT logic
rag_assistant: RAG chat, PDF ingest, LLM integration
core: Django project settings, URLs
notebooks: Example Jupyter notebooks for API usage

4.2 Prompt Customization

The prompt used by the RAG assistant is fully customizable. By default, the system uses the template provided in prompt_template.txt. If you wish to customize the prompt, simply copy this file to prompt.txt and edit it as needed:

You can then modify prompt.txt to change the instructions, tone, or add/remove variables. The placeholders {context} and {user_input} will be dynamically replaced at runtime with the retrieved context from the vector database and the user's question, respectively. This step is optional—if prompt.txt is not present, the system will automatically fall back to using prompt_template.txt.

4.3 Configurable Chunk Size

# settings.py
CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", 10000))

# utils.py
from django.conf import settings
splitter = RecursiveCharacterTextSplitter(chunk_size=settings.CHUNK_SIZE, chunk_overlap=50)

4.4 API Endpoints

POST /api/register/ — Register (returns access & refresh tokens)
POST /api/login/ — Login (returns access & refresh tokens)
POST /api/logout/ — Logout (blacklists refresh token)
POST /api/token/refresh/ — Get new access token
GET /api/profile/ — Get user details (JWT required)
POST /api/rag/chat/ — Send prompt, get LLM answer with context

5. Results & Example Usage

Secure, authenticated access to a RAG-powered chat endpoint.
Context-aware answers: LLM responses are grounded in ingested textbook content.
Performance: In local and containerized tests, the system handled user registration, login, token refresh, and profile retrieval, as well as context-aware question answering with low latency.
LLM Integration: Successfully tested with LM Studio running Mistral 7B and OpenAI-compatible endpoints.

Example Interaction

User registers and logs in to obtain a JWT token.
User sends a question (e.g., "What is the quadratic formula?") to /api/rag/chat/.
The system retrieves relevant textbook chunks and sends them, along with the question, to the LLM.
The LLM returns a grounded answer, e.g., "The quadratic formula is ...", citing the source textbook.

6. Deployment

Local Development

Clone the repository and copy .env.example to .env.
Place your PDFs in pdfs.
Build and start the stack:
```
docker compose up --build
```

Ingest PDFs:

docker compose exec api-rag-assistant python manage.py ingest

Start LM Studio (or your preferred LLM endpoint) and configure .env accordingly.

Notebook Demo

An interactive Jupyter notebook (usage_example.ipynb) demonstrates API usage, including registration, authentication, and chat queries.

7. Contact & Contribution

GitHub: rag-assitant-api
Issues & Discussions: Please open an issue or discussion on GitHub for questions, feature requests, or bug reports.

Results

The Tutor Assistant API enables secure, authenticated access to a RAG-powered chat endpoint, returning LLM-generated answers grounded in ingested textbook content. In local and containerized tests, the system successfully handled user registration, login, token refresh, and profile retrieval, as well as context-aware question answering. The API demonstrated reliable integration with LM Studio running Mistral 7B, with latency and throughput suitable for interactive educational use.