
We present Tutor Assistant API, a production-ready Django REST API that combines robust user authentication, profile management, and a Retrieval-Augmented Generation (RAG) assistant. The system leverages LangChain, FAISS, and large language models (LLMs) such as Mistral 7B to deliver context-aware, textbook-grounded answers to user queries. Designed for extensibility and reproducibility, the project features modular code, Dockerized deployment, and comprehensive documentation, enabling rapid integration and experimentation for educational and research applications.
Retrieval-Augmented Generation (RAG) is a powerful paradigm for building LLM-powered assistants that can answer questions using up-to-date, domain-specific knowledge without retraining the model. Tutor Assistant API demonstrates a practical, end-to-end RAG system for educational use cases, enabling secure, authenticated access to a chat endpoint that returns LLM-generated answers grounded in ingested textbook content.
This publication details the system’s architecture, technical decisions, configuration, and deployment, providing a reproducible template for similar RAG-based solutions in education and research.
The Tutor Assistant API consists of two main components:
RecursiveCharacterTextSplitter. The chunk size is configurable via environment variable (CHUNK_SIZE), defaulting to 10,000 characters for optimal retrieval and LLM context usage.sentence-transformers/all-MiniLM-L6-v2 model and stored in a FAISS vector database..env and loaded in settings.py (CHUNK_SIZE, default: 10,000). This allows tuning for different document types and LLM context windows.LLM_API_URL, LLM_MODEL, LLM_API_KEY).The prompt used by the RAG assistant is fully customizable. By default, the system uses the template provided in prompt_template.txt. If you wish to customize the prompt, simply copy this file to prompt.txt and edit it as needed:
You can then modify prompt.txt to change the instructions, tone, or add/remove variables. The placeholders {context} and {user_input} will be dynamically replaced at runtime with the retrieved context from the vector database and the user's question, respectively. This step is optional—if prompt.txt is not present, the system will automatically fall back to using prompt_template.txt.
# settings.py CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", 10000)) # utils.py from django.conf import settings splitter = RecursiveCharacterTextSplitter(chunk_size=settings.CHUNK_SIZE, chunk_overlap=50)
POST /api/register/ — Register (returns access & refresh tokens)POST /api/login/ — Login (returns access & refresh tokens)POST /api/logout/ — Logout (blacklists refresh token)POST /api/token/refresh/ — Get new access tokenGET /api/profile/ — Get user details (JWT required)POST /api/rag/chat/ — Send prompt, get LLM answer with context/api/rag/chat/..env.docker compose up --build
docker compose exec api-rag-assistant python manage.py ingest
.env accordingly.An interactive Jupyter notebook (usage_example.ipynb) demonstrates API usage, including registration, authentication, and chat queries.
The Tutor Assistant API enables secure, authenticated access to a RAG-powered chat endpoint, returning LLM-generated answers grounded in ingested textbook content. In local and containerized tests, the system successfully handled user registration, login, token refresh, and profile retrieval, as well as context-aware question answering. The API demonstrated reliable integration with LM Studio running Mistral 7B, with latency and throughput suitable for interactive educational use.