Local RAG Assistant — Retrieval-Augmented Generation with Chroma & Flan-T5 (Chinmaya Rout)

Abstract

This project implements a beginner-friendly, reproducible Retrieval-Augmented Generation (RAG) assistant built entirely in Jupyter Notebook.
The assistant ingests plain-text documents, splits them into overlapping chunks, computes sentence-level embeddings with sentence-transformers (all-MiniLM-L6-v2), stores vectors in a local ChromaDB collection, and answers natural language queries using a compact local LLM (google/flan-t5-small).

Key outcomes:

A fully local RAG pipeline that runs end-to-end inside a notebook (no paid APIs required).
Answers are grounded in retrieved document chunks and include explicit Sources: for traceability.
Reproducible artifacts included: rag_notebook.ipynb, data/*.txt (knowledge base), output_demo.txt (sample Q&A log), and requirements.txt.

This submission demonstrates how an entry-level developer can build a traceable, testable RAG assistant and package it for evaluation in the AAIDC Module 1 review cycle.!

Methodology

Overview & goals

Build an end-to-end RAG pipeline that: ingest → chunk → embed → store → retrieve → answer.
Keep it fully local and reproducible for a beginner environment (no paid API keys required).
Provide traceability: each answer contains a Sources line showing which document(s) were used.

Project structure

rag_proj/
├── rag_notebook.ipynb # Main notebook (ingest → embed → retrieve → generate)
├── data/ # Text documents used as knowledge base (publication*.txt)
├── output_demo.txt # Demo Q&A log (generated by the notebook)
├── README.md
├── requirements.txt
└── .gitignore

Data ingestion & chunking

Source format: plain .txt files placed in data/.
Chunking strategy: sentence-based splitting with ~300 words target per chunk and ~50 words overlap to preserve context across chunk boundaries.
Reason: balanced chunk size limits LLM input length while preserving coherent context.

Embeddings and vector store

Item 1 Embeddings: sentence-transformers model all-MiniLM-L6-v2 (fast, compact)
Item 2 Vector DB: ChromaDB (in-memory demo or persisted to disk).
Item 3 Each chunk is saved with metadata: source_file and chunk_id

Retrieval + Answer generation

Item 1 Retrieval: semantic nearest neighbors (top-k, default k=3).
Item 2 Generator: google/flan-t5-small via transformers. The prompt instructs the model to answer only using the provided context and to output a Sources: line.
Item 3 Output format: concise answer (1–4 sentences) + Sources: ,

Data ingestion & chunking

Source format: plain .txt files placed in data/.
Chunking strategy: sentence-based splitting with ~300 words target per chunk and ~50 words overlap to preserve context across chunk boundaries.
Reason: balanced chunk size limits LLM input length while preserving coherent context.

Chunking code (snippet used in notebook):

from nltk.tokenize import sent_tokenize

def chunk_text(text, max_words=300, overlap_words=50):
    sents = sent_tokenize(text)
    chunks, cur, cur_count = [], [], 0
    for sent in sents:
        w = len(sent.split())
        if cur_count + w <= max_words or not cur:
            cur.append(sent); cur_count += w
        else:
            chunks.append(" ".join(cur))
            overlap = " ".join(" ".join(cur).split()[-overlap_words:]) if overlap_words>0 else ""
            cur = [overlap] if overlap else []
            cur.append(sent); cur_count = len(" ".join(cur).split())
    if cur: chunks.append(" ".join(cur))
    return chunks

<!-- RT_DIVIDER -->
# Results
Demo Q&A (selected excerpts)

The full demo log is in rag_proj/output_demo.txt. Example outputs captured during the demo:

Q: What is this publication about?
A: This publication explains how to build a Retrieval-Augmented Generation (RAG) assistant using local embeddings and ChromaDB, covering chunking, embeddings, retrieval, and answer generation.
Sources: publication1.txt

Q: Which tools are recommended in the documents?
A: The documents list Chroma (vector DB), sentence-transformers (embeddings), and Flan-T5 (local generation).
Sources: publication2.txt

Q: What limitation is mentioned?
A: The demo notes that the dataset is small and that retrieval accuracy depends on chunking and embedding quality.
Sources: publication1.txt

Local RAG Assistant — Retrieval-Augmented Generation with Chroma & Flan-T5 (Chinmaya Rout)

Table of contents

Abstract

Methodology

Overview & goals

Project structure

Data ingestion & chunking

Data ingestion & chunking

Table of contents

Files

Code

Code

Datasets

Datasets