Sep 06, 2025●5 reads●MIT License

RAG-Based Question-Answering Assistant

e
Eshaal Zehra

Simple RAG Assistant with TF-IDF and Optional FAISS

AAIDC 2025
Eshaal Zehra

Abstract

This project implements a lightweight Retrieval-Augmented Generation (RAG) assistant designed for the AAIDC Project 1: Foundations of Agentic AI. Unlike typical RAG pipelines that rely on large models, embeddings, and heavy frameworks such as LangChain or HuggingFace, this system demonstrates the core principles of RAG using only Python, scikit-learn TF-IDF, and optional FAISS for similarity search.

The assistant ingests plain-text documents, splits them into chunks, indexes them using vector representations, and answers natural language queries by retrieving relevant context and generating rule-based responses. This minimal implementation provides an educational baseline for understanding the mechanics of RAG without requiring GPUs or proprietary APIs.

Introduction

RAG systems combine retrieval (finding relevant knowledge) with generation (producing an answer). Most modern implementations use large language models (LLMs) like GPT, combined with dense embeddings and vector databases. However, these approaches can be heavy for learners or environments with limited compute.

This project addresses the need for a lightweight, dependency-minimal RAG assistant that demonstrates the concept clearly. By using TF-IDF vectors, cosine similarity, and a small rule-based knowledge base, it enables:

Fast prototyping without GPUs or external APIs.

Transparency in how documents are split, indexed, and retrieved.

Educational insight into RAG mechanics for beginners.

The assistant allows users to query a small document collection (sample files on LangChain, vector databases, and RAG itself) and receive grounded responses.

Methodology

The assistant follows a 4-step pipeline:

1. Document Ingestion

Loads .txt files from a given directory.

Wraps each file as a Document object with metadata (source filename).

2. Text Splitting

Implements a custom SimpleTextSplitter that chunks text into manageable segments (~1000 characters).

Preserves overlap between chunks for better context.

Text → Sentences → Chunks → Indexed Documents

3. Vector Indexing

Default Mode: scikit-learn TfidfVectorizer creates sparse vectors.

Optional Mode: FAISS (if available) provides efficient similarity search.

Stores both the chunked documents and their vector representations.

4. Retrieval & Response Generation

Retrieval: Given a query, the system computes cosine similarity against all document vectors and selects the top-k chunks.

Response Generation:

Rule-based answers for known topics (LangChain, FAISS, RAG, vector databases).

Otherwise, extracts sentences from retrieved context.

Provides sources, similarity scores, and timestamps for transparency.

User Interaction

The system runs entirely from the command line (CLI):

Start the assistant via python main.py.

Users can ask natural language questions.

Commands include:

quit – exit

history – show past Q&A

examples – display suggested queries

Sample example questions provided:

“What is LangChain?”

“Which vector databases are mentioned?”

“How does RAG work?”

Experiments

Setup

Dataset: 3 sample .txt files (LangChain overview, Vector databases guide, RAG systems explained).

Indexing: TF-IDF vectors (5000 max features, unigrams & bigrams).

Retrieval: cosine similarity, top-3 chunks.

Example Queries & Responses

Q: What is RAG?
A: Retrieval-Augmented Generation combines LLMs with external knowledge retrieval. Based on the documents: “RAG reduces hallucinations by grounding responses in real documents...”

Q: Which vector databases are mentioned?
A: Popular ones include FAISS, Pinecone, Weaviate, Chroma, and Qdrant.

Q: Tell me about LangChain components.
A: LangChain includes Prompts, Models, Chains, Agents, and Memory. From the documents: “LangChain supports integration with OpenAI GPT models and Hugging Face transformers...”

Results & Observations

Retrieval Quality: TF-IDF was sufficient to return semantically related chunks for technical queries.

Response Quality: Rule-based augmentation ensured concise and accurate answers for core topics.

Usability: CLI worked smoothly; history and example features helped with exploration.

Limitations:

No generative LLM → answers are extractive and rule-based.

Performance decreases for queries outside the dataset.

Does not yet support advanced reasoning (ReAct, chain-of-thought).

Conclusion

This project demonstrates the foundations of RAG without heavy dependencies. By combining TF-IDF retrieval with lightweight response rules, it provides a clear, educational showcase of:

Document ingestion, chunking, and vector indexing.

Retrieval-based question answering.

Extensible architecture for future improvements.

Future Work:

Integrate modern embeddings (sentence-transformers).

Support FAISS as the default retriever.

Add session memory for conversational context.

Extend to a web-based UI (Gradio/Streamlit).