A lightweight system that answers questions about AI publications using Retrieval-Augmented Generation (RAG).
Ready Tensor publications contain valuable AI/ML insights, but readers often struggle to:
This project solves these challenges by building an intelligent Q&A assistant.
Ready Tensor publications contain deep technical insights about AI/ML systems, but readers often face a common dilemma: they need specific information but must sift through lengthy articles to find it. This friction reduces the accessibility of valuable knowledge and creates a barrier for quick learning.
I designed a Retrieval-Augmented Generation (RAG) assistant that bridges this gap by providing instant, context-aware answers to user queries. The system ingests Ready Tensor-style publications, processes them into searchable chunks, and uses a lightweight LLM to generate natural language responses.
The implementation begins with document ingestion. I created three sample publications covering RAG systems, agentic AI, and vector databases — all topics relevant to Ready Tensor's audience. These documents serve as the knowledge base for the assistant.
For text splitting, I experimented with various chunk sizes and overlap strategies. Ultimately, I settled on 200-character chunks with 20-character overlap. This configuration maintains context continuity while staying within the LLM's token limitations. The overlap ensures that important phrases spanning chunk boundaries aren't lost during retrieval.
The vector storage layer uses FAISS, which provides efficient similarity search without requiring external dependencies. For embeddings, I chose the all-MiniLM-L6-v2 model from Sentence Transformers due to its balance of accuracy and speed on CPU.
Testing the system with queries like "What is RAG?" consistently yields accurate responses that combine retrieved information with natural language generation. The assistant successfully demonstrates the core RAG pattern: retrieve relevant context → generate informed response.
Since this system is designed for public use, I prioritized security from the start. The implementation uses no external APIs, making it fully offline-capable and eliminating risks associated with third-party services. All processing happens locally, ensuring user queries remain private.
Building this assistant taught me that RAG systems require careful balance between chunk size, overlap, and LLM capabilities. Too much overlap wastes tokens; too little loses context. The 10% overlap strategy proved optimal for this use case.
This project is released under the MIT License, allowing others to use, modify, and distribute it freely while maintaining attribution.
Built with ❤️ by Rahul Bunker
User Query: "What is RAG?"
Assistant Response:
"RAG combines retrieval from a knowledge base with LLM generation."
git clone https://github.com/R786P/aaidc-module1.git cd aaidc-module1