Project Overview
This project implements a Retrieval-Augmented Generation (RAG) assistant using the Wikipedia page on Retrieval-Augmented Generation as its knowledge source.
The goal of the project is to demonstrate how integrating external knowledge retrieval with large language models improves factual accuracy and reduces hallucinations.
The assistant answers user queries by retrieving relevant information from the Wikipedia content and using it as grounded context during response generation.
Problem Statement
Large Language Models (LLMs) rely on static training data and can generate inaccurate or hallucinated responses when answering knowledge-intensive questions.
This limitation becomes more evident when users expect precise, explainable, and up-to-date information.
Retrieval-Augmented Generation addresses this challenge by retrieving relevant external documents at inference time and augmenting the model’s input with grounded context. This project explores that approach in a focused and explainable manner.
System Architecture
The system follows a standard RAG pipeline:
The Wikipedia content on Retrieval-Augmented Generation is loaded and preprocessed
The text is split into semantically meaningful chunks
Each chunk is converted into vector embeddings
Embeddings are stored in a vector database (ChromaDB)
At query time, relevant chunks are retrieved using semantic search
A language model generates an answer using the retrieved context
This design ensures that responses are grounded in authoritative source material rather than relying solely on the model’s internal knowledge.
Implementation Details
Chunking Strategy: Recursive text splitting with overlap to preserve context
Embedding Model: Sentence-Transformers (all-MiniLM-L6-v2)
Vector Database: ChromaDB (local persistence)
LLM: GPT-based chat model with temperature set to zero for factual consistency
Prompt Design: The prompt explicitly instructs the model to answer only using the retrieved context, reducing hallucination risk
All components are modularized to keep the system easy to understand and extend.
Example Queries and Outputs
Query:
What is retrieval-augmented generation?
Answer:
Retrieval-augmented generation enhances language models by retrieving relevant external documents and incorporating them into the prompt before generating a response, improving factual accuracy and grounding.
Query:
Why is RAG useful for large language models?
Answer:
RAG helps mitigate hallucinations in large language models by supplying relevant external context at inference time, allowing the model to base its responses on factual source material.
These examples demonstrate that the assistant retrieves relevant sections from the Wikipedia content and generates grounded answers.
Limitations
The assistant is currently limited to a single Wikipedia page as its knowledge base
There is no explicit source citation shown in the final response
Retrieval quality depends on chunking strategy and embedding effectiveness
These limitations were intentionally accepted to keep the project focused on core RAG fundamentals.
Future Improvements
Ingesting multiple documents or Wikipedia pages
Adding source attribution for retrieved chunks
Implementing hybrid retrieval (keyword + vector search)
Exposing the assistant via a REST API or simple UI
Introducing evaluation metrics for retrieval quality
Code Link:
https://github.com/jhahimanshu1996-sketch/wikipedia-rag-assistant/tree/main