Wikipedia RAG article based RAG assistant

Project Overview

This project implements a Retrieval-Augmented Generation (RAG) assistant using the Wikipedia page on Retrieval-Augmented Generation as its knowledge source.
The goal of the project is to demonstrate how integrating external knowledge retrieval with large language models improves factual accuracy and reduces hallucinations.

The assistant answers user queries by retrieving relevant information from the Wikipedia content and using it as grounded context during response generation.

Problem Statement

Large Language Models (LLMs) rely on static training data and can generate inaccurate or hallucinated responses when answering knowledge-intensive questions.
This limitation becomes more evident when users expect precise, explainable, and up-to-date information.

Retrieval-Augmented Generation addresses this challenge by retrieving relevant external documents at inference time and augmenting the model’s input with grounded context. This project explores that approach in a focused and explainable manner.

System Architecture

The system follows a standard RAG pipeline:

The Wikipedia content on Retrieval-Augmented Generation is loaded and preprocessed

The text is split into semantically meaningful chunks

Each chunk is converted into vector embeddings

Embeddings are stored in a vector database (ChromaDB)

At query time, relevant chunks are retrieved using semantic search

A language model generates an answer using the retrieved context

This design ensures that responses are grounded in authoritative source material rather than relying solely on the model’s internal knowledge.

Implementation Details

Chunking Strategy: Recursive text splitting with overlap to preserve context

Embedding Model: Sentence-Transformers (all-MiniLM-L6-v2)

Vector Database: ChromaDB (local persistence)

LLM: GPT-based chat model with temperature set to zero for factual consistency

Prompt Design: The prompt explicitly instructs the model to answer only using the retrieved context, reducing hallucination risk

All components are modularized to keep the system easy to understand and extend.

Example Queries and Outputs

Query:
What is retrieval-augmented generation?

Answer:
Retrieval-augmented generation enhances language models by retrieving relevant external documents and incorporating them into the prompt before generating a response, improving factual accuracy and grounding.

Query:
Why is RAG useful for large language models?

Answer:
RAG helps mitigate hallucinations in large language models by supplying relevant external context at inference time, allowing the model to base its responses on factual source material.

These examples demonstrate that the assistant retrieves relevant sections from the Wikipedia content and generates grounded answers.

Limitations

The assistant is currently limited to a single Wikipedia page as its knowledge base

There is no explicit source citation shown in the final response

Retrieval quality depends on chunking strategy and embedding effectiveness

These limitations were intentionally accepted to keep the project focused on core RAG fundamentals.

Future Improvements

Ingesting multiple documents or Wikipedia pages

Adding source attribution for retrieved chunks

Implementing hybrid retrieval (keyword + vector search)

Exposing the assistant via a REST API or simple UI

Introducing evaluation metrics for retrieval quality

Code Link:
https://github.com/jhahimanshu1996-sketch/wikipedia-rag-assistant/tree/main

Wikipedia RAG article based RAG assistant