RAG Wiki Assistant - Retrieval-Augmented Prompting for Effective Wikipedia Q&A

Screenshot 2025-10-16 184950.png

Abstract

RAG Wiki Assistant is a lightweight RAG-based Q&A app that answers user queries by restricting the responses in Wikipedia-sourced documents. The app demonstrates how reliable, source-bound answers can be produced without hallucinations by combining retrieval and generation proper prompting. Also, by grounding answers to specific AI/ML and LLM-related Wikipedia articles, the system offers an open, transparent and reproducible example of building trustworthy domain-focused assistants for education and research. This project combines a lightweight preprocessing pipeline that turns .txt files into dense vector embeddings, a ChromaDB vector store for retrieval, HuggingFace embeddings, LangChain orchestration for chunking and prompt templates, and GROQ as the generative model for answer composition. The repository includes clear instructions, YAML configuration, ingestion scripts, and a Streamlit interface so others can replicate or extend the system with their own topics or datasets.

Introduction

Modern question-answering systems increasingly rely on retrieval-augmented generation (RAG) to combine the factual grounding of a document collection with the expressive ability of a generative model. Wikipedia is an attractive knowledge source: it is broad, well-structured, and frequently updated. In this project, I built a wikipedia assistant that strictly limits the model's output to evidence found inside the supplied Wikipedia documents. The key goals were:

Restriction: Ensure answers cite/support the collected Wikipedia passages.
Reproducibility: Provide a documented pipeline for any basic RAG applications.
Simplicity & Efficiency: Use small, fast embedding models and ChromaDB for local persistent storage so the system is inexpensive to run.

Methodology

In this section, I'll first outline the complete pipeline — from data extraction to final answer generation — and then map each stage to its corresponding directory or file within the repository.

1. Data Extraction

Wikipedia articles were collected using Python’s wikipediaapi library.
Each article is saved as a .txt file inside a dedicated data/ folder at the project’s root.
The dataset currently contains 25 articles on key AI/ML and LLM topics — including Machine Learning, Natural Language Processing, Large Language Models, and related concepts (see the data folder for the full list).

2. Vector Database Initialization

A ChromaDB instance is created with a dedicated collection (wiki_pages) and a persistent directory (./chroma_db). The collection name and persistent directory are already mentioned as parameter for the initializing function found inside the vectordb_and_ingestion module.

The ./chroma_db directory is ignored using .gitignore so it won't be found in the repo.

3. Data Ingestion

Chunking: Each .txt file is split into overlapping text chunks using LangChain’s RecursiveCharacterTextSplitter (using 1,000 characters per chunk, 200-character overlap).
Embedding: Since a Vector Database is used to store values that are in a form of vectors(array of numbers), each chunk generated is transformed into a dense vector using Hugging Face’s sentence-transformer model all-MiniLM-L6-v2 via LangChain’s HuggingFaceEmbeddings wrapper.
Insertion to DB: Chunks, metadata, IDs, and embeddings are batch-inserted into the ChromaDB collection we initialized, creating a persistent, searchable vector store.
All the functions used for this process can be found inside the vectordb_and_ingestion module

4. Retrieval Using Similarity Search

For each user query, the system embeds the query with the same embedding model and performs a similarity search (top-n nearest neighbors) in ChromaDB.
Each top_n relevant documents will be loaded to cross-check with the final answer
The function used to do this can be found inside the retrieval_and_response module.

5. Response (Answer)

Retrieved chunks are concatenated and passed to the GROQ LLM through a strict prompt template instructing the model to answer only using the provided documents.
The function used to do this can also be found in the retrieval_and_response module(see the link above).

6. Serving the Answer to the app

A Streamlit frontend accepts user queries and displays the LLM’s response. The app has 3 pages.

Page 1 - Ask & Answer - User queries and LLM response
Page 2 - Relevant Documents - Showing n_results relevant documents in ascending order of cosine distances.
Page 3 - RAG Prompt - System prompt of the RAG

⚠️ Warning: this is done for experimental and demonstration purposes only. Exposing the system prompt in real production environments could raise a security and integrity risk

The script for the streamlit application can be found inside the app folder

Complete Project Structure

rag-wiki-assistant/
    ├─ app/
    │   └─ app.py # Main Streamlit application
    ├─ code/
    │    └─ config/
    │        ├─ config.yaml # App-level settings
    │        └─ prompt_config.yaml # RAG prompts
    │    ├─ data_extraction.py # Extracts the relevant wikipedia articles using wikipediaapi in .txt format
    │    ├─ loader.py # Loads YAML configuration files
    │    ├─ logger.py # Minimal logging setup
    │    ├─ prompt.py # Prompt builder 
    │    ├─ retrieval_and_response.py # Handles retrieval & LLM response
    │    ├─ vectordb_and_ingestion.py # Initializes the VectorDB and Feeds the files to ChromaDB 
    ├─ data/ # Holds 25 .txt files
    ├─ images/ # Screenshots of app results
    ├─ requirements.txt # Python dependencies
    ├─ .gitignore 
    ├─ LICENSE # MIT License
    └─ README.md

Demonstration

Let's test the performance and the question/answer quality of this RAG.
First, Let's ask it a related question to the topic.
The query is about information concerning Transfer Learning. We know that the knowledge base is in fact trained with data concerning Transfer Learning so it should generate an answer (check data folder).
The answer generated by the prompted LLM can be seen as follows:
As you can see the generated answer is concise and written with bullet points as instructed by the prompt configuration.
Let's now move to the second page and see the retrieved top_n documents.
Since we set the n_results as 5, the relevant documents pulled from the vector database are also 5 (I couldn't fit all 5 documents in the picture).
Let's move to page 3 and check the final RAG prompt
Like I said before, this is shown for demonstration purposes only.
Let's see what happens when we ask the RAG a question that is not related to the documents it used.
The RAG successfully returns a constant message that says "I'm sorry, but this question cannot be answered with the provided documents."
This shows that our prompt engineering worked.

This Demo can also be found in the repo's README

Results

Functional Outcome: The RAG Wiki Assistant successfully retrieves and generates answers restricted to the AI/ML and LLM-related Wikipedia articles. All user queries tested during the demonstration returned responses grounded in these documents only, with no observed hallucinations.
Performance: The system achieves fast retrieval and answer generation, owing to the use of ChromaDB for vector storage and efficient Hugging Face embeddings. Similarity search consistently returned the most relevant chunks, demonstrating the robustness of the chosen embedding model and chunking strategy.
User Experience: Through the Streamlit interface, the assistant displays retrieved content and generated answers clearly, allowing users to verify the source context.
Reproducibility: The pipeline, including data extraction, ingestion, and retrieval steps, has been documented and can be replicated by following the provided repository structure and YAML configuration files. Users can replace the source dataset with other topics to create their own domain-focused assistants.

Conclusion

This project demonstrates that a lightweight Retrieval-Augmented Generation (RAG) pipeline can be implemented with open-source tools to produce trustworthy, source-grounded answers. By combining Wikipedia content with LangChain-based chunking, Hugging Face embeddings, ChromaDB storage, and a GROQ-powered LLM, the assistant illustrates a reproducible template for domain-specific question-answering systems. Also, this repository can be implemented for almost any basic RAG apps by just changing the files in the documents, updating the system prompt and a few tweaks to the models.

Maintenance and Support

The Wiki Assistant is currently maintained as an open-source research prototype. Future work will focus on expanding support for new data sources and improving retrieval performance. Community contributions and issue reporting are actively encouraged via the GitHub repository.

Check out the repository for more
Instructions on how to run this app can be found here

Contact me: leoulteferi1996@gmail.com