This project presents a Retrieval-Augmented Generation (RAG) AI assistant designed to answer questions using external documents. The system leverages vector embeddings and a local dataset of documents sourced from Wikipedia to provide accurate and contextually relevant answers.
Key contributions:
The code and dataset are publicly available in the GitHub repository.
This is a solo work i did by my self.Traditional language models generate text based solely on learned patterns, which can lead to outdated or incorrect responses. Retrieval-Augmented Generation (RAG) enhances the accuracy of AI responses by combining a language model with external knowledge sources.
This project implements a RAG assistant that:
The dataset used consists of Wikipedia pages related to artificial intelligence, machine learning, and natural language processing.
project structure:
rag-ai-assistant/
βββ README.md
βββ requirements.txt
βββ sensitive.env
βββ src/
β βββ app.py
β βββ vectordb.py
β βββ data/
β β βββ example_docs/
βββ examples/
βββ demo_run.ipynb
The goal of the experiments is to evaluate the RAG assistant's ability to:
src/data/example_docs/This section presents the performance and observations from the experiments conducted with the RAG assistant using the example dataset. Both quantitative and qualitative analyses are provided to assess its retrieval and generation capabilities.
We tested the RAG system on a set of 10 sample queries covering different topics present in the dataset. The results are summarized below:
| Metric | Observation |
|---|---|
| Number of queries tested | 10 |
| Fully correct answers | 8 |
| Partially correct answers | 2 |
| Retrieval success (β₯1 doc) | 100% |
| Approximate accuracy | 80% |
Accuracy calculation:
[
\text{Accuracy} = \frac{\text{Number of fully correct answers}}{\text{Total queries}} \times 100 = \frac{8}{10} \times 100 = 80%
]
Observation: The system successfully retrieved relevant documents for every query, and most answers were correct and contextually appropriate.
Here are examples of the RAG assistantβs behavior:
Successful retrieval and answer generation
Partial success
Limitation / failure
Screenshot examples:
(Insert images of terminal outputs or notebook outputs showing query, retrieved docs, and generated answers.)
Optional visualizations:
(Include charts if generated from your notebook.)
This work demonstrates the development and evaluation of a Retrieval-Augmented Generation (RAG) assistant capable of answering queries by combining document retrieval with large language model (LLM) generation.
Key takeaways:
In conclusion, this project highlights the potential of RAG systems for knowledge-intensive applications and provides a foundation for further exploration in building efficient, accurate, and context-aware AI assistants.