RAG assistant to answer questions based on provided documents

Abstract

This project implements a Retrieval-Augmented Generation (RAG) chatbot that loads .txt files as its knowledge base. This RAG system can help upload documents to vector db and split documents into smaller and searchable chunks. Using a LLM model integrated via Google Gemini API, the RAG assistant can deliver accurate and document-based answers.

Introduction

Retrieval-Augmented Generation (RAG) is an AI framework that combines large language models (LLMs) with external data sources to provide more accurate, up-to-date, and contextually relevant answers. RAG works by first retrieving relevant information from a knowledge base and then using that information to augment the prompt given to the LLM, which then generates a response based on both its training data and the retrieved context.
Screenshot 2025-12-18 at 9.47.55 AM.png
In this RAG system, it can:

Retrieve documents from a knowledge base
Define effective prompts for the LLM
Process query to improve its quality for retrieval
Define the project scope and validate what the RAG assistant can handle
Generate response using retrieved context
Evaluate retrieved information

Methodology

Loading documents: Preparing documents (like .txt files) and adding documents to /data directory. Loading the files into vector db.
Chunking text: Using LangChain's RecursiveCharacterTextSplitter method to chunk texts.
Embedding chunks: Using Sentence-transformer (all-MiniLM-L6-v2) to create embeddings for all chunks and then store in the vector db.
Processing query: Processing user's raw input to be analyzed, refined, and converted into a format suitable for retrieving relevant information.
Implementing search function: Based on users' queries, search function will fetch the most relevant chunks from vector db.
Defining RAG prompt: Creating a RAG prompt that effectively combines retrieved context with user questions.
Generating response: Response is generated based on LLM and retrieved context and returned in a structured format defined by RAG prompt.
Evaluating retrieval features: Measuring how well the system finds relevant information and using it to create good answers, including Precision@K, Recall@K, Mean Reciprocal Rank, NDCG@K.

Experiments

Building a RAG assistant in this case follow below implementation steps:
Screenshot 2025-12-18 at 10.02.09 AM.png

How to test the full system

1. Installation

Clone the repo
Install dependencies by running:

  pip install -r requirements.txt

Set up API key: Update GOOGLE_API_KEY in .env file (or OpenAI/Groq API key)

2. Add relevant documents if necessary

Add .txt files into data folder
e.g.

3. Run command

python src/app.py

Core Files and Functions

1. vectordb.py

This file contains VectorDB class which is a simple vector database wrapper using ChromaDB with sentence transformers.

Key methods

Initialization

To initialize the vector database:

# For example:
def __init__(self, collection_name: str = None, embedding_model: str = None):
        """
        Initialize the vector database.

        Args:
            collection_name: Name of the ChromaDB collection
            embedding_model: HuggingFace model name for embeddings
        """

Chunking text

To split the input text into searchable chunks:

# For example:
def chunk_text(self, text: str, chunk_size: int = 500) -> List[str]:
        """
        Simple text chunking by splitting on spaces and grouping into chunks.

        Args:
            text: Input text to chunk
            chunk_size: Approximate number of characters per chunk

        Returns:
            List of text chunks
        """

Adding documents

To store chunks in vector database:

# For example:
def add_documents(self, documents: List) -> None:
        """
        Add documents to the vector database.

        Args:
            documents: List of documents
        """

Search

To search relevant document in vector database:

# For example:
def search(self, query: str, n_results: int = 5) -> Dict[str, Any]:
        """
        Search for similar documents in the vector database.

        Args:
            query: Search query
            n_results: Number of results to return

        Returns:
            Dictionary containing search results with keys: 'documents', 'metadatas', 'distances', 'ids'
        """

2. app.py

This file contains RAGAssistant class which supports different LLM providers, prompt template, query processing, vector db search, retrieval evaluation.

Key methods

Load documents

To load prepared documents from src/data directory:

# For example:
def load_documents() -> List[str]:
    """
    Load documents for demonstration.

    Returns:
        List of sample documents
    """

Initialize LLM

To initialize LLMs including OpenAI, Groq, Google Gemini APIs:

# For example:
def _initialize_llm(self):
        """
        Initialize the LLM by checking for available API keys.
        Tries OpenAI, Groq, and Google Gemini in that order.
        """
        # Check for OpenAI API key
        if os.getenv("OPENAI_API_KEY"):
            model_name = os.getenv("OPENAI_MODEL", "gpt-4o-mini")
            ......
            )

        elif os.getenv("GROQ_API_KEY"):
            model_name = os.getenv("GROQ_MODEL", "llama-3.3-70b-versatile")
            ......
            )

        elif os.getenv("GOOGLE_API_KEY"):
            model_name = os.getenv("GOOGLE_MODEL", "gemini-2.5-flash")
            ......
            )

        else:
            raise ValueError(
                "No valid API key found. Please set one of: OPENAI_API_KEY, GROQ_API_KEY, or GOOGLE_API_KEY in your .env file"
            )

Generate prompt template

To define RAG prompt template, including placeholders for context and user's question:

#  For example:
self.prompt_template = ChatPromptTemplate(
            [
                ("system", 
                """
                Role: xxxx
                Instructions: xxxx
                Style or tone guidelines:
                - xxxx
                Output constraints:
                - xxxx
                """),
                ("user", "Context: {context}\n\nQuestion: {question}"),
            ]
        )

Query pipeline

To generate the RAG query-response pipeline:

# For example:
def invoke(self, input: str, n_results: int = 3) -> str:
        """
        Query the RAG assistant with query processing and scope validation.

        Args:
            input: User's input
            n_results: Number of relevant chunks to retrieve

        Returns:
            Dictionary containing the answer and retrieved context
        """

Additional methods

Evaluation retrieval

To evaluate retrieval quality for a query:

# For example:
def evaluate_query(self, query: str, relevant_ids: List[str] = None, 
                      n_results: int = 5) -> Dict[str, Any]:
        """
        Evaluate retrieval quality for a query.
        If relevant_ids provided, calculates precision/recall/MRR.
        If not provided, shows retrieval quality based on similarity.
        
        Args:
            query: User query
            relevant_ids: Optional list of relevant chunk IDs (ground truth)
            n_results: Number of results to retrieve
            
        Returns:
            Dictionary with evaluation information
        """

3. project_scrope.py

This file contains ProjectScope class which manages project scope for document domains, identify which domain a query belong to, and validates the scope.

# For example:

class ProjectScope:
    """Manages project scope for document domains."""

    def __init__(self):
        """Initialize with project scope configuration."""
        # Domain keywords mapping
        self.domains = {
            "rag": ["rag", "retrieval augmented", "retrieval", "augmented generation", "llm", "large language model"],
            xxxx
        }
        
        # Project metadata
        self.project_name = "Building a RAG Assistant project"
        self.description = "Building a RAG Assistant project"

    def identify_domain(self, query: str) -> Optional[str]:
        """
        Identify which domain a query belongs to.

        Args:
            query: The query to classify

        Returns:
            The domain name
        """
        xxxx
        return xxxx

    def validate_scope(self, query: str) -> Dict[str, Any]:
        """
        Validate if a query is within project scope.

        Args:
            query: The query to validate

        Returns:
            Dictionary with validation results
        """
        # Identify domain
        domain = self.identify_domain(query)
        
        return {
            xxxx
        }

    def get_config(self) -> Dict[str, Any]:
        """
        Get the full configuration.

        Returns:
            The configuration
        """
        return {
           xxxx
        }

4. query_processor.py

This file contains QueryProcessor class which improves retrieval quality, including normalizing and cleaning the query, classifying the query, and processing a query through the pipeline.

For example:
class QueryProcessor:
    """Simple query processor for improving retrieval quality."""

    def preprocess(self, query: str) -> str:
        """
        Normalize and clean the query.

        Args:
            query: The query to process

        Returns:
            The processed query
        """
        # Remove extra whitespace
        query = re.sub(r'\s+', ' ', query.strip())
        
        xxxx
        
        return query

    def classify(self, query: str) -> str:
        """
        Classify query type based on intent.

        Args:
            query: The query to classify

        Returns:
            The query type
        """
        xxxx
        return xxxx

    def process(self, query: str) -> Dict[str, Any]:
        """
        Process a query through the pipeline.

        Args:
            query: The query to process

        Returns:
            Dictionary with processed query and metadata
        """
        xxxx
        
        return {
            xxxx
        }

5. retrieval_evaluator.py

This file contains RetrievalEvaluator class which measures retrieval quality, including 4 methods of calculating Precision@K, Recall@K, Mean Reciprocal Rank, and NDCG@K

# For example:
class RetrievalEvaluator:
    """Simple evaluator for measuring retrieval quality."""

    def precision_at_k(self, retrieved_ids: List[str], relevant_ids: List[str], k: int) -> float:
        """
        Calculate Precision@K: fraction of top K results that are relevant.

        Args:
            retrieved_ids: List of retrieved chunk/document IDs
            relevant_ids: List of relevant IDs (ground truth)
            k: The number of results to consider

        Returns:
            Precision@K: fraction of top K results that are relevant
        """

        xxxx
        return xxxx

    def recall_at_k(self, retrieved_ids: List[str], relevant_ids: List[str], k: int) -> float:
        """
        Calculate Recall@K: fraction of relevant items found in top K.

        Args:
            retrieved_ids: List of retrieved chunk/document IDs
            relevant_ids: List of relevant IDs (ground truth)
            k: The number of results to consider

        Returns:
            Recall@K: fraction of relevant items found in top K
        """

        xxxx
        return xxxx

    def mrr(self, retrieved_ids: List[str], relevant_ids: List[str]) -> float:
        """
        Calculate Mean Reciprocal Rank: 1/rank of first relevant result.

        Args:
            retrieved_ids: List of retrieved chunk/document IDs
            relevant_ids: List of relevant IDs (ground truth)

        Returns:
            Mean Reciprocal Rank: 1/rank of first relevant result (0.0 if no matches)
        """
        
        xxxx
        return xxxx

    def ndcg_at_k(self, retrieved_ids: List[str], relevant_ids: List[str], k: int) -> float:
        """
        Calculate NDCG@K: normalized discounted cumulative gain.

        Args:
            retrieved_ids: List of retrieved chunk/document IDs
            relevant_ids: List of relevant IDs (ground truth)
            k: The number of results to consider

        Returns:
            NDCG@K: normalized discounted cumulative gain
        """

        xxxx
        return xxxx

    def evaluate(self, retrieved_ids: List[str], relevant_ids: List[str], 
                k_values: List[int] = [1, 3, 5]) -> Dict[str, Any]:
        """
        Evaluate retrieval with multiple metrics.
        
        Args:
            retrieved_ids: List of retrieved chunk/document IDs
            relevant_ids: List of relevant IDs (ground truth)
            k_values: List of k values to evaluate at
            
        Returns:
            Dictionary with evaluation metrics
        """
        results = {
            xxxx
        }
        xxxx
        return results

Results

The RAG assistant only answers questions by using the provided context.
After running command

python src/app.py

the output will be

#For example:
Initializing RAG Assistant...
Using Groq model: llama-3.3-70b-versatile
Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Vector database initialized with collection: rag_documents
RAG Assistant initialized successfully

Loading documents...
Loaded Contextual_retrieval.txt successfully
Loaded RAG.txt successfully
Loaded Autopilot_guide.txt successfully
Loaded Semantic_chunking.txt successfully
Loaded Agentic_AI.txt successfully
Loaded Key_features.txt successfully
Loaded Document_AI.txt successfully
Loaded 7 documents successfully
Loaded 7 sample documents
Processing 7 documents...
Document 0: Split into 3 chunks.
Document 1: Split into 2 chunks.
Document 2: Split into 6 chunks.
Document 3: Split into 6 chunks.
Document 4: Split into 3 chunks.
Document 5: Split into 9 chunks.
Document 6: Split into 4 chunks.
Documents added to vector database
Added sample documents

============================================================
Project: Building a RAG Assistant project
Description: Building a RAG Assistant project for the AAIDC course module 1

Supported Domains:
  - Rag
  - Document Ai
  - Agentic Ai
  - Chunking
  - Contextual Retrieval
  - Autopilot
============================================================

============================================================
Welcome to the RAG Assistant


Enter a question, or 'quit' to exit:

If you enter a question, for example, "what is RAG?", you'll be asked if you need to enable retrieval quality evaluation? (y/n).
If you input "n", which means you don't enable retrieval quality evaluation, you'll get answer which is retrieved based on the provided documents.

# For example:
Enter a question, or 'quit' to exit: what is RAG


Do you need to enable retrieval quality evaluation? (y/n)
n
Retrieval quality evaluation disabled.

Getting answer...
Searching for top 3 results for query: what is RAG?
Generating query embedding...
Querying collection...


RAG stands for Retrieval-Augmented Generation. It is the process of optimizing the output of a large language model (LLM) by referencing an authoritative knowledge base outside of its training data sources before generating a response. This approach extends the capabilities of LLMs to specific domains or an organization's internal knowledge base, without the need to retrain the model, making it a cost-effective way to improve LLM output and keep it relevant, accurate, and useful in various contexts. 

Key points about RAG include:
* It optimizes LLM output by referencing an external knowledge base
* It extends LLM capabilities to specific domains or internal knowledge bases
* It does not require retraining the model
* It is a cost-effective approach to improving LLM output.

============================================================
Welcome to the RAG Assistant


Enter a question, or 'quit' to exit:

If you input "y" to enable retrieval quality evaluation, you'll get retrieval quality results:

# For example:
Enter a question, or 'quit' to exit: what is RAG


Do you need to enable retrieval quality evaluation? (y/n)
y
Retrieval quality evaluation enabled.

Getting answer...
Searching for top 3 results for query: what is RAG?
Generating query embedding...
Querying collection...


Searching for top 5 results for query: what is RAG?
Generating query embedding...
Querying collection...



[Retrieval Quality - Basic]
  Average similarity: 43.2%
  Quality status: poor
  ⚠️  Warning: Low similarity - answer may not be accurate
  ⚠️  Your query might be out of scope for the documents!
Searching for top 5 results for query: what is RAG?
Generating query embedding...
Querying collection...



============================================================
Retrieved Chunks (review to identify which are actually relevant):
============================================================

1. ID: doc_1_chunk_1 (similarity: 63.4%)
   Content: . RAG extends the already powerful capabilities of LLMs to specific domains or an organization's int...

2. ID: doc_3_chunk_3 (similarity: 54.9%)
   Content: Fixed-sized chunking. Most chunking strategies used in RAG today are based on fix-sized text segment...

3. ID: doc_1_chunk_0 (similarity: 41.5%)
   Content: Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language mod...

4. ID: doc_3_chunk_2 (similarity: 38.7%)
   Content: Text data chunking strategies play a key role in optimizing the RAG response and performance. Fixed-...

5. ID: doc_0_chunk_0 (similarity: 17.5%)
   Content: Contextual retrieval is a technique introduced by Anthropic in September 2024. Their article covers ...
============================================================

Do you want to provide ground truth (relevant_ids) for full evaluation? (y/n)

You'll be asked a further question - if you want to provide ground truth for full evaluation, if you input "n", then you'll get answers which is retrieved based on the provided documents; if you input "y", then you'll have 3 options to choose, for example:

# For example:
Do you want to provide ground truth (relevant_ids) for full evaluation? (y/n)
y

Options:
  a) Enter chunk IDs manually (comma-separated)
     Example: doc_0_chunk_0, doc_0_chunk_1
  b) Type 'use_retrieved' to mark all retrieved as relevant
     (Only use if you're sure they all answer your question!)
  c) Type 'none' if query is out of scope (no chunks are relevant)

Your choice: doc_1_chunk_1
Using 1 chunk(s) as relevant.


Searching for top 5 results for query: what is RAG?
Generating query embedding...
Querying collection...



[Evaluation Debug]
Retrieved IDs: ['doc_1_chunk_1', 'doc_3_chunk_3', 'doc_1_chunk_0', 'doc_3_chunk_2', 'doc_0_chunk_0']...
Relevant IDs: ['doc_1_chunk_1']
Matches found: 1

============================================================
Full Evaluation Results (with ground truth):
============================================================
Retrieved: 5 | Relevant: 1

Metrics:
  Precision@1: 1.000
  Precision@3: 0.333
  Precision@5: 0.200
  Recall@5: 1.000
  MRR: 1.000
  NDCG@5: 1.000
============================================================


RAG stands for Retrieval-Augmented Generation. It is the process of optimizing the output of a large language model (LLM) by referencing an authoritative knowledge base outside of its training data sources before generating a response. This approach extends the capabilities of LLMs to specific domains or an organization's internal knowledge base, without the need to retrain the model, making it a cost-effective way to improve LLM output and keep it relevant, accurate, and useful in various contexts. 

Key points about RAG include:
* It optimizes LLM output by referencing an external knowledge base
* It extends LLM capabilities to specific domains or internal knowledge bases
* It does not require retraining the model
* It is a cost-effective approach to improving LLM output.

============================================================
Welcome to the RAG Assistant


Enter a question, or 'quit' to exit:

When you ask a question which is out of provided documents, you'll see "I don't know" from the output according to the output constraints of the prompt.

# For example:
Enter a question, or 'quit' to exit: compare google and firefox


Do you need to enable retrieval quality evaluation? (y/n)
n
Retrieval quality evaluation disabled.

Getting answer...
Searching for top 3 results for query: compare google and firefox
Generating query embedding...
Querying collection...


I don't know. The provided context does not contain information about Google and Firefox, or any comparison between the two. It discusses Anthropic's techniques for contextual retrieval, combining semantic search and keyword search, and adding context to document chunks, but does not mention Google or Firefox.

Conclusion

The RAG assistant can successfully answer users' queries using a knowledge base provided in vector db. It can retrieve the most relevant chunks and generate accurate answers based on the prompt.

RAG assistant to answer questions based on provided documents

Table of contents

Abstract

Introduction

Methodology

Experiments

How to test the full system

1. Installation

2. Add relevant documents if necessary

3. Run command

Core Files and Functions

1. vectordb.py

Key methods

Initialization

Chunking text

Adding documents

Search

2. app.py

Key methods

Load documents

Initialize LLM

Generate prompt template

Query pipeline

Additional methods

Evaluation retrieval

3. project_scrope.py

4. query_processor.py

5. retrieval_evaluator.py

Results

Conclusion

Table of contents

Code

Code