1. Overview and Purpose
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, despite their fluency, they suffer from a fundamental limitation: their knowledge is static, bounded by a training cutoff, and prone to hallucination when asked about information outside their learned parameters. This project addresses that limitation by implementing a Retrieval-Augmented Generation (RAG) system that grounds model responses in an external, curated knowledge base.
The primary objective of this project is to transform a generic conversational AI into a domain-aware, evidence-backed research assistant capable of answering questions strictly based on provided documents. Rather than relying on parametric memory alone, the system retrieves relevant information on demand and uses it to generate accurate, transparent, and verifiable responses. This work was developed as part of the ReadyTensor Agentic AI Developer Certification – Project 1 and demonstrates a complete, production-oriented RAG pipeline aligned with best practices taught throughout the course.
2. Problem Context and Motivation
Even with perfect conversational memory, an LLM remains “frozen in time” at its training cutoff. As a result, it may confidently provide outdated, incomplete, or entirely fabricated information. This behavior is particularly problematic in technical, scientific, and research-oriented settings where accuracy, traceability, and source credibility are essential.
The motivation behind this project is to mitigate these risks by:
Enabling real-time access to external knowledge
Reducing hallucinations through relevance-based filtering
Enforcing grounded answers and explicit refusals when information is missing
By doing so, the system demonstrates how RAG acts as the foundational memory layer for agentic AI systems, enabling safer and more reliable reasoning.
3. Dataset Sources and Description
The knowledge base for this assistant consists of multiple structured .txt documents covering a range of technical domains, including:
Artificial Intelligence
Biotechnology
Quantum Computing
Sustainable Energy
Space Exploration and related scientific topics
The system currently supports plain .txt documents only. The default dataset consists of the sample files provided in the official ReadyTensor Project template repository, and users can modify the assistant’s knowledge base by adding or removing .txt files from the /data/ directory followed by re-ingestion.
These documents were intentionally selected/chosen to simulate a realistic, multi-domain research corpus. The diversity of topics allows for evaluation of retrieval precision under mixed-domain conditions and highlights the importance of relevance filtering. Each document is treated as a first-class data source and preserved with metadata (such as filename, chunk identifiers, and source references) to support traceability, citation, and transparency.
4. Dataset Processing Methodology
To prepare documents for semantic retrieval, the system applies a Recursive Character Text Splitter, which breaks long documents into smaller, overlapping chunks. Overlapping is critical to ensure that important contextual information is not lost at chunk boundaries, particularly in technical explanations that span multiple paragraphs.
Each chunk is embedded using a Sentence Transformer model (all-MiniLM-L6-v2), producing dense vector representations that capture semantic meaning rather than surface-level keywords. These embeddings are then stored in a persistent ChromaDB vector database, allowing the knowledge base to survive application restarts and enabling efficient similarity search.
This methodology directly reflects best practices emphasized in the course material, including:
Appropriate chunk sizing and overlap
Semantic embedding over keyword-based search
Persistent vector storage for production-readiness
5. System Architecture and Design Decisions
The assistant follows a classic two-phase RAG architecture:
5.1 Knowledge Ingestion (Insertion Phase)
During ingestion, documents are:
Loaded from a local data directory
Split into overlapping semantic chunks
Embedded into vector representations
Stored in ChromaDB alongside metadata
This phase builds the searchable knowledge base and is designed to be repeatable as new documents are added.
5.2 Retrieval and Generation (Inference Phase)
When a user submits a query:
The query is embedded into the same vector space
A similarity search retrieves the most relevant chunks using cosine distance
Results are filtered using a relevance threshold
Filtered context is passed to a large language model for answer generation
A key design decision is the use of relevance thresholding. Queries that do not meet the threshold are rejected, preventing off-topic questions from triggering hallucinated responses. Additionally, prompt hardening enforces strict grounding rules, instructing the model to refuse speculation and respond with “I don’t know based on the provided documents” when necessary.
6. Key Features
Semantic Retrieval: Information is retrieved based on meaning rather than keyword matching.
Relevance Thresholding: Low-confidence or off-topic queries are automatically rejected.
Prompt Hardening: System prompts override user attempts to elicit unsupported or speculative answers.
Transparency: Retrieved sources and vector distance scores are exposed to users.
Persistent Storage: ChromaDB ensures durability across application restarts.
Multi-Model Compatibility: The system supports OpenAI, Groq (Llama 3), and Google Gemini backends.
7. Evaluation Framework and Performance Metrics
Retrieval quality is evaluated using vector distance scores returned by ChromaDB. Lower distances indicate higher semantic similarity and are surfaced to the user to enable confidence assessment. The system’s behavior is evaluated across three primary scenarios:
High-relevance queries within the dataset
Ambiguous or cross-domain queries
Completely out-of-scope queries
This evaluation framework prioritizes controlled refusal over incorrect answers, reflecting real-world safety and reliability requirements.
8. Results and Interpretation
When valid, domain-aligned queries are issued, the assistant produces concise, source-cited responses derived exclusively from retrieved document chunks. When queries fall outside the dataset scope (e.g., current events or unrelated topics), the system consistently responds with a transparent refusal rather than hallucinating.
These results validate the effectiveness of semantic retrieval, relevance gating, and prompt hardening as complementary mechanisms for building trustworthy AI systems.
9. Limitations and Deployment Considerations
While effective, system performance depends on several factors:
Quality and coverage of source documents
Chunking strategy and overlap configuration
Embedding model selection
For production deployment, additional considerations would include monitoring, logging, access control, automated re-ingestion pipelines, and periodic re-embedding to maintain data freshness.
10. Significance and Future Directions
This project demonstrates how Retrieval-Augmented Generation enables grounded reasoning, improved safety, and domain specificity in agentic AI systems. It highlights RAG’s role as the architectural backbone for intelligent assistants that must reason over external knowledge.
Future enhancements include:
Support for additional document formats (PDFs, HTML)
Adaptive relevance thresholds
Comparative embedding model evaluation
Real-time ingestion and update pipelines
Enhanced evaluation metrics and benchmarking
Conclusion
By integrating semantic retrieval, relevance filtering, and grounded generation, this project successfully transforms a general-purpose LLM into a reliable, domain-aware research assistant. The system exemplifies how RAG can be applied in practice to build secure, scalable, and production-ready agentic AI systems—meeting both academic rigor and applied industry standards.