This publication presents a single-agent Retrieval-Augmented Generation (RAG) AI assistant that answers user queries strictly based on a fixed set of curated text documents. Built using LangChain and Groq’s large language model, the system enforces retrieval-only answering, similarity-based gating, and polite refusal of out-of-scope queries to prevent hallucinations. The project demonstrates foundational principles of Agentic AI, controlled reasoning, and safe AI system design, and serves as a practical reference implementation for document-grounded AI assistants.
Purpose:
The purpose of this project is to design and implement a safe, agentic RAG-based AI assistant that can answer questions only from provided documents while reliably refusing unsupported queries.
Objectives:
Demonstrate a single-agent Agentic RAG architecture
Implement strict anti-hallucination safeguards
Ensure answers are fully grounded in retrieved documents
Provide a usable interactive UI for demonstration
Maintain a clean, reproducible, open-source implementation
Large Language Models often generate confident but incorrect answers when asked questions outside their knowledge scope. This behavior—commonly known as hallucination—poses risks in educational, technical, and decision-support settings.
While Retrieval-Augmented Generation (RAG) mitigates this issue by grounding responses in external documents, many implementations still allow:
Partial guessing when retrieval fails
Silent fallback to model knowledge
Unclear boundaries of applicability
This project addresses these gaps by designing a strictly controlled Agentic RAG assistant that:
Treats retrieved documents as the only source of truth
Refuses to answer when relevant context is unavailable
Clearly communicates system limitations to the user
Target Audience:
AI/ML learners studying RAG and Agentic AI
Developers building document-based chatbots
Educators demonstrating safe AI system design
Evaluators and reviewers assessing RAG correctness
Use Cases:
Querying curated knowledge bases
Educational demonstrations of RAG behavior
Prototyping safe enterprise or academic assistants
Understanding hallucination mitigation techniques
Data Source
The assistant uses static, author-provided text documents, each covering a distinct domain:
Artificial Intelligence
Biotechnology
Climate Science
Quantum Computing
Space Exploration
Sustainable Energy
These documents are stored as .txt files and manually curated to ensure clarity and relevance.
In scope: Questions answerable directly from the documents
Out of scope: Any query requiring external knowledge, current events, or personal opinions
The dataset is intentionally small and controlled to prioritize verifiability and safety over breadth.
User submits a query
Documents are embedded and stored in a vector database
A retriever selects the most relevant document chunks
A single agent evaluates retrieved context
The LLM generates a response only from retrieved text
If context is insufficient, the system refuses to answer
Single Agent: Appropriate for foundational Agentic AI concepts
No External Tools: Eliminates knowledge leakage
Similarity Thresholding: Prevents weak matches from influencing answers
Strict System Prompting: Enforces refusal behavior
The system implements multiple layers of hallucination prevention:
Retrieval-Only Answering: The LLM never sees information beyond retrieved chunks
Similarity Gating: Low-confidence retrieval results trigger refusal
Explicit Refusal Policy: The assistant responds with a polite message when information is unavailable
No Web Access: Ensures deterministic and reproducible behavior
These measures collectively ensure predictable, safe outputs.
Python
LangChain (orchestration and retrieval)
Groq LLM (generation)
FAISS / Chroma (vector storage)
Streamlit (user interface)
API keys loaded via environment variables
No secrets committed to version control
The complete implementation is available as an open-source repository.
Validation is performed through:
In-scope queries with known answers from documents
Out-of-scope queries to verify refusal behavior
Manual inspection of responses for grounding correctness
The system consistently:
Answers correctly when context exists
Refuses when context is missing
Avoids speculative or fabricated outputs
Knowledge is limited strictly to provided documents
No multi-turn conversational memory
No automatic document updates or ingestion
Not designed for real-time or web-based knowledge
These limitations are intentional to preserve safety and clarity.
This architecture is suitable for:
Educational assistants
Internal documentation chatbots
Compliance-sensitive environments
Early-stage RAG prototypes
It demonstrates how Agentic AI principles can be applied responsibly in real systems.
Demonstrates correct RAG implementation, not superficial retrieval
Shows how agentic decision-making improves safety
Highlights practical hallucination mitigation techniques
Provides a clean, extensible reference implementation
Code Repository:
https://github.com/dharamshiyash/agentic-rag-assistant
The repository includes full setup instructions, dependencies, and usage examples.
Multi-turn conversational RAG
Source citation in responses
Support for PDF and structured documents
Multi-agent extensions for advanced reasoning
This project demonstrates that safe, reliable AI assistants require more than powerful models. By combining Agentic AI principles with strict retrieval grounding, the system achieves predictable behavior, transparent limitations, and trustworthy outputs. The work serves as a strong foundational example of how RAG systems should be designed when correctness and safety matter.