RAG_Chatbot

Building a Retrieval-Augmented AI Assistant Using Gemini and LangChain

GitHub Repository:

https://github.com/sarthak-cs/RAG_chatbot

1. Introduction

Large Language Models (LLMs) such as Gemini are powerful but limited by their static training data. They may hallucinate or fail when asked about information outside their knowledge cutoff. To address this limitation, Retrieval-Augmented Generation (RAG) combines information retrieval with generative AI to produce more accurate, context-aware responses.
In this project, I built a console-based RAG chatbot that retrieves relevant information from a custom dataset and generates intelligent answers using Google Gemini 2.5 Pro. This work demonstrates the foundational concepts of agentic AI, where an AI system actively retrieves knowledge before reasoning and responding.

2. Problem Statement

Traditional chatbots rely solely on pretrained language models, which:

Cannot access private or custom datasets
May generate incorrect or hallucinated responses
Lack transparency in how answers are formed

The goal of this project is to design a system that:

Grounds responses in real documents
Improves accuracy using semantic retrieval
Demonstrates a practical RAG pipeline using modern AI tooling

3. System Architecture

The system follows a standard Retrieval-Augmented Generation pipeline:

Document Ingestion
Custom text documents are loaded from local storage.
Text Chunking
Documents are split into overlapping chunks to preserve context while enabling efficient retrieval.
Embedding Generation
Each chunk is converted into vector embeddings using HuggingFace MiniLM.
Vector Storage
Embeddings are stored in a ChromaDB vector database for fast similarity search.
Query Processing

For each user query:

Relevant chunks are retrieved using semantic similarity
Retrieved context is injected into the LLM prompt
Gemini 2.5 Pro generates a grounded response

4. Technologies Used

Python - Core programming language
LangChain - Orchestrating document loading, chunking and retrieval
ChromaDB - Vector database for semantic search
HuggingFace MiniLM - Lightweight embedding model
Google Gemini 2.5 Pro - Large Language Model for response generation

5. Implementation Overview

The chatbot operates in an interactive loop where user queries are processed in real time. Retrieved document chunks are explicitly injected into the prompt, ensuring that the model answers only using verified context.
Basic prompt-guarding is applied by instructing the model to rely strictly on retrieved information, reducing hallucinations.

6. Safety and Responsible AI Considerations

While this project focuses on foundational RAG concepts, basic safeguards are included:

Context-only prompting to limit hallucinations
Environment-based API key management
Modular design enabling future content filtering and moderation

Future iterations can include:

Toxicity and bias filters
Query validation and rate limiting
Role-based access controls for sensitive datasets

7. Results and Use Cases

The chatbot successfully answers questions grounded in the provided dataset and demonstrates:

Improved factual accuracy
Reduced hallucinations
Clear traceability of information sources

Potential real-world applications include:

Internal knowledge assistants
Educational Q&A systems
Documentation support bots
Research assistants for private datasets

8. Future Enhancements

Web-based user interface
Support for PDFs, CSVs and DOCX files
Multi-document and multi-source retrieval
Conversation memory and analytics
Deployment as an API or web service

9. Conclusion

This project demonstrates how Retrieval-Augmented Generation can significantly improve the reliability of AI assistants. By integrating semantic retrieval with generative models, the system provides grounded, context-aware responses and serves as a strong foundation for building more advanced agentic AI systems.