This publication presents a Retrieval-Augmented Generation (RAG) system that enables natural language question-answering over custom document collections. Built using LangChain, ChromaDB, and modern LLMs (OpenAI, Groq, Google Gemini), the system demonstrates how semantic search and retrieval can dramatically improve AI response accuracy while reducing hallucinations. The implementation features intelligent document chunking, persistent vector storage, multi-provider LLM support, and comprehensive test coverage with advanced RAG evaluation metrics.
Key Capabilities:
Organizations and individuals face a common challenge: How do you make AI understand and accurately answer questions about YOUR specific documents?
General-purpose chatbots and LLMs, while powerful, have significant limitations:
This problem manifests across numerous domains:
Traditional keyword-based search has limitations:
Retrieval-Augmented Generation (RAG) bridges this gap by:
This solution provides the accuracy and specificity of custom data with the natural language capabilities of modern LLMs—without expensive fine-tuning.
What This System Does:
What This System Doesn't Do:
The system follows a clean, modular architecture:
┌─────────────────────────────────────────────────────────────┐
│ Document Ingestion │
│ data/ folder → Load documents → Chunk text → Embeddings │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Vector Storage Layer │
│ ChromaDB (Persistent) → sentence-transformers embeddings │
│ Collection: rag_documents → 384-dim vectors │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Query Pipeline │
│ User Question → Embed → Search Top-K → Context Retrieval │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LLM Response Generation │
│ Context + Question → LangChain Prompt → LLM → Answer │
│ Providers: OpenAI / Groq / Google Gemini │
└─────────────────────────────────────────────────────────────┘
1. Document Processing (vectordb.py)
RecursiveCharacterTextSplitter for intelligent chunking
2. Vector Database (vectordb.py)
3. RAG Pipeline (app.py)
4. Configuration (config.py + config.yaml)
| Component | Technology | Rationale |
|---|---|---|
| LLM Framework | LangChain | Industry standard, excellent abstractions, multi-provider support |
| Vector Database | ChromaDB | Lightweight, persistent, easy setup, no external services |
| Embeddings | sentence-transformers | Open-source, fast, high-quality semantic embeddings |
| LLM Providers | OpenAI, Groq, Google | Flexibility to choose based on cost, speed, quality trade-offs |
| Testing | pytest + DeepEval | Comprehensive unit tests + specialized RAG evaluation metrics |
| Config Management | PyYAML + python-dotenv | Clean separation of code and configuration |
Input Data:
data/ directory in project rootSample Data Included:
The project includes five sample documents covering different domains:
Adding Your Own Documents:
Simply drop .txt or .md files into the data/ folder and restart the application.
Prerequisites:
# System Requirements Python 3.10 or higher pip (Python package manager) Virtual environment (recommended) # Minimum 4GB RAM # ~1GB disk space for dependencies
Step-by-Step Installation:
# 1. Clone the repository git clone https://github.com/david-001/agentic-ai-essentials-cert-project cd agentic-ai-essentials-cert-project # 2. Create and activate virtual environment python -m venv venv source venv/bin/activate # Linux/Mac # venv\Scripts\activate # Windows # 3. Install dependencies pip install -r requirements.txt # 4. Configure environment variables cp .env.example .env # Edit .env with your API key (at least one required)
Configuration (config/config.yaml):
# Embedding model configuration embedding: model: "sentence-transformers/all-MiniLM-L6-v2" # Database configuration database: collection_name: "rag_documents" path: "./chroma_db" # LLM configuration llm: temperature: 0.0 # File paths paths: data_directory: "data"
Environment Variables (.env):
# Choose ONE LLM provider (system tries in order: OpenAI → Groq → Google) OPENAI_API_KEY=sk-proj-... GROQ_API_KEY=gsk-... GOOGLE_API_KEY=AIza... # Optional: Specify model explicitly OPENAI_MODEL=gpt-4o-mini GROQ_MODEL=llama-3.1-8b-instant GOOGLE_MODEL=gemini-2.0-flash
1. Document Loading
def load_documents() -> List[str]: """ Load documents for demonstration. Returns: List of sample documents """ results = [] # Implement document loading # - Read the documents from the data directory # - Return a list of documents # - Support .txt and .md files # Define the data directory path data_dir = config.DATA_DIRECTORY # Check if data directory exists if not os.path.exists(data_dir): print(f"Warning: {data_dir} directory not found. Creating it...") os.makedirs(data_dir) print(f"Please add your documents to the '{data_dir}' folder and run again.") return results # Load all .txt files from the data directory for filename in os.listdir(data_dir): filepath = os.path.join(data_dir, filename) # Handle text files if filename.endswith('.txt'): try: with open(filepath, 'r', encoding='utf-8') as f: content = f.read() results.append({ 'content': content, 'metadata': {'source': filename} }) print(f"Loaded: {filename}") except Exception as e: print(f"Error loading {filename}: {e}") # Handle markdown files elif filename.endswith('.md'): try: with open(filepath, 'r', encoding='utf-8') as f: content = f.read() results.append({ 'content': content, 'metadata': {'source': filename} }) print(f"Loaded: {filename}") except Exception as e: print(f"Error loading {filename}: {e}") if len(results) == 0: print(f"\nNo documents found in '{data_dir}' folder.") print("Please add some .txt or .md files to get started.") return results
The chunk_text method in VectorDB uses LangChain's RecursiveCharacterTextSplitter:
2. Adding documents to Vector Database
def add_documents(self, documents: List) -> None: """ Add documents to the vector database. Args: documents: List of documents """ # Implement document ingestion logic # - Loop through each document in the documents list # - Extract 'content' and 'metadata' from each document dict # - Use self.chunk_text() to split each document into chunks # - Create unique IDs for each chunk (e.g., "doc_0_chunk_0") # - Use self.embedding_model.encode() to create embeddings for all chunks # - Store the embeddings, documents, metadata, and IDs in your vector database # - Print progress messages to inform the user print(f"Processing {len(documents)} documents...") # Handle empty document list if not documents: print("No documents to process.") return all_chunks = [] all_metadatas = [] all_ids = [] # Process each document for doc_idx, document in enumerate(documents): # Extract content and metadata content = document.get('content', '') metadata = document.get('metadata', {}) # Chunk the document chunks = self.chunk_text(content) print(f"Document {doc_idx + 1}: Split into {len(chunks)} chunks") # Create unique IDs and metadata for each chunk for chunk_idx, chunk in enumerate(chunks): chunk_id = f"doc_{doc_idx}_chunk_{chunk_idx}" chunk_metadata = { **metadata, 'doc_index': doc_idx, 'chunk_index': chunk_idx } all_chunks.append(chunk) all_metadatas.append(chunk_metadata) all_ids.append(chunk_id) if not all_chunks: print("No chunks to add!") return # Create embeddings for all chunks print(f"Creating embeddings for {len(all_chunks)} chunks...") embeddings = self.embedding_model.encode(all_chunks, show_progress_bar=True) # Add to ChromaDB collection print("Adding to vector database...") self.collection.add( ids=all_ids, embeddings=embeddings.tolist(), documents=all_chunks, metadatas=all_metadatas ) print(f"Successfully added {len(all_chunks)} chunks to vector database")
3. RAG Query Pipeline
def query(self, input: str, n_results: int = 3) -> str: """ Query the RAG assistant. Args: input: User's input n_results: Number of relevant chunks to retrieve Returns: Dictionary containing the answer and retrieved context """ llm_answer = "" # Implement the RAG query pipeline # - Use self.vector_db.search() to retrieve relevant context chunks # - Combine the retrieved document chunks into a single context string # - Use self.chain.invoke() with context and question to generate the response # - Return a string answer from the LLM # Step 1: Search for relevant context chunks search_results = self.vector_db.search(input, n_results=n_results) # Step 2: Combine retrieved chunks into a single context string if search_results['documents']: context = "\n\n".join(search_results['documents']) else: context = "No relevant information found." # Step 3: Use the chain to generate the response llm_answer = self.chain.invoke({ "context": context, "question": input }) return llm_answer
4. Prompt Engineering
The prompt template is carefully designed to:
template = """You are a helpful AI assistant. Use the following context to answer the user's question. Use clear, concise language with bullet points where appropriate. Given the some documents that should be relevant to the user's question, answer the user's question. Only answer questions based on the provided documents. If the user's question is not related to the documents, then you SHOULD NOT answer the question. Say "The question is not answerable given the documents". Never answer a question from your own knowledge. Provide concise answers in bullet points when relevant. Context: {context} Question: {question} Answer:"""
Local Development:
Production Considerations:
Scalability:
Security:
Performance:
Monitoring:
Core Dependencies:
langchain-core==1.2.7
langchain-google-genai==4.2.0
langchain-groq==1.1.1
langchain-openai==1.1.7
langchain-text-splitters==1.1.0
chromadb==1.4.1
sentence-transformers==5.2.0
python-dotenv==1.2.1
Testing & Evaluation:
pytest==9.0.2
deepeval==3.8.0
The system was evaluated using a comprehensive test suite combining custom retrieval metrics and DeepEval's specialized RAG evaluation metrics:
Retrieval Quality Metrics (Custom Implementation):
Generation Quality Metrics (DeepEval Framework):
The test suite includes:
Retrieval Performance:
Precision@3: 0.8182 (81.82% of retrieved chunks are relevant)
Recall@3: 1.0000 (100% of relevant chunks retrieved in top-3)
MRR: 1.0000 (First relevant result always in position 1)
NDCG@5: 0.9854 (Near-perfect ranking quality)
Avg Latency: 19.94ms (Fast retrieval, <20ms per query)
Generation Quality (DeepEval):
Faithfulness: 1.0000 (Perfect grounding in context, zero hallucinations)
Answer Relevance: 1.0000 (Answers perfectly address questions)
Contextual Precision: 0.8472 (84.72% of retrieved context is relevant)
Contextual Recall: 0.9167 (91.67% of needed information retrieved)
Contextual Relevancy: 0.4289 (Some retrieved chunks less relevant)
Performance Analysis:
The results demonstrate excellent retrieval and generation quality:
✅ Strengths:
⚠️ Areas for Improvement:
Overall Grade: A (Excellent)
The system achieves production-ready quality with perfect faithfulness and relevance scores, making it suitable for deployment in knowledge-intensive applications where accuracy is critical.
Technical Limitations:
Operational Limitations:
Known Edge Cases:
Very short documents (<256 chars) create single-chunk entries that may be too broad for precise semantic matching
Business Value:
Technical Contribution:
Potential Extensions:
Semantic Search Quality: sentence-transformers/all-MiniLM-L6-v2 provides excellent semantic understanding despite being lightweight and fast
Prompt Engineering: The carefully crafted prompt effectively keeps responses grounded in documents and prevents hallucinations
Modular Architecture: Clean separation between vector DB, RAG pipeline, and configuration makes the system maintainable and extensible
Multi-Provider Support: Flexibility to switch between OpenAI, Groq, and Gemini provides cost/performance optimization options
ChromaDB Performance: Persistent storage with fast retrieval meets requirements for development and small-scale deployment
Chunk Size Matters: 256 characters with 20-character overlap strikes a good balance, but different document types may benefit from tuning
Retrieval Count Trade-off: Top-3 chunks work well for focused questions; complex queries might benefit from top-5 or top-7
Temperature Settings: Setting temperature to 0 for factual Q&A significantly reduces hallucinations
Evaluation is Critical: Automated RAG metrics (DeepEval) catch issues that manual testing misses
Error Handling: Graceful degradation when documents are out of scope is essential for user trust
This project was developed as part of the Ready Tensor Agentic AI Essentials Certification Program. Special thanks to the Ready Tensor team for the comprehensive curriculum and project structure.
Repository Structure:
agentic-ai-essentials-cert-project/
│
├── src/ # Source code directory
│ ├── app.py # Main application with RAG pipeline
│ ├── config.py # Configuration loader (loads from YAML)
│ └── vectordb.py # Vector database wrapper for ChromaDB
│
├── config/ # Configuration directory
│ └── config.yaml # YAML configuration file (edit settings here)
│
├── data/ # Document collection
│ ├── api_documentation.md # Sample: API documentation
│ ├── company_policies.md # Sample: HR policies
│ ├── customer_faq.md # Sample: Customer FAQ
│ ├── product_documentation.md # Sample: Product information
│ └── security_compliance.md # Sample: Security documentation
│
├── tests/ # Comprehensive test suite
│ ├── conftest.py # Pytest configuration and shared fixtures
│ ├── metrics_utils.py # Metric calculation utilities
│ ├── rag_evaluator.py # DeepEval-based RAG quality evaluator
│ ├── rag_evaluator_utils.py # Helper utilities for evaluation
│ ├── test_app.py # Integration tests for RAG pipeline
│ └── test_vectordb.py # Unit tests for vector database
│
├── requirements.txt # Python dependencies
├── pytest.ini # Pytest configuration
├── .env # Environment variables (API keys) - DO NOT COMMIT
├── .env.example # Template for environment setup
├── .gitignore # Git ignore rules
├── LICENSE # MIT License
└── README.md # This file
│
└── chroma_db/ # Vector database storage (auto-created)
Repository: https://github.com/david-001/agentic-ai-essentials-cert-project