This project implements a Retrieval-Augmented Generation (RAG) Assistant, a hybrid intelligent system that integrates a vector database (ChromaDB) with multiple Large Language Models (LLMs) which includes OpenAI GPT, Groq Llama, and Google Gemini.
The system allows users to query a knowledge base built from local documents (PDFs, Word files, and text files) they provided, retrieving semantically relevant information using HuggingFace embeddings, and generating accurate, context-aware answers through one of the available LLM APIs.
It demonstrates a complete end-to-end architecture of a Lightweight AI-powered Knowledge Retrieval System, designed for research, clinical, or educational applications.
Modern LLMs are powerful, but they lack direct access to private or domain-specific data.
For example, a clinician or researcher may have internal reports or publications stored locally, which are not accessible to cloud-based AI models.
This project bridges that gap by combining:
Vector search (for factual grounding), and
Generative reasoning (for natural, contextual answers)
The motivation was to create a self-contained, local retrieval system that enhances any LLM with private knowledge injection, while still allowing flexible integration with multiple AI providers depending on available credentials or cost.
The system follows a modular architecture, composed of two main modules and one environment configuration:
app.py
– The orchestration module responsible for:Environment variable management (dotenv)
Document loading and preprocessing
Prompt templating and chaining (LangChain)
Query-answer workflow management
vectordb.py
– The vector database interface handling:Text chunking and embedding generation
Persistent storage of document embeddings
Semantic similarity search and retrieval via ChromaDB
.env
– The configuration file managing API keys and model selection:OPENAI_API_KEY, GROQ_API_KEY, GEMINI_API_KEY
Embedding and ChromaDB collection settings
.txt
, .pdf
, .docx
or .md
files from data/
directoryRecursiveCharacterTextSplitter
(500 characters, 50 overlap)all-MiniLM-L6-v2
)Component | Technology Used | Purpose |
---|---|---|
Programming Language | Python 3.10+ | Core development language |
LLM Integration | LangChain | Orchestration and chaining of LLMs |
Vector Database | ChromaDB | Persistent vector search for contextual retrieval |
Embeddings | SentenceTransformers (all-MiniLM-L6-v2 ) | Semantic text encoding |
Text Splitter | LangChain RecursiveCharacterTextSplitter | Chunking documents for embedding |
Supported LLM APIs | OpenAI GPT, Groq Llama, Google Gemini | Multi-model support |
Document Parsing | PyPDF2, python-docx | Reading .pdf and .docx files respectively |
Environment Management | python-dotenv | Secure API key loading |
Storage | PersistentClient (./chroma_db ) | Local vector persistence |
.env
..pdf
, .docx
, .txt
, and .md
files.app.py
(LLM orchestration) and vectordb.py
(vector database management).Ensure you have:
Python 3.10 or newer
pip
installed
git clone https://github.com/Nago-01/agentic_ai_project1.git
cd agentic_ai_project1
python -m venv .venv
source .venv/bin/activate # (Linux/Mac)
.venv\Scripts\activate # (Windows)
pip install -r requirements.txt
Create a .env
file in the root directory with your API keys:
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk-...
GOOGLE_API_KEY=AIza...
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
CHROMA_COLLECTION_NAME=rag_documents
Place your .pdf
, .docx
, .txt
, or .md
files inside the data/
directory.
python -m src.app
Initializing RAG Assistant...
Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Vector database initialized with collection: rag_documents
RAG Assistant initialized successfully
Loading documents...
Loaded 3 documents
Enter a question or 'quit' to exit: What is antimicrobial resistance?
Antimicrobial resistance refers to the ability of microorganisms, such as bacteria, viruses, and fungi, to resist the effects of antimicrobial agents like antibiotics and antivirals. It occurs through genetic mutations or horizontal gene transfer.
RAGAssitant
Method | Description |
---|---|
__init__() | Initializes the assistant, loads LLM and vector DB. |
_initialize_llm() | Selects available LLM (Groq → OpenAI → Gemini). |
add_documents(documents: List) | Adds parsed documents to the vector database. |
invoke(input: str, n_results: int = 3) | Retrieves relevant chunks and generates an answer. |
VectorDB
Method | Description |
---|---|
chunk_text(text: str, chunk_size: int = 500) | Splits text into smaller chunks for embedding. |
add_documents(documents: List) | Embeds and stores document chunks in ChromaDB. |
search(query: str, n_results: int = 5) | Finds most similar chunks based on semantic similarity. |
SQLite3
and long-term memory retention using VectorStoreRetrievalMemory
.To evaluate the performance and effectiveness of the RAG-based assistant, I compared it against few baseline and state-of-the-art systems:
LlamaIndex Simple RAG Implementation – another lightweight RAG baseline for document retrieval.
Haystack Document QA Pipeline – a robust framework for open-domain document question answering.
GPT-4 with Context Window Only (no retrieval) – a pure LLM approach without external knowledge integration.
Each system was tested on the same document set using 15 user queries of varying complexity. Metrics such as answer relevance and context accuracy were evaluated on paper, while response latency was evaluated mathematically.
The system demonstrated competitive performance in factual accuracy and context retrieval consistency, particularly due to its use of SentenceTransformer embeddings and Chroma persistent vector store, which allowed for efficient similarity search and reusable storage. However, the system exhibited context drift in multi-document scenarios, a limitation I plan to address through query-specific context filtering in future versions.
Metric | Description | Measurement Approach | Value |
---|---|---|---|
Latency (s) | Average time taken from query input to final answer generation. | Mean across 50 runs. | 2.3s |
This project showcases a RAG-based system implemented to align with the Foundation of Agentic AI - the first module of this program. It combines retrieval-augmented generation (RAG) with hybrid memory and minimal agentic reasoning to provide accurate, context-aware answers derived directly from uploaded publications. By integrating vector search, LLM-based inference, and document chunking, the assistant efficiently bridges the gap between static document knowledge and dynamic question-answering.
Beyond static retrieval, the semi-agentic design introduces flexibility and autonomy in processing queries, ensuring that the assistant evolves toward more interactive and reasoning-capable systems.
This project therefore serves as both a proof-of-concept and a foundation for future intelligent assistants that can reason over specialized knowledge bases with precision.
LangChain Documentation: https://python.langchain.com
ChromaDB: https://docs.trychroma.com
HuggingFace SentenceTransformers: https://www.sbert.net
OpenAI API: https://platform.openai.com/docs
Groq Llama Models: https://groq.com
Google Gemini API: https://ai.google.dev
Python-dotenv: https://pypi.org/project/python-dotenv
Image Credit: WTF in Tech
Image Credit: WTF in Tech