
This project implements a Retrieval-Augmented Generation (RAG) based AI Assistant that answers questions using only the content inside a set of domain documents. The system loads text files, splits them into smaller chunks, generates embeddings using Google Gemini Embeddings, stores them in a ChromaDB vector database, and retrieves relevant context when a user asks a question. An LLM (Gemini 2.5 Flash) then produces a grounded answer based strictly on the retrieved text. When the documents do not contain the answer, the model is forced to respond: βI donβt have enough information from the documents.β This project was completed as part of the Ready Tensor Agentic AI Developer Certification β Module 1.
The system follows a standard RAG workflow:
Document Loading: .txt files inside the /data directory are loaded automatically.
Chunking: Each document is split into ~500-character sentence-aware chunks to improve retrieval granularity.
Embeddings: Google Generative AI text-embedding-004 converts each chunk into vector embeddings.
Vector Storage: All embeddings and metadata are stored in ChromaDB.
Similarity Search: At query time, the user question is embedded and matched against stored vectors.
LLM Response: The retrieved context is inserted into a prompt, and Gemini generates the answer grounded in that context.
If the answer is not found in retrieval context, the model returns:
βI don't have enough information from the documents.β
For this project, I prepared three small domain documents stored in the data/ directory.
These act as the custom knowledge base for retrieval:
All documents were manually drafted and curated for clarity, converted to UTF-8 text format, and placed in the data folder for ingestion.
No external proprietary datasets were used, ensuring full reproducibility.
The knowledge base contains 3 custom text documents, placed in the /data directory:
| File Name | Description | Approx. Length |
|---|---|---|
| vaes_intro.txt | What VAEs are and common applications | ~250 words |
| vaes_vs_autoencoders.txt | Difference between VAEs and traditional autoencoders | ~220 words |
| transformers_basics.txt | Explanation of transformers and self-attention | ~260 words |
All files are plain text and fully included in this repository.
| Component | Purpose |
|---|---|
| Google Gemini (LLM + Embeddings) | Answering and embedding text |
| ChromaDB | Vector store for chunk embeddings |
| LangChain | Prompt templating + pipeline |
| Python 3.10+ | Runtime |
vectordb.py:
Acts as a wrapper around ChromaDB for storage and retrieval.
Splits document text into smaller chunks to improve search accuracy.
Uses Google Generative AI embeddings (text-embedding-004) to convert text chunks into vectors.
Stores each chunk along with metadata such as source filename, chunk index, and character length.
Performs similarity search by embedding the userβs query and returning the closest chunks.
Includes retry logic to handle temporary API failures and 504 timeout errors, improving stability.
app.py:
Loads all .txt files from the data folder and prepares them for ingestion.
Calls vectordb.py to chunk, embed, and store the documents.
Defines a Retrieval-Augmented Generation (RAG) prompt template that forces answers to rely only on retrieved context.
Runs the full RAG pipeline:
Search for relevant chunks
Build formatted context
Pass context and question into the LLM
Print the answer with the source filenames
Outputs βI donβt have enough information from the documents.β when the context does not contain an answer.
Security and configuration:
API keys are stored in a .env file and protected using .gitignore.
A .env.example file is provided so others can run the project without access to real secrets.
ChromaDB persistence is used so the system can reuse stored vectors without re-processing documents on every run.
The system was evaluated using real queries executed against the stored documents.
An output was considered correct if:
β
Answer used only retrieved context
β
The model cited the correct document
β
If no answer existed, it returned the fallback sentence
| Query | Expected Result | System Output | Result |
|---|---|---|---|
| βWhat are VAEs used for?β | Generation, anomaly detection, imputation | β Correct & cited | β Pass |
| βDifference between VAEs and autoencoders?β | Comparison exists in documents | β Correct & cited | β Pass |
| βHow do transformers model long-range dependencies?β | Self-attention mechanism | β Correct & cited | β Pass |
β Accuracy: 100% (3/3 queries grounded in retrieved text)
| Query | Screenshot |
|---|---|
| What are VAEs used for? | ![]() |
| What is the difference between VAEs and autoencoders? | ![]() |
| How do transformers model long-range dependencies? | ![]() |
(Images stored in /assets/ folder inside the repo)
Works only for .txt files (no PDFs/HTML yet)
Retrieval limited to semantic similarity only
No conversation memory across turns
Small corpus, not benchmarked against large-scale data
Not deployed as API/UI
Add PDF ingestion + OCR
Add Streamlit or FastAPI interface
Store chat history for conversational RAG
Support large datasets and cloud-scale vector DB (Pinecone / Weaviate)
Add automated evaluation with more queries
This application can be run locally or deployed using:
Python virtual environment
Docker container
Streamlit web UI for chat-style interface
Environment-secured .env for API keys
Most general-purpose LLMs can answer questions, but they lack domain-specific context.
This project solves that gap by grounding the model in user-provided documents, reducing hallucinations and improving accuracy.
The system can be extended by:
.txt files to the data/ directoryTo show the benefit of using a RAG approach, I compared responses from:
β
RAG pipeline (Gemini + ChromaDB)
β LLM-only prompt (no document retrieval)
| User Question | LLM-Only (No RAG) | RAG System Output |
|---|---|---|
| "What is the difference between VAEs and autoencoders?" | Gives a generic answer based on prior training, even if incorrect | Correctly responds: "I don't have enough information from the documents." |
| "How do transformers model long-range dependencies?" | Generic theoretical response, no citation | Uses retrieved chunks and cites (source: transformers_basics.txt) |
Key Result:
This proves that RAG improves reliability compared to a normal LLM prompt that can guess or hallucinate.
This RAG design is suitable for teams that want a small, private knowledge assistant without exposing data to external models.
It can be extended to PDF ingestion, enterprise document search, chatbots, or customer support tools.
Many enterprises already use RAG systems to ground large language models in proprietary knowledge.
Systems like customer support, medical record search, and legal document lookup rely on RAG for factual accuracy and auditability.
This project demonstrates a complete working RAG assistant using Gemini and ChromaDB. It retrieves context, grounds responses, cites sources, handles missing information gracefully, and follows production-aligned design principles.