HomePublicationsProgramsContributors
Start publication
HomePublicationsProgramsContributors

Table of contents

Code

Datasets

Files

AboutDocsPrivacyCopyrightContactSupport
© Ready Tensor, Inc.
Back to publications
Jan 04, 2026●61 reads●MIT License

Agentic AI Essentials - Multi LLM System

  • k
    Krish Pansara
LikeBookmark

Table of contents

🧠 RAG-Based Question Answering Assistant using LLMs

A Retrieval-Augmented Generation (RAG) powered AI assistant built using LangChain and Vector Databases (Chroma/FAISS) that can answer questions based on your own documents.
Developed as part of Agentic AI Essentials Certificationion Program”, a project under the Agentic AI Essentials program.


🤖 What is this project about?

This project demonstrates how Large Language Models (LLMs) can be enhanced through Retrieval-Augmented Generation (RAG) — a powerful technique where an LLM doesn’t rely only on its internal training data, but also retrieves relevant information from external knowledge sources (your own documents).

🔍 In simple terms:

It’s like ChatGPT, but it knows about your files — PDFs, text, notes — and gives accurate, context-specific answers.

This assistant can run entirely from your local system and interact through a Command-Line Interface (CLI) or optionally a Streamlit UI for a smoother user experience.


💡 Why RAG?

Traditional LLMs like GPT or Gemini have limitations — they can “hallucinate” or give outdated answers because their knowledge cutoff date is fixed.
RAG solves this by:

  1. Retrieving the most relevant information from your provided documents, and
  2. Augmenting the model’s input with that context before generation.

This way, your assistant always produces accurate, up-to-date, and grounded responses based on your data.


🧩 Key Features

  • 📄 Custom Document Ingestion – Upload and index your own documents (PDF, TXT, or MD).
  • 🔍 Vector Store Retrieval – Efficiently stores and searches embeddings using Chroma or FAISS.
  • 🤖 LLM-Powered Answers – Generates natural, context-aware responses using OpenAI, Groq, or Gemini APIs.
  • 🧠 LangChain Integration – Orchestrates document loading, retrieval, and LLM response generation.
  • ⚙️ Configurable Pipeline – Easily switch between models or databases with minimal code changes.
  • 🗂️ Extensible Design – Future-ready for enhancements like memory, logging, or reasoning chains (ReAct, CoT).

🧠 How It Works (Step-by-Step with Deep Understanding)

A Retrieval-Augmented Generation (RAG) system bridges the gap between static model knowledge and dynamic, domain-specific data.
Here’s how your assistant works under the hood:

🪶 1. Document Loading

The assistant begins by reading all the files stored inside the data/ directory — these can be PDFs, text files, or markdown notes.
This step converts unstructured human-readable data into a digital format ready for processing.

Why this matters:
LLMs cannot “read” PDFs or large files directly. By preprocessing the documents, we prepare them for meaningful retrieval and embedding.

✂️ 2. Chunking

Large documents are split into smaller, semantically meaningful pieces (called chunks).
For example, a 20-page PDF might become 200 short text chunks of 200–500 tokens each.

Why this matters:
LLMs have context limits (e.g., 8K or 16K tokens).
Chunking allows efficient retrieval — so when a question is asked, only the most relevant parts are considered.

🧮 3. Embedding Generation

Each text chunk is passed through a Sentence Transformer or an embedding API (like OpenAI’s text-embedding-3-small).
This converts text into a vector — a list of numbers representing semantic meaning.

Why this matters:
Vectors allow computers to “understand” the similarity between texts.
Two chunks with similar meaning will have vectors that are close together in multidimensional space.

🧱 4. Vector Storage

All embeddings (vectors) are stored in a vector database, such as ChromaDB or FAISS.
These databases are optimized for fast similarity search — finding which vectors (chunks) are closest to a query vector.

Why this matters:
Instead of scanning entire documents every time, the system can instantly retrieve only the most relevant text snippets.

🔍 5. Query Handling

When a user asks a question, the system:

  1. Converts the query into a vector (just like the chunks).
  2. Searches the vector database for the most similar document chunks.

The top-matched chunks are then retrieved as the context for the model.

Why this matters:
This allows the assistant to answer questions using your documents instead of relying only on the model’s pretraining.

🧩 6. Prompt Construction

The retrieved context and the user’s question are combined into a structured prompt template —
something like:

“You are an AI assistant. Use the following context to answer the question accurately.
Context: [retrieved chunks]
Question: [user query]”

Why this matters:
Good prompt engineering ensures the LLM produces concise, grounded, and hallucination-free responses.

💬 7. Response Generation

Finally, the prompt is sent to an LLM backend — such as OpenAI GPT, Groq, or Google Gemini.
The model processes both the question and retrieved context to produce an accurate, human-like answer.

Why this matters:
This is where generation happens — the assistant doesn’t “memorize” answers but reasons over context, creating a dynamic and trustworthy Q&A system.

🔁 The Feedback Loop

You can iteratively improve the system by:

  • Adding more documents to data/.
  • Adjusting chunk size and embedding models.
  • Refining prompt templates.

This makes the assistant smarter and more aligned with your specific knowledge base over time.


🧱 Tech Stack

ComponentTechnologyPurpose
FrameworkLangChainHandles chaining of retrieval & generation logic
Vector StoreChroma / FAISSStores embeddings for semantic search
EmbeddingsSentence Transformers or LLM APIsConverts text chunks into numerical vectors
LLM BackendOpenAI, Groq, or Google GeminiGenerates final context-grounded answers
InterfaceCLI / StreamlitFor user interaction
LanguagePython 3.10+Core programming language

📂 Folder Structure

rt-aaidc-module1/
├── src/
│   ├── app.py           # Main RAG application
│   └── vectordb.py      # Vector database wrapper
├── data/               # Contains documnets
│   ├── *.txt          # Contains text files
├── requirements.txt    # All dependencies included
└── README.md          # This guide

🎯 ReadyTensor AAIDC Program

This project was created as part of the ReadyTensor AI Applied Intelligence Developer Certification (AAIDC) program —
a structured and mentor-guided course designed to transform learners from LLM users into AI system builders.

🧱 About the Program

The AAIDC (Applied AI Developer Certification) by ReadyTensor focuses on helping developers:

  • Understand how modern AI assistants actually work behind the scenes
  • Apply concepts like RAG, agentic reasoning, and multi-LLM orchestration
  • Gain hands-on experience by completing real-world, production-ready projects

📘 Module 1 – “Foundations of Agentic AI: Your First RAG Assistant”

This module introduces the core architecture behind Retrieval-Augmented Generation (RAG).
Students learn to:

  1. Design a document ingestion and retrieval pipeline
  2. Create embeddings using Sentence Transformers or API models
  3. Store them in vector databases (Chroma / FAISS)
  4. Build a LangChain-based RAG system connected to LLMs
  5. Deliver context-grounded, hallucination-free AI responses

🌱 What I Learned

By completing this module, I gained a deep, practical understanding of:

  • The full life cycle of RAG-based assistants
  • How to combine vector databases with LLMs effectively
  • The engineering perspective behind document-based AI systems
  • Writing structured, prompt-driven logic with LangChain

🧩 In essence, this project marks my first milestone in building real-world AI systems — not just prompting them.


🎓 Learning Outcomes

By completing this RAG-based AI Assistant project as part of the ReadyTensor AAIDC Program, I achieved both technical mastery and conceptual understanding of modern AI assistant design.
Below are the major outcomes from this project:

🧩 1. End-to-End RAG Pipeline Development

  • Learned to design and implement a complete Retrieval-Augmented Generation (RAG) workflow.
  • Understood how to connect the data ingestion, vectorization, retrieval, and generation stages.
  • Gained hands-on experience with document loading, chunking, and semantic search using embeddings.

⚙️ 2. Practical Use of LLM Frameworks

  • Worked with LangChain and Sentence Transformers to integrate external language models.
  • Understood how to call multiple LLM APIs (OpenAI, Groq, and Google Gemini) using environment variables.
  • Learned how to design flexible pipelines that can switch between different LLM providers seamlessly.

💾 3. Vector Databases & Information Retrieval

  • Implemented ChromaDB for storing and querying document embeddings.
  • Learned how similarity search works using cosine similarity and vector distance metrics.
  • Optimized retrieval by tuning chunk sizes, embedding models, and context length.

🧠 4. Prompt Engineering & Context Management

  • Designed custom prompt templates that merge retrieved context with user queries.
  • Explored strategies to minimize hallucination and maintain answer grounding.
  • Practiced prompt refinement for improving clarity, accuracy, and relevance of LLM responses.

🧮 5. Hands-On AI System Design Thinking

  • Understood how modern AI assistants are architected in the real world (beyond basic prompting).
  • Gained the ability to analyze, debug, and improve an LLM-based application.
  • Learned to think like an AI system engineer, not just an API user.

🚀 6. Professional & Deployment Skills

  • Managed project structure following software engineering best practices.
  • Used Git and GitHub for version control, documentation, and collaboration.
  • Learned to maintain a clean, reproducible environment using virtualenv and requirements.txt.

⚙️ Installation & Setup

Follow the steps below to set up and run the project locally:

1️⃣ Clone the repository

git clone https://github.com/krishpansara/rt-aaidc-module1/ cd rt-aaidc-module1

2️⃣ Create and Activate a Virtual Environment

It’s recommended to create a virtual environment to isolate project dependencies.

🪟 For Windows:

python -m venv venv venv\Scripts\activate

🐧 For macOS / Linux:

python3 -m venv venv source venv/bin/activate

3️⃣ Install the Required Dependencies

Once the virtual environment is activated, install all dependencies using:

pip install -r requirements.txt

4️⃣ Configure your API key:

This project supports multiple LLM providers (OpenAI, Groq, Google).
You need to set your API keys before running the app.

After creating the Virtual Environment your project containes .env file in the project root directory add your API keys in the file as shown below:

# .env file # Example: use one or more providers depending on your setup OPENAI_API_KEY=your_openai_api_key_here GROQ_API_KEY=your_groq_api_key_here GOOGLE_API_KEY=your_google_api_key_here

5️⃣ Run the Application

python app.py

🙏 Acknowledgements

Special thanks to ReadyTensor.ai for providing structured learning and the RAG project template.
Inspired by the open-source AI community and the LangChain ecosystem.


🧭 In summary:
This project strengthened my ability to build end-to-end AI systems, combining data engineering, machine learning, and software development.
It represents a foundational step toward becoming a practical AI/ML engineer capable of developing custom knowledge-grounded assistants.

Table of contents

Your publication could be next!

Join us today and publish for free

Sign Up for free!

Table of contents

Code

  • Multi LLM System

Code

  • Multi LLM System