Context Aware RAG Assistant

Author: Richmond Iroegbu Elochukwu
Affiliation: Ready Tensor AI Developer Certification Program
Keywords: LangChain, RAG, Chatbot, LLM, Vector Database, Context Retrieval

Abstract

This project presents a Context-Aware LangChain RAG Chatbot designed to enable intelligent and document-grounded conversational interactions using modern Large Language Models (LLMs). The system integrates Retrieval-Augmented Generation (RAG) with a vector database, ensuring contextually accurate responses grounded in uploaded documents. The current implementation leverages a single document, “Agenda 2063: The Africa We Want” demonstrating the chatbot’s ability to perform domain-specific question answering. The model supports multiple LLM backends, including OpenAI, Groq, and Gemini, offering flexibility and cost efficiency.

1. Introduction

Traditional chatbots often rely solely on pre-trained LLM knowledge, resulting in responses that may be outdated or contextually irrelevant for domain-specific applications. To address this, the Context-Aware LangChain RAG Chatbot employs Retrieval-Augmented Generation, allowing users to interact with custom document data while maintaining the fluency of LLM responses.

This project was developed as part of the Ready Tensor AI Developer Certification Program to demonstrate practical implementation of a document-grounded AI assistant using the LangChain framework.

2. Methodology

The chatbot’s architecture combines document ingestion, embedding, vector storage, retrieval, and response generation in a modular pipeline.

Key Components

Document Loader: Extracts text content from PDF files using PyPDF2.
Embedding Generator: Converts text chunks into numerical vectors for semantic search.
Vector Database (VectorDB): Stores and retrieves embeddings based on similarity.
Prompt Builder: Constructs context-aware prompts combining user queries and retrieved chunks.
LLM Engine: Utilizes LangChain’s integration with ChatGroq, ChatOpenAI, or Gemini to generate answers.

3. System Architecture

The chatbot follows a modular RAG pipeline:

PDF Document → Text Chunking → Embeddings → Vector Store → Query Retrieval → Prompt Assembly → LLM Response

Each stage is isolated in its own module:

vectordb.py handles vector storage and similarity search.
prompt_builder.py and prompt_instructions.py manage dynamic prompt assembly.
app.py orchestrates document loading, query handling, and model responses.

This modular approach promotes scalability and allows swapping components such as the embedding model or LLM provider without breaking the pipeline.

4. Implementation Details

Programming Language: Python
Frameworks: LangChain, dotenv, PyPDF2
Vector Database: ChromaDB
Supported Models: Groq (ChatGroq), OpenAI (ChatOpenAI), Google Gemini
Environment: Google Colab / Local / Ready Tensor Runtime
Input: PDF document(s)
Output: Context-grounded text responses

Logging and error handling mechanisms ensure reliability during document parsing, vectorization, and inference.

5. Evaluation and Results

The chatbot was evaluated using the Agenda 2063 Popular Version document. User queries were tested for semantic relevance and factual grounding.

Performance Highlights

High contextual accuracy for document-based queries
Consistent semantic retrieval across multiple LLM providers
Fast response time with Groq LLM integration
Modular adaptability for future domain expansion

6. Deployment Instructions

Environment Setup

git https://github.com/Richmondiroegbu/context-aware-rag-assistant.git
cd src
pip install -r requirements.txt

Configuration

Create a .env file and add your API keys:

OPENAI_API_KEY=your_key
GROQ_API_KEY=your_key
GEMINI_API_KEY=your_key

Execution

Run the chatbot:

python run.py

Upload your document (e.g., Agenda 2063.pdf) and start interacting through the command line or web interface (depending on configuration).

7. Conclusion

The Context Aware RAG Assistant Chatbot demonstrates how Retrieval-Augmented Generation can empower LLMs to produce accurate, document-grounded, and context-sensitive answers. Its modular design allows for easy adaptation across various domains, such as policy documents, technical manuals, and enterprise knowledge bases.

8. References

LangChain Documentation: https://python.langchain.com
ChromaDB Docs: https://docs.trychroma.com/docs/overview/introduction
OpenAI API Docs: https://platform.openai.com/docs/api-reference/introduction
Groq LLM Platform: https://console.groq.com/docs/overview
Google Gemini API: https://ai.google.dev/gemini-api/docs

Context Aware RAG Assistant

Author: Richmond Iroegbu Elochukwu
Affiliation: Ready Tensor AI Developer Certification Program
Keywords: LangChain, RAG, Chatbot, LLM, Vector Database, Context Retrieval

Abstract

1. Introduction

This project was developed as part of the Ready Tensor AI Developer Certification Program to demonstrate practical implementation of a document-grounded AI assistant using the LangChain framework.

2. Methodology

The chatbot’s architecture combines document ingestion, embedding, vector storage, retrieval, and response generation in a modular pipeline.

Key Components

Document Loader: Extracts text content from PDF files using PyPDF2.
Embedding Generator: Converts text chunks into numerical vectors for semantic search.
Vector Database (VectorDB): Stores and retrieves embeddings based on similarity.
Prompt Builder: Constructs context-aware prompts combining user queries and retrieved chunks.
LLM Engine: Utilizes LangChain’s integration with ChatGroq, ChatOpenAI, or Gemini to generate answers.

3. System Architecture

The chatbot follows a modular RAG pipeline:

PDF Document → Text Chunking → Embeddings → Vector Store → Query Retrieval → Prompt Assembly → LLM Response

Each stage is isolated in its own module:

vectordb.py handles vector storage and similarity search.
prompt_builder.py and prompt_instructions.py manage dynamic prompt assembly.
app.py orchestrates document loading, query handling, and model responses.

This modular approach promotes scalability and allows swapping components such as the embedding model or LLM provider without breaking the pipeline.

4. Implementation Details

Programming Language: Python
Frameworks: LangChain, dotenv, PyPDF2
Vector Database: ChromaDB
Supported Models: Groq (ChatGroq), OpenAI (ChatOpenAI), Google Gemini
Environment: Google Colab / Local / Ready Tensor Runtime
Input: PDF document(s)
Output: Context-grounded text responses

Logging and error handling mechanisms ensure reliability during document parsing, vectorization, and inference.

5. Evaluation and Results

The chatbot was evaluated using the Agenda 2063 Popular Version document. User queries were tested for semantic relevance and factual grounding.

Performance Highlights

High contextual accuracy for document-based queries
Consistent semantic retrieval across multiple LLM providers
Fast response time with Groq LLM integration
Modular adaptability for future domain expansion

6. Deployment Instructions

Environment Setup

git https://github.com/Richmondiroegbu/context-aware-rag-assistant.git
cd src
pip install -r requirements.txt

Configuration

Create a .env file and add your API keys:

OPENAI_API_KEY=your_key
GROQ_API_KEY=your_key
GEMINI_API_KEY=your_key

Execution

Run the chatbot:

python run.py

Upload your document (e.g., Agenda 2063.pdf) and start interacting through the command line or web interface (depending on configuration).

7. Conclusion

8. References

LangChain Documentation: https://python.langchain.com
ChromaDB Docs: https://docs.trychroma.com/docs/overview/introduction
OpenAI API Docs: https://platform.openai.com/docs/api-reference/introduction
Groq LLM Platform: https://console.groq.com/docs/overview
Google Gemini API: https://ai.google.dev/gemini-api/docs

Context Aware RAG Assistant

Context Aware RAG Assistant

Table of contents

Abstract

1. Introduction

2. Methodology

Key Components

3. System Architecture

4. Implementation Details

5. Evaluation and Results

Performance Highlights

6. Deployment Instructions

Environment Setup

Configuration

Execution

7. Conclusion

8. References

Table of contents

Abstract

1. Introduction

2. Methodology

Key Components

3. System Architecture

4. Implementation Details

5. Evaluation and Results

Performance Highlights

6. Deployment Instructions

Environment Setup

Configuration

Execution

7. Conclusion

8. References

Datasets

Datasets

Code

Code