RAG-based AI assistant application

Abstract

This project implements a Retrieval-Augmented Generation (RAG) AI assistant that uses LangChain, a vector database (ChromaDB), and large language models (OpenAI, Groq, Google Gemini) for context-aware question answering. The assistant ingests custom document corpora, retrieves relevant knowledge via vector search, and generates accurate, conversational responses with memory of previous interactions.

Introduction

Retrieval-Augmented Generation (RAG) enhances AI assistant capabilities by combining neural language models with context sourced from external knowledge bases. Our project addresses the limitations of “pure” LLMs by enabling responses grounded in uploaded documents (TXT, PDF, etc.), facilitating explainable answers and better multi-turn conversations. The project leverages modular design so it can be extended with memory, tool use, or reasoning workflows.

Methodology

System Architecture:

Documents are loaded, chunked, and embedded via Sentence Transformers.
ChromaDB stores document chunk embeddings for similarity search.
A prompt template combines retrieved context, conversation memory, and the user query.
LangChain chains manage the flow, interfacing with the user and the LLM.

Document Workflow:

Ingest .txt/.pdf files in the data/ directory.
Chunk and embed documents with sentence-transformers/all-MiniLM-L6-v2.
Store embeddings and metadata in ChromaDB.
On each user query, search for relevant chunks and assemble them as context for the LLM.

Memory Integration:

The assistant uses LangChain’s ConversationBufferMemory to track user-assistant interactions across turns, enhancing coherence and relevance of follow-up answers.

Experiments

We tested the assistant with a sample corpus of Ready Tensor publications and Wikipedia articles. Each interaction involved:

Asking an initial factual question covered by the corpus.
Following up with a related, more specific or context-dependent question.
Testing edge cases such as queries outside the knowledge base.
Relevant logging and print statements were enabled to verify memory context and retrieval effectiveness.

Results

The assistant provides clear, well-structured answers grounded in the ingested documents.
Memory integration demonstrated improved handling of follow-up questions (e.g., pronoun reference resolution).
Example interaction:

User: What is CRISPR gene editing?
Assistant: CRISPR-Cas9 is a revolutionary gene-editing technology...

User: How does it work?
Assistant: CRISPR works by targeting specific DNA sequences...

Out-of-scope questions receive appropriate fallback responses:
"I am sorry, but I do not have enough information to answer that."

Conclusion

This RAG-based AI assistant combines the strengths of vector retrieval and LLM reasoning with the added context of conversational memory. It sets the foundation for more advanced agentic AIs that can integrate broader tool use, intermediate reasoning, and persistent knowledge. Future enhancements may include real tool integration, persistent long-term chat memory, richer document types, and advanced reasoning modules.

Abstract

Introduction

Methodology

System Architecture:

Documents are loaded, chunked, and embedded via Sentence Transformers.
ChromaDB stores document chunk embeddings for similarity search.
A prompt template combines retrieved context, conversation memory, and the user query.
LangChain chains manage the flow, interfacing with the user and the LLM.

Document Workflow:

Ingest .txt/.pdf files in the data/ directory.
Chunk and embed documents with sentence-transformers/all-MiniLM-L6-v2.
Store embeddings and metadata in ChromaDB.
On each user query, search for relevant chunks and assemble them as context for the LLM.

Memory Integration:

The assistant uses LangChain’s ConversationBufferMemory to track user-assistant interactions across turns, enhancing coherence and relevance of follow-up answers.

Experiments

We tested the assistant with a sample corpus of Ready Tensor publications and Wikipedia articles. Each interaction involved:

Asking an initial factual question covered by the corpus.
Following up with a related, more specific or context-dependent question.
Testing edge cases such as queries outside the knowledge base.
Relevant logging and print statements were enabled to verify memory context and retrieval effectiveness.

Results

The assistant provides clear, well-structured answers grounded in the ingested documents.
Memory integration demonstrated improved handling of follow-up questions (e.g., pronoun reference resolution).
Example interaction:

User: What is CRISPR gene editing?
Assistant: CRISPR-Cas9 is a revolutionary gene-editing technology...

User: How does it work?
Assistant: CRISPR works by targeting specific DNA sequences...

Out-of-scope questions receive appropriate fallback responses:
"I am sorry, but I do not have enough information to answer that."

RAG-based AI assistant application

Table of contents

Abstract

Introduction

Methodology

System Architecture:

Document Workflow:

Memory Integration:

Experiments

Results

Conclusion

Table of contents

Files

Abstract

Introduction

Methodology

System Architecture:

Document Workflow:

Memory Integration:

Experiments

Results

Conclusion

Code

Code