Building a RAG-Based AI Assistant for LangChain Documentation

A Retrieval-Augmented Generation system using ChromaDB, Sentence Transformers, and multi-provider LLM support

Introduction

Large Language Models (LLMs) are powerful but limited by their training data cutoff. When developers need answers about specific frameworks like LangChain, general-purpose LLMs may produce outdated or hallucinated responses. Retrieval-Augmented Generation (RAG) addresses this by grounding LLM responses in actual source documentation, ensuring answers are accurate and verifiable.

This project builds a RAG-based AI assistant that answers developer questions about LangChain by searching through its official documentation and generating contextual responses. The system demonstrates core RAG concepts: document ingestion, text chunking, vector embedding, similarity search, and context-augmented generation.

Objectives

Build a functional RAG pipeline that retrieves relevant documentation before generating answers
Implement a vector database for semantic search across documentation
Support multiple LLM providers (OpenAI, Groq, Google Gemini) for flexible deployment
Create an interactive CLI assistant that maintains conversation history

Problem Definition

Developers working with LangChain face a common challenge: the framework evolves rapidly, and documentation is spread across many pages. Traditional keyword search often fails to surface the most relevant information for nuanced technical questions. An LLM alone may hallucinate API details or reference deprecated patterns.

The gap this project addresses is providing a system that:

Searches actual documentation rather than relying on LLM training data
Returns answers grounded in specific source documents with attribution
Is adaptable to any documentation corpus by simply swapping the text files

Approach and Methodology

The system follows a standard RAG architecture:

User Question --> Vector DB Search --> Context Formatting --> LLM Response

Step 1 - Document Ingestion: Plain text files from the data/ directory are loaded using LangChain's TextLoader. Each file represents a page of LangChain documentation.

Step 2 - Text Chunking: Documents are split into smaller chunks using RecursiveCharacterTextSplitter with a chunk size of 500 characters and 200-character overlap. The overlap preserves context across chunk boundaries. The splitter uses a hierarchy of separators (\n\n, \n, . , , "") to break text at natural boundaries.

Step 3 - Embedding: Each chunk is encoded into a 384-dimensional vector using the all-MiniLM-L6-v2 Sentence Transformer model. This lightweight model runs locally without GPU requirements.

Step 4 - Vector Storage: Embeddings are stored in ChromaDB with persistent local storage. Each chunk retains its source metadata (filename, chunk index) for attribution.

Step 5 - Retrieval: When a user asks a question, it is embedded using the same model and the top 5 most similar chunks are retrieved using cosine distance.

Step 6 - Generation: Retrieved chunks are formatted into a structured prompt along with conversation history, then sent to the LLM. The prompt template enforces chain-of-thought reasoning with separate "Reasoning" and "Answer" sections.

Design Decisions

Local embeddings over API-based: Chose Sentence Transformers for zero-cost, offline-capable embedding generation rather than OpenAI embeddings
ChromaDB over FAISS/Pinecone: ChromaDB provides persistent local storage with a simple Python API, suitable for a single-user CLI application without requiring external infrastructure
Multi-provider LLM support: Auto-detection of API keys allows users to use whichever provider they have access to (including Groq's free tier), lowering the barrier to running the project
RecursiveCharacterTextSplitter over fixed chunking: Respects natural text boundaries (paragraphs, sentences) rather than arbitrary character splits

Implementation Details

The project consists of two core modules:

VectorDB (vectordb.py)

The VectorDB class wraps ChromaDB with embedding generation:

class VectorDB:
    def __init__(self, collection_name, embedding_model):
        self.client = chromadb.PersistentClient(path="./chroma_db")
        self.embedding_model = SentenceTransformer(embedding_model)
        self.collection = self.client.get_or_create_collection(name=collection_name)

Key methods:

chunk_text(text, chunk_size) — Splits text using LangChain's RecursiveCharacterTextSplitter
add_documents(documents) — Chunks, embeds, and stores documents with metadata
search(query, n_results) — Encodes query and returns top-N similar chunks with distances

RAGAssistant (app.py)

The RAGAssistant class orchestrates the full pipeline:

class RAGAssistant:
    def __init__(self):
        self.llm = self._initialize_llm()  # Auto-detects provider
        self.vector_db = VectorDB()
        self.prompt_template = ChatPromptTemplate.from_template(...)
        self.chain = self.prompt_template | self.llm | StrOutputParser()

The prompt template includes structured fields for role, context, instructions, reasoning process, output constraints, style, and goal. This structured approach ensures consistent, well-formatted responses.

LLM provider detection follows a priority chain: OpenAI > Groq > Google Gemini, based on which API key is present in the environment.

Dataset

The knowledge base consists of 8 text files sourced from the official LangChain documentation:

File	Topic
`overview.txt`	Core benefits, getting started
`models.txt`	Chat models, providers, streaming, structured output
`agents.txt`	Agent architecture, tools, prompts, memory
`tools.txt`	Creating tools, decorators, ToolNode
`messages.txt`	Message types, roles, conversation history
`short-term-memory.txt`	Short-term memory and state management
`streaming-overview.txt`	Streaming patterns
`streaming-frontend.txt`	Frontend streaming integration

These documents cover the core concepts a developer encounters when building with LangChain: from basic model invocation through agents, tools, memory, and streaming.

Processing: Each document is chunked into ~500-character segments with 200-character overlap, producing approximately 80-100 total chunks across all 8 files. Chunks are embedded into 384-dimensional vectors and stored in ChromaDB.

Tools and Technologies

Component	Technology	Purpose
Framework	LangChain	Prompt templates, output parsing, text splitting
Vector Database	ChromaDB	Persistent local vector storage and similarity search
Embeddings	Sentence Transformers (`all-MiniLM-L6-v2`)	Local embedding generation (384 dimensions)
LLM Providers	OpenAI, Groq, Google Gemini	Response generation
Package Manager	uv	Dependency management and virtual environments
Testing	pytest	Unit testing for vector database operations

Results

The assistant successfully answers developer questions about LangChain by retrieving relevant documentation chunks and generating grounded responses.

Example interaction:

Q: How do I create a tool in LangChain?

Reasoning:
Based on the retrieved context from tools.txt, LangChain provides
a @tool decorator for creating tools that agents can use.

Answer:
The simplest way to create a tool in LangChain is with the @tool
decorator. The function's docstring becomes the tool description
that helps the model understand when to use it. Type hints are
required as they define the tool's input schema.

The system correctly:

Retrieves chunks from the most relevant source file (tools.txt for tool-related questions)
Provides code examples from the documentation when applicable
Attributes answers to specific source documents
Separates reasoning from the final answer for transparency

Key Findings

Chunk size matters: 500-character chunks with 200-character overlap provide a good balance between context preservation and retrieval precision for technical documentation
Local embeddings are sufficient: The all-MiniLM-L6-v2 model provides adequate semantic similarity for documentation search without requiring API calls or GPU
Structured prompts improve output quality: Enforcing chain-of-thought reasoning with explicit output sections (Reasoning + Answer) produces more transparent and verifiable responses
Multi-provider support increases accessibility: Supporting free-tier providers (Groq) alongside premium ones (OpenAI) makes the project accessible to anyone

Limitations

Static corpus: The knowledge base requires manual updates when LangChain documentation changes. There is no automated ingestion pipeline
No evaluation metrics: The system lacks formal retrieval quality metrics (e.g., recall@k, MRR) or response quality evaluation (e.g., faithfulness scoring)
Single-user design: ChromaDB's persistent local storage does not support concurrent users
No re-ranking: Retrieved chunks are used in order of cosine similarity without a cross-encoder re-ranking step
Limited corpus size: 8 documentation files cover core concepts but not the full LangChain documentation

Future Directions

Automated documentation ingestion: Build a scraper or use LangChain's document loaders to automatically pull and update documentation
Evaluation framework: Add retrieval metrics (recall@k, precision@k) and response faithfulness scoring using LLM-as-judge
Cross-encoder re-ranking: Add a re-ranking step after initial retrieval to improve chunk relevance
Web interface: Replace the CLI with a Streamlit or Gradio frontend
Expand corpus: Include the full LangChain documentation, API reference, and community guides

Conclusion

This project demonstrates a practical RAG system that grounds LLM responses in actual documentation. The architecture is intentionally modular: the vector database, embedding model, and LLM provider can each be swapped independently. While the current implementation is scoped as a CLI tool for LangChain documentation, the same pipeline can be adapted to any text corpus by replacing the files in the data/ directory.

The source code is available on GitHub under the CC BY-NC-SA 4.0 license.

Building a RAG-Based AI Assistant for LangChain Documentation

Table of contents

Building a RAG-Based AI Assistant for LangChain Documentation

Introduction

Objectives

Problem Definition

Approach and Methodology

Design Decisions

Implementation Details

VectorDB (vectordb.py)

RAGAssistant (app.py)

Dataset

Tools and Technologies

Results

Key Findings

Limitations

Future Directions

Conclusion

Table of contents

Datasets

Datasets

Code

Code