This paper presents AskTheDocs, a Retrieval‑Augmented Generation (RAG) chatbot built with LangGraph and Weaviate. AskTheDocs enables developers to query local technical documentation (PDF/TXT) via natural language, achieving grounded, source‑attributed answers. We demonstrate: (1) an end‑to‑end RAG pipeline, (2) flexible LLM integration (OpenAI GPT‑4 or Ollama Llama 3.1), and (3) a modular architecture supporting scalable document ingestion and retrieval.
AskTheDocs addresses the challenge of navigating large, heterogeneous documentation sets. Instead of manual keyword search, users drop their documentation files in a folder and chat with the agent to retrieve precise, context‑aware answers without hallucination.
By the end, readers will understand how the pipeline is built, how to set up and use it, and how to extend its capabilities to new documents or models.
This publication provides a comprehensive overview of AskTheDocs’ architecture, implementation, and operational considerations. Key objectives:
AskTheDocs is designed for technical professionals who regularly interact with extensive and complex documentation. It provides a local-first, RAG-based chatbot interface that leverages the user’s own documentation corpus to generate contextually accurate, grounded responses.
The tool is particularly well-suited for:
For example, developers can include both Flask and Django documentation and prompt the chatbot with:
"What’s the equivalent of @app.route
in Django?"
Prerequisites: Python 3.11+, familiarity with LangChain, and a Weaviate instance.
Clone the repo:
git clone https://github.com/rishi255/askthedocs cd askthedocs
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
Install dependencies:
pip install -r requirements.txt
Configure environment variables:
cp .env.example .env # Then edit .env to set your API keys (OpenAI etc.)
Add your documentation:
Place your .pdf
or .txt
files inside the data/
directory. These will be automatically parsed and indexed for Q&A.
Run the chatbot:
python rag_chatbot.py
Simply ask your questions based on the loaded documentation. Type 'quit'
to exit.
DocumentLoader class:
PyPDFLoader
/TextLoader
to read filesRecursiveCharacterTextSplitter
to chunk documents (400‑char chunks, 50‑char overlap)Vector Store & Retriever
WeaviateVectorStore
holds the document chunksLLM Orchestration
Uses a three‑step LangGraph StateGraph:
Configuration (.env file)
MODEL_PROVIDER
decides whether inference will be done locally or via cloud API (OpenAI for now). Options: ollama
or openai
WEAVIATE_API_KEY
and WEAVIATE_CLUSTER_URL
for Weaviate configurationOPENAI_API_KEY
(only used if MODEL_PROVIDER
= openai
)OPENAI_MODEL_NAME
(only used if MODEL_PROVIDER
= openai
)OLLAMA_MODEL_NAME
(only used if MODEL_PROVIDER
= ollama
)License: Apache 2.0
Repository: https://github.com/rishi255/askthedocs
Contact: rishikeshrachchh@gmail.com (issues via GitHub Issues)
Resource Needs:
AskTheDocs offers a practical blueprint for local‑first RAG chatbots over technical documentation. Its modular design, clear methodology, and operational guidance meet key evaluation criteria in software/tool development. We welcome community adoption, feedback, and contributions.