AskTheDocs: a Local‑First RAG Chatbot for Technical Documentation

Abstract

This paper presents AskTheDocs, a Retrieval‑Augmented Generation (RAG) chatbot built with LangGraph and Weaviate. AskTheDocs enables developers to query local technical documentation (PDF/TXT) via natural language, achieving grounded, source‑attributed answers. We demonstrate: (1) an end‑to‑end RAG pipeline, (2) flexible LLM integration (OpenAI GPT‑4 or Ollama Llama 3.1), and (3) a modular architecture supporting scalable document ingestion and retrieval.

Agentic AI Flowchart.jpg

1. Introduction & Purpose

AskTheDocs addresses the challenge of navigating large, heterogeneous documentation sets. Instead of manual keyword search, users drop their documentation files in a folder and chat with the agent to retrieve precise, context‑aware answers without hallucination.

By the end, readers will understand how the pipeline is built, how to set up and use it, and how to extend its capabilities to new documents or models.

2. Objectives of the Publication

This publication provides a comprehensive overview of AskTheDocs’ architecture, implementation, and operational considerations. Key objectives:

Setup & Usage: Instructions and considerations for installation, running, and configuration.
Document Ingestion: How PDF/TXT files are split, embedded, and stored in Weaviate.
Retrieval & Summarization: Retrieval of top‑k chunks and summarization prompts.
Response Generation: QA prompt design for grounded answers.
Operational Insights: Resource considerations and bottlenecks when running AskTheDocs.

3. Intended Audience & Use Case

AskTheDocs is designed for technical professionals who regularly interact with extensive and complex documentation. It provides a local-first, RAG-based chatbot interface that leverages the user’s own documentation corpus to generate contextually accurate, grounded responses.

The tool is particularly well-suited for:

🔍 Comparing features across frameworks
🧑‍💻 Debugging implementation issues using information from actual docs
🧱 Extracting precise code examples and usage patterns from trusted sources
🤖 Mitigating hallucinations by restricting responses to the local document set

For example, developers can include both Flask and Django documentation and prompt the chatbot with:
"What’s the equivalent of @app.route in Django?"

Prerequisites: Python 3.11+, familiarity with LangChain, and a Weaviate instance.

4. Setup & Usage

Clone the repo:

git clone https://github.com/rishi255/askthedocs  
cd askthedocs

Create and activate a virtual environment:

python -m venv venv  
source venv/bin/activate  # Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt  
```

Configure environment variables:

cp .env.example .env  
# Then edit .env to set your API keys (OpenAI etc.)

Add your documentation:
Place your .pdf or .txt files inside the data/ directory. These will be automatically parsed and indexed for Q&A.
Run the chatbot:
```
python rag_chatbot.py  
```

Simply ask your questions based on the loaded documentation. Type 'quit' to exit.

Screenshot - the Chatbot in Action

AskTheDocs Screenshot.jpg

5. Architecture & Methodology

5.1 High‑Level Design

DocumentLoader class:
- Picks either PyPDFLoader/TextLoader to read files
- Uses LangChain's RecursiveCharacterTextSplitter to chunk documents (400‑char chunks, 50‑char overlap)
- Batch-processes the embeddings into Weaviate
Vector Store & Retriever
- LangChain's WeaviateVectorStore holds the document chunks
- Semantic‑search retriever fetches top‑4 relevant chunks
LLM Orchestration
- Uses a three‑step LangGraph StateGraph:
  1. retrieve – fetch chunks
  2. summarize – condense context
  3. generate – produce final answer via QA prompt
Configuration (.env file)
- MODEL_PROVIDER decides whether inference will be done locally or via cloud API (OpenAI for now). Options: ollama or openai
- WEAVIATE_API_KEY and WEAVIATE_CLUSTER_URL for Weaviate configuration
- OPENAI_API_KEY (only used if MODEL_PROVIDER = openai)
- OPENAI_MODEL_NAME (only used if MODEL_PROVIDER = openai)
- OLLAMA_MODEL_NAME (only used if MODEL_PROVIDER = ollama)

5.2 Rationale & Alternatives

Weaviate for built‑in batching and cloud scaling; can swap to Pinecone or FAISS.
all‑MiniLM‑L6‑v2 embeddings balance speed and accuracy.
Chunking optimizes semantic coherence and LLM context usage.

6. Licensing, Support & Operational Considerations

License: Apache 2.0
Repository: https://github.com/rishi255/askthedocs
Contact: rishikeshrachchh@gmail.com (issues via GitHub Issues)
Resource Needs:
- Minimal if using an API (e.g., OpenAI) for inference.
- Bottlenecks: Ollama on less‑powerful machines or with large models; initial document indexing (one‑time).

7. Limitations & Trade‑offs

Cold Start: One‑time indexing latency for large doc sets.
Chunk Granularity: Fixed splits may bisect code examples; future semantic splitting could help.
API Costs: OpenAI usage incurs costs; Ollama self‑hosted may underperform high‑param models.

8. Future Scope

Memory Module: Support multi‑turn context.
Extended Formats: Add DOCX, HTML, and Markdown loaders.
UI Layer: Web interface with streaming responses.
Advanced Retrieval: Metadata filtering, citation highlighting.
Monitoring: Logging, metrics, and alerting for drift detection.

9. Conclusion

AskTheDocs offers a practical blueprint for local‑first RAG chatbots over technical documentation. Its modular design, clear methodology, and operational guidance meet key evaluation criteria in software/tool development. We welcome community adoption, feedback, and contributions.

10. References

LangChain & LangGraph documentation
Weaviate documentation
OpenAI GPT‑4 and Ollama Llama API docs