Abstract
RAG-Bot is an open-source project that implements a Retrieval-Augmented Generation (RAG) system with an interactive terminal interface. This project addresses the common challenge of LLM hallucinations by grounding responses in a knowledge base of documents. By combining the power of vector databases with modern language models, RAG-Bot enables accurate, contextual responses to natural language queries across multiple LLM providers including OpenAI, Ollama, Google, and Groq. This blog post details the implementation, architecture, and performance of the RAG-Bot system, demonstrating its effectiveness as a flexible and powerful tool for document-based question answering.
Methodology
System Architecture
RAG-Bot follows a modular architecture with several key components:
- Document Processing Pipeline: Converts various document formats (starting with JSON) into markdown files that can be chunked and embedded.
- Vector Database Integration: Uses ChromaDB as the vector store for document embeddings, enabling semantic search capabilities.
- Embedding Generation: Implements sentence transformers via Huggingface embeddings(specifically all-MiniLM-L6-v2) to create high-quality document embeddings.
- LLM Provider Interface: Provides a unified interface to multiple LLM providers (OpenAI, Ollama, Google, Groq) with configurable parameters.
- RAG Query Engine: Combines document retrieval with LLM generation to produce contextually relevant answers.
- Interactive Terminal: Offers a user-friendly command-line interface for system interaction.
Implementation Details
The implementation uses Python with several key libraries:
- ChromaDB: For vector storage and retrieval
- LangChain: For RAG pipeline components
- Sentence Transformers(huggingface): For document embedding generation
- Various LLM APIs: For text generation capabilities
- ReadyTensor: For some code reference
The document ingestion process follows these steps:
- Convert JSON documents to markdown format (optional) provided the markdown files included in the data directory
- Split documents into chunks of appropriate size
- Generate embeddings for each chunk
- Store documents and embeddings in ChromaDB
The query process involves:
- Embedding the user query
- Retrieving relevant document chunks from ChromaDB
- Constructing a prompt with retrieved context
- Sending the prompt to the selected LLM
- Returning the generated response to the user
Challenges and Solutions
Several technical challenges were addressed during development:
- ChromaDB File Access Issues: Implemented a robust database manager with proper shutdown procedures and process management to prevent file access conflicts.
- LLM Provider Abstraction: Designed a flexible provider interface that allows easy switching between different LLM backends without changing the core application logic.

Results
User Experience
The interactive terminal interface provides several advantages:
- Ease of Use: Simple commands for configuration and querying
- Flexibility: On-the-fly switching between different LLM providers
- Transparency: Clear logging of each step in the RAG process
Future Work
Several enhancements are planned for future versions:
- Web Interface: Developing a browser-based UI for broader accessibility
- Additional Document Formats: Supporting PDF, HTML, and other formats directly
- Improved Context Handling: Implementing hierarchical retrieval for better handling of complex queries
- Memory Features: Adding conversation history to maintain context across multiple queries
- Evaluation Framework: Building automated testing to measure RAG performance across different domains
Conclusion
RAG-Bot demonstrates the power of combining vector databases with language models to create an effective document querying system. The project provides a solid foundation for building domain-specific assistants that can accurately answer questions based on a corpus of documents. By supporting multiple LLM providers and focusing on modularity, RAG-Bot offers flexibility for various use cases and deployment scenarios.
The open-source nature of this project invites collaboration and extension, potentially leading to more sophisticated RAG implementations in the future. As language models continue to evolve, the RAG approach will remain valuable for grounding responses in factual information and reducing hallucinations.