Abstract
LocalRAG Project1-BlueStag Documentation
1. Abstract
This project implements a Local Retrieval-Augmented Generation (RAG) system that enables users to query and extract information from their document collections using natural language. The system combines vector-based document retrieval with large language model (LLM) capabilities to provide contextually relevant answers based on the user's document corpus.
The implementation features a modular architecture supporting multiple LLM providers (OpenAI GPT, Groq Llama, and Google Gemini), ChromaDB for vector storage and similarity search, and flexible document loading capabilities for both text (.txt) and Microsoft Word (.docx) formats. The system operates locally while leveraging cloud-based LLM APIs for natural language generation, ensuring that document content remains under user control while benefiting from state-of-the-art language understanding capabilities.
Key features include automatic fallback to template documents when no user documents are present, intelligent document chunking and embedding, and an interactive command-line interface for real-time querying.
Methodology
2. Methodology
2.1 System Architecture
The RAG system follows a three-stage pipeline:
Stage 1: Document Ingestion and Processing
- Documents are loaded from a specified directory (
data/
by default)
- Support for multiple formats: plain text (.txt) and Microsoft Word (.docx/.doc)
- Automatic fallback to template documents (
data/TemplateDocs/
) when main directory is empty
- Document content is extracted and paired with source file metadata
Stage 2: Vector Database Creation
- Documents are processed through the
VectorDB
class utilizing ChromaDB
- Text content is converted to high-dimensional embeddings for semantic similarity search
- Vector representations enable efficient retrieval of contextually relevant document chunks
Stage 3: Query Processing and Generation
- User queries are converted to vector embeddings and matched against the document corpus
- Top-k similar document chunks are retrieved (default k=5)
- Retrieved context is combined with the user query in a structured prompt template
- The enhanced prompt is processed by the selected LLM to generate contextually grounded responses
2.2 LLM Provider Selection
The system implements a hierarchical API key detection mechanism:
- Primary: OpenAI GPT models (default: gpt-4o-mini)
- Secondary: Groq Llama models (default: llama-3.1-8b-instant)
- Tertiary: Google Gemini models (default: gemini-2.0-flash)
This approach ensures maximum compatibility across different user environments and API availability.
2.3 Document Processing Algorithm
def load_documents(documents_path = "data") -> List[str]:
# 1. Scan directory for supported file types
# 2. If no files found, redirect to template directory
# 3. Process each file according to its format:
# - .txt: Use LangChain TextLoader
# - .docx/.doc: Use python-docx library
# 4. Return list of (content, filename) tuples
2.4 RAG Query Pipeline
def invoke(self, input: str, n_results: int = 5) -> str:
# 1. Perform vector similarity search on user query
# 2. Extract top-n relevant document chunks
# 3. Concatenate chunks into unified context
# 4. Apply prompt template with context and question
# 5. Generate response using selected LLM
# 6. Return natural language answer
Results
3. Results
Document Loading Capabilities:
- Successfully processes both text and Word document formats
- Robust error handling with informative user feedback
- Automatic template fallback ensures system functionality even with empty document directories
- Metadata preservation allows for source attribution in responses
Vector Retrieval Efficiency:
- ChromaDB integration provides fast semantic search capabilities
- Configurable result count (default n_results=5) allows optimization for context window limitations
- Vector similarity matching effectively identifies relevant content across diverse document types
LLM Integration Reliability:
- Multi-provider support ensures high system availability
- Graceful fallback mechanism prevents single points of failure
- Environment variable configuration enables easy API key management
- Temperature setting (0.0) ensures consistent, factual responses
3.2 User Experience
Interactive Features:
- Command-line interface with continuous query loop
- Real-time document processing feedback
- Clear error messages and setup instructions
- Intuitive quit mechanism ("quit" command)
Response Quality:
- Contextually grounded answers based on user's specific document corpus
- Clear indication when insufficient context is available
- Preservation of source document relationships
- Structured prompt template ensures consistent response format