Chat with Product Documents (BlueStag)

Abstract

LocalRAG Project1-BlueStag Documentation

1. Abstract

This project implements a Local Retrieval-Augmented Generation (RAG) system that enables users to query and extract information from their document collections using natural language. The system combines vector-based document retrieval with large language model (LLM) capabilities to provide contextually relevant answers based on the user's document corpus.

The implementation features a modular architecture supporting multiple LLM providers (OpenAI GPT, Groq Llama, and Google Gemini), ChromaDB for vector storage and similarity search, and flexible document loading capabilities for both text (.txt) and Microsoft Word (.docx) formats. The system operates locally while leveraging cloud-based LLM APIs for natural language generation, ensuring that document content remains under user control while benefiting from state-of-the-art language understanding capabilities.

Key features include automatic fallback to template documents when no user documents are present, intelligent document chunking and embedding, and an interactive command-line interface for real-time querying.

Methodology

2. Methodology

2.1 System Architecture

The RAG system follows a three-stage pipeline:

Stage 1: Document Ingestion and Processing

Documents are loaded from a specified directory (data/ by default)
Support for multiple formats: plain text (.txt) and Microsoft Word (.docx/.doc)
Automatic fallback to template documents (data/TemplateDocs/) when main directory is empty
Document content is extracted and paired with source file metadata

Stage 2: Vector Database Creation

Documents are processed through the VectorDB class utilizing ChromaDB
Text content is converted to high-dimensional embeddings for semantic similarity search
Vector representations enable efficient retrieval of contextually relevant document chunks

Stage 3: Query Processing and Generation

User queries are converted to vector embeddings and matched against the document corpus
Top-k similar document chunks are retrieved (default k=5)
Retrieved context is combined with the user query in a structured prompt template
The enhanced prompt is processed by the selected LLM to generate contextually grounded responses

2.2 LLM Provider Selection

The system implements a hierarchical API key detection mechanism:

Primary: OpenAI GPT models (default: gpt-4o-mini)
Secondary: Groq Llama models (default: llama-3.1-8b-instant)
Tertiary: Google Gemini models (default: gemini-2.0-flash)

This approach ensures maximum compatibility across different user environments and API availability.

2.3 Document Processing Algorithm

def load_documents(documents_path = "data") -> List[str]:
    # 1. Scan directory for supported file types
    # 2. If no files found, redirect to template directory
    # 3. Process each file according to its format:
    #    - .txt: Use LangChain TextLoader
    #    - .docx/.doc: Use python-docx library
    # 4. Return list of (content, filename) tuples

2.4 RAG Query Pipeline

def invoke(self, input: str, n_results: int = 5) -> str:
    # 1. Perform vector similarity search on user query
    # 2. Extract top-n relevant document chunks
    # 3. Concatenate chunks into unified context
    # 4. Apply prompt template with context and question
    # 5. Generate response using selected LLM
    # 6. Return natural language answer

Results

3. Results

3.1 System Performance

Document Loading Capabilities:

Successfully processes both text and Word document formats
Robust error handling with informative user feedback
Automatic template fallback ensures system functionality even with empty document directories
Metadata preservation allows for source attribution in responses

Vector Retrieval Efficiency:

ChromaDB integration provides fast semantic search capabilities
Configurable result count (default n_results=5) allows optimization for context window limitations
Vector similarity matching effectively identifies relevant content across diverse document types

LLM Integration Reliability:

Multi-provider support ensures high system availability
Graceful fallback mechanism prevents single points of failure
Environment variable configuration enables easy API key management
Temperature setting (0.0) ensures consistent, factual responses

3.2 User Experience

Interactive Features:

Command-line interface with continuous query loop
Real-time document processing feedback
Clear error messages and setup instructions
Intuitive quit mechanism ("quit" command)

Response Quality:

Contextually grounded answers based on user's specific document corpus
Clear indication when insufficient context is available
Preservation of source document relationships
Structured prompt template ensures consistent response format

Abstract

LocalRAG Project1-BlueStag Documentation

1. Abstract

Methodology

2. Methodology

2.1 System Architecture

The RAG system follows a three-stage pipeline:

Stage 1: Document Ingestion and Processing

Documents are loaded from a specified directory (data/ by default)
Support for multiple formats: plain text (.txt) and Microsoft Word (.docx/.doc)
Automatic fallback to template documents (data/TemplateDocs/) when main directory is empty
Document content is extracted and paired with source file metadata

Stage 2: Vector Database Creation

Documents are processed through the VectorDB class utilizing ChromaDB
Text content is converted to high-dimensional embeddings for semantic similarity search
Vector representations enable efficient retrieval of contextually relevant document chunks

Stage 3: Query Processing and Generation

User queries are converted to vector embeddings and matched against the document corpus
Top-k similar document chunks are retrieved (default k=5)
Retrieved context is combined with the user query in a structured prompt template
The enhanced prompt is processed by the selected LLM to generate contextually grounded responses

2.2 LLM Provider Selection

The system implements a hierarchical API key detection mechanism:

Primary: OpenAI GPT models (default: gpt-4o-mini)
Secondary: Groq Llama models (default: llama-3.1-8b-instant)
Tertiary: Google Gemini models (default: gemini-2.0-flash)

This approach ensures maximum compatibility across different user environments and API availability.

2.3 Document Processing Algorithm

def load_documents(documents_path = "data") -> List[str]:
    # 1. Scan directory for supported file types
    # 2. If no files found, redirect to template directory
    # 3. Process each file according to its format:
    #    - .txt: Use LangChain TextLoader
    #    - .docx/.doc: Use python-docx library
    # 4. Return list of (content, filename) tuples

2.4 RAG Query Pipeline

def invoke(self, input: str, n_results: int = 5) -> str:
    # 1. Perform vector similarity search on user query
    # 2. Extract top-n relevant document chunks
    # 3. Concatenate chunks into unified context
    # 4. Apply prompt template with context and question
    # 5. Generate response using selected LLM
    # 6. Return natural language answer

Results

3. Results

3.1 System Performance

Document Loading Capabilities:

Successfully processes both text and Word document formats
Robust error handling with informative user feedback
Automatic template fallback ensures system functionality even with empty document directories
Metadata preservation allows for source attribution in responses

Vector Retrieval Efficiency:

ChromaDB integration provides fast semantic search capabilities
Configurable result count (default n_results=5) allows optimization for context window limitations
Vector similarity matching effectively identifies relevant content across diverse document types

LLM Integration Reliability:

Multi-provider support ensures high system availability
Graceful fallback mechanism prevents single points of failure
Environment variable configuration enables easy API key management
Temperature setting (0.0) ensures consistent, factual responses

3.2 User Experience

Interactive Features:

Command-line interface with continuous query loop
Real-time document processing feedback
Clear error messages and setup instructions
Intuitive quit mechanism ("quit" command)

Response Quality:

Contextually grounded answers based on user's specific document corpus
Clear indication when insufficient context is available
Preservation of source document relationships
Structured prompt template ensures consistent response format

Chat with Product Documents (BlueStag)

Table of contents

Abstract

LocalRAG Project1-BlueStag Documentation

1. Abstract

Methodology

2. Methodology

2.1 System Architecture

2.2 LLM Provider Selection

2.3 Document Processing Algorithm

2.4 RAG Query Pipeline

Results

3. Results

3.1 System Performance

3.2 User Experience

Table of contents

Abstract

LocalRAG Project1-BlueStag Documentation

1. Abstract

Methodology

2. Methodology

2.1 System Architecture

2.2 LLM Provider Selection

2.3 Document Processing Algorithm

2.4 RAG Query Pipeline

Results

3. Results

3.1 System Performance

3.2 User Experience

Datasets

Datasets

Code

Code