AI-RAG-docuquery: A Retrieval-Augmented Document Query System

Abstract

I present AI-RAG-docuquery, an open-source system for retrieval-augmented question answering (QA) over heterogeneous document collections. The tool integrates dense vector search (FAISS), sentence-transformer embeddings, and configurable large language model (LLM) backends into a unified desktop application. It enables users to index local corpora of PDFs, DOCX, PPTX, XLSX, TXT, CSV, or Markdown files and obtain grounded, source-cited answers in response to natural-language queries. The project is released under the MIT License and is available on GitHub.

System Overview

AI-RAG-docuquery is designed as a lightweight, reproducible framework for applied Retrieval-Augmented Generation (RAG). The architecture is composed of three main layers:

Indexing and Retrieval
- Documents are pre-processed via loaders for multiple formats.
- Embeddings are computed using Sentence-Transformers (MiniLM or E5 family).
- Vectors are stored in FAISS indexes, supporting dense retrieval; hybrid sparse/dense retrieval is optionally enabled.
- Index metadata (filenames, page numbers, chunk offsets) is stored in JSONL for citation alignment.
Answer Generation
- Retrieved contexts are passed to an LLM backend.
- Supported backends include:
  - No-LLM (citations-only) mode: returns top-k passages.
  - OpenAI-compatible APIs: model selection (e.g., gpt-4o-mini), API key, and optional Base URL for third-party providers.
  - Local HuggingFace models: user-supplied IDs (e.g., Qwen/Qwen2.5-0.5B-Instruct) running on CPU/GPU.
User Interface and Usability
- Cross-platform GUI implemented in PyQt6.
- Indexing and querying available in separate tabs.
- Clickable citations: search results link directly to local documents at the relevant location.
- Hot-swappable LLM backends without application restart.

Technical Contributions

This project emphasizes the following contributions:

Transparent grounding: every generated answer includes ranked, clickable citations to original documents.
Flexible LLM abstraction layer: switching between commercial APIs, local HuggingFace models, or citation-only mode requires no code modification.
Reproducible deployment: distributed as both source and standalone Windows executables (PyInstaller).
Modular architecture: clean separation of concerns (indexer.py, retrieve.py, llm_clients.py, loaders.py), facilitating extension and adaptation.

Implementation Details

Programming language: Python (3.10–3.12)
Libraries: PyQt6, FAISS-cpu, sentence-transformers, transformers, torch, pymupdf, pypdf, pandas, openai
Platform: Windows 10/11 (tested); macOS/Linux supported from source
Index storage: faiss_index/ containing FAISS binaries, JSONL metadata, and serialized configs
Build options:
- --onefile PyInstaller for portable .exe
- --onedir PyInstaller for robustness with shared libraries

Use Cases

Personal document QA: knowledge workers can query technical manuals, research papers, or organizational archives.
Educational support: students can index lecture slides, notes, and reference material.
Enterprise knowledge management: potential to extend with larger sharded FAISS indexes.

Availability

The full project, including source code, requirements, and build instructions, is publicly available on GitHub:
👉 AI-RAG-docuquery (GitHub Repository)

License: MIT
Latest release: v2.0 (2025-08-27)

Roadmap

Planned future work includes:

Multi-index management (create, merge, and query across multiple FAISS indexes).
Scalable indexing (on-disk FAISS, sharding for >10M vectors).
Additional LLM adapters (Azure-OpenAI, Anthropic, Google Gemini, Mistral).
Improved hybrid ranking and passage de-duplication.

Conclusion

With AI-RAG-docuquery I demonstrate how retrieval-augmented generation can be applied to everyday document collections in a transparent and reproducible way. By unifying vector retrieval, flexible LLM integration, and verifiable citations into a desktop-ready package, the project bridges the gap between academic RAG prototypes and practical end-user tools.

AI-RAG-docuquery: A Retrieval-Augmented Document Query System

Abstract

System Overview

AI-RAG-docuquery is designed as a lightweight, reproducible framework for applied Retrieval-Augmented Generation (RAG). The architecture is composed of three main layers:

Indexing and Retrieval
- Documents are pre-processed via loaders for multiple formats.
- Embeddings are computed using Sentence-Transformers (MiniLM or E5 family).
- Vectors are stored in FAISS indexes, supporting dense retrieval; hybrid sparse/dense retrieval is optionally enabled.
- Index metadata (filenames, page numbers, chunk offsets) is stored in JSONL for citation alignment.
Answer Generation
- Retrieved contexts are passed to an LLM backend.
- Supported backends include:
  - No-LLM (citations-only) mode: returns top-k passages.
  - OpenAI-compatible APIs: model selection (e.g., gpt-4o-mini), API key, and optional Base URL for third-party providers.
  - Local HuggingFace models: user-supplied IDs (e.g., Qwen/Qwen2.5-0.5B-Instruct) running on CPU/GPU.
User Interface and Usability
- Cross-platform GUI implemented in PyQt6.
- Indexing and querying available in separate tabs.
- Clickable citations: search results link directly to local documents at the relevant location.
- Hot-swappable LLM backends without application restart.

Technical Contributions

This project emphasizes the following contributions:

Transparent grounding: every generated answer includes ranked, clickable citations to original documents.
Flexible LLM abstraction layer: switching between commercial APIs, local HuggingFace models, or citation-only mode requires no code modification.
Reproducible deployment: distributed as both source and standalone Windows executables (PyInstaller).
Modular architecture: clean separation of concerns (indexer.py, retrieve.py, llm_clients.py, loaders.py), facilitating extension and adaptation.

Implementation Details

Programming language: Python (3.10–3.12)
Libraries: PyQt6, FAISS-cpu, sentence-transformers, transformers, torch, pymupdf, pypdf, pandas, openai
Platform: Windows 10/11 (tested); macOS/Linux supported from source
Index storage: faiss_index/ containing FAISS binaries, JSONL metadata, and serialized configs
Build options:
- --onefile PyInstaller for portable .exe
- --onedir PyInstaller for robustness with shared libraries

Use Cases

Personal document QA: knowledge workers can query technical manuals, research papers, or organizational archives.
Educational support: students can index lecture slides, notes, and reference material.
Enterprise knowledge management: potential to extend with larger sharded FAISS indexes.

Availability

The full project, including source code, requirements, and build instructions, is publicly available on GitHub:
👉 AI-RAG-docuquery (GitHub Repository)

License: MIT
Latest release: v2.0 (2025-08-27)

Roadmap

Planned future work includes:

Multi-index management (create, merge, and query across multiple FAISS indexes).
Scalable indexing (on-disk FAISS, sharding for >10M vectors).
Additional LLM adapters (Azure-OpenAI, Anthropic, Google Gemini, Mistral).
Improved hybrid ranking and passage de-duplication.

AI-RAG-docuquery: A Retrieval-Augmented Document Query System

Table of contents

AI-RAG-docuquery: A Retrieval-Augmented Document Query System

Abstract

System Overview

Technical Contributions

Implementation Details

Use Cases

Availability

Roadmap

Conclusion

Table of contents

AI-RAG-docuquery: A Retrieval-Augmented Document Query System

Abstract

System Overview

Technical Contributions

Implementation Details

Use Cases

Availability

Roadmap

Conclusion

Code

Code