A Retrieval-Augmented Generation (RAG) system designed for semantic search and question-answering over academic documents, specifically focused on medieval Jewish magic and historical texts.
Overview
This project implements a complete RAG pipeline that processes PDF documents, creates semantic embeddings, stores them in a vector database, and provides an intelligent question-answering interface. The system combines document retrieval with large language model generation to provide contextually accurate answers based on your document corpus.
Key features:
PDF Document Processing: Automatic extraction and chunking of academic papers
Semantic Search: Vector-based similarity search using sentence transformers
Multi-LLM Support: Compatible with OpenAI GPT, Groq Llama, and Google Gemini
Web Interface: User-friendly FastAPI-based frontend
Persistent Storage: ChromaDB vector database for efficient document retrieval
Demo
Web Interface
The web interface provides an intuitive way to query your document collection. Simply type your question and get AI-powered answers based on your academic papers.
Key Features Shown:
🔍 Semantic Search: Natural language queries across your document corpus
📚 Source Attribution: Clear references to specific documents and sections
🎯 Contextual Answers: AI responses grounded in your actual content
⚡ Real-time Processing: Fast response times for interactive research
Target Audience
This project is intended for:
Academic Researchers studying historical texts and manuscripts
Digital Humanities Scholars working with large document collections
Graduate Students conducting literature reviews and research
Developers interested in implementing RAG systems for domain-specific applications
Prerequisites
Required Knowledge
Basic Python programming
Understanding of virtual environments
Familiarity with command line interfaces
Basic knowledge of machine learning concepts (helpful but not required)
Hardware Requirements
RAM: Minimum 8GB (16GB recommended for larger document collections)
Storage: At least 5GB free space for models and embeddings
CPU: Multi-core processor recommended for faster embedding generation
System Compatibility
Operating Systems: Windows 10/11, macOS, Linux
Python: Version 3.11 or higher
Internet Connection: Required for initial model downloads and API access
Installation
1. Clone or Download the Project
cd"path/to/your/projects"# If using git:git clone <repository-url># Or extract from zip file