Talk2PDF is an intelligent web application that allows you to have interactive conversations with your PDF documents. Upload a PDF, and the app will enable you to ask questions about its content, leveraging powerful AI models to provide accurate answers from the text and even explain images within the document.
This application is built with Flask for the backend, LangChain for orchestrating language model interactions, and Google's Gemini models for state-of-the-art text and vision understanding.
Features
Interactive Chat Interface: Ask questions in natural language and get responses in real-time.
PDF Content Analysis: Processes the entire text content of your uploaded PDF to provide comprehensive answers.
Image Understanding (Optional): Extracts images from the PDF, analyzes them, and can answer questions about their content.
Deduplication: Intelligently detects and skips duplicate images to save processing time.
Easy Setup: Run the application with just a few simple commands.
Secure API Key Handling: Prompts for your API key if not found in the environment, avoiding hard-coding.
How It Works
The application follows a multi-step workflow to process your documents and answer questions intelligently.
User Flow
Upload a PDF file via the web interface.
Choose between Text Only or Text & Image analysis.
Ask questions through the chat interface.
Receive AI-generated answers based on the PDF's content.
Backend Processing (LangChain Flow)
Upload PDF: The user uploads a PDF file through the web interface.
Store File: The file is temporarily stored on the server.
Setup QA System: The core question-answering system is initialized.
Text Splitting: The document’s text is split into smaller chunks.
Generate Embeddings: Text chunks are converted into embeddings using Google’s AI.
Create RetrievalQA Chain: A LangChain-based chain is created to match questions with relevant content.
Ask Question: The user enters a question.
Display Answer: The system retrieves the relevant chunks and generates a human-like answer.
Getting Started
Follow these steps to set up and run the application locally.
Prerequisites
Python 3.8 or higher
pip (Python package manager)
Installation
Clone the repository:
git clone https://github.com/nishanthnaa52/Talk2pdf
cd Talk2pdf
Install dependencies:
pip install -r requirements.txt
Configuration
This app uses the Google Gemini API, which requires an API key.