Gemini PDF Chatbot using Streamlit

Abstract

In today's information-driven world, interacting with documents using natural language has become a critical need for both technical and non-technical users. Traditional PDF documents are static and require manual reading and searching, often resulting in inefficiencies. This project introduces an AI-powered PDF Chatbot built with Streamlit and the Google Gemini API, designed to enable real-time, conversational access to PDF content. By combining LLM-based understanding, semantic vector search, and a lightweight web UI, the chatbot enables users to ask questions about any uploaded PDF and receive intelligent, context-aware responses. The system is particularly well-suited for domains like education, legal tech, research, and enterprise documentation.

Methodology

The chatbot's pipeline consists of multiple interconnected stages, each leveraging cutting-edge AI tools and frameworks to deliver seamless interaction with document content.

📥 1. PDF Upload and Text Extraction

Users upload a .pdf file via the Streamlit UI, which is designed for simplicity and speed.
PyPDF2 is used to extract text from each page of the document.
The extracted raw text is sanitized, stripped of metadata, and split into smaller, logically consistent chunks (typically 500–1000 characters).
This chunking ensures that long-form documents are processed in manageable parts, essential for downstream embedding and similarity search.

🧠 2. Text Embedding using Google Generative AI

Each text chunk is passed through the Google Generative AI Embedding API using the embedding-001 model.
The resulting high-dimensional vectors represent the semantic meaning of the text.
These embeddings allow the system to match queries with the most contextually relevant text chunks, even if the exact words don't overlap.

✅ The use of Gemini embeddings ensures state-of-the-art vector representations trained on vast datasets.

🔍 3. Semantic Vector Search with FAISS

All embeddings are stored in a FAISS (Facebook AI Similarity Search) index.
When a user submits a question, it is also embedded into a vector.
The FAISS engine compares this query vector against the indexed document vectors using cosine similarity to retrieve the top-k most relevant chunks.
This retrieval is extremely fast (milliseconds) and enables semantic-based Q&A, not just keyword-based.

💬 4. Answer Generation using Gemini Flash Model

The retrieved document chunks and the user’s query are sent to the Gemini 2.0 Flash model.
The model synthesizes an answer based on the retrieved context and produces a natural, conversational response.
Gemini Flash is designed for speed, making it ideal for low-latency, real-time applications like chatbots.

🔧 This architecture aligns with the Retrieval-Augmented Generation (RAG) paradigm — combining search + generation.

🖥️ 5. Streamlit Web Interface

Built with Streamlit, the web interface enables:
- PDF upload
- Real-time Q&A
- Error feedback and edge case handling
The UI is lightweight, responsive, and does not require backend infrastructure, making deployment easy on platforms like Streamlit Cloud, Hugging Face Spaces, or Heroku.

📦 Tech Stack

Frontend: Streamlit
LLM & Embeddings: Google Gemini API (embedding-001, Gemini Flash)
Orchestration: LangChain
Vector Store: FAISS
PDF Parsing: PyPDF2
Language: Python 3.10+

🧪 Use Cases

Legal Tech: Query legal documents for specific clauses or legal terms
Academic Research: Ask questions about findings, methodology, or results in scientific papers
Business Intelligence: Extract trends, KPIs, and financial summaries from business documents
Education: Enable students to interact with textbooks and notes using natural language

📥 Installation & Usage

Clone the repository and install the required libraries:

git clone https://github.com/alok-more/my-first-chatbot.git
cd my-first-chatbot
pip install -r requirements.txt

Set your Gemini API key in chatbot.py:

GOOGLE_API_KEY = "YOUR_GOOGLE_GEMINI_API_KEY"

Run the application:

streamlit run chatbot.py

🔮 Future Work

Integrate chat memory for multi-turn conversations
Add support for other file formats (DOCX, TXT, CSV)
Improve multi-lingual support using Gemini multilingual embeddings
Deploy on cloud infrastructure with persistent vector storage

Check It Out Here

GitHub Repo: https://github.com/alok-more/my-first-chatbot