The flowcharts for RAG implementations provided in this publication are illustrative examples sourced from publicly available references and repositories. They are intended for educational purposes to aid understanding of the concepts discussed.
Being an AI researcher with a focus on Natural Language Processing and Information Retrieval, We noticed that many organizations struggle with efficiently extracting information from their growing repositories of PDF documents. Whether it's technical documentation, research papers, or company policies, these valuable knowledge assets often remain underutilized because they lack accessibility.
In early 2024, We decided to tackle this challenge by creating a streamlined solution that would allow users to have natural conversations with their PDF documents. My goal was to create an open-source tool that was both effective and accessible, even to users with limited computational resources.
The result was ChatPDF.ai mark-1, a Streamlit-powered application that transforms static PDFs into interactive knowledge sources using language model technology. What makes this project unique is its flexibility: users can choose between two distinct question-answering approaches depending on their needs and available computing resources. This used FastEmbedEmbeddings for embeddings, chromadb for storing and chatOllama for Mistral model selection for rag implementation (checkout the same in main branch of project repo [https://github.com/Vvslaxman/rag-norag/tree/main]).
Now lets dive into ChatPDF.ai mark-1 !!
Modern organizations face several challenges with document management:
To address these challenges, I designed ChatPDF.ai with two complementary approaches:
ChatPDF.ai is a Streamlit application that enables users to ask questions about their PDF documents. The application offers two different approaches for document question-answering:
The application follows these main steps:
This approach implements Retrieval-Augmented Generation:
# From test.py - RAG implementation with Deepseek embeddings = OllamaEmbeddings(model="nomic-embed-text") vector_store = Chroma.from_documents(splits, embeddings, persist_directory="./chroma_db") st.session_state.retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 3}) # When generating answers llm = ChatOllama(model="deepseek-r1:1.5b", temperature=0.3) qa_chain = RetrievalQA.from_chain_type(llm, retriever=st.session_state.retriever, chain_type="stuff") response = qa_chain.invoke({"query": prompt}) answer = response["result"]
This approach uses a more direct LLM answering technique:
# From test.py - HuggingFace Non-RAG implementation embeddings = HuggingFaceEmbeddings() faiss_index = FAISS.from_texts(chunks, embeddings) st.session_state.retriever = faiss_index.as_retriever() # Prompt template for HuggingFace prompt_template = PromptTemplate( input_variables=["context", "question"], template="Context: {context}\n\nQuestion: {question}\n\nAnswer:" ) # When generating answers search_results = st.session_state.retriever.get_relevant_documents(prompt) context = "\n\n".join([doc.page_content for doc in search_results]) answer = st.session_state.huggingface_chain.run({"context": context, "question": prompt})
Directory structure: āāā vvslaxman-rag-norag/ āāā README.md āāā app.py āāā pyproject.toml āāā rag.py āāā requirements.txt āāā run.sh āāā secrets.toml āāā deepseek_RAG_PDF_Chatbot/ āāā chatbot.py āāā requirements.txt āāā test.py āāā utils.py āāā UI_ss/ ā āāā Deepseek/ ā āāā HF/ āāā __pycache__/ āāā chroma_db/ āāā chroma.sqlite3 āāā a553d286-b93e-45ff-940f-027d1c60a27a/ āāā data_level0.bin āāā header.bin āāā length.bin āāā link_lists.bin
The development process was not without challenges. Initially, I experimented with larger models, but quickly realized that accessibility was more important than raw performance. By focusing on the Deepseek R1-1.5B model (which can run on most modern laptops), I found the sweet spot between performance and practicality.
For the embedding pipeline, I initially tested several options before settling on nomic-embed-text through Ollama for the RAG approach. This combination provided excellent semantic understanding while remaining computationally efficient.
One of the key design decisions was implementing a clear step-by-step feedback system during knowledge base creation. Users can see exactly what's happening as their documents are processed:
### Step 1: Loading and parsing PDFs š
ā
PDFs loaded successfully!
### Step 2: Splitting documents into chunks š
ā
Documents split into chunks!
### Step 3: Creating embeddings using Ollama š§
ā
Embeddings created using Ollama!
ā
Knowledge Base Created in 5.23 seconds
This transparency builds trust and helps users understand how the system works.
The application uses utility functions defined in utils.py
for document processing:
# From utils.py - Document processing def process_documents(pdfs): # Create temporary directory for PDF storage with tempfile.TemporaryDirectory() as temp_dir: # Save uploaded PDFs to temp directory pdf_paths = [] for pdf in pdfs: path = os.path.join(temp_dir, pdf.name) with open(path, "wb") as f: f.write(pdf.getbuffer()) pdf_paths.append(path) # Load the documents documents = [] for path in pdf_paths: loader = PDFPlumberLoader(path) documents.extend(loader.load()) # Split documents into chunks text_splitter = RecursiveCharacterTextSplitter( chunk_size=1200, chunk_overlap=150 ) splits = text_splitter.split_documents(documents) # Create embeddings and vector store embeddings = OllamaEmbeddings(model="nomic-embed-text") vector_store = Chroma.from_documents( documents=splits, embedding=embeddings, persist_directory="./chroma_db" ) return vector_store
Feature | Deepseek R1-1.5B (RAG) | HuggingFace Flan-T5 (Non-RAG) |
---|---|---|
Embeddings | OllamaEmbeddings with nomic-embed-text | HuggingFaceEmbeddings |
Vector Store | Chroma | FAISS |
LLM | Deepseek R1:1.5b | Flan-T5 large |
Retrieval | MMR search with k=3 | Standard retrieval |
Context Integration | Integrated within RetrievalQA chain | Manual via prompt template |
Install Dependencies:
pip install -r requirements.txt
Environment Setup:
ollama serve
export HUGGINGFACEHUB_API_TOKEN=your_token_here
streamlit run test.py
Here's what the application looks like when running:
Main Interface:
The main interface features:
Knowledge Base Creation:
When creating the knowledge base, users will see step-by-step feedback:
### Step 1: Loading and parsing PDFs š
ā
PDFs loaded successfully!
### Step 2: Splitting documents into chunks š
ā
Documents split into chunks!
### Step 3: Creating embeddings using Ollama š§
ā
Embeddings created using Ollama!
ā
Knowledge Base Created in 5.23 seconds
Chat Interaction:
The chat interface shows:
Model | Knowledge Base Creation Time (10-page PDF) | Query Response Time |
---|---|---|
Deepseek R1-1.5B (RAG) | ~15.23 sec | ~18.8 sec |
HuggingFace Flan-T5 (Non-RAG) | ~24.85 sec | ~22.1 sec |
Note: Times may vary based on document size and system hardware.
During testing, I used ChatPDF.ai with various document sets:
The results were impressive. Users could ask natural questions and receive relevant, contextual answers in seconds. The time measurements showed RAG consistently provided better answers, with knowledge base creation taking ~15 seconds for a 10-page PDF and query responses averaging ~19 seconds.
The application is structured around these main components:
Session State Management:
# Initialize session state if "selected_model" not in st.session_state: st.session_state.selected_model = "Deepseek R1-1.5B (RAG)" if "messages" not in st.session_state: st.session_state.messages = [] if "vector_store" not in st.session_state: st.session_state.vector_store = None # ...additional state variables
Approach Details:
approach_details = { "Deepseek R1-1.5B (RAG)": { "description": "*Retrieval-Augmented Generation (RAG)* with Deepseek R1-1.5B.", "tech_stack": "- *ChatOllama* for answering queries\n- *OllamaEmbeddings* for document embeddings\n- *Chroma Vector Store* for retrieval", }, "HuggingFace Flan-T5 (Non-RAG)": { "description": "*Direct LLM Answering (Non-RAG)* with Flan-T5 large.", "tech_stack": "- *Hugging Face Flan-T5* for response generation\n- *FAISS Vector Store* for retrieval\n- *LLMChain* for query processing", }, }
Model Switching Logic:
# Detect approach switch if new_model != st.session_state.selected_model: st.warning(f"ā You selected a different approach: *{st.session_state.selected_model} ā {new_model}*") if st.button("ā Confirm & Switch"): st.session_state.selected_model = new_model st.session_state.messages = [] # Clear past messages st.session_state.vector_store = None st.session_state.retriever = None st.session_state.qa_chain = None st.session_state.huggingface_chain = None st.rerun()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=150)
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 3})
Early users have provided valuable feedback:
"I've been looking for a way to quickly extract information from our company manuals without reading through hundreds of pages. ChatPDF.ai has been a game-changer for me, even though I did provided other old dummy pdf just for confidentiality ! ." - SDE-1
"The ability to switch between different models depending on the task is incredibly useful. For simple questions, the faster model works great, but for complex queries, the RAG approach gives more detailed answers." - Research Analyst
š Note: These flowcharts are not guaranteed to represent exact or production-ready implementations. Always refer to original sources for the most accurate and up-to-date information.
ChatPDF.ai demonstrates that effective document AI doesn't require massive GPU clusters or complex infrastructure. By combining lightweight models with efficient retrieval techniques, we can create practical tools that help people access and understand information locked in their documents.
The code and implementation are open-source, enabling others to build upon this foundation and adapt it to their specific needs. This project represents a step toward democratizing access to advanced NLP techniques and making document intelligence accessible to everyone.
Whether you're a researcher, a business professional, or simply someone with too many PDFs to read, ChatPDF.ai offers a practical way to turn static documents into interactive knowledge sources. The future of document interaction is conversational, and with tools like this, that future is already here.