In the age of AI, interacting with multiple documents has evolved beyond simple reading and searching. Imagine a tool that allows you to chat with multiple PDFs, extracting relevant insights in an interactive manner. The Multi PDF Chatbot does exactly that—empowering users to engage in dynamic conversations with multiple PDFs using LangChain and FAISS for efficient vector-based search.
The Multi PDF Chatbot is a Streamlit-powered application that enables users to upload multiple PDFs and chat with their contents. Whether you're analyzing reports, studying research papers, or reviewing contracts, this tool provides an intelligent and efficient way to extract key insights without manually scanning the documents.
Multiple PDF Uploading
Users can easily upload multiple PDF documents via the Streamlit interface.
Text Extraction & Chunking
The system extracts text from each PDF and divides it into manageable chunks for processing.
Embeddings Creation
These text chunks are converted into vector embeddings using a sentence transformer model, making the text searchable and context-aware.
Vector-Based Retrieval with FAISS
The embeddings are stored in a FAISS (Facebook AI Similarity Search) database, enabling efficient retrieval of relevant sections based on user queries.
Conversational AI Integration
Powered by LangChain, the tool allows users to ask questions about multiple documents and receive contextually relevant responses.
User-Friendly Chat Interface
The application provides an intuitive chat-based interface where users can converse with multiple documents seamlessly.
Before starting, ensure you have the following installed:
Run the following command to install the necessary dependencies:
pip install streamlit langchain langchain-community langchain-groq faiss-cpu huggingface-hub pypdf streamlit-chat
At the beginning of your script, import all necessary modules:
import streamlit as st from streamlit_chat import message from langchain.chains import ConversationalRetrievalChain from langchain_community.document_loaders import PyPDFLoader from langchain.embeddings import HuggingFaceEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_community.vectorstores import FAISS from langchain.memory import ConversationBufferMemory from langchain_groq import ChatGroq import os import tempfile
Use Streamlit's secrets management to store API keys securely:
os.environ["GROQ_API_KEY"] = st.secrets["GROQ_API_KEY"] os.environ["HUGGINGFACEHUB_API_TOKEN"] = st.secrets["HUGGINGFACEHUB_API_TOKEN"]
Create a function that initializes a conversational retrieval chain:
def create_conversational_chain(vector_store): llm = ChatGroq(model="llama3-8b-8192", temperature=0.2) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) chain = ConversationalRetrievalChain.from_llm( llm=llm, chain_type='stuff', retriever=vector_store.as_retriever(search_kwargs={"k": 2}), memory=memory ) return chain
Define a function to process user queries and return responses:
def conversation_chat(query, chain, history): result = chain({"question": query, "chat_history": history}) history.append((query, result["answer"])) return result["answer"]
Set up session state to manage chat history:
def initialize_session_state(): if 'history' not in st.session_state: st.session_state['history'] = [] if 'generated' not in st.session_state: st.session_state['generated'] = ["Hello! Ask me anything about 🤗"] if 'past' not in st.session_state: st.session_state['past'] = ["Hey! 👋"]
Render the conversation history and user input:
def display_chat_history(chain): reply_container = st.container() container = st.container() with container: with st.form(key='my_form', clear_on_submit=True): user_input = st.text_input("Question:", placeholder="Ask about your PDF", key='input') submit_button = st.form_submit_button(label='Send ➤') if submit_button and user_input: with st.spinner('Generating response...'): output = conversation_chat(user_input, chain, st.session_state['history']) st.session_state['past'].append(user_input) st.session_state['generated'].append(output) if st.session_state['generated']: with reply_container: for i in range(len(st.session_state['generated'])): message(st.session_state["past"][i], is_user=True, key=str(i) + '_user', avatar_style="thumbs") message(st.session_state["generated"][i], key=str(i), avatar_style="bottts", seed="Felix")
Define the main function to run the Streamlit app:
def main(): # Initialize session state initialize_session_state() # Initialize Streamlit UI st.title("PDF ChatBot :books:") st.sidebar.title("Document Processing") uploaded_files = st.sidebar.file_uploader("Upload files", accept_multiple_files=True) if uploaded_files: text = [] for file in uploaded_files: file_extension = os.path.splitext(file.name)[1] with tempfile.NamedTemporaryFile(delete=False) as temp_file: temp_file.write(file.read()) temp_file_path = temp_file.name if file_extension == ".pdf": loader = PyPDFLoader(temp_file_path) text.extend(loader.load()) os.remove(temp_file_path) # Split text into chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) text_chunks = text_splitter.split_documents(text) # Create embeddings embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'}) # Create vector store vector_store = FAISS.from_documents(text_chunks, embedding=embeddings) # Create the conversation chain chain = create_conversational_chain(vector_store) # Display chat interface display_chat_history(chain) if __name__ == "__main__": main()
Save the script as app.py
, then run the following command to start the chatbot:
streamlit run app.py
With the Multi PDF Chatbot, document interaction is no longer limited to passive reading. This AI-powered tool brings intelligence to PDFs, making information retrieval efficient, interactive, and engaging. Whether for research, legal reviews, or corporate document analysis, this project showcases the power of AI in enhancing how we interact with textual data.
Want to try it out? Check out the project on GitHub: Multi PDF Chatbot.
There are no datasets linked
There are no models linked
There are no models linked
There are no datasets linked