This is a Streamlit application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools. The app uses Langchain, OpenAI's GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering.
The following libraries are required for this project:
faiss-cpu
>= 1.10.0langchain-community
>= 0.3.18langchain-huggingface
>= 0.1.2langchain-openai
>= 0.3.7langchain
>= 0.3.19openai
>= 1.64.0pypdf
>= 5.3.0sentence-transformers
>= 3.4.1streamlit
>= 1.42.2tiktoken
>= 0.9.0scripts.secret
(for storing your OpenAI API key)scripts.document_loader
(for loading and splitting PDF documents)You can install the necessary dependencies using pip:
pip install streamlit langchain faiss openai
The application runs as follows:
Document Upload
The user can upload a PDF document via the Streamlit interface. The file is saved temporarily on the server, and the document is then processed into text chunks for easier retrieval.
Text Splitting and Vector Database Creation
Once the document is uploaded:
Question-Answering System
The system utilizes a retrieval-based approach (RAG - Retrieval-Augmented Generation) to answer user queries.
When a user asks a question, the system retrieves the most relevant chunks from the document using FAISS and processes them with GPT-4 to generate a meaningful answer.
User Interface
The application presents:
The required libraries are imported to handle document processing, vector database creation, and AI-powered responses:
from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain.chains import create_retrieval_chain from langchain_openai import ChatOpenAI from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate from scripts.secret import OPENAI_KEY from scripts.document_loader import load_document import streamlit as st
The Streamlit app is initialized, with the title displayed and a file uploader for PDFs:
st.title("AI-Powered Document Q&A") uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")
When a file is uploaded, the following steps are performed:
temp_file = "./temp.pdf" with open(temp_file, "wb") as file: file.write(uploaded_file.getvalue()) chunks = load_document(temp_file)
Embeddings are generated using OpenAI's model (text-embedding-ada-002), and a FAISS vector store is created to hold these embeddings:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY, model="text-embedding-ada-002") vector_db = FAISS.from_documents(chunks, embeddings)
The question-answering chain is created by combining the retriever (FAISS) and the document processing chain (StuffDocumentsChain). The model used for answering questions is GPT-4:
retriever = vector_db.as_retriever() llm = ChatOpenAI(model_name="gpt-4o-mini", openai_api_key=OPENAI_KEY) system_prompt = "You are a helpful assistant. Use the given context to answer the question." prompt = ChatPromptTemplate.from_messages([("system", system_prompt), ("human", "{input}")]) question_answer_chain = create_stuff_documents_chain(llm, prompt) chain = create_retrieval_chain(retriever, question_answer_chain)
The user is prompted to input a question, and the system responds with an answer based on the document's content:
question = st.text_input("Ask a question about the document:") if question: response = chain.invoke({"input": question})['answer'] st.write(response)
To run the app, execute the following command in your terminal:
streamlit run app.py
This will start the Streamlit server and open the app in your browser.