RAG is a framework that enhances language models by retrieving relevant documents from a corpus before generating a response. It consists of two main components:
This project implements a RAG-based assistant that:
transformers library).You can explore the full code and try the assistant in the Google Colab notebook.
FAISS (Facebook AI Similarity Search) is used to create an efficient index of the document corpus for fast retrieval. Here's a snippet of how the index is set up:
import faiss import numpy as np embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") db = FAISS.from_documents(docs, embeddings) retriever = db.as_retriever(search_kwargs={"k": 3})
prompt=PromptTemplate(template=template,input_variables=["context","question"]) qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=retriever, chain_type_kwargs={"prompt": prompt} )
import gradio as gr def respond(message, chat_history): bot_response = safe_answer(message) chat_history.append((message, bot_response)) return chat_history, chat_history
def safe_answer(question): retrieved_docs = retriever.invoke(question) print("Top Retrieved Chunks:") for i, doc in enumerate(retrieved_docs): print(f"\nChunk {i+1}:\n{doc.page_content[:300]}...") if not retrieved_docs or all(len(doc.page_content.strip()) < 10 for doc in retrieved_docs): return "I don't have information about this in the provided document." else: return qa_chain.run(question).strip() questions = [ "How do I add memory to a RAG application?", "What is InceptionV3 used for?", "How can I use MongoDB to store chat history?", "What is YOLO used for in computer vision?" ] for q in questions: print("\n" + "-"*50) print("Question:", q) response = safe_answer(q) print("Answer:", response)
It retrieves Top n chunks with context length upto 300. It then selects the best and most concise response out of them. Have a look into some of the question response
-------------------------------------------------- Question: How do I add memory to a RAG application? Top Retrieved Chunks: Chunk 1: "publication_description": "\n\nSometime in the last 5 months, I built a RAG application, and after building this RAG application, I realised there was a need to add memory to it before moving it to production. I went on YouTube and searched for videos, but I c... Chunk 2: [ { "id": "0CBAR8U8FakE", "username": "3rdson", "license": "none", "title": "How to Add Memory to RAG Applications and AI Agents",... Chunk 3: \nlangchain-openai\npymongo\n```\n---\n## Now you are good to go\n\n---\n# What Is Memory and Why Do RAG Applications and AI Agents Need Them?\n\nLet\u2019s use ChatGPT as an example. When you ask ChatGPT a question like `\u201cWho is the current president of America... <ipython-input-13-816af6bec74c>:10: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. return qa_chain.run(question).strip() Answer: To add memory to a RAG application, you need to give the RAG application a "brain" by including the following: 1. A database (for storing user's questions, the AI's answer, chat IDs, the user's email etc) 2. A function that retrieves users' previous questions whenever a new question is asked 3. A function that uses LLM to check if the current question is related to the previous one. If it is, it will use the previous answer to generate a new answer. -------------------------------------------------- Question: What is InceptionV3 used for? Top Retrieved Chunks: Chunk 1: properly. If an image is too large or too small, resizing it to the required dimensions is necessary for consistent model performance.\n\n3. **Memory and Computational Efficiency:**: The shape of the image affects the amount of memory required to store the data. Larger images (higher resolution) req... Chunk 2: YOLO has gone through several iterations, with YOLO11 being the latest version as of today. It is widely used in applications like surveillance, autonomous vehicles, and robotics. \n \ud83d\udd17 [GitHub](https://github.com/ultralytics/ultralytics) | [Docs](https://docs.ultralytics.com/)\n\n3. **... Chunk 3: handle a range of challenges, from basic image compression to more complex tasks like anomaly detection and data imputation.\n\nCheck the **Models** section for the github code repository for this publication. <!-- RT_DIVIDER --> :::info{title=\"Note\"}\nAlthough the original MNIST images are in black and whi... Answer: I don't have information about this in the provided document.