RAG Assistant using FAISS, Langchain and Gradio

What is RAG?

RAG is a framework that enhances language models by retrieving relevant documents from a corpus before generating a response. It consists of two main components:

Retriever: Fetches relevant documents (using FAISS in this project).
Generator: Produces answers based on the retrieved context (using a transformer model).

Project Overview

This project implements a RAG-based assistant that:

Uses FAISS to index and retrieve documents from a corpus.
Generates responses with a transformer model (via Hugging Face's transformers library).
Provides an interactive interface with Gradio for user queries.

You can explore the full code and try the assistant in the Google Colab notebook.

Implementation Details

1. Document Retrieval with FAISS

FAISS (Facebook AI Similarity Search) is used to create an efficient index of the document corpus for fast retrieval. Here's a snippet of how the index is set up:

import faiss
import numpy as np

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
db = FAISS.from_documents(docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 3})

2.RAG chain using Prompt Template and llm(llama3.3-70b)

prompt=PromptTemplate(template=template,input_variables=["context","question"])
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

3.User Interface for Interaction

import gradio as gr

def respond(message, chat_history):
    bot_response = safe_answer(message)
    chat_history.append((message, bot_response))
    return chat_history, chat_history

Result Analysis

def safe_answer(question):
    retrieved_docs = retriever.invoke(question)
    print("Top Retrieved Chunks:")
    for i, doc in enumerate(retrieved_docs):
        print(f"\nChunk {i+1}:\n{doc.page_content[:300]}...")

    if not retrieved_docs or all(len(doc.page_content.strip()) < 10 for doc in retrieved_docs):
        return "I don't have information about this in the provided document."
    else:
        return qa_chain.run(question).strip()

questions = [
    "How do I add memory to a RAG application?",
    "What is InceptionV3 used for?",
    "How can I use MongoDB to store chat history?",
    "What is YOLO used for in computer vision?"
]

for q in questions:
    print("\n" + "-"*50)
    print("Question:", q)
    response = safe_answer(q)
    print("Answer:", response)

RAG Assistant Response on Sample Questions Related to Corpus

It retrieves Top n chunks with context length upto 300. It then selects the best and most concise response out of them. Have a look into some of the question response


--------------------------------------------------
Question: How do I add memory to a RAG application?

Top Retrieved Chunks:

Chunk 1:
"publication_description": "![1705674621330.png](1705674621330.png)\n\nSometime in the last 5 months, I built a RAG application, and after building this RAG application, I realised there was a need to add memory to it before moving it to production. I went on YouTube and searched for videos, but I c...

Chunk 2:
[
    {
        "id": "0CBAR8U8FakE",
        "username": "3rdson",
        "license": "none",
        "title": "How to Add Memory to RAG Applications and AI Agents",...

Chunk 3:
\nlangchain-openai\npymongo\n```\n---\n## Now you are good to go\n![1-5e5944a1.png](1-5e5944a1.png)\n---\n# What Is Memory and Why Do RAG Applications and AI Agents Need Them?\n\nLet\u2019s use ChatGPT as an example. When you ask ChatGPT a question like `\u201cWho is the current president of America...
<ipython-input-13-816af6bec74c>:10: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead.
  return qa_chain.run(question).strip()
Answer: To add memory to a RAG application, you need to give the RAG application a "brain" by including the following:

1. A database (for storing user's questions, the AI's answer, chat IDs, the user's email etc)
2. A function that retrieves users' previous questions whenever a new question is asked
3. A function that uses LLM to check if the current question is related to the previous one. If it is, it will use the previous answer to generate a new answer.

--------------------------------------------------
Question: What is InceptionV3 used for?
Top Retrieved Chunks:

Chunk 1:
properly. If an image is too large or too small, resizing it to the required dimensions is necessary for consistent model performance.\n\n3. **Memory and Computational Efficiency:**: The shape of the image affects the amount of memory required to store the data. Larger images (higher resolution) req...

Chunk 2:
YOLO has gone through several iterations, with YOLO11 being the latest version as of today. It is widely used in applications like surveillance, autonomous vehicles, and robotics.  \n   \ud83d\udd17 [GitHub](https://github.com/ultralytics/ultralytics) | [Docs](https://docs.ultralytics.com/)\n\n3. **...

Chunk 3:
handle a range of challenges, from basic image compression to more complex tasks like anomaly detection and data imputation.\n\nCheck the **Models** section for the github code repository for this publication.
<!-- RT_DIVIDER -->
:::info{title=\"Note\"}\nAlthough the original MNIST images are in black and whi...
Answer: I don't have information about this in the provided document.

What is RAG?

RAG is a framework that enhances language models by retrieving relevant documents from a corpus before generating a response. It consists of two main components:

Retriever: Fetches relevant documents (using FAISS in this project).
Generator: Produces answers based on the retrieved context (using a transformer model).

Project Overview

This project implements a RAG-based assistant that:

Uses FAISS to index and retrieve documents from a corpus.
Generates responses with a transformer model (via Hugging Face's transformers library).
Provides an interactive interface with Gradio for user queries.

You can explore the full code and try the assistant in the Google Colab notebook.

Implementation Details

1. Document Retrieval with FAISS

FAISS (Facebook AI Similarity Search) is used to create an efficient index of the document corpus for fast retrieval. Here's a snippet of how the index is set up:

import faiss
import numpy as np

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
db = FAISS.from_documents(docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 3})

2.RAG chain using Prompt Template and llm(llama3.3-70b)

prompt=PromptTemplate(template=template,input_variables=["context","question"])
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

3.User Interface for Interaction

import gradio as gr

def respond(message, chat_history):
    bot_response = safe_answer(message)
    chat_history.append((message, bot_response))
    return chat_history, chat_history

Result Analysis

def safe_answer(question):
    retrieved_docs = retriever.invoke(question)
    print("Top Retrieved Chunks:")
    for i, doc in enumerate(retrieved_docs):
        print(f"\nChunk {i+1}:\n{doc.page_content[:300]}...")

    if not retrieved_docs or all(len(doc.page_content.strip()) < 10 for doc in retrieved_docs):
        return "I don't have information about this in the provided document."
    else:
        return qa_chain.run(question).strip()

questions = [
    "How do I add memory to a RAG application?",
    "What is InceptionV3 used for?",
    "How can I use MongoDB to store chat history?",
    "What is YOLO used for in computer vision?"
]

for q in questions:
    print("\n" + "-"*50)
    print("Question:", q)
    response = safe_answer(q)
    print("Answer:", response)

RAG Assistant Response on Sample Questions Related to Corpus

It retrieves Top n chunks with context length upto 300. It then selects the best and most concise response out of them. Have a look into some of the question response


--------------------------------------------------
Question: How do I add memory to a RAG application?

Top Retrieved Chunks:

Chunk 1:
"publication_description": "![1705674621330.png](1705674621330.png)\n\nSometime in the last 5 months, I built a RAG application, and after building this RAG application, I realised there was a need to add memory to it before moving it to production. I went on YouTube and searched for videos, but I c...

Chunk 2:
[
    {
        "id": "0CBAR8U8FakE",
        "username": "3rdson",
        "license": "none",
        "title": "How to Add Memory to RAG Applications and AI Agents",...

Chunk 3:
\nlangchain-openai\npymongo\n```\n---\n## Now you are good to go\n![1-5e5944a1.png](1-5e5944a1.png)\n---\n# What Is Memory and Why Do RAG Applications and AI Agents Need Them?\n\nLet\u2019s use ChatGPT as an example. When you ask ChatGPT a question like `\u201cWho is the current president of America...
<ipython-input-13-816af6bec74c>:10: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead.
  return qa_chain.run(question).strip()
Answer: To add memory to a RAG application, you need to give the RAG application a "brain" by including the following:

1. A database (for storing user's questions, the AI's answer, chat IDs, the user's email etc)
2. A function that retrieves users' previous questions whenever a new question is asked
3. A function that uses LLM to check if the current question is related to the previous one. If it is, it will use the previous answer to generate a new answer.

--------------------------------------------------
Question: What is InceptionV3 used for?
Top Retrieved Chunks:

Chunk 1:
properly. If an image is too large or too small, resizing it to the required dimensions is necessary for consistent model performance.\n\n3. **Memory and Computational Efficiency:**: The shape of the image affects the amount of memory required to store the data. Larger images (higher resolution) req...

Chunk 2:
YOLO has gone through several iterations, with YOLO11 being the latest version as of today. It is widely used in applications like surveillance, autonomous vehicles, and robotics.  \n   \ud83d\udd17 [GitHub](https://github.com/ultralytics/ultralytics) | [Docs](https://docs.ultralytics.com/)\n\n3. **...

Chunk 3:
handle a range of challenges, from basic image compression to more complex tasks like anomaly detection and data imputation.\n\nCheck the **Models** section for the github code repository for this publication.
<!-- RT_DIVIDER -->
:::info{title=\"Note\"}\nAlthough the original MNIST images are in black and whi...
Answer: I don't have information about this in the provided document.

RAG Assistant using FAISS, Langchain and Gradio

Table of contents

What is RAG?

Project Overview

Implementation Details

1. Document Retrieval with FAISS

2.RAG chain using Prompt Template and llm(llama3.3-70b)

3.User Interface for Interaction

Result Analysis

RAG Assistant Response on Sample Questions Related to Corpus

Table of contents

Files

What is RAG?

Project Overview

Implementation Details

1. Document Retrieval with FAISS

2.RAG chain using Prompt Template and llm(llama3.3-70b)

3.User Interface for Interaction

Result Analysis

RAG Assistant Response on Sample Questions Related to Corpus

Code

Code

Datasets

Datasets