RAG-Powered LangGraph QA Assistant

Overview

This project is a Retrieval-Augmented Generation (RAG) powered AI assistant that answers questions about LangGraph. It combines vector embeddings, FAISS indexing, and a large language model (FLAN-T5) to provide context-aware, accurate answers from a single document source.

The assistant is implemented with Python and provides an interactive Gradio web interface for easy usage.

Key features:

Semantic Search with Embeddings: Uses SentenceTransformer embeddings to convert document chunks and queries into dense vectors for similarity search.

FAISS Indexing: Leverages FAISS for fast retrieval of the top-k most relevant chunks, enabling real-time semantic search.

RAG Pipeline: The retrieved chunks form the context for a local LLM (FLAN-T5) which generates structured, detailed answers.

Dynamic Chunking: Splits long documents into 500-character chunks, ensuring LLM input constraints are respected and retrieval is precise.

Interactive UI: Built with Gradio, allowing users to ask questions, see instant “Thinking…” feedback, and receive well-structured answers.

Sidebar Examples: Predefined questions guide users and allow reviewers to quickly test the system.

Robustness: Uses L2 normalization of embeddings and cosine similarity to ensure accurate retrieval.

Use Case:

The system is ideal for exploring LangGraph concepts, including trees, embeddings, RAG principles, Python & AI libraries, and other foundational topics. Reviewers can quickly test the assistant by copying sidebar questions or typing custom queries.

Technical Stack:

Python 3.x

Sentence Transformers for embeddings (all-MiniLM-L6-v2)

FAISS for vector similarity search

HuggingFace Transformers (FLAN-T5-small) for local LLM generation

Gradio for interactive chat interface

NumPy & scikit-learn for vector processing and normalization

How It Works

Document Chunking
The text file is split into smaller chunks (default 500 characters) for efficient retrieval:

def chunk_text(text, size=500):
    paragraphs = text.split("\n\n")
    chunks = []
    for para in paragraphs:
        para = para.strip()
        if not para:
            continue
        while len(para) > size:
            chunks.append(para[:size])
            para = para[size:]
        if para:
            chunks.append(para)
    return chunks

Building FAISS Index and Embeddings (core of retrieval mechanism)

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = np.array(embedder.encode(docs), dtype="float32")
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)

LLM Question Answering (core AI logic)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")

def generate_answer(question, context):
    prompt = f"CONTEXT:\n{context}\n\nQUESTION: {question}\nAnswer now:"
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1024)
    outputs = model.generate(**inputs, max_new_tokens=300)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)