This project is a Retrieval-Augmented Generation (RAG) powered AI assistant that answers questions about LangGraph. It combines vector embeddings, FAISS indexing, and a large language model (FLAN-T5) to provide context-aware, accurate answers from a single document source.
The assistant is implemented with Python and provides an interactive Gradio web interface for easy usage.
Semantic Search with Embeddings: Uses SentenceTransformer embeddings to convert document chunks and queries into dense vectors for similarity search.
FAISS Indexing: Leverages FAISS for fast retrieval of the top-k most relevant chunks, enabling real-time semantic search.
RAG Pipeline: The retrieved chunks form the context for a local LLM (FLAN-T5) which generates structured, detailed answers.
Dynamic Chunking: Splits long documents into 500-character chunks, ensuring LLM input constraints are respected and retrieval is precise.
Interactive UI: Built with Gradio, allowing users to ask questions, see instant βThinkingβ¦β feedback, and receive well-structured answers.
Sidebar Examples: Predefined questions guide users and allow reviewers to quickly test the system.
Robustness: Uses L2 normalization of embeddings and cosine similarity to ensure accurate retrieval.
The system is ideal for exploring LangGraph concepts, including trees, embeddings, RAG principles, Python & AI libraries, and other foundational topics. Reviewers can quickly test the assistant by copying sidebar questions or typing custom queries.
Python 3.x
Sentence Transformers for embeddings (all-MiniLM-L6-v2)
FAISS for vector similarity search
HuggingFace Transformers (FLAN-T5-small) for local LLM generation
Gradio for interactive chat interface
def chunk_text(text, size=500): paragraphs = text.split("\n\n") chunks = [] for para in paragraphs: para = para.strip() if not para: continue while len(para) > size: chunks.append(para[:size]) para = para[size:] if para: chunks.append(para) return chunks
from sentence_transformers import SentenceTransformer import faiss import numpy as np embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") embeddings = np.array(embedder.encode(docs), dtype="float32") index = faiss.IndexFlatIP(embeddings.shape[1]) index.add(embeddings)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small") model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small") def generate_answer(question, context): prompt = f"CONTEXT:\n{context}\n\nQUESTION: {question}\nAnswer now:" inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1024) outputs = model.generate(**inputs, max_new_tokens=300) return tokenizer.decode(outputs[0], skip_special_tokens=True)
The complete working code for this RAG-powered LangGraph QA Assistant can be accessed on GitHub:
Clone or download to run the project locally.