Sep 23, 2025●MIT License

rag_assistant

AgenticAI
AIML
GenAI
python
rag

s
Sudarshan Mestha

RAG Assistant: Publication Documentation
Overview
The RAG Assistant is a Retrieval-Augmented Generation (RAG) system designed to answer questions using a local language model and a vector store. Developed as part of the Agentic AI Developer Certification (AIDC-Module 1), this project addresses the need for an offline, customizable question-answering tool, overcoming limitations like OpenAI API quotas. It’s built with Python and open-source libraries, making it accessible for AI enthusiasts and developers.

Purpose: Provide accurate, context-based answers from user-supplied documents.
Target Audience: AI developers, students, and certification candidates.
License: MIT (open-source, permissive use).

ReadyTensor Publication Link (to be updated post-submission)GitHub Inspiration (for LangChain usage)
Note: Replace with an actual diagram of the RAG pipeline (e.g., retrieval → generation flow).
Technical Details
Components
The RAG Assistant leverages the following technologies:

LangChain: Orchestrates the RAG pipeline (retrieval, generation).
Hugging Face Transformers: Powers the local LLM (e.g., facebook/bart-base).
FAISS: Enables efficient vector storage and similarity search.
HuggingFaceEmbeddings: Uses sentence-transformers/all-MiniLM-L6-v2 for text embeddings.

Workflow

Ingestion: Splits and embeds documents into a FAISS vector store.
Retrieval: Fetches the most relevant document chunk using MMR (Maximal Marginal Relevance).
Generation: Generates answers using the LLM, guided by a structured prompt.
Output: Displays the parsed response via CLI.

Code Snippet: Core Pipeline
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline

def setup_rag_chain():
vector_store = load_vector_store()
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5})

prompt_template = """
You are a factual assistant. Answer ONLY using the exact information provided in the context below. 
Do not add information or generate text outside the context. If the context does not contain the answer, 
respond with 'I don’t know.'

Context:
{context}

Question: {question}

Answer:
"""
prompt = PromptTemplate.from_template(prompt_template)

def format_docs(x):
    print(f"Debug docs type in format_docs: {type(x)}, content: {x}")
    if isinstance(x, list):
        return "\n\n".join([d.page_content for d in x])
    return x

chain = (
    {"context": RunnableLambda(lambda x: retriever.invoke(x["question"])) | RunnableLambda(format_docs),
     "question": RunnablePassthrough()}
    | prompt
    | llm
    | parse_answer
)
return chain

LLM Configuration

Model: facebook/bart-base (switch from distilgpt2 for better context adherence).
Parameters: max_new_tokens=50, temperature=0.2, repetition_penalty=2.0.
Installation: pip install transformers sentencepiece.

Usage Instructions
Setup

Install dependencies

install langchain-huggingface transformers torch sentencepiece

Place your document (e.g., sample.txt) in data/documents/.
Configure config/.env with an OPENAI_API_KEY (optional for local use).

Running the Assistant

Ingest Data

app.py ingest

Expected Output: "Ingested documents into vector store."
Query Interface

app.py query

Enter questions like "What is the capital of France?".
Type 'exit' to quit.
Expected Answer: "The capital of France is Paris."

Note: Replace with a screenshot of the CLI in action.
Sample Questions and Answers

"What is the capital of France?" → "The capital of France is Paris."
"Name a landmark in Paris" → "Eiffel Tower."
"When was the Eiffel Tower completed?" → "1889."
Note: Answers depend on the context in sample.txt.

Challenges and Solutions

Issue: Retrieved entire sample.txt.
Solution: Implemented chunking with RecursiveCharacterTextSplitter and MMR retrieval.

Issue: Nonsensical answers with distilgpt2.
Solution: Switched to facebook/bart-base with a stricter prompt.

Issue: TypeError in chain.
Solution: Used RunnableLambda for proper function piping.

Future Enhancements

Add memory to track conversation context.
Integrate a GUI using Tkinter or Flask.
Support multiple LLMs for comparison.

Submission Details

Title: RAG Assistant
Description: A local RAG system for question-answering.
Tags: AIML, python, RAG, GenAI, AgenticAI
Certification Module: AIDC-Module 1
Resources: Upload this .md file, app.py, src/, and sample.txt.

Acknowledgments
Thanks to the xAI community and ReadyTensor support for guidance.ReadyTensor Support | xAI