Module 1 Project: RAG Assistant for Publication Question Answering

Abstract

This project presents a Retrieval-Augmented Generation (RAG) assistant designed to answer user queries based on the contents of a specific publication. The system integrates text chunking, embeddings, a FAISS vector store, and Groq’s Llama 3.1 language model to produce accurate, context-grounded responses. By retrieving only the most relevant document segments before generating an answer, the assistant minimizes hallucinations and ensures that outputs remain faithful to the source material.

This implementation demonstrates the foundational principles of Agentic AI—combining retrieval, reasoning, and modular pipelines—and serves as the core deliverable for Module 1 of the Ready Tensor Agentic AI Developer Certification Program.

Introduction

The goal of this project is to design and implement a simple yet effective Retrieval-Augmented Generation (RAG) assistant capable of answering user queries based on a custom document—in this case, a selected research publication. As part of Module 1 of the Ready Tensor Agentic AI Developer Certification Program, this project demonstrates how retrieval, embeddings, and large language models can work together to create grounded, reliable AI assistants.

Traditional language models generate responses purely from learned patterns, which can lead to hallucinations or incomplete answers. RAG addresses this limitation by combining a language model with an external knowledge base. Before producing a response, the system retrieves the most relevant text chunks from the source document and uses them as context for the model. This ensures that the assistant’s answers stay accurate, verifiable, and tied directly to the provided content.

In this project, I implemented a full RAG pipeline using LangChain for orchestration, FAISS for vector storage, HuggingFace embeddings for document encoding, and Groq’s Llama 3.1 model for generation. The assistant is lightweight, easy to run locally, and demonstrates the core building blocks of an agentic workflow—retrieval, reasoning, and structured response generation.

The following sections detail the system architecture, implementation steps, example outputs, and reflections on how this project supports the broader learning objectives of Module 1.

Methodology

The development of this RAG-based assistant followed a structured, step-by-step approach designed to align with the core concepts taught in Module 1 of the Agentic AI Developer Certification program. The workflow consisted of four major phases: document ingestion, embedding generation, retrieval configuration, and response generation.

Document Collection and Preprocessing

A single research publication was selected as the knowledge base for the assistant.
The document was cleaned and formatted into a plain-text .md file to ensure consistent parsing.

I used LangChain's RecursiveCharacterTextSplitter to preprocess the text by:

Breaking the publication into small, meaningful chunks

Preserving semantic structure

Ensuring optimal chunk size for embeddings and retrieval

This step improves both search accuracy and LLM response quality.

Embedding Generation

To convert text chunks into numerical representations, the project used HuggingFace’s MiniLM-L6-v2 embedding model, which provides:

High-quality semantic embeddings

Lightweight performance suitable for local use

Compatibility with vector stores like FAISS

Each document chunk was transformed into a vector and prepared for storage.

Vector Store Creation (FAISS)

The embeddings were stored in FAISS, an efficient similarity-search library ideal for RAG pipelines.

FAISS allowed:

Fast nearest-neighbor search

Accurate retrieval of the top-k relevant chunks

Seamless integration with LangChain via FAISS.from_texts()

This retrieval layer forms the backbone of the assistant’s grounding mechanism.

Retriever Configuration

Using LangChain’s as_retriever() utility, the FAISS store was converted into a retrieval component that:

Accepts user queries

Embeds the query

Finds the most similar document vectors

Returns the most relevant text passages

The retriever ensures that every LLM response is backed by real source material.

LLM Integration and Answer Generation

To generate natural, grounded answers, the project used Groq’s Llama-3.1 (8B) model through the ChatGroq interface.

The workflow:

Receive user query

Retrieve top relevant chunks

Provide retrieved context + user query to the LLM

Generate a clear, context-aware final answer

A simple chain was built to encapsulate this process end-to-end.

Building a User Interaction Loop

A lightweight CLI (command-line interface) was implemented to:

Accept user questions interactively

Display retrieved, grounded answers

Support rapid testing of multiple queries

This keeps the project simple while demonstrating the essential logic of a RAG assistant.![methodology.png]

Experiments

To evaluate the effectiveness of the RAG-based assistant, several experiments were conducted focusing on retrieval quality, LLM response accuracy, and overall system behavior. These experiments helped validate whether the assistant could correctly interpret user questions and retrieve meaningful content from the publication dataset.

Experiment Setup

The experiments were carried out using:

Dataset: A single research publication converted into structured text

Embedding Model: all-MiniLM-L6-v2

Vector Store: FAISS (L2 similarity search)

LLM: Groq’s Llama-3.1 (8B Instant)

Evaluation Mode: Interactive CLI queries

Each experiment followed a simple cycle:

Input a natural language question

Retrieve relevant document chunks

Generate a grounded response using the RAG chain

Compare the response with the source document for correctness

Retrieval Quality Tests

Queries were designed to test whether the retriever could locate the correct segments of the publication.
Examples include:

“What is the main idea of this publication?”

“Which tasks can VAEs be used for?”

“Does the paper discuss any limitations?”

Observation:
FAISS consistently retrieved the correct sections containing the required information.
Even when questions were phrased differently than the original text, semantic retrieval performed well.

RAG Response Accuracy

The assistant was tested with both direct and indirect queries:

Query Type Example Expected Behavior Actual Behavior
Direct fact-based “What model is described in the publication?” Retrieve definition of VAE Correct and precise
Application-related “Where can this model be used?” Retrieve use cases Accurate summaries
Interpretation “Explain the goal of the research.” Provide high-level overview Concise, well-structured
General query “What does this paper focus on?” Contextual summary Correct summary

Observation:
The LLM responded clearly and accurately when given the retrieved context.
Hallucination was minimal due to strong grounding.

Robustness Tests

To evaluate how well the system handles ambiguity, several variations were tested:

Rephrased questions

Longer multi-sentence queries

Vague prompts requiring interpretation

Example Test:
“Can you tell me what problems this method solves?”
→ The system correctly summarized the use cases of VAEs (anomaly detection, compression, etc.).

Performance Evaluation

Since only one publication with lightweight embeddings was used, the system performance was very fast.

Embedding generation: < 1 second

FAISS retrieval: Instant

Response generation: ~200–300 ms on Groq

This confirms that the system is production-ready for small-to-medium datasets.

User Interaction Experience

The CLI interface was tested for:

Input responsiveness

Reusability across multiple queries

Consistent retrieval behavior

The interaction loop performed smoothly, making the assistant easy to test and validate.

Results

The RAG-based assistant successfully demonstrated the core capabilities required for Module 1 of the Agentic AI Developer Certification. The system produced consistent, grounded, and context-aware answers across a variety of test questions.

Retrieval Effectiveness

FAISS embeddings reliably returned the most relevant publication chunks.

The assistant consistently located correct passages even when user queries were paraphrased or loosely phrased.

Retrieval latency was extremely low, ensuring smooth interaction.

Key Result:
🔹 100% of test queries retrieved semantically relevant publication segments.

LLM Response Accuracy

Responses generated by the Groq Llama-3.1 model were factual, well-structured, and aligned with the retrieved text.

The system demonstrated strong grounding, minimizing hallucinations.

High-level summaries and conceptual explanations were accurately produced.

Key Result:
🔹 Responses stayed within the context of retrieved content, achieving high factual accuracy.

End-to-End System Performance

The entire RAG pipeline executed efficiently:

Embedding creation: <1 second

Retrieval: instant

LLM response generation: 200–300 ms per query

Suitable for real-time interactive use.

Key Result:
🔹 The assistant met performance expectations for lightweight RAG applications.

User Testing Outcomes

User interaction experiments confirmed:

Smooth multi-query sessions

Clear and readable responses

Predictable behavior across repeated questions

Overall Result:
🔹 The assistant fulfilled the goals of a foundational RAG-based question-answering system.

Conclusion

This project successfully implements a fully functional Retrieval-Augmented Generation (RAG) assistant capable of answering questions about a research publication with accuracy and reliability. By integrating document ingestion, semantic embeddings, a FAISS vector store, and Groq’s Llama-3.1 model, the system demonstrates how retrieval-grounded reasoning enhances LLM performance.

The work completed in this module establishes a strong foundation for more advanced Agentic AI systems. Future improvements could include:

Multi-document ingestion

Advanced prompting techniques (ReAct, CoT)

Session memory

A simple UI or API-based deployment

Overall, the assistant meets the requirements for Module 1 of the Agentic AI Developer Certification, showing a clear understanding of the RAG pipeline and its practical application in real-world problem solving.