Oct 20, 2025●Creative Commons Attribution (CC BY)

RAG based Legal Assistant

Hemavathy Ramesh

Abstract

This project presents a Retrieval-Augmented Generation (RAG) Agent designed to provide accurate, context-aware responses to legal queries related to the Information Technology Act, Environment Protection Act, and Companies Act of India. The system combines document retrieval with generative AI reasoning to enhance the interpretability and precision of legal information access.

Using vector embeddings and a local knowledge base, the agent retrieves the most relevant legal sections or clauses from official legislative texts and supplements them with natural language explanations generated by an advanced OpenAI GPT model (e.g., GPT-4o-mini). The hybrid approach ensures both factual reliability from retrieval and interpretive depth from generation.

Methodology

The project workflow includes:

Document Processing — Extracting and chunking text from authentic PDF versions of the IT Act, Environment Act, and Companies Act using PyMuPDF or alternative PDF parsers.
Embedding & Storage — Generating semantic embeddings of the documents and storing them in a ChromaDB vector database for efficient similarity search.
Query Handling — When a user asks a legal question, the agent retrieves relevant sections and contextually generates an explanation or interpretation using the LLM.
Response Generation — The final output integrates the retrieved law excerpts and a natural, human-readable summary that maintains legal accuracy.

Results

The Retrieval-Augmented Generation (RAG) based Legal Assistant was successfully developed and tested using three major Indian laws — the Information Technology Act, the Environment Protection Act, and the Companies Act.