AI Research Assistant using RAG

AI Research Assistant: A RAG-Powered Solution to Information Overload for Researchers and Developers

Rag system.png

1. Introduction

In the rapidly expanding field of Artificial Intelligence, researchers and practitioners face an overwhelming volume of academic literature. Staying up-to-date is a significant challenge. This publication introduces a custom-built AI Research Assistant—a tool that leverages Retrieval-Augmented Generation (RAG) to provide concise, accurate, and source-cited answers from a personal library of research papers.

This project serves two primary purposes:

For Researchers & Students:

Imagine being able to "talk" to your entire collection of research papers. Instead of manually searching through dozens of PDFs, you can ask a direct question like, "What are the key challenges in multi-agent reinforcement learning?" and receive a synthesized answer with citations pointing to the exact source papers. This dramatically accelerates research and study.

For Developers & Engineers:

This project serves as a comprehensive, real-world blueprint for building a production-grade RAG application. It follows professional software engineering practices, including a clean architecture, configuration management, and a robust, intelligent user interface. It answers the question: "How do I build a RAG system that is reliable, maintainable, and user-friendly?"

This document details the system's architecture, implementation flow, and future potential, demonstrating a tool that is not only powerful but also trustworthy and easy to use.

2. System Architecture

_- visual selection.png

The assistant is built on a Retrieval-Augmented Generation (RAG) architecture, which enhances a Large Language Model (LLM) with an external, searchable knowledge base. This prevents the LLM from relying solely on its pre-trained (and potentially outdated) knowledge, resulting in more accurate and contextually relevant answers. The system is composed of two distinct pipelines.

2.1 Ingestion Pipeline: Building the Knowledge Base

This offline process prepares the research papers to be searched. It indexes the knowledge so that relevant information can be found quickly.

Sourcing: The process begins by automatically downloading relevant research papers from ArXiv.
Metadata Generation: A metadata.json file is created to map each unique PDF filename to the paper's full title, ensuring user-friendly citations.
Chunking & Embedding: Documents are split into small, overlapping text chunks, which are then converted into numerical vectors (embeddings) using a sentence-transformer model.
Storage: The embeddings are stored in a FAISS vector store, which acts as the assistant's searchable long-term memory.

2.2 Query Pipeline: Answering Questions

This real-time process occurs when a user asks a question. It is designed for both intelligence and efficiency.

Intelligent Routing: The user's query is first analyzed by a Router Chain. This uses the LLM to classify the user's intent: is it a meta-question about the system's resources, or is it a research question for the RAG pipeline?
Retrieval: For research questions, the system finds the most relevant text chunks from the FAISS knowledge base.
Augmentation & Generation: The retrieved chunks are combined with the user's question into a detailed prompt. This augmented prompt is then sent to Google's gemini-1.0-pro model to generate a synthesized, in-text cited answer.

https://youtu.be/5jO-ANWdUew

3. Implementation Details: The Flow of a Query

To understand how the assistant works, let's trace the journey of a single question from user input to final answer.

User Input: The user types a question into the command-line or Gradio web interface.
Intent Classification: The question is sent to the RAGPipeline's Router. The router makes its first, lightweight call to the Gemini LLM with a specific prompt, asking it to classify the question as either LIST_RESOURCES or RESEARCH_QUESTION.
Branching: The RunnableBranch logic directs the flow based on the classification.
- If LIST_RESOURCES: The system takes a shortcut. It reads the metadata.json file and instantly returns a formatted list of all available paper titles. No retrieval or second LLM call occurs.
- If RESEARCH_QUESTION (Default Path): The main RAG process begins.
Vector Search (Retrieval): The user's question is passed to the FAISS retriever. The retriever converts the question into an embedding and performs a similarity search, finding the top 3 most relevant text chunks from the indexed documents.
Context Augmentation: The retrieved document chunks are passed to a helper function. This function looks up each chunk's source filename in the metadata.map to find the full paper title. It then formats this information into a single, clean context block, with each piece of evidence clearly labeled with its source (e.g., Source: [Title of Paper]).
Prompt Injection: This rich context block and the original user question are injected into the main RAG prompt template. This template contains critical instructions for the LLM on how to answer and cite sources.
Answer Generation: The final, augmented prompt is sent to the Gemini LLM in a second API call. The LLM synthesizes the information from the context and generates a human-readable answer with in-line citations, as instructed.
Output Display: The final answer and the list of unique source titles are returned to the user interface (CLI or Gradio) for a clean, verifiable presentation.

4. Getting Started: Can I Use It?

Yes! The project is open-source under the MIT License, and you can get it running with a few simple steps.

4.1 Prerequisites

Python 3.10+
A Google Gemini API Key

## Definitive Getting Started Guide

This guide includes the crucial steps for creating a virtual environment and ensuring all dependencies are installed correctly.

Step 1: Clone the Repository

First, clone the project from your GitHub repository to your local machine. You need to replace <your-repository-url> with the actual URL from GitHub.

git clone <your-repository-url>
cd Ai-Research-RAG-system

Step 2: Create and Activate a Virtual Environment

This is a critical best practice. It creates an isolated environment for your project's dependencies.

# For Windows
python -m venv venv
.\venv\Scripts\activate

# For macOS/Linux
python3 -m venv venv
source venv/bin/activate

You should see (venv) appear at the beginning of your terminal prompt.

Step 3: Install All Required Packages

Instead of relying on a requirements.txt file that may not exist yet, let's install all the packages we've used directly. This is the most likely step that was failing.

pip install gradio langchain langchain-google-genai faiss-cpu sentence-transformers pypdf python-dotenv arxiv langchain-huggingface

(Optional) Step 4: Create the `requirements.txt` File

Now that all the packages are installed, you can create the requirements.txt file for future use with this one command:

pip freeze > requirements.txt

Step 5: Set Up Your API Key

Create a new file named .env in the project root and add your Google Gemini API key to it.

GOOGLE_API_KEY="your_api_key_here"

Step 6: Run the Application

You are now fully set up. Run the scripts in order.

Download the papers:
```
python scripts/download_papers.py
```
Build the knowledge base:
```
python ingest.py
```

Start the assistant:

# For the Gradio web interface
python app.py

# Or for the command-line interface
python main.py

If you follow these more detailed steps and still encounter an issue, please copy and paste the full error message you see in the terminal. We will solve it immediately.

5. Limitations and Future Development

While robust, the assistant has several limitations that offer clear opportunities for future enhancements.

5.1 Limitations

Static Knowledge Base: The assistant's knowledge is frozen at the time of the last data ingestion. It cannot access new papers or real-time information.
Retrieval Dependency: The quality of every answer is entirely dependent on the retriever finding the correct context. If the user's phrasing doesn't semantically match the text, the retriever may fail to find the right information.
No Conversational Memory: The assistant treats every question as new. It cannot remember previous questions or answers to handle follow-up queries.

5.2 Future Enhancements

Conversational Memory: Integrate a memory component (e.g., ConversationBufferMemory) to allow for multi-turn, contextual conversations.
Hybrid Search: Upgrade the retriever to use a hybrid approach, combining semantic (vector) search with traditional keyword search (e.g., BM25) to improve retrieval accuracy.
Re-ranking Step: Add a cross-encoder model after the initial retrieval step to re-rank the retrieved documents for relevance before passing them to the LLM, further increasing accuracy.
Automated Data Refresh: Implement a scheduled workflow (e.g., a cron job) that periodically runs the download and ingest scripts to keep the knowledge base up-to-date with the latest research from ArXiv.

6. Conclusion

This project successfully demonstrates the creation of a complete, end-to-end AI Research Assistant. By integrating an intelligent routing system with a robust RAG pipeline, the final application is not only functional but also efficient and user-friendly. It stands as both a practical tool for researchers and a professional blueprint for developers, showcasing how modern AI techniques can be applied to solve the real-world problem of information overload. The commitment to a clean architecture, detailed documentation, and user-centric features ensures that this project is a valuable asset for anyone looking to build or understand production-grade RAG systems.

AI Research Assistant using RAG

Table of contents

AI Research Assistant: A RAG-Powered Solution to Information Overload for Researchers and Developers

1. Introduction

This project serves two primary purposes:

For Researchers & Students:

For Developers & Engineers:

2. System Architecture

2.1 Ingestion Pipeline: Building the Knowledge Base

2.2 Query Pipeline: Answering Questions

3. Implementation Details: The Flow of a Query

4. Getting Started: Can I Use It?

4.1 Prerequisites

## Definitive Getting Started Guide

Step 1: Clone the Repository

Step 2: Create and Activate a Virtual Environment

Step 3: Install All Required Packages

(Optional) Step 4: Create the `requirements.txt` File

Step 5: Set Up Your API Key

Step 6: Run the Application

5. Limitations and Future Development

5.1 Limitations

5.2 Future Enhancements

6. Conclusion

Table of contents

Datasets

Datasets

Code

Code

Table of contents

AI Research Assistant: A RAG-Powered Solution to Information Overload for Researchers and Developers

1. Introduction

This project serves two primary purposes:

For Researchers & Students:

For Developers & Engineers:

2. System Architecture

2.1 Ingestion Pipeline: Building the Knowledge Base

2.2 Query Pipeline: Answering Questions

3. Implementation Details: The Flow of a Query

4. Getting Started: Can I Use It?

4.1 Prerequisites

## Definitive Getting Started Guide

Step 1: Clone the Repository

Step 2: Create and Activate a Virtual Environment

Step 3: Install All Required Packages

(Optional) Step 4: Create the requirements.txt File

Step 5: Set Up Your API Key

Step 6: Run the Application

5. Limitations and Future Development

5.1 Limitations

5.2 Future Enhancements

6. Conclusion

Table of contents

Datasets

Datasets

Code

Code

(Optional) Step 4: Create the `requirements.txt` File