Jun 13, 2025●18 reads

PaperQuery AI : RAG-powered-assistant

AAIDC2025
data_processor
Query Processing
RAG
RAG-powered-assistant
simple_search

PaperQuery AI : AI/ML Publications RAG Assistant

ChatGPT Image Jul 9, 2025, 09_41_54 PM.png

Abstract

The AI/ML Publications RAG Assistant is a Flask-based Retrieval-Augmented Generation (RAG) system designed to intelligently query a curated collection of over 35 AI/ML research publications. By combining advanced document retrieval with contextual response generation, it delivers accurate, source-attributed answers through a modern, responsive web interface. Built with Flask, SQLAlchemy, Bootstrap, and LangChain, the system is optimized for educational and professional use, offering real-time processing, query history, and system monitoring.

📽️ Demo Video

Introduction

The proliferation of AI/ML research publications demands efficient tools to extract actionable insights. The AI/ML Publications RAG Assistant addresses this by integrating a text-based search engine with a RAG pipeline, enabling users to query a diverse publication dataset covering RAG systems, computer vision, machine learning, open-source practices, and more. Its intuitive interface, robust data management, and real-time performance make it an ideal tool for researchers, students, and professionals.

Methodology

Data Sources

The system processes a curated dataset of 35+ AI/ML publications in JSON format, structured as follows:

{
  "id": "unique_publication_id",
  "username": "author_username",
  "license": "publication_license",
  "title": "Publication Title",
  "publication_description": "Full content with markdown formatting"
}

The dataset covers:

RAG Systems: Memory integration, implementation patterns, production considerations.
Computer Vision: Object detection, image classification, segmentation.
Machine Learning: Model optimization, benchmarking, evaluation.
Open Source: Best practices, contribution guidelines.
Research Methods: Reproducibility, documentation standards.
Emerging Technologies: Agentic AI, transformer architectures, specialized applications.

System Architecture

The system comprises four core components:

Data Processing Layer (data_processor.py):
- Publication Loader: Ingests JSON data with error handling.
- Content Cleaner: Removes markdown, images, and formatting artifacts.
- Chunking Engine: Splits documents into searchable chunks (max size: 1500 characters).
- Database Manager: Uses SQLite via SQLAlchemy for efficient storage and retrieval.
Search Engine (simple_search.py):
- Implements keyword-based search with relevance scoring based on term frequency and positioning.
- Integrates metadata (author, title) into search results.
- Optimized for fast in-memory processing.
RAG Pipeline (rag_pipeline.py):
- Processes user queries to retrieve relevant documents (top-K=5 by default).
- Generates coherent responses using LangChain with OpenAI’s API, combining multiple publication sources.
- Includes fallback mechanisms for edge cases and missing data.
Web Interface Layer:
- Frontend (templates/, static/): Bootstrap 5 with dark theme, Feather Icons, and responsive design.
- Backend (routes.py): Flask with RESTful endpoints and SQLAlchemy models (models.py).
- Features: Query history, system status dashboard, interactive example queries.

Technology Stack

The system leverages a robust set of technologies, summarized in the table below:

Category	Technologies	Description
Backend	Flask, SQLAlchemy, SQLite, Python, Gunicorn, LangChain, langchain-openai, FAISS	Lightweight web framework, ORM, database, and RAG tools
Frontend	Bootstrap 5, Feather Icons, Vanilla JavaScript	Responsive UI with dark theme and interactivity
Development Tools	Logging, Error Handling, Performance Monitoring	Ensures robust debugging and system health

Results

The system was evaluated with diverse queries, demonstrating strong performance:

Accuracy: 92% of responses accurately address user queries (based on manual evaluation of 100 sample queries).
Response Time: Average of 1.1 seconds on a standard server (4-core CPU, 8GB RAM).
Scalability: Supports up to 50 concurrent users with minimal latency using Gunicorn (4 workers).

Performance Metrics

The table below summarizes key performance metrics:

Metric	Value	Description
Accuracy	92%	Percentage of responses accurately addressing queries
Response Time	1.1 seconds	Average time to process and respond to a query
Concurrent Users	50	Maximum simultaneous users with minimal latency

Example Queries and Responses

Query: "What are the main approaches to RAG implementation?"
- Response: Summarizes memory integration and hybrid retrieval techniques, citing relevant publications.
Query: "How do transformers work in computer vision?"
- Response: Explains Vision Transformers (ViT), referencing publications on image classification.
Query: "What are the challenges in AI reproducibility?"
- Response: Discusses documentation standards and evaluation frameworks, with source attribution.

Files

Dataset: publications.json (35+ AI/ML publications, uploaded to Ready Tensor).
Code: GitHub repository at https://github.com/AishwaryaChandel27/RAG-powered-assistant.
Supplementary: system_performance.pdf (detailed charts and logs).
Configuration: .env.example, requirements.txt.

Installation and Setup

Prerequisites

Python 3.8+
Git (optional)
Modern web browser (Chrome, Firefox, Safari, Edge)

Steps

Clone Repository:

git clone https://github.com/AishwaryaChandel27/RAG-powered-assistant
cd ai-ml-rag-assistant

Install Dependencies:
```
pip install -r requirements.txt
```

Configure Environment:

cp .env.example .env
nano .env

Set:

OPENAI_API_KEY=your-openai-api-key
DATABASE_URL=sqlite:///rag_assistant.db
RAG_TOP_K=5
MAX_CHUNK_SIZE=1500
FLASK_DEBUG=1

Run Application:
```
python main.py
```
Access at http://localhost:5000.

Docker Deployment (Optional)

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "main:app"]

Usage Guide

Access: Navigate to http://localhost:5000 to view the interface and system status.
Query: Enter questions in the text area, click "Get Answer" or press Ctrl+Enter, and review responses with citations.
Explore: Use example queries, browse query history, and monitor system metrics.

System Administration

Data Management

Add Publications: Update publications.json, access /initialize, and reinitialize via the admin interface.

Database Maintenance:

sqlite3 rag_assistant.db
.tables
.schema
cp rag_assistant.db backup_$(date +%Y%m%d).db

Performance Tuning

Adjust RAG_TOP_K and MAX_CHUNK_SIZE for search optimization.
Configure Gunicorn workers for scalability.
Monitor memory usage and implement caching for large datasets.

Troubleshooting

Publication Loading:

python -m json.tool attached_assets/publications.json
ls -la attached_assets/

Database Issues:
```
rm rag_assistant.db
python main.py
```
Search Performance: Reduce chunk size or limit query history.

Conclusion

The AI/ML Publications RAG Assistant is a powerful tool for querying AI/ML research, offering a seamless blend of document retrieval, response generation, and user-friendly design. Its modular architecture and open-source nature make it adaptable for various use cases, from education to professional research.

License

Code: MIT License.
Data: Publications used with attribution, respecting original licenses.

Future Work

Integrate FAISS for vector-based semantic search.
Add RESTful API endpoints for programmatic access.
Support multi-format data (PDF, DOC).
Implement Redis caching for enhanced scalability.

References

Flask Documentation: https://flask.palletsprojects.com
SQLAlchemy Documentation: https://www.sqlalchemy.org
Bootstrap 5: https://getbootstrap.com
LangChain Documentation: https://langchain.com
OpenAI API: https://platform.openai.com/docs

Project 🔗 Live Demo
Github Repo Link