Sep 02, 2025●6 reads●No License

Computer Vision RAG Publication Assistant

Computer Vision
Ready Tensor Module 1

s
Sudarshan Maddi

TECH BG 3.jpg

Computer Vision RAG Publication Assistant

1. Abstract

This project presents a Retrieval-Augmented Generation (RAG) assistant designed to explore and answer questions about a single Ready Tensor computer vision publication.
The assistant integrates LangChain, FAISS, and HuggingFace sentence-transformers embeddings with a local LLaMA model (via Ollama) to deliver accurate, publication-grounded responses.

The chosen publication focuses on evaluation metrics in image classification, with special emphasis on the confusion matrix and its role in measuring model performance beyond accuracy. Unlike simple accuracy scores, the confusion matrix reveals how well a model distinguishes between correct and incorrect classifications, providing deeper insights into real-world performance.

This assistant ingests the publication text, builds embeddings, and retrieves relevant context during user interaction. By doing so, it enables readers to explore critical questions such as What is this publication about?, What methodology is used?, or What are the key findings? — all grounded in the ingested material.

2. Objectives

✅ Build a RAG pipeline with LangChain + FAISS
✅ Ingest a single Ready Tensor publication as the knowledge base
✅ Provide an interactive Streamlit UI for natural language queries
✅ Ensure the model only answers using ingested data (avoiding hallucinations)
✅ Demonstrate retrieval quality through example queries

3. Methodology

Document Ingestion
- Parsed the JSON dataset and extracted the selected publication.
- Preprocessed text into enriched fields (title, description, author, etc.).
- Split the content into manageable chunks using RecursiveCharacterTextSplitter.
Embedding & Storage
- Generated embeddings using sentence-transformers/all-MiniLM-L6-v2.
- Stored them in a FAISS vector database for fast retrieval.
Retrieval + LLM Response
- Queries are passed to the retriever to fetch top-k relevant chunks.
- Context is combined with the user query and passed into LLaMA (Ollama).
- Responses are strictly based on retrieved documents.
User Interface
- Implemented in Streamlit.
- Provides a text input box and expandable debug panel for retrieved sources.

4. Example Queries

What is this publication about?
Who are the authors of this publication?
What methodology or approach is used?
What are the key findings or results?
How does this work compare to previous research?

5. Key Insights from the Publication

The confusion matrix is a critical tool in computer vision tasks for evaluating classification models.
It allows practitioners to analyze both correct and incorrect predictions.
Provides a more nuanced understanding of performance beyond simple accuracy.
Forms the foundation for other metrics like precision, recall, and F1-score.

6. Limitations & Future Work

Current implementation only supports one publication; scaling to multiple publications would require larger storage and more efficient retrieval.
The system depends on local embeddings and Ollama’s LLaMA model; adding support for other embedding models or APIs could expand usability.
Future iterations may include session memory and multi-publication exploration.

BY SUDARSHAN MADDI - @suddhumaddi@gmail.com

CV PROJECT.jpg

Computer Vision RAG Publication Assistant

Table of contents

Computer Vision RAG Publication Assistant

1. Abstract

2. Objectives

3. Methodology

4. Example Queries

5. Key Insights from the Publication

6. Limitations & Future Work

Table of contents

Code

Code

Datasets

Datasets