This project presents a Retrieval-Augmented Generation (RAG) assistant designed to explore and answer questions about a single Ready Tensor computer vision publication.
The assistant integrates LangChain, FAISS, and HuggingFace sentence-transformers embeddings with a local LLaMA model (via Ollama) to deliver accurate, publication-grounded responses.
The chosen publication focuses on evaluation metrics in image classification, with special emphasis on the confusion matrix and its role in measuring model performance beyond accuracy. Unlike simple accuracy scores, the confusion matrix reveals how well a model distinguishes between correct and incorrect classifications, providing deeper insights into real-world performance.
This assistant ingests the publication text, builds embeddings, and retrieves relevant context during user interaction. By doing so, it enables readers to explore critical questions such as What is this publication about?, What methodology is used?, or What are the key findings? β all grounded in the ingested material.
2. Objectives
β Build a RAG pipeline with LangChain + FAISS
β Ingest a single Ready Tensor publication as the knowledge base
β Provide an interactive Streamlit UI for natural language queries
β Ensure the model only answers using ingested data (avoiding hallucinations)
β Demonstrate retrieval quality through example queries
3. Methodology
Document Ingestion
Parsed the JSON dataset and extracted the selected publication.
Preprocessed text into enriched fields (title, description, author, etc.).
Split the content into manageable chunks using RecursiveCharacterTextSplitter.
Embedding & Storage
Generated embeddings using sentence-transformers/all-MiniLM-L6-v2.
Stored them in a FAISS vector database for fast retrieval.
Retrieval + LLM Response
Queries are passed to the retriever to fetch top-k relevant chunks.
Context is combined with the user query and passed into LLaMA (Ollama).
Responses are strictly based on retrieved documents.
User Interface
Implemented in Streamlit.
Provides a text input box and expandable debug panel for retrieved sources.
4. Example Queries
What is this publication about?
Who are the authors of this publication?
What methodology or approach is used?
What are the key findings or results?
How does this work compare to previous research?
5. Key Insights from the Publication
The confusion matrix is a critical tool in computer vision tasks for evaluating classification models.
It allows practitioners to analyze both correct and incorrect predictions.
Provides a more nuanced understanding of performance beyond simple accuracy.
Forms the foundation for other metrics like precision, recall, and F1-score.
6. Limitations & Future Work
Current implementation only supports one publication; scaling to multiple publications would require larger storage and more efficient retrieval.
The system depends on local embeddings and Ollamaβs LLaMA model; adding support for other embedding models or APIs could expand usability.
Future iterations may include session memory and multi-publication exploration.