This repository presents an AI-powered product analysis pipeline leveraging the Qwen2-VL-2B-Instruct model for analyzing images of packaged and fresh products. The project aims to extract meaningful details like product names, expiry dates, freshness indices, and shelf life, integrating fine-tuning techniques and innovative methodologies to improve performance and scalability.
In industries like retail and e-commerce, efficient product analysis is crucial for inventory management and quality control. Traditional Machine Learning (ML) and Deep Learning (DL) pipelines often struggle with multi-modal data and domain-specific generalization. We propose a novel approach that uses the Qwen2-VL-2B-Instruct model, leveraging its large-scale multi-modal learning capabilities.
Our methodology integrates a robust fine-tuning mechanism using QLoRA on a custom dataset, enabling the model to adapt to niche use cases. By introducing a Gradio-based GUI for real-time experimentation and a CLI for automation, the project ensures usability for developers and researchers. While pre-trained models may falter on domain-specific data, our innovations in data preparation, fine-tuning, and validation demonstrate significant impact, offering a reliable AI-driven solution for product analysis.
Qwen2-VL-2B-Instruct stands out for its exceptional multi-modal capabilities, which are vital for analyzing both textual and visual information. Unlike traditional ML and DL models requiring separate pipelines for images and metadata, Qwen2-VL seamlessly processes both modalities, ensuring:
Traditional ML/DL models would necessitate handcrafted feature extraction or cumbersome ensemble architectures, making them less adaptable to dynamic real-world use cases.
Custom Dataset:
Preprocessing:
Loading the Model:
transformers
library.Fine-Tuning with QLoRA:
Product Analysis Pipeline:
GUI and CLI Integration:
Initial Model Performance:
Impact of Fine-Tuning:
Comparison with Baselines:
I am in learning phase please give your feedbacks and connect with me on below linkedIn the fine tuned model is not ready due to not availablity of paid GPU.
Project Repo: Smart Vision Quality Testing
My LinkedIN and Github:
https://www.linkedin.com/in/pranavmittal07/
https://github.com/pranavmittal07
There are no datasets linked
There are no datasets linked