Abstract
The increasing need for automated systems in retail and inventory management has led to a surge in interest in technologies that can detect and recognize text from price labels in real-time. Traditional Optical Character Recognition (OCR) systems often rely on third-party APIs, which can be costly and introduce latency issues. We developed a system leveraging YOLOv8 for object detection and PaddleOCR for text recognition to address these challenges. The goal was to create a low-cost, efficient solution deployable on mobile devices with real-time capabilities. This study outlines the design, implementation, and results of our system, focusing on its potential applications and advantages.
Methodology
System Design: The system operates in two primary stages. First, real-time video is captured on a mobile device, and YOLOv8 detects the price labels within each frame. The detected regions of interest (ROIs) are then processed using PaddleOCR (English variant) to extract text from the price labels. This approach ensures high accuracy and fast processing, making it suitable for real-time applications.
Dataset Preparation: To ensure the model’s effectiveness, we created a custom dataset using Roboflow. The dataset was compiled by capturing diverse images of price labels under various conditions, such as different lighting, label designs, and fonts. Manual annotation was performed to label the price regions accurately, enabling the YOLOv8 model to specialize in detecting price labels. The dataset’s diversity aimed to enhance the system’s robustness.
Model Training: The YOLOv8 model was trained on the annotated dataset to optimize its detection capabilities. PaddleOCR, pre-trained on an English variant, was integrated without additional training due to its high baseline accuracy. The models were optimized to prioritize low inference time and computational efficiency, enabling deployment on mobile devices with limited resources.
Deployment and Testing: The system was deployed on a mobile device, utilising its camera for live video input. Testing involved various real-world scenarios to assess the system's accuracy, robustness, and speed. Comparative evaluations against other OCR solutions highlighted PaddleOCR’s superior inference time, making it an optimal choice for real-time price label recognition.
Results
The system demonstrated the following key outcomes:
nference Time: Achieved a remarkable processing speed of approximately 2 milliseconds per frame, ensuring seamless real-time operation on mobile platforms.
Accuracy: Delivered high accuracy in detecting and recognizing price labels across various testing conditions, including variations in lighting, font styles, and label designs.
Cost Efficiency: Eliminated reliance on expensive third-party OCR APIs, significantly reducing operational costs.
Robustness: Performed consistently across dynamic environments, demonstrating resilience to common challenges such as poor lighting or unconventional label formats.
Comparative analysis revealed that PaddleOCR outperformed other OCR models regarding inference time. At the same time, YOLOv8's detection capabilities were enhanced by the custom dataset, ensuring reliable detection and recognition of price labels.
Conclusion
The integration of YOLOv8 with PaddleOCR offers an effective solution for detecting and recognizing price labels in real-time on mobile devices. The system’s low inference time, high accuracy, and cost efficiency position it as a viable tool for retail and inventory management applications. Future enhancements could include expanding the dataset to support multilingual recognition and adapting the system for other text-based tasks. This work underscores the potential of combining state-of-the-art object detection and OCR technologies to address real-world challenges.
There are no models linked
There are no datasets linked
There are no datasets linked
There are no models linked