With the increasing development of autonomous vehicles and the widespread use of mobile devices, this project connects these two fields by creating a mobile application that simulates the perception system of an autonomous vehicle using real-time semantic segmentation. This mobile application lowers the barrier to entry, making it easier for individuals to experiment with and explore autonomous driving technologies on an accessible mobile device, without the need for expensive, dedicated hardware. Leveraging lightweight models such as MobileNet-BiSeNet and MobileNet-UNet, the application achieves an optimal balance of speed and accuracy, making it suitable for deployment on resource-constrained mobile devices. Additionally, a custom dataset reflecting Singapore's unique urban environment was created and annotated to enhance the accuracy and robustness of the models. The application integrates multiple segmentation models, providing users with the flexibility to compare model performance in real-time. This project not only serves as a simulation tool, but also as an educational resource, offering insights into the capabilities and challenges of deploying advanced AI technologies on mobile platforms.
Introduction
Motivation
The exponential growth in the demand for autonomous driving technologies has highlighted the importance of advanced perception systems that enable vehicles to interpret their environment accurately. Semantic segmentation plays a crucial role in these systems, classifying every pixel in an image to provide a detailed understanding of the surroundings.
Despite advancements in the field, existing solutions are often not easily accessible to the general population, and are often built on high-performance hardware, limiting their accessibility and applicability. This project addresses these limitations by implementing semantic segmentation models on mobile platforms, thereby making the perception system of autonomous vehicles more accessible and portable. By using a localized dataset tailored to Singapore's unique road conditions, this project further aims to tackle regional-specific challenges, such as dense urban infrastructure and complex traffic scenarios.
Objectives
Development of a Cross-Platform Mobile Application: To design and implement a mobile application capable of performing real-time semantic segmentation, providing a user-friendly interface that showcases simulation results in an intuitive manner.
Integration of Multiple Segmentation Models: The application incorporates MobileNet-BiSeNet and MobileNet-UNet, enabling users to explore trade-offs between speed, accuracy, and computational efficiency.
Creation of a Custom Dataset: Annotate images from Singapore's urban environment to improve the relevance and accuracy of models in order to have better performance when testing it on Singapore's roads.
Demonstration of Real-Time AI Capabilities on Mobile Devices: Optimize models for mobile deployment to showcase the feasibility of running computationally intensive tasks on smartphones.
Methodology
Semantic Segmentation Models
Semantic segmentation is essential for autonomous driving systems as it provides pixel-level object classification. The project explored two models:
MobileNet-BiSeNet: A lightweight, dual-path architecture that balances high-level semantic understanding with spatial details, making it ideal for real-time applications while using MobileNet as a backbone.
MobileNet-UNet: Combines MobileNet's efficient feature extraction with U-Net's encoder-decoder structure, optimized for low-resource environments.
Mobile Development Framework
Flutter: Chosen for its cross-platform capabilities and superior performance compared to alternatives like React Native and Ionic. Flutter's native-like rendering and consistent user interface make it an ideal choice for mobile applications requiring real-time processing.
Dataset Preparation
The project created a custom dataset of 161 annotated frames to reflect Singapore's urban landscape, including objects such as roads, vegetation, and traffic infrastructure. The annotations were performed using the Computer Vision Annotation Tool (CVAT), ensuring high-quality labels for training. Data augmentation techniques, including flipping, brightness adjustments, and distortion, were employed to enhance model robustness.
Figure 1. Example of an annotated frame using CVAT
Figure 2. Data augmentation of annotated frames
Model Training
MobileNet-BiSeNet and MobileNet-UNet were trained on the annotated dataset with TensorFlow in Python. The trained models were converted to TensorFlow Lite format for seamless integration into the mobile application.
Application Development
The mobile application integrates the following features:
Real-time segmentation results for both image uploads and live video feeds.
A model selection feature enabling users to compare the performance of MobileNet-BiSeNet and MobileNet-UNet.
Cross-platform compatibility, ensuring consistent performance on both iOS and Android devices.
Development Pipeline
The project followed a structured pipeline:
Dataset Preparation: Annotating and augmenting images to create a high-quality training dataset.
Model Training: Training MobileNet-BiSeNet and MobileNet-UNet using TensorFlow.
Model Conversion: Optimizing and converting models to TensorFlow Lite format for mobile deployment.
Application Integration: Embedding the models into the Flutter-based mobile application
Results
The application demonstrated:
Compatibility across iOS and Android platforms.
A user-friendly interface with flexible model selection.
Real-time semantic segmentation with minimal latency.
Figure 3. Homescreen of the mobile app
Figure 4. Model selector to choose between different models
Figure 5. Semantic segmentation performed on an uploaded image
Challenges and Limitations
Limited Annotated Data: The small dataset size constrained the models’ ability to generalize, resulting in lower accuracy in extreme scenarios such as heavy rain or fog.
Simulation-Only Functionality: The application does not incorporate real-world sensor data like LiDAR or radar, limiting its utility in live autonomous driving scenarios.
Future Work
Integration of Instance Segmentation: Enhance object detection by distinguishing individual objects within the same category, providing a more detailed understanding of the environment.
Model Optimization: Implement techniques like knowledge distillation and quantization-aware training to improve performance and efficiency.
Dataset Expansion: Collect and annotate a larger dataset covering diverse weather, lighting, and environmental conditions to improve robustness.
Real-Time Data Augmentation: Incorporate adaptive learning and real-time data augmentation to enhance model performance in unfamiliar scenarios.
Conclusion
This project demonstrates the feasibility of deploying real-time semantic segmentation on mobile platforms. By addressing localized challenges and leveraging lightweight models, it provides a foundation for further advancements in autonomous driving technologies and related fields. The mobile application developed serves as both a simulation tool and an educational resource, contributing to the broader AI and computer vision community.
Alsenan A, Ben Youssef B, Alhichri H. MobileUNetV3—A Combined UNet and MobileNetV3 Architecture for Spinal Cord Gray Matter Segmentation. Electronics. 2022;11(15):2388. Available from: https://doi.org/10.3390/electronics11152388
C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “BISENET: Bilateral Segmentation Network for Real-Time Semantic Segmentation,” in Lecture notes in computer science, 2018, pp. 334–349. doi: 10.1007/978-3-030-01261-8_20. Available from: https://link.springer.com/chapter/10.1007/978-3-030-01261-8_20