Dec 30, 2024●24 reads

YOLOv8_TensorRT_CUDA_DeepSort

C++
CUDA
DeepSort
Object Detection
Pytorch
TensorRT
YOLO_v8

t
@tingweijen

Abstract

This publication presents a robust pedestrian tracking system capable of real-time performance. The system integrates state-of-the-art deep learning techniques and GPU acceleration to achieve efficient object detection and tracking. The YOLOv8 model is employed for object detection within video frames, identifying regions of interest that contain pedestrians. These regions are then processed using ResNet-18 for feature extraction, which encodes the distinct characteristics of each pedestrian. Subsequently, the Deep SORT algorithm is utilized to track multiple objects across frames. All technical implementations are developed in C++ and CUDA, with model inference optimized using TensorRT for high-speed computation. CUDA acceleration is also applied to both model preprocessing and postprocessing stages. The system achieves a processing speed of 10 frames per second (fps), demonstrating its effectiveness and efficiency in real-world applications.

Methodology

Object Detection

The YOLOv8 model is implemented to detect objects in video frames.
GPU acceleration using CUDA ensures rapid inference, identifying bounding boxes around pedestrians.

Feature Extraction

ResNet-18 is used to extract features from the detected bounding boxes, encoding unique attributes of each pedestrian.
CUDA optimizations are applied to streamline feature extraction processes.

Object Tracking

The Deep SORT algorithm is adopted for tracking detected objects across consecutive video frames.
Feature embeddings from ResNet-18 are utilized to maintain robust tracking performance, even in crowded scenarios.

System Optimization

TensorRT is employed to optimize the inference process of both YOLOv8 and ResNet-18 models, enhancing computational efficiency.
CUDA is leveraged for preprocessing tasks (e.g., resizing and normalization) and postprocessing tasks (e.g., non-maximum suppression), ensuring minimal overhead.

Implementation

The entire system is implemented in C++ and CUDA, providing fine-grained control over hardware resources and achieving maximum throughput.

Results

The proposed pedestrian tracking system achieves a processing speed of 10 fps on standard GPU hardware. The combination of YOLOv8 for detection, ResNet-18 for feature extraction, and Deep SORT for tracking ensures accurate and reliable performance. CUDA and TensorRT optimizations significantly reduce latency in both model inference and auxiliary tasks, enabling real-time operation. Experiments demonstrate that the system effectively tracks multiple pedestrians in dynamic and cluttered environments, making it suitable for real-world applications such as surveillance and autonomous navigation.