Custom Object Detection Using YOLOv4: A Deep Learning Approach

Abstract

This paper presents the implementation and training of the YOLOv4 (You Only Look Once) object detection model on a custom dataset using the Darknet framework. YOLOv4 is renowned for its balance of speed and accuracy, making it ideal for real-time object detection tasks. This study focuses on the complete pipeline, including environment setup, dataset preparation, configuration adjustments, training, and evaluation. The results demonstrate successful learning of custom object classes, with high detection accuracy and visual performance, validating YOLOv4’s utility in domain-specific computer vision applications.

Introduction

Object detection remains a pivotal problem in computer vision with real-world applications in security surveillance, autonomous vehicles, industrial inspection, and smart cities. Traditional object detection models often struggle with performance trade-offs between speed and accuracy. YOLOv4, proposed by Bochkovskiy et al., presents a state-of-the-art solution optimized for both real-time inference and high precision.

This study explores the training of YOLOv4 on a custom dataset using Google Colab and Darknet framework. We aim to demonstrate how domain-specific detection tasks can be efficiently tackled using this open-source technology stack.

Methodology

2.1 Environment Setup
Platform: Google Colab with GPU support (Tesla T4)

Framework: Darknet compiled with OpenCV, CUDA, and cuDNN enabled.

Source: Cloned from https://github.com/AlexeyAB/darknet
Screenshot 2025-07-02 222844.png

2.2 Dataset Preparation
Data Type: Custom image dataset with corresponding YOLO-format annotations.

Labeling: .txt files created for each image with bounding box coordinates.

Metadata Files:

obj.names for class labels

obj.data for dataset path configuration

train.txt and valid.txt for file path indexing

Screenshot 2025-07-02 223007.png

2.3 Model Configuration
Modified yolov4-custom.cfg to support:

Custom number of classes

Adjusted [convolutional] layer filters before [yolo] layers

Tuned learning rate, batch size, and max_batches

Pretrained weights yolov4.conv.137 used to initialize training

3.1 Experiment 1: Training YOLOv4 from pretrained weights on a small custom dataset with 1–3 classes.

Hyperparameters:

Batch size: 64

Subdivisions: 16

Learning rate: 0.001

Checkpointing:

Weights were saved periodically for performance analysis.

Intermediate .weights files were stored and visualized using mAP and loss curves.

Screenshot 2025-07-02 223153.png
3.2 Transfer Learning
The model leveraged pretrained convolutional weights, enabling faster convergence and better generalization from limited data.

3.3 Checkpointing
Model checkpoints were saved during training (every 100–500 iterations), facilitating performance comparisons and early stopping if necessary.

Results

The trained YOLOv4 model demonstrated competent object detection on the custom dataset.

4.1 Evaluation Metrics
Training Loss: Smooth convergence below threshold

mAP (mean Average Precision): 0.79
IoU (Intersection over Union): Qualitatively assessed via bounding box overlaps

4.2 Inference Time
The final model produced real-time inferences with minimal latency, suitable for practical deployment scenarios.

Conclusion

This research confirms the viability of using YOLOv4 for custom object detection tasks with limited datasets. The pipeline from annotation to inference is highly configurable, making YOLOv4 a robust solution for developers and researchers.

Future improvements include:

Advanced data augmentation techniques

Hyperparameter optimization using grid/random search

Comparative analysis with YOLOv5/YOLOv8 and EfficientDet