This repository implements object detection and tracking using computer vision techniques, specifically leveraging Kalman Filters and the SORT algorithm (Simple Online and Realtime Tracking). The project aims to detect objects in a video feed and track their movement across frames and outputting a graph showing how active each object is over time. The current implementation handles single-class only. For multi-class, a class similarity check needs to be added to the tracker in order to avoid confusion between tracks that have different classes.
This project aims to provide a streamlined approach for detecting and tracking objects in video streams using computer vision. It uses Kalman filters to estimate the object's position and velocity while compensating for noisy measurements. The SORT (Simple Online and Realtime Tracking) algorithm is employed for real-time tracking of multiple objects.
In addition to tracking, the system outputs a video and a graph to visualize the movement (activity) of tracked objects over time.
Clone the repository and launch container building, it will run automatically once the build is done. The container uses Ultralytics's latest image along with some additional packages that are specified in requirements.txt
and are installed automatically.
git clone https://github.com/Rachelslh/Object-Movement-Detection-and-Tracking.git cd Object-Movement-Detection-and-Tracking sh docker.sh
There's a config file under the src directory, in which you can specify the pretrained model you'd like from Ultralytics (if you change it from its default value that is currently yolo11m, then the corresponding model classmap needs to be updated), class confidence threshold and the class you'd like to work with. I chose cars because the input video i used is about moving cars in the highway.
You can run the object detection and tracking on any video file by specifying the path to the input file and the output path in the config directly.
python infer.py
The program will process the video, apply object detection, and track the objects in real-time. Detected bounding boxes will be drawn around the objects of interest in blue, their positions and velocities will be tracked. The tracks' estimated bounding boxes are drawn in red. This is saved whithin the output video that will be written by default to output.mp4
.
You'll also find a plot saved whithin the main directory under the name Objects_movement.png
. The plot shows all the tracks' velocities (i thought about using displacement instead but velocity is also a proof of movement and is estimated directly by Kalman filters).
Here's the main file structure, excluding additional repo management tools such as dockerfile and the pre-commit hook.
Object-Movement-Detection-and-Tracking/ │ ├── data/ # Contains an input video file and a JSON to specify model classmap ├── src/ # Source directory │ ├── config.py # configuration YAML file ├── model.py # Yolo inference, SORT algorithm implementation, Kalman Filter implementation │ └── mltypes.py # Types defined as dataclasses to work with ├── infer.py # Main inference script, runs the full pipline └── README.md # Project description and usage
Using YOLO11m here for detection and classficiation, inference runs with one frame at a time to mimic the behavior where the input comes from an RTSP, taking an average of 150ms per frame on an MPS(M1) device.
From the Ultralytics Yolov11 model results, it appears that Yolo11m has a good precision/latency tradeoff, larger models aren't that far away in terms of precision but have a higher latency. Smaller models appear to have lower latency and lower precision values than the medium one (Yolo11m).
The Kalman Filter is used for predicting the next position of each object based on its past state. The Kalman filter helps:
Noise and uncertainty covariance matrices are created by default using KalmanFilter from FilterPy package.
The SORT (Simple Online and Realtime Tracking) algorithm is used for multi-object tracking. It works by:
The code adheres to the rules of OOP, designed some ML-based types such as the bounding box, detection and track to help with the overall structure of the code.
Input video comes from here.
Output video can be visualized here. Used class confidence threshold of 0.40 for this one.
The plot reflects the high velocities present in the video, the average velocity across all of the objects present in the video seems to be approximately 42 pixels/frame, which aligns with what we're seeing in the input video (cars moving fast in the highway).
There are no datasets linked
There are no datasets linked
There are no models linked
There are no models linked