The “Crowd Detection with YOLOv5 and SORT Tracking” project is an innovative system designed to identify and track heads in videos using advanced object detection and tracking algorithms. The application utilizes YOLOv5, a state-of-the-art real-time object detection model, to detect heads efficiently in video streams. It integrates SORT (Simple Online and Realtime Tracking), a lightweight and effective tracking algorithm, to maintain unique IDs for detected heads across video frames. The system provides detailed tracking summaries, offering per-minute counts of detected individuals with visualization options, including graphs. Additionally, it features an alarm mechanism that triggers when specific thresholds, such as the total unique detections, are exceeded. Users can also define custom Regions of Interest (ROIs) for focused detection in selected areas of the video. This comprehensive approach combines accuracy, usability, and adaptability, making it well-suited for applications in security, crowd monitoring, and resource management. This report details the methodologies, techniques, and tools employed in the development of this system.
Object detection identifies instances of objects (like people or vehicles) within an image or video.
YOLO (You Only Look Once) is a deep learning model renowned for its speed and accuracy in detecting objects in real time. YOLOv5, the latest iteration, combines advanced detection capabilities with user friendly integration.
i. Single Pass Detection: YOLO detects objects by splitting an image into a grid and predicting bounding boxes and class probabilities for each grid cell in a single pass through the network.
ii. Anchor Boxes: Predefined boxes for different aspect ratios and sizes help refine predictions.
iii. Confidence Scores: Each prediction includes a confidence score, representing the likelihood of an object and its class.
i. Real-Time Processing: YOLOv5 is optimized for speed, making it suitable for live video feeds or high FPS applications.
ii. Pretrained Models: It offers multiple variants (small, medium, large, etc.), trained on datasets like COCO, to fit various accuracy and speed requirements.
iii. Transfer Learning: Developers can fine-tune YOLOv5 on custom datasets for specific detection tasks.
iv. Compatibility: Supports PyTorch, TensorFlow, and ONNX for seamless integration.
Encryption SORT (Simple Online and Realtime Tracking) is a tracking algorithm that associates detected objects across video frames, maintaining a unique ID for each object. By combining YOLOv5 and SORT, our system achieves robust detection and tracking in dynamic environments.
i. Bounding Box Predictions: Input from YOLOv5 or similar detection models serves as the starting point for SORT.
ii. Data Association: SORT uses a Kalman filter and Hungarian algorithm to predict and match bounding boxes across frames.
iii. Unique IDs: Each tracked object is assigned a unique ID, maintained as long as the object stays visible.
i. Efficiency: It operates in real-time, even on standard hardware, due to minimal computational overhead.
ii. Temporal Consistency: Maintains consistent IDs for objects across multiple frames.
iii. Modular Design: Can be paired with any object detector.
The YOLOv5 model, pretrained on the COCO dataset, is employed for real-time detection of "person" objects. Key aspects include:
Speed: Capable of processing multiple frames per second.
Accuracy: High detection rates for objects in diverse lighting and environments.
Versatility: Works seamlessly on video files and live webcam feeds.
SORT algorithm, tracks detected heads across frames. Key features include:
Tracking Accuracy: Maintains consistent IDs for objects across video frames.
Efficiency: Minimal computational overhead allows for real-time performance.
Custom ROI: Allows selection of specific regions within the video for focused tracking.
An alarm triggers, if the total unique individuals detected exceed 250. The system:
Ensures alarms sound only once every 10 seconds.
Utilizes the playsound library for alarm playback.
Custom parameters for alarm system could be implemented
The application records per-minute counts of detected individuals, displaying:
Summary: Text-based display of counts.
Graphical View: Line or bar graph representation using matplotlib.
The application provides a edit config button, This button can be used to edit critical variables like
i. process_interval
(The frame processing interval)
ii. iou_threshold
(The Intersection over Union (IoU) threshold is a critical parameter in object detection tasks. It determines the minimum required overlap between a predicted
bounding box and the ground truth bounding box for the prediction to be considered a true positive.)
iii. max_age
(max_age allows the tracker to maintain the object ID for a few frames, even if it is not detected in those frames. If max_age is set to 5, the SORT tracker will wait for up to 5 frames without detecting a particular object before discarding its track.)
iv. min_hits
(min_hits specifies the minimum number of consecutive frames in which a detection must be associated with a track before the track is considered valid and output as a confirmed track.)
v. alarm_threshold
(alarm_threshold represents the minimum condition that triggers an alarm in the application.)
vi. alarm_interval
(alarm_interval is the interval between the alarm file playing)
{ "process_interval": 5, "iou_threshold": 0.3, "max_age": 20, "min_hits": 3, "alarm_threshold": 250, "alarm_interval": 10 }
Visit https://github.com/IDKSAM27/crowd-analyser for installation process,
For more details visit https://github.com/IDKSAM27/crowd-analyser/blob/main/Documentation.txt
YOLOv5
: Deep learning-based object detection framework.SORT
: Real-time object tracking algorithm.OpenCV
: For video processing and frame extraction.Tkinter
: GUI for user interaction and control.Matplotlib
: Visualization of per-minute counts.PyTorch
: Backend framework for loading the YOLOv5 model.Playsound
: Library for alarm playback.The system is implemented in Python, incorporating the following key components:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
# Global variable to store ROI coordinates roi_coords = None # Allow us to choose a Region of Interest def select_roi(): global cap, roi_coords if cap is None or not cap.isOpened(): messagebox.showerror("Error", "Video or webcam is not active.") return # Read a single frame to select ROI ret, frame = cap.read() if not ret: messagebox.showerror("Error", "Unable to capture frame for ROI selection.") return # Resize the frame for consistent display frame = cv2.resize(frame, (640, 480)) # OpenCV ROI selection roi = cv2.selectROI("Select ROI", frame, showCrosshair=True, fromCenter=False) cv2.destroyWindow("Select ROI") if roi == (0, 0, 0, 0): roi_coords = None messagebox.showinfo("Info", "ROI selection cleared.") else: x, y, w, h = roi roi_coords = (x, y, x + w, y + h) messagebox.showinfo("Info", f"ROI selected: {roi_coords}") # Clear the ROI def set_roi(coords): global roi_coords roi_coords = coords messagebox.showinfo("Info", "ROI cleared and reset to full frame.") # Using ROI in the detection function if roi_coords: x1, y1, x2, y2 = roi_coords frame = frame[y1:y2, x1:x2] # Crop the frame to the selected ROI
def detect_people(): try: global cap, stop_thread, tracked_ids, current_ids, roi_coords print("Starting people detection...") frame_count = 0 # Process every 5 frames process_interval = config["process_interval"] while cap.isOpened() and not stop_thread: ret, frame = cap.read() if not ret: print("End of video reached or error reading the video!") break frame_count += 1 if frame_count % process_interval != 0: continue try: frame = cv2.resize(frame, (640, 480)) # Crop the frame to the selected ROI, if any if roi_coords: x1, y1, x2, y2 = roi_coords frame = frame[y1:y2, x1:x2] results = model(frame) if len(results.xyxy) == 0 or results.xyxy[0].shape[1] < 6: print("Warning: Model output is invalid. Skipping frame.") continue # Extract bounding box results for 'person' class people = results.xyxy[0].cpu().numpy() people = [p for p in people if int(p[5]) == 0] # Class 0 corresponds to 'person' # Prepare detections for SORT tracker detections = [] for x1, y1, x2, y2, conf, _ in people: # Adjust coordinates for cropped ROI if roi_coords: x1 += roi_coords[0] x2 += roi_coords[0] y1 += roi_coords[1] y2 += roi_coords[1] detections.append([x1, y1, x2, y2, conf]) # Update the SORT tracker tracked_objects = tracker.update(np.array(detections)) current_ids = set() for x1, y1, x2, y2, track_id in tracked_objects: x1, y1, x2, y2 = map(int, [x1, y1, x2, y2]) track_id = int(track_id) current_ids.add(track_id) # Adjust bounding boxes back for cropped frames if roi_coords: x1 -= roi_coords[0] x2 -= roi_coords[0] y1 -= roi_coords[1] y2 -= roi_coords[1] # Draw a bounding box for the head region head_height = int(0.25 * (y2 - y1)) head_y2 = y1 + head_height cv2.rectangle(frame, (x1, y1), (x2, head_y2), (0, 165, 255), 2) cv2.putText(frame, f"ID {track_id}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2) # Track the current frame IDs current_ids.add(track_id) tracked_ids.add(track_id) # above two lines handles the updating of the current_ids and tracked_ids # Update GUI labels lbl_total_count.config(text=f"Total People Appeared: {len(tracked_ids)}") lbl_current_count.config(text=f"Current People in Frame: {len(current_ids)}") # Play the alarm if more than 10 unique people are detected if len(tracked_ids) > alarm_interval: threading.Thread(target=play_alarm).start() # Display the frame frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = Image.fromarray(frame_rgb) imgtk = ImageTk.PhotoImage(image=img) lbl_video.imgtk = imgtk lbl_video.configure(image=imgtk) root.update() time.sleep(0.03) except Exception as inner_e: print(f"Error processing frame: {inner_e}") cap.release() cv2.destroyAllWindows() print("Video capture released.") except Exception as e: print(f"Error in detection loop: {e}")
# Tracks or count per minute count and shows in the graph of the same (graph plotting in graph_display.py) def track_minute_counts(): #Track the minute counts in a background thread. global minute_counts, tracked_ids, tracking_active while tracking_active: # Initialize the current minute tracking current_minute = time.strftime("%Y-%m-%d %H:%M") current_minute_ids = set() # Sleep for a second to synchronize with the actual minute time.sleep(1) while time.strftime("%Y-%m-%d %H:%M") == current_minute and tracking_active: # Add IDs currently in frame to the minute's ID set current_minute_ids.update(current_ids) time.sleep(0.5) # Small delay to avoid too much CPU usage # After the minute is over, save the count and reset minute_counts[current_minute] = len(current_minute_ids) print(f"{current_minute}: {minute_counts[current_minute]} people detected.") # Debug output
# Initialize the main application window root = tk.Tk() root.title("Head Detection with YOLOv5 and SORT Tracking") root.geometry("800x660") root.resizable(width=False, height=False) # Video display label lbl_video = tk.Label(root) lbl_video.pack() # Labels for tracking statistics lbl_total_count = tk.Label(root, text="Total People Appeared: 0", font=("Arial", 14)) lbl_total_count.pack() lbl_current_count = tk.Label(root, text="Current People in Frame: 0", font=("Arial", 14)) lbl_current_count.pack() # Loading label lbl_status = tk.Label(root, text="Loading model...", font=("Arial", 14)) lbl_status.pack() # Frame for buttons button_frame = tk.Frame(root) button_frame.pack(pady=10) # Buttons for actions btn_open_video = tk.Button(button_frame, text="Open Video", command=open_video) btn_open_video.pack(side=tk.LEFT, padx=10) btn_webcam = tk.Button(button_frame, text="Start Webcam", command=start_webcam) btn_webcam.pack(side=tk.LEFT, padx=10) btn_stop = tk.Button(button_frame, text="Stop Detection", command=stop_detection) btn_stop.pack(side=tk.LEFT, padx=10) btn_display_counts = tk.Button(button_frame, text="Display Minute Counts", command=display_minute_counts) btn_display_counts.pack(side=tk.LEFT, padx=10) # ROI buttons btn_select_roi = tk.Button(button_frame, text="Select ROI", command=select_roi) btn_select_roi.pack(side=tk.LEFT, padx=10) btn_clear_roi = tk.Button(button_frame, text="Clear ROI", command=lambda: set_roi(None)) btn_clear_roi.pack(side=tk.LEFT, padx=10) # Button to edit configuration btn_edit_config = tk.Button(root, text="Edit Config", command=lambda: open_config_editor(root, config)) btn_edit_config.pack(side=tk.LEFT, anchor=tk.SW, padx=10, pady=10) # Run the main Tkinter loop root.mainloop()
The "Crowd Detection with YOLOv5 and SORT Tracking" system effectively combines modern object detection and tracking methods with an intuitive GUI for real-time head detection and analysis. By offering data visualization and alert mechanisms, the system is well-suited for applications in crowd monitoring, security, and resource management.
1. Integration of Additional Models: Testing with larger YOLOv5 variants for enhanced efficiency.
2. Cross-Platform Compatibility: Porting the GUI to web or mobile platforms.
3. Extended Analytics: Providing demographic insights like age or gender detection.
4. Real-Time Streaming: Deployment for real-time monitoring in crowded spaces.
1. YOLOv5 Documentation: https://github.com/ultralytics/yolov5
2. SORT Algorithm: https://github.com/abewley/sort
3. PyTorch Framework: https://pytorch.org
4. OpenCV Library: https://opencv.org
5. Playsound Library: https://pypi.org/project/playsound/
The original frame size video analysis.
ROI Selection.
Editing Configuration.
Visualization of the real-time data.