This project presents an innovative system that integrates advanced vision-language models with geolocation technology to provide a comprehensive emergency response and damage assessment solution. The system is designed to detect accidents in real time and evaluate the criticalness of the situation using cutting-edge computer vision and natural language processing. Upon detecting an emergency, it identifies the nearest hospital via GPS and sends alerts to emergency services while allowing user-initiated service requests. Additionally, the system estimates the cost of damage and connects users with repair service providers, offering a seamless and efficient post-accident experience. By combining real-time detection, critical evaluation, and end-to-end service facilitation, this solution aims to enhance emergency response efficiency and streamline accident recovery processes.
Accidents pose significant challenges, including delayed response times, lack of critical situation assessments, and inefficient post-incident recovery processes. Current systems often focus on singular aspects of accident management, leaving gaps in holistic response mechanisms. This project addresses these gaps by developing an integrated system that combines accident detection, emergency response, and recovery facilitation.
The system leverages a vision-language model to accurately assess the criticalness of accidents, ensuring prompt and appropriate responses. GPS technology is utilized to locate the nearest hospital and facilitate rapid coordination with emergency services. Beyond immediate response, the system offers a damage cost estimation feature and seamlessly connects users with repair service providers, ensuring a complete solution from accident detection to recovery. This project not only enhances safety and response times but also introduces convenience and efficiency in post-accident processes, making it a valuable contribution to accident management systems.
This project employs a multi-module approach integrating state-of-the-art technologies for accident detection, critical assessment, emergency alerting, and post-accident recovery facilitation. The methodology comprises the following key components:
import os from ultralytics import YOLO import cv2 import random # Load your YOLO model model = YOLO(r"F:/Projects/Accident_Detection/best(5).pt") # Path to your YOLO weights # Input and output video paths input_video_path = r"F:\Projects\Accident_Detection\Indian_Car_Accident_Narrow_Escapes(720p).mp4" # Your input video file output_video_path = "output_video_with_detections.mp4" # Output video file # Directory to save accident frames (detected parts only) accident_folder = r"F:/Projects/Accident_Detection/accident_frames" os.makedirs(accident_folder, exist_ok=True)
# Generate a random color for each class, with red for the "Accident" class class_colors = {} for cls, name in model.names.items(): if isinstance(name, str) and name.lower() == "accident": # Ensure name is a string class_colors[cls] = (0, 0, 255) # Red color (BGR format) else: class_colors[cls] = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) # Open the input video cap = cv2.VideoCapture(input_video_path) fps = int(cap.get(cv2.CAP_PROP_FPS)) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Codec for .mp4 out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height)) frame_count = 0 while cap.isOpened(): ret, frame = cap.read() if not ret: break frame_count += 1 # Perform YOLO detection results = model.predict(frame) # Updated method for inference save_accident_frame = False # Flag to check if "Accident" class is detected # Draw detections on the frame for result in results: # Iterate over the Result objects for box in result.boxes: # Iterate over detected boxes x1, y1, x2, y2 = map(int, box.xyxy[0]) # Get bounding box coordinates conf = box.conf[0] # Get confidence score cls = int(box.cls[0]) # Get class ID label = f"{model.names[cls]} {conf:.2f}" # Assign color based on class color = class_colors[cls] # Check if the class is "Accident" if model.names[cls].lower() == "accident": save_accident_frame = True # Crop the detected region (ROI) detected_part = frame[y1:y2, x1:x2] # Extract region of interest (ROI) # Save the cropped part as a separate image accident_part_path = os.path.join(accident_folder, f"accident_frame_{frame_count:04d}_roi.jpg") cv2.imwrite(accident_part_path, detected_part) # Draw bounding box and label cv2.rectangle(frame, (x1, y1), (x2, y2), color, 3) # Bounding box thickness font_scale = 0.7 font_thickness = 2 text_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, font_scale, font_thickness)[0] text_x, text_y = x1, y1 - 10 if y1 - 10 > 10 else y1 + 10 cv2.rectangle(frame, (text_x, text_y - text_size[1] - 5), (text_x + text_size[0], text_y + 5), color, -1) cv2.putText(frame, label, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (255, 255, 255), font_thickness) # Write the frame with detections to the output video out.write(frame) # Display the frame in a window cv2.imshow("Video with Detections", frame) # Check for the 'q' key to quit early if cv2.waitKey(1) & 0xFF == ord('q'): break
# Release resources cap.release() out.release() cv2.destroyAllWindows() print(f"Detection complete. Output video saved at: {output_video_path}") print(f"Accident frames (detected parts) saved in folder: {accident_folder}")
Accident Frame Folder:
import os from transformers import BlipProcessor, BlipForConditionalGeneration from PIL import Image # Load BLIP model and processor (pretrained model for image captioning) processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base") # Directory where accident frames (detected parts) are saved accident_folder = "F:/Projects/Accident_Detection/accident_frames" description_folder = "F:/Projects/Accident_Detection/description" os.makedirs(description_folder, exist_ok=True) # Create description folder if not exist
analyze_accident_image
function processes each image, generating a description.accident_folder
, generates a description, and saves it in a text file in the description_folder
..txt
file.# Function to analyze images with BLIP model and get a description def analyze_accident_image(image_path): # Open image raw_image = Image.open(image_path).convert("RGB") # Preprocess the image and pass through the model inputs = processor(raw_image, return_tensors="pt") out = model.generate(**inputs) # Decode the output to get the description description = processor.decode(out[0], skip_special_tokens=True) # Here you can add additional logic for criticality detection if "accident" in description.lower(): criticality = "High" # Assuming the model detects something related to accidents else: criticality = "Normal" return description, criticality # Loop through all saved accident frames and analyze them for img_file in os.listdir(accident_folder): img_path = os.path.join(accident_folder, img_file) if img_file.endswith('.jpg'): print(f"Analyzing {img_file}...") description, criticality = analyze_accident_image(img_path) # Create a text file with the description and criticality level text_filename = os.path.join(description_folder, f"{img_file.replace('.jpg', '.txt')}") with open(text_filename, 'w') as file: file.write(f"Description: {description}\n") file.write(f"Criticality: {criticality}\n") # Displaying output for the user print(f"Text file saved: {text_filename}") print('-' * 50)
Example Image
After detecting an accident in the video, all relevant accident frames are stored in a separate folder for further analysis. Alongside this, a description of each frame, assessing the criticalness of the situation, is saved in another folder. This ensures organized storage and aids in evaluating the severity of the incident effectively.
Our model has achieved the following performance metrics:
The confusion matrix provides a detailed breakdown of the model's performance:
In this project, we developed a comprehensive accident detection and response system that integrates cutting-edge technologies to enhance safety and streamline emergency responses. The system's architecture combines accident detection, a vision-language model for criticalness assessment, GPS-based hospital localization, and an emergency alert mechanism. This innovative approach not only ensures rapid response to accidents but also allows users to request emergency services efficiently.
Additionally, the system offers valuable post-accident support by providing cost estimates for damages and connecting users with repair service providers tailored to their needs. This feature not only simplifies the recovery process for users but also fosters partnerships with service providers, creating a mutually beneficial ecosystem.