The following project is about detecting numerous(in my case 9 surgical instrument) surgical instrument through object detection using state of the art yolo11 model from Ultralytics in real time to help assist doctors with equipment handling during surgeries. The research for the model will highlight how and which computer vision tech helps mitigate problems surrounding surgical instrument detection.
Data Collection
The data is collected through browsing each surgical instrument over internet separately and downloading 400-500 images per class.
Dataset
The dataset is later divided into 9 classes which includes
Surgical loupe - Surgical loupes are optical instruments designed to magnify the view of small objects or areas
,
Abdominal Retractor - These retractors are essentially used during deep abdominal procedures for retracting incisions or wound edges. This retractor allows surgeons to work through the abdominal cavity area with its curved, blade-like frame.
,
Retractor(Gelpi) - It is a precision surgical tool widely used for retraction of both deep and superficial tissues
,
Scalpel handle - supports various blades for incision process
,
Sternum saw - A sternal/sternum saw is a bone cutter used to perform median sternotomy, opening the patient's chest by splitting the breastbone, or sternum
,
Bone rongeur - A rongeur is considered a heavy-duty instrument with scoop-shaped and sharp-edged tips used for gouging out bones. It is widely used to open a window in bones and skull to access tissues underneath.
,
Forcep - Tissue forceps are surgical tools that are used to grasp and move delicate tissues during surgery
,
scissor - Surgical scissors are essential tools in the hands of skilled medical professionals, facilitating precise and controlled cutting during surgical procedures
,
Suction tube - A suctioning tube is a medical device used to remove fluid, mucus, or other material from the airway or other body cavities
,
a total of 7777 images are added in total for complete dataset after augmentation process.
The collected image folders where loaded into roboflow platform for cleaning , preprocessing and augmentation.
Cleaning was performed manually during
annotation phase , outlier image and unwanted
images where removed. The annotation was carried out for 8 days on 3546 images .
After the annotation is done using roboflow added preprocessing step including, image resizing and pixel normalization
roboflow provides various image augmentation method and is the next step on the platform after preprocessing, steps added here includes , blurring, noise addition, cropping, shear , flip(horizontal), rotation and saturation.
For the project I used Bounding box annotation over polygon(specifically for segmentation task) for below reasons:-
Drawing polygon for surgical instrument is very much complex than bounding boxes as polygons are shape specific even a little mistake could result in false predictions on the other side bounding box just needs the orientation of the object to draw box.
Through the research I came to know the even for specific class (e.g. retractors) there are even categories of retractors such as gelpi , balfour, gosset , Weitlaner(self retaining) and more, each for various function and parts but complexity does not end here even in a single type, lets say gelpi retractor
shapes and sizes are still pretty much different below are two images of gelpi retractor
as visible the top side is different but bottom is same , since polygon is shape specific it might interfere during the training and final result while bounding box needs only appearance and not the details so it will help.
Yolo11 from ultralytics is selected as preferred model as it is faster to train , high accuracy and flexible to adapt. Model training as well as graph preparation along with deployment all done using the ultralytics software. The model achieved a 90% accuracy across all classes.
The above model's result and accuracy is achieved through a part training process which took a total of 4 days to train the complete model instead of 1 day in a full training.
Loss of Momentum:
Deep learning models often benefit from momentum, where the optimization algorithm uses information from previous updates to guide the current update.
Interrupted training can disrupt this momentum, potentially slowing down convergence.
Difficulty in Hyperparameter Tuning:
Hyperparameters, like learning rate and batch size, are often tuned for a specific training schedule.
Intermittent training can make it harder to find optimal hyperparameter settings.
Potential for Overfitting:
If the model is not trained to convergence in each part, it may start overfitting the training data, leading to poorer generalization performance.
Resource Constraints:
If you have limited computational resources, training in parts can be a practical solution.
However, careful consideration should be given to the potential drawbacks.
Experimentation:
Partial training can be useful for experimenting with different hyperparameters or architectures without committing to a full training run.
# for prediction on images from ultralytics import YOLO import cv2 from google.colab.patches import cv2_imshow import supervision as sv import numpy as np model = YOLO('/content/sample_data/medins.pt') img = cv2.imread('/content/sample_data/th (6).jpeg') img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) result = model(img)[0] detections = sv.Detections.from_ultralytics(result) box_annotator = sv.BoxAnnotator() img = box_annotator.annotate(scene=img, detections=detections) label = sv.LabelAnnotator() img = label.annotate(scene=img, detections=detections) img = cv2.resize(img, (800, 800)) cv2_imshow(img) cv2.waitKey(0)
# for prediction on video and storing results from ultralytics import YOLO import cv2 from google.colab.patches import cv2_imshow import supervision as sv import numpy as np model = YOLO('/content/sample_data/medins.pt') cap = cv2.VideoCapture('/content/sample_data/istockphoto-1487988244-640_adpp_is.mp4') output = cv2.VideoWriter('medins1.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 30, (768, 432)) ret=1 while ret: ret, frame = cap.read() if not ret: break img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) result = model(img)[0] detections = sv.Detections.from_ultralytics(result) box_annotator = sv.BoxAnnotator() img = box_annotator.annotate(scene=img, detections=detections) label = sv.LabelAnnotator() img = label.annotate(scene=img, detections=detections) output.write(img) cap.release() output.release() cv2.destroyAllWindows()
Video Result
Link text
This report presents a surgical detection model developed using YOLO11, a state-of-the-art object detection framework. The model was trained on a dataset of surgical images, achieving a commendable accuracy of 90%. This indicates that the model is capable of accurately identifying and localizing surgical instruments and tools within complex surgical scenes.
The successful implementation of YOLO11 for surgical detection demonstrates its potential to revolutionize surgical procedures. By providing real-time, accurate object detection, this technology can assist surgeons in various ways, including:
Instrument Tracking: Monitoring the location and status of surgical instruments.
Surgical Workflow Optimization: Identifying potential bottlenecks and inefficiencies in surgical procedures.
Surgical Skill Assessment: Evaluating the performance of surgical trainees.
Augmented Reality: Providing real-time visual guidance to surgeons.
Future work will focus on improving the model's accuracy and robustness, as well as exploring its integration with surgical robots and other medical devices. Additionally, expanding the dataset to include a wider range of surgical procedures and instrument types will enhance the model's generalizability.