Surgical Scene Understanding in Laparoscopic Hysterectomy via Deep Learning
Abstract
This project focuses on enhancing surgical scene understanding for laparoscopic hysterectomy procedures. Hysterectomy, a common surgery, involves significant challenges for surgeons, such as navigating anatomical structures and using surgical tools guided only by live camera feeds. This project leverages computer vision and deep learning to develop a segmentation model capable of classifying and highlighting surgical instruments and anatomical structures. Through iterative experiments and refinements, the final prototype achieved a mean Intersection over Union (mIoU) of 69% and a DICE coefficient of 72%, contributing to advancements in minimally invasive surgical techniques.
Introduction
Laparoscopic hysterectomy is a widely performed surgical procedure that involves removing the uterus through small abdominal incisions under camera guidance. Challenges include limited depth perception and reliance on narrow fields of view, which can hinder surgical accuracy. By integrating image analysis and machine learning, this project aims to improve the operational efficiency and decision-making capabilities of surgeons.
Aims and Objectives
Develop a deep learning-based segmentation model for surgical instruments and anatomical structures.
Address challenges related to limited visualization and improve procedural efficiency.
Enhance understanding of laparoscopic hysterectomy through iterative model refinements.
Background
Image Segmentation
Classical Methods: Techniques like thresholding and edge detection are computationally efficient but struggle with complex scenes and lighting variations.
Deep Learning Methods: Approaches using neural networks, such as semantic and instance segmentation, offer improved performance by learning hierarchical representations from raw images.
Classical vs. Deep Learning Methods
Classical methods excel in simplicity and low computational cost but fail in handling complex and dynamic environments. Deep learning-based approaches, while computationally intensive, can learn high-level semantic details directly from data, making them ideal for tasks like surgical scene understanding.
{width:}
U-Net Architecture
U-Net, a convolutional encoder-decoder architecture with skip connections, is tailored for tasks requiring precise segmentation. It combines low-level and high-level feature maps for improved accuracy. Skip connections enhance feature propagation and reduce the loss of spatial information.
Dataset
Source
Dataset: AutoLaparo dataset, including 1800 high-resolution annotated images from laparoscopic hysterectomy procedures.
Challenges
Significant class imbalance, with underrepresented categories like "Electric Hook" comprising only 0.2% of instances.
Variations in lighting and presence of smoke further complicate segmentation.
Data Preparation
Image Resizing: Downscaled to 256x256 pixels to optimize computational efficiency while preserving relevant details.
Mask Transformation: Converted labels to one-hot encoded format to standardize input for the segmentation model.
Augmentation: Techniques such as flipping, rotation, and brightness adjustments were applied to diversify the training set.
Methodology
Model Architecture
Encoder: Captures hierarchical features using convolutional layers and max pooling.
Decoder: Upsamples features using transpose convolution layers and integrates spatial details through skip connections.
Final SoftMax Layer: Outputs class probabilities for each pixel.
Iterative Refinements
Added dropout layers to mitigate overfitting.
Customized activation functions (ReLU for hidden layers, SoftMax for output).
Loss Functions
Categorical Cross-Entropy: Baseline loss function.
Weighted Cross-Entropy: Addressed class imbalance by assigning weights to categories based on their frequency.
Focal Loss: Enhanced focus on challenging instances through modulation parameters like alpha and gamma.
Optimization Techniques
Adaptive learning rate starting at 0.001, reduced by 80% after five epochs without improvement.
Early stopping implemented to prevent overfitting by halting training after 10 epochs without validation loss improvement.
Evaluation Metrics
Pixel-Wise Accuracy: Measures overall correctness but lacks spatial precision.
Mean Intersection Over Union (mIoU): Quantifies overlap between predicted and ground truth masks.
Mean DICE Coefficient: Evaluates spatial similarity, offering insights into boundary precision.
Results and Experiments
Performance Metrics
Impact of Augmentation
Augmentation increased mIoU from 59% to 64% and Mean DICE from 65% to 70%, demonstrating its effectiveness in improving generalization.
Class-Specific IoU Scores
Class imbalance led to poor IoU for underrepresented categories like "Electric Hook" (3%). Weighted Cross-Entropy improved this to 28%.
Final Result and Visual Results
Prototype Implementation
Real-Time Segmentation
Processes video frames at 36 FPS using an NVIDIA GTX 1050 Ti GPU.
Outputs segmentation overlays for enhanced surgical guidance.
Prototype Workflow
Input images/videos resized to 256x256 pixels.
Batch processing for real-time predictions.
Outputs RGB masks overlaid with 80% opacity.
Challenges and Limitations
Class Imbalance: Underrepresented categories had poor segmentation accuracy.
Low Brightness and Smoke: Affected segmentation reliability.
Proximity Issues: Close proximity of tools to the camera reduced segmentation quality.
Future Work
Attention Mechanisms: Enhance U-Net with attention layers to focus on critical regions.
Dataset Expansion: Incorporate more diverse annotated images to improve model robustness.
Temporal Context: Add RNNs to improve video segmentation consistency.
Integrated Solutions: Combine segmentation with surgical phase detection for holistic scene understanding.
Conclusion
This project demonstrates the feasibility of using deep learning for surgical scene understanding in laparoscopic hysterectomy. While achieving respectable results, future advancements are needed to improve segmentation accuracy and robustness, particularly for underrepresented categories and challenging conditions.