UnsupervisedCellImageSegmentation
Table of contents
Abstract
This project implements an unsupervised approach to segment cell images into distinct regions: Background, Cytoplasm, and Nuclei. The study explores the application of multiple clustering algorithms for the segmentation of cell images. Five clustering techniques—KMeans, MiniBatchKMeans, BisectingKMeans, Birch, and GaussianMixture—are compared to evaluate their performance in terms of Jaccard scores across a small dataset of images. Additionally, an ensemble approach (referred to as "Clusterer") aggregates the results of individual algorithms through majority voting to achieve robust and consistent segmentation. The findings highlight the ensemble's reliability and stability, along with key insights into the strengths and weaknesses of individual clustering methods.
Methodology
Dataset
The dataset consists of segmented cell images, each evaluated using Jaccard scores to quantify clustering performance.
Clustering Algorithms
- KMeans: Minimizes within-cluster variance by partitioning the data into ( k ) clusters.
- MiniBatchKMeans: A faster variant of KMeans, using mini-batches for centroid updates.
- BisectingKMeans: A hierarchical approach, iteratively splitting clusters using KMeans.
- Birch: Uses a tree structure for efficient clustering of large datasets.
- GaussianMixture: Models data using a mixture of Gaussian distributions to capture complex structures.
Ensemble Method - Clusterer
A majority voting mechanism combines the outputs of all individual clustering algorithms to produce an aggregated result for each image. This ensemble method aims to improve performance stability and address inconsistencies in individual algorithm predictions.
Evaluation Metrics
- Jaccard scores are computed for each algorithm across all images.
- Statistical analysis of scores includes the mean, standard deviation, and quartiles.
- Special attention is given to outlier cases and performance variability.
Analysis
Results are summarized and visualized to identify trends, assess consistency, and evaluate the benefits of the ensemble approach.
Results
Performance Comparison
- KMeans achieved the highest average Jaccard score
and the highest maximum score . - The ensemble approach (Clusterer) demonstrated consistent performance
, outperforming individual algorithms in stability.
Algorithm Stability
- MiniBatchKMeans and BisectingKMeans showed significant variability
and , respectively. - Birch and GaussianMixture provided moderate and stable results but lagged slightly behind KMeans.
Ensemble Insights
- Clusterer improved performance consistency, particularly in cases where individual algorithms diverged.
- The ensemble approach generally matched or exceeded the average scores of individual algorithms, except in cases where one algorithm significantly outperformed others (e.g., "12e.jpg").
Outlier Case ("12e.jpg")
- KMeans and GaussianMixture excelled
, while MiniBatchKMeans and BisectingKMeans underperformed , respectively. - The ensemble approach fell short of the top-performing individual methods, highlighting limitations of majority voting in highly variable scenarios.
Summary Tables
See the following tables for detailed results and descriptive statistics:
Individual Image Performance Table
This table provides Jaccard scores for each clustering algorithm on individual images:
Image Name | KMeans | MiniBatchKMeans | BisectingKMeans | Birch | GaussianMixture | Clusterer |
---|---|---|---|---|---|---|
12a.jpg | 0.931569 | 0.925057 | 0.866394 | 0.912475 | 0.913492 | 0.916149 |
12b.jpg | 0.913992 | 0.913760 | 0.899200 | 0.906678 | 0.904579 | 0.914282 |
12c.jpg | 0.907294 | 0.907246 | 0.904665 | 0.877766 | 0.887136 | 0.907246 |
12d.jpg | 0.905713 | 0.908069 | 0.907775 | 0.899540 | 0.893810 | 0.914475 |
12e.jpg | 0.916694 | 0.696940 | 0.636804 | 0.906669 | 0.915642 | 0.906820 |
Statistical Summary Table
This table summarizes the descriptive statistics for Jaccard scores across all images:
Metric | KMeans | MiniBatchKMeans | BisectingKMeans | Birch | GaussianMixture | Clusterer |
---|---|---|---|---|---|---|
Count | 5.000000 | 5.000000 | 5.000000 | 5.000000 | 5.000000 | 5.000000 |
Mean | 0.915052 | 0.870214 | 0.842968 | 0.900626 | 0.902932 | 0.911794 |
Std | 0.010296 | 0.097124 | 0.116427 | 0.013577 | 0.012334 | 0.004409 |
Min | 0.905713 | 0.696940 | 0.636804 | 0.877766 | 0.887136 | 0.906820 |
25% | 0.907294 | 0.907246 | 0.866394 | 0.899540 | 0.893810 | 0.907246 |
50% | 0.913992 | 0.908069 | 0.899200 | 0.906669 | 0.904579 | 0.914282 |
75% | 0.916694 | 0.913760 | 0.904665 | 0.906678 | 0.913492 | 0.914475 |
Max | 0.931569 | 0.925057 | 0.907775 | 0.912475 | 0.915642 | 0.916149 |