Back to Publications

UnsupervisedCellImageSegmentation

Table of contents

Abstract

This project implements an unsupervised approach to segment cell images into distinct regions: Background, Cytoplasm, and Nuclei. The study explores the application of multiple clustering algorithms for the segmentation of cell images. Five clustering techniques—KMeans, MiniBatchKMeans, BisectingKMeans, Birch, and GaussianMixture—are compared to evaluate their performance in terms of Jaccard scores across a small dataset of images. Additionally, an ensemble approach (referred to as "Clusterer") aggregates the results of individual algorithms through majority voting to achieve robust and consistent segmentation. The findings highlight the ensemble's reliability and stability, along with key insights into the strengths and weaknesses of individual clustering methods.


Methodology

Dataset

The dataset consists of segmented cell images, each evaluated using Jaccard scores to quantify clustering performance.

Clustering Algorithms

  • KMeans: Minimizes within-cluster variance by partitioning the data into ( k ) clusters.
  • MiniBatchKMeans: A faster variant of KMeans, using mini-batches for centroid updates.
  • BisectingKMeans: A hierarchical approach, iteratively splitting clusters using KMeans.
  • Birch: Uses a tree structure for efficient clustering of large datasets.
  • GaussianMixture: Models data using a mixture of Gaussian distributions to capture complex structures.

Ensemble Method - Clusterer

A majority voting mechanism combines the outputs of all individual clustering algorithms to produce an aggregated result for each image. This ensemble method aims to improve performance stability and address inconsistencies in individual algorithm predictions.

Evaluation Metrics

  • Jaccard scores are computed for each algorithm across all images.
  • Statistical analysis of scores includes the mean, standard deviation, and quartiles.
  • Special attention is given to outlier cases and performance variability.

Analysis

Results are summarized and visualized to identify trends, assess consistency, and evaluate the benefits of the ensemble approach.


Results

Performance Comparison

  • KMeans achieved the highest average Jaccard score and the highest maximum score .
  • The ensemble approach (Clusterer) demonstrated consistent performance , outperforming individual algorithms in stability.

Algorithm Stability

  • MiniBatchKMeans and BisectingKMeans showed significant variability and , respectively.
  • Birch and GaussianMixture provided moderate and stable results but lagged slightly behind KMeans.

Ensemble Insights

  • Clusterer improved performance consistency, particularly in cases where individual algorithms diverged.
  • The ensemble approach generally matched or exceeded the average scores of individual algorithms, except in cases where one algorithm significantly outperformed others (e.g., "12e.jpg").

Outlier Case ("12e.jpg")

  • KMeans and GaussianMixture excelled , while MiniBatchKMeans and BisectingKMeans underperformed , respectively.
  • The ensemble approach fell short of the top-performing individual methods, highlighting limitations of majority voting in highly variable scenarios.

Summary Tables

See the following tables for detailed results and descriptive statistics:

  1. Individual Image Performance Table
  2. Statistical Summary Table

Individual Image Performance Table

This table provides Jaccard scores for each clustering algorithm on individual images:

Image NameKMeansMiniBatchKMeansBisectingKMeansBirchGaussianMixtureClusterer
12a.jpg0.9315690.9250570.8663940.9124750.9134920.916149
12b.jpg0.9139920.9137600.8992000.9066780.9045790.914282
12c.jpg0.9072940.9072460.9046650.8777660.8871360.907246
12d.jpg0.9057130.9080690.9077750.8995400.8938100.914475
12e.jpg0.9166940.6969400.6368040.9066690.9156420.906820

Statistical Summary Table

This table summarizes the descriptive statistics for Jaccard scores across all images:

MetricKMeansMiniBatchKMeansBisectingKMeansBirchGaussianMixtureClusterer
Count5.0000005.0000005.0000005.0000005.0000005.000000
Mean0.9150520.8702140.8429680.9006260.9029320.911794
Std0.0102960.0971240.1164270.0135770.0123340.004409
Min0.9057130.6969400.6368040.8777660.8871360.906820
25%0.9072940.9072460.8663940.8995400.8938100.907246
50%0.9139920.9080690.8992000.9066690.9045790.914282
75%0.9166940.9137600.9046650.9066780.9134920.914475
Max0.9315690.9250570.9077750.9124750.9156420.916149

Models

Datasets