We present a project combining AI tools, particularly computer vision, to analyze and better understand diabetic retinopathy. This disease is particularly concerning as it affects many diabetic patients and is the leading cause of blindness before age 65 in France.
The project's initiative is both scientific and personal: having relatives with diabetes motivates us to propose a concrete solution to help detect and monitor retinopathy through medical imaging.
We detail how we use a transformer, specifically the Swin Transformer, to identify different forms of retinopathy from fundus images, and how attention map extraction and segmentation using Segment Anything Model (SAM) help better localize and understand lesions.
Diabetic retinopathy is a major complication of diabetes, affecting nearly 50% of type 2 diabetic patients. It primarily results from progressive damage to small blood vessels irrigating the retina. In France, it's the leading cause of blindness in people under 65.
Detection and classification of retinopathy, particularly through fundus image analysis, is both time-consuming and technically challenging. To address this challenge, we propose a pipeline based on computer vision models to assist healthcare professionals in diagnosing and monitoring this disease.
Beyond technological research, this project is particularly meaningful to us, as family members have diabetes and risk developing ocular complications. This personal dimension motivates us to achieve high performance and make our solution accessible and practical in clinical settings.
The link of the repository project is here
Our proposed methodology addresses diabetic retinopathy detection and analysis through a three-step pipeline:
(1) Classification,
(2) Attention Visualization,
and (3) Lesion Segmentation.
Below is a concise overview.
The model outputs probabilities for five classes:
0 - No DR
1 - Mild
2 - Moderate
3 - Severe
4 - Proliferative DR
By analyzing these probabilities, we gauge the image’s likelihood of belonging to each severity level.
This three-step approach—Swin-based classification, attention-based interpretability, and SAM-driven segmentation—offers an end-to-end solution for early, transparent, and accurate diabetic retinopathy diagnosis. It not only enhances clinical decision-making but also builds trust by showing exactly where and how the model detects disease-related changes.
The training is on the dataset of the APTOS 2019 Blindness Detection challenge. It contains train and test images, we only use 90% of the train images (3300)
For the data augmentation:
train_transforms = transforms.Compose([ transforms.Resize((300, 300)), transforms.RandomCrop((224, 224)), transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), transforms.GaussianBlur(kernel_size=(3, 3)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
Note: We avoid flips or rotations because the orientation and typical features of retinal images might be lost or confused if flipped.
Here are the different results obtained by our solution:
This is the overall attention as explained above. But sometimes it may not be enough to detect hot areas so we visualize the different attention maps.
Layers attention
And SAM can be parameters to detect only small lesions and we can calculate the size of the lesions automatically
Diabetic retinopathy is a pathology that can lead to blindness when not detected and managed in time. Through this project, we've developed a complete solution, from classification to segmentation, supported by interpretation capabilities offered by attention maps.
There are no models linked
There are no models linked