Brain tumor detection from MRI scans is a crucial task in medical diagnostics. This project presents an automated deep learning-based approach using the YOLOv8n (nano) object detection model from Ultralytics. The model is trained on a publicly available, annotated dataset consisting of four classes: Pituitary, Meningioma, Glioma, and No Tumor. With high accuracy and real-time inference capability, the system demonstrates state-of-the-art performance, achieving an overall mAP@0.50 of 96.8% and mAP@0.50:0.95 of 79.7%, making it a promising tool for aiding radiologists in early diagnosis.
Brain tumors are life-threatening anomalies that require early detection for successful treatment. Manual examination of MRI images can be tedious, error-prone, and inconsistent due to human subjectivity. With the advent of deep learning, particularly object detection algorithms, there is a growing interest in automating tumor detection to enhance diagnostic accuracy and speed.
This study applies YOLOv8n—a lightweight, real-time object detection model—to detect and localize brain tumors in MRI images. By leveraging its speed and accuracy, the project aims to create a deployable solution for medical settings where quick diagnosis is essential.
Dataset
Source: Kaggle - MRI for Brain Tumor with Bounding Boxes
Classes: Pituitary, Meningioma, Glioma, No Tumor
Structure:
Each image includes a bounding box label in YOLO format.
Split into training and validation sets.
Model
YOLOv8n (Nano version) from Ultralytics is used for its lightweight architecture suitable for real-time inference.
Pretrained weights are fine-tuned on the MRI dataset using Ultralytics’ training loop.
Pipeline
Preprocessing:
Convert image color formats
Validate image-label pair integrity
Training Configuration:
Custom dataset in YOLO format
Trained using Google Colab with GPU acceleration
Evaluation:
Precision, Recall, mAP metrics
Class-wise and overall performance
Environment: Google Colab with Python 3, OpenCV, Plotly, and Ultralytics YOLOv8.
Training Dataset: 512 images with 554 total annotated tumor instances.
Model Version: YOLOv8n trained from scratch using .yaml formatted dataset.
Class Images Instances Precision Recall mAP@0.50 mAP@0.50:0.95
All 512 554 0.953 0.934 0.968 0.797
Pituitary 135 153 0.937 0.873 0.938 0.762
No Tumor 140 142 0.979 0.988 0.992 0.833
Meningioma 98 98 0.977 1.000 0.983 0.836
Glioma 154 161 0.920 0.876 0.959 0.755
Key Observations:
No Tumor class showed excellent precision and recall.
Meningioma achieved perfect recall, meaning the model did not miss any true positives.
Real-time inference speed: ~2.2ms per image.
Overall high performance despite using the lightweight YOLOv8n variant.
# Install necessary libraries !pip install ultralytics !pip install plotly !pip install opencv-python-headless import os import cv2 import numpy as np import pandas as pd import plotly.express as px import plotly.graph_objects as go from ultralytics import YOLO import matplotlib.pyplot as plt import matplotlib.patches as patches import yaml import shutil
!pip install -q kaggle
from google.colab import files files.upload()
#create a kaggle folder !mkdir ~/.kaggle #copy the kaggle.json to folder created !cp kaggle.json ~/.kaggle/ #permission for the json to act !chmod 600 ~/.kaggle/kaggle.json
import kagglehub # Download latest version path = kagglehub.dataset_download("ahmedsorour1/mri-for-brain-tumor-with-bounding-boxes") print("Path to dataset files:", path)
# Paths to data directories train_path = "/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Train" val_path = "/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Val"
from pathlib import Path # Paths to data directories train_path = Path("/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Train") val_path = Path("/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Val") # Check if directories exist if train_path.exists() and train_path.is_dir(): print(f"Training directory exists: {train_path}") print("Contents of the training directory:") print([p.name for p in train_path.iterdir()]) else: print(f"Training directory does not exist: {train_path}") if val_path.exists() and val_path.is_dir(): print(f"Validation directory exists: {val_path}") print("Contents of the validation directory:") print([p.name for p in val_path.iterdir()]) else: print(f"Validation directory does not exist: {val_path}")
# Classes classes = ["Glioma", "Meningioma", "No Tumor", "Pituitary"]
# Function to load images and labels def load_data(data_path): images = [] labels = [] for class_label in classes: class_path = os.path.join(data_path, class_label, 'images') label_path = os.path.join(data_path, class_label, 'labels') for img_file in os.listdir(class_path): img = cv2.imread(os.path.join(class_path, img_file)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) label_file = img_file.replace('.jpg', '.txt') label_file_path = os.path.join(label_path, label_file) if os.path.exists(label_file_path): with open(label_file_path, 'r') as file: label_data = file.readline().strip().split() if len(label_data) > 0: images.append(img) labels.append(label_data) else: print(f"Label file {label_file_path} is empty, skipping this image.") else: print(f"Label file {label_file_path} not found, skipping this image.") return images, labels # Load training and validation data train_images, train_labels = load_data(train_path) val_images, val_labels = load_data(val_path) # Split the validation set: 50% for validation, 50% for test val_images, test_images, val_labels, test_labels = train_test_split( val_images, val_labels, test_size=0.5, random_state=42)
# EDA - Visualize class distribution train_counts = [len(os.listdir(os.path.join(train_path, cls, 'images'))) for cls in classes] val_counts = [len(os.listdir(os.path.join(val_path, cls, 'images'))) for cls in classes] eda_df = pd.DataFrame({'Class': classes, 'Train': train_counts, 'Validation': val_counts}) fig = go.Figure(data=[ go.Bar(name='Train', x=eda_df['Class'], y=eda_df['Train']), go.Bar(name='Validation', x=eda_df['Class'], y=eda_df['Validation']) ]) fig.update_layout(barmode='group', title='Class Distribution in Training and Validation Sets') fig.show()
import os import shutil from ultralytics import YOLO # Step 1: Define paths source_dir = '/kaggle/input/mri-for-brain-tumor-with-bounding-boxes' train_source = os.path.join(source_dir, 'Train') val_source = os.path.join(source_dir, 'Val') dataset_dir = '/kaggle/working/brain_tumor_dataset' # Step 2: Create YOLOv8 structure for split in ['train', 'val']: os.makedirs(f'{dataset_dir}/images/{split}', exist_ok=True) os.makedirs(f'{dataset_dir}/labels/{split}', exist_ok=True) # Step 3: Move images and labels to YOLO format def prepare_data(source_split_dir, split): for class_dir_name in os.listdir(source_split_dir): # Construct the path to the class directory class_path = os.path.join(source_split_dir, class_dir_name) # Check if it's actually a directory (skip files or hidden items) if os.path.isdir(class_path): # Construct paths to the 'images' and 'labels' subdirectories img_source_dir = os.path.join(class_path, 'images') lbl_source_dir = os.path.join(class_path, 'labels') # Check if 'images' directory exists if os.path.exists(img_source_dir) and os.path.isdir(img_source_dir): # Iterate through image files in the 'images' subdirectory for file in os.listdir(img_source_dir): file_path = os.path.join(img_source_dir, file) if file.lower().endswith(('.jpg', '.jpeg', '.png')): # Copy image file to the destination shutil.copy(file_path, f'{dataset_dir}/images/{split}/{file}') else: print(f"Warning: Images directory not found for class '{class_dir_name}' in {source_split_dir}") # Check if 'labels' directory exists if os.path.exists(lbl_source_dir) and os.path.isdir(lbl_source_dir): # Iterate through label files in the 'labels' subdirectory for file in os.listdir(lbl_source_dir): file_path = os.path.join(lbl_source_dir, file) if file.lower().endswith('.txt'): # Copy label file to the destination shutil.copy(file_path, f'{dataset_dir}/labels/{split}/{file}') else: print(f"Warning: Labels directory not found for class '{class_dir_name}' in {source_split_dir}") prepare_data(train_source, 'train') prepare_data(val_source, 'val') # Step 4: Write data.yaml data_yaml = f""" path: {dataset_dir} train: images/train val: images/val nc: 4 names: ['Pituitary', 'No Tumor', 'Meningioma', 'Glioma'] """ with open(f'{dataset_dir}/data.yaml', 'w') as f: f.write(data_yaml) # Step 5: Train YOLOv8 model model = YOLO('yolov8n.pt') # You can use yolov8s.pt for a larger model model.train(data=f'{dataset_dir}/data.yaml', epochs=20, imgsz=640) # Step 6: Inference example on validation images val_images_dir = f'{dataset_dir}/images/val' results = model(val_images_dir, save=True, conf=0.4) # Adjust conf threshold as needed
# Display a sample prediction (optional) from IPython.display import Image, display import os # Ensure os is imported if not already for r in results: # Use os.path.join to correctly combine the directory and filename display(Image(filename=os.path.join(r.save_dir, os.path.basename(r.path)))) break # Show just one example
import matplotlib.pyplot as plt import matplotlib.patches as patches from PIL import Image import numpy as np import os # Define class names (same as in data.yaml) classes = ['Pituitary', 'No Tumor', 'Meningioma', 'Glioma'] # Function to display detection results with highlighted parts def display_samples(images_dir, yolo_model, num_samples=10): images = [os.path.join(images_dir, img) for img in os.listdir(images_dir) if img.lower().endswith(('.jpg', '.jpeg', '.png'))] images = images[:num_samples] # Limit to first N images for img_path in images: img = Image.open(img_path) img_array = np.array(img) results = yolo_model(img_path)[0] # Run inference on a single image plt.figure(figsize=(8, 8)) plt.imshow(img_array) ax = plt.gca() # Draw bounding boxes and labels for box in results.boxes: x1, y1, x2, y2 = box.xyxy[0].cpu().numpy() conf = box.conf[0].cpu().numpy() cls = int(box.cls[0].cpu().numpy()) rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='r', facecolor='none') ax.add_patch(rect) plt.text(x1, y1 - 10, f"{classes[cls]} {conf:.2f}", color='white', fontsize=12, backgroundcolor='red') plt.title('YOLOv8 Detection') plt.axis('off') plt.show() # Usage example (after model training and loading) val_images_dir = f'{dataset_dir}/images/val' display_samples(val_images_dir, model)
This project successfully demonstrates the feasibility and effectiveness of using YOLOv8 for detecting brain tumors from MRI scans. With high precision and recall across all tumor types and real-time inference capabilities, the model is suitable for assisting in clinical workflows.
Future Work
Expand dataset with 3D MRI slices or volumetric data.
Incorporate segmentation techniques for pixel-level tumor boundaries.
Evaluate on cross-institutional datasets to ensure generalization.