Brain-Tumor-Detection-Using-Yolov8n-model-from-ultralytics

Abstract

Brain tumor detection from MRI scans is a crucial task in medical diagnostics. This project presents an automated deep learning-based approach using the YOLOv8n (nano) object detection model from Ultralytics. The model is trained on a publicly available, annotated dataset consisting of four classes: Pituitary, Meningioma, Glioma, and No Tumor. With high accuracy and real-time inference capability, the system demonstrates state-of-the-art performance, achieving an overall mAP@0.50 of 96.8% and mAP@0.50:0.95 of 79.7%, making it a promising tool for aiding radiologists in early diagnosis.

Introduction

Brain tumors are life-threatening anomalies that require early detection for successful treatment. Manual examination of MRI images can be tedious, error-prone, and inconsistent due to human subjectivity. With the advent of deep learning, particularly object detection algorithms, there is a growing interest in automating tumor detection to enhance diagnostic accuracy and speed.

This study applies YOLOv8n—a lightweight, real-time object detection model—to detect and localize brain tumors in MRI images. By leveraging its speed and accuracy, the project aims to create a deployable solution for medical settings where quick diagnosis is essential.

Methodology

Dataset
Source: Kaggle - MRI for Brain Tumor with Bounding Boxes

Classes: Pituitary, Meningioma, Glioma, No Tumor

Structure:

Each image includes a bounding box label in YOLO format.

Split into training and validation sets.

Model
YOLOv8n (Nano version) from Ultralytics is used for its lightweight architecture suitable for real-time inference.

Pretrained weights are fine-tuned on the MRI dataset using Ultralytics’ training loop.

Pipeline
Preprocessing:

Convert image color formats

Validate image-label pair integrity

Training Configuration:

Custom dataset in YOLO format

Trained using Google Colab with GPU acceleration

Evaluation:

Precision, Recall, mAP metrics

Class-wise and overall performance

Experiments

Environment: Google Colab with Python 3, OpenCV, Plotly, and Ultralytics YOLOv8.

Training Dataset: 512 images with 554 total annotated tumor instances.

Model Version: YOLOv8n trained from scratch using .yaml formatted dataset.

Results

Class Images Instances Precision Recall mAP@0.50 mAP@0.50:0.95
All 512 554 0.953 0.934 0.968 0.797
Pituitary 135 153 0.937 0.873 0.938 0.762
No Tumor 140 142 0.979 0.988 0.992 0.833
Meningioma 98 98 0.977 1.000 0.983 0.836
Glioma 154 161 0.920 0.876 0.959 0.755

Key Observations:
No Tumor class showed excellent precision and recall.

Meningioma achieved perfect recall, meaning the model did not miss any true positives.

Real-time inference speed: ~2.2ms per image.

Overall high performance despite using the lightweight YOLOv8n variant.

# Install necessary libraries
!pip install ultralytics
!pip install plotly
!pip install opencv-python-headless

import os
import cv2
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from ultralytics import YOLO
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import yaml
import shutil

!pip install -q kaggle

from google.colab import files
files.upload()

#create a kaggle folder
!mkdir ~/.kaggle
#copy the kaggle.json to folder created
!cp kaggle.json ~/.kaggle/
#permission for the json to act
!chmod 600 ~/.kaggle/kaggle.json

import kagglehub

# Download latest version
path = kagglehub.dataset_download("ahmedsorour1/mri-for-brain-tumor-with-bounding-boxes")

print("Path to dataset files:", path)

# Paths to data directories
train_path = "/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Train"
val_path = "/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Val"

from pathlib import Path

# Paths to data directories
train_path = Path("/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Train")
val_path = Path("/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Val")

# Check if directories exist
if train_path.exists() and train_path.is_dir():
    print(f"Training directory exists: {train_path}")
    print("Contents of the training directory:")
    print([p.name for p in train_path.iterdir()])
else:
    print(f"Training directory does not exist: {train_path}")

if val_path.exists() and val_path.is_dir():
    print(f"Validation directory exists: {val_path}")
    print("Contents of the validation directory:")
    print([p.name for p in val_path.iterdir()])
else:
    print(f"Validation directory does not exist: {val_path}")

# Classes
classes = ["Glioma", "Meningioma", "No Tumor", "Pituitary"]

# Function to load images and labels
def load_data(data_path):
    images = []
    labels = []
    for class_label in classes:
        class_path = os.path.join(data_path, class_label, 'images')
        label_path = os.path.join(data_path, class_label, 'labels')
        for img_file in os.listdir(class_path):
            img = cv2.imread(os.path.join(class_path, img_file))
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            label_file = img_file.replace('.jpg', '.txt')
            label_file_path = os.path.join(label_path, label_file)
            if os.path.exists(label_file_path):
                with open(label_file_path, 'r') as file:
                    label_data = file.readline().strip().split()
                    if len(label_data) > 0:
                        images.append(img)
                        labels.append(label_data)
                    else:
                        print(f"Label file {label_file_path} is empty, skipping this image.")
            else:
                print(f"Label file {label_file_path} not found, skipping this image.")
    return images, labels

# Load training and validation data
train_images, train_labels = load_data(train_path)
val_images, val_labels = load_data(val_path)
# Split the validation set: 50% for validation, 50% for test
val_images, test_images, val_labels, test_labels = train_test_split(
    val_images, val_labels, test_size=0.5, random_state=42)

# EDA - Visualize class distribution
train_counts = [len(os.listdir(os.path.join(train_path, cls, 'images'))) for cls in classes]
val_counts = [len(os.listdir(os.path.join(val_path, cls, 'images'))) for cls in classes]
eda_df = pd.DataFrame({'Class': classes, 'Train': train_counts, 'Validation': val_counts})

fig = go.Figure(data=[
    go.Bar(name='Train', x=eda_df['Class'], y=eda_df['Train']),
    go.Bar(name='Validation', x=eda_df['Class'], y=eda_df['Validation'])
])
fig.update_layout(barmode='group', title='Class Distribution in Training and Validation Sets')
fig.show()

import os
import shutil
from ultralytics import YOLO

# Step 1: Define paths
source_dir = '/kaggle/input/mri-for-brain-tumor-with-bounding-boxes'
train_source = os.path.join(source_dir, 'Train')
val_source = os.path.join(source_dir, 'Val')
dataset_dir = '/kaggle/working/brain_tumor_dataset'

# Step 2: Create YOLOv8 structure
for split in ['train', 'val']:
    os.makedirs(f'{dataset_dir}/images/{split}', exist_ok=True)
    os.makedirs(f'{dataset_dir}/labels/{split}', exist_ok=True)

# Step 3: Move images and labels to YOLO format
def prepare_data(source_split_dir, split):
    for class_dir_name in os.listdir(source_split_dir):
        # Construct the path to the class directory
        class_path = os.path.join(source_split_dir, class_dir_name)

        # Check if it's actually a directory (skip files or hidden items)
        if os.path.isdir(class_path):
            # Construct paths to the 'images' and 'labels' subdirectories
            img_source_dir = os.path.join(class_path, 'images')
            lbl_source_dir = os.path.join(class_path, 'labels')

            # Check if 'images' directory exists
            if os.path.exists(img_source_dir) and os.path.isdir(img_source_dir):
                # Iterate through image files in the 'images' subdirectory
                for file in os.listdir(img_source_dir):
                    file_path = os.path.join(img_source_dir, file)
                    if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                        # Copy image file to the destination
                        shutil.copy(file_path, f'{dataset_dir}/images/{split}/{file}')
            else:
                print(f"Warning: Images directory not found for class '{class_dir_name}' in {source_split_dir}")

            # Check if 'labels' directory exists
            if os.path.exists(lbl_source_dir) and os.path.isdir(lbl_source_dir):
                 # Iterate through label files in the 'labels' subdirectory
                for file in os.listdir(lbl_source_dir):
                    file_path = os.path.join(lbl_source_dir, file)
                    if file.lower().endswith('.txt'):
                        # Copy label file to the destination
                        shutil.copy(file_path, f'{dataset_dir}/labels/{split}/{file}')
            else:
                print(f"Warning: Labels directory not found for class '{class_dir_name}' in {source_split_dir}")


prepare_data(train_source, 'train')
prepare_data(val_source, 'val')

# Step 4: Write data.yaml
data_yaml = f"""
path: {dataset_dir}
train: images/train
val: images/val
nc: 4
names: ['Pituitary', 'No Tumor', 'Meningioma', 'Glioma']
"""
with open(f'{dataset_dir}/data.yaml', 'w') as f:
    f.write(data_yaml)

# Step 5: Train YOLOv8 model
model = YOLO('yolov8n.pt')  # You can use yolov8s.pt for a larger model
model.train(data=f'{dataset_dir}/data.yaml', epochs=20, imgsz=640)

# Step 6: Inference example on validation images
val_images_dir = f'{dataset_dir}/images/val'
results = model(val_images_dir, save=True, conf=0.4)  # Adjust conf threshold as needed

# Display a sample prediction (optional)
from IPython.display import Image, display
import os # Ensure os is imported if not already

for r in results:
    # Use os.path.join to correctly combine the directory and filename
    display(Image(filename=os.path.join(r.save_dir, os.path.basename(r.path))))
    break  # Show just one example

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import numpy as np
import os

# Define class names (same as in data.yaml)
classes = ['Pituitary', 'No Tumor', 'Meningioma', 'Glioma']

# Function to display detection results with highlighted parts
def display_samples(images_dir, yolo_model, num_samples=10):
    images = [os.path.join(images_dir, img) for img in os.listdir(images_dir) if img.lower().endswith(('.jpg', '.jpeg', '.png'))]
    images = images[:num_samples]  # Limit to first N images

    for img_path in images:
        img = Image.open(img_path)
        img_array = np.array(img)

        results = yolo_model(img_path)[0]  # Run inference on a single image
        plt.figure(figsize=(8, 8))
        plt.imshow(img_array)
        ax = plt.gca()

        # Draw bounding boxes and labels
        for box in results.boxes:
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
            conf = box.conf[0].cpu().numpy()
            cls = int(box.cls[0].cpu().numpy())

            rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='r', facecolor='none')
            ax.add_patch(rect)
            plt.text(x1, y1 - 10, f"{classes[cls]} {conf:.2f}", color='white', fontsize=12, backgroundcolor='red')

        plt.title('YOLOv8 Detection')
        plt.axis('off')
        plt.show()

# Usage example (after model training and loading)
val_images_dir = f'{dataset_dir}/images/val'
display_samples(val_images_dir, model)

Conclusion

This project successfully demonstrates the feasibility and effectiveness of using YOLOv8 for detecting brain tumors from MRI scans. With high precision and recall across all tumor types and real-time inference capabilities, the model is suitable for assisting in clinical workflows.

Future Work
Expand dataset with 3D MRI slices or volumetric data.

Incorporate segmentation techniques for pixel-level tumor boundaries.

Evaluate on cross-institutional datasets to ensure generalization.

Abstract

Introduction

Methodology

Dataset
Source: Kaggle - MRI for Brain Tumor with Bounding Boxes

Classes: Pituitary, Meningioma, Glioma, No Tumor

Structure:

Each image includes a bounding box label in YOLO format.

Split into training and validation sets.

Model
YOLOv8n (Nano version) from Ultralytics is used for its lightweight architecture suitable for real-time inference.

Pretrained weights are fine-tuned on the MRI dataset using Ultralytics’ training loop.

Pipeline
Preprocessing:

Convert image color formats

Validate image-label pair integrity

Training Configuration:

Custom dataset in YOLO format

Trained using Google Colab with GPU acceleration

Evaluation:

Precision, Recall, mAP metrics

Class-wise and overall performance

Experiments

Environment: Google Colab with Python 3, OpenCV, Plotly, and Ultralytics YOLOv8.

Training Dataset: 512 images with 554 total annotated tumor instances.

Model Version: YOLOv8n trained from scratch using .yaml formatted dataset.

Results

Key Observations:
No Tumor class showed excellent precision and recall.

Meningioma achieved perfect recall, meaning the model did not miss any true positives.

Real-time inference speed: ~2.2ms per image.

Overall high performance despite using the lightweight YOLOv8n variant.

# Install necessary libraries
!pip install ultralytics
!pip install plotly
!pip install opencv-python-headless

import os
import cv2
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from ultralytics import YOLO
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import yaml
import shutil

!pip install -q kaggle

from google.colab import files
files.upload()

#create a kaggle folder
!mkdir ~/.kaggle
#copy the kaggle.json to folder created
!cp kaggle.json ~/.kaggle/
#permission for the json to act
!chmod 600 ~/.kaggle/kaggle.json

import kagglehub

# Download latest version
path = kagglehub.dataset_download("ahmedsorour1/mri-for-brain-tumor-with-bounding-boxes")

print("Path to dataset files:", path)

# Paths to data directories
train_path = "/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Train"
val_path = "/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Val"

from pathlib import Path

# Paths to data directories
train_path = Path("/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Train")
val_path = Path("/kaggle/input/mri-for-brain-tumor-with-bounding-boxes/Val")

# Check if directories exist
if train_path.exists() and train_path.is_dir():
    print(f"Training directory exists: {train_path}")
    print("Contents of the training directory:")
    print([p.name for p in train_path.iterdir()])
else:
    print(f"Training directory does not exist: {train_path}")

if val_path.exists() and val_path.is_dir():
    print(f"Validation directory exists: {val_path}")
    print("Contents of the validation directory:")
    print([p.name for p in val_path.iterdir()])
else:
    print(f"Validation directory does not exist: {val_path}")

# Classes
classes = ["Glioma", "Meningioma", "No Tumor", "Pituitary"]

# Function to load images and labels
def load_data(data_path):
    images = []
    labels = []
    for class_label in classes:
        class_path = os.path.join(data_path, class_label, 'images')
        label_path = os.path.join(data_path, class_label, 'labels')
        for img_file in os.listdir(class_path):
            img = cv2.imread(os.path.join(class_path, img_file))
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            label_file = img_file.replace('.jpg', '.txt')
            label_file_path = os.path.join(label_path, label_file)
            if os.path.exists(label_file_path):
                with open(label_file_path, 'r') as file:
                    label_data = file.readline().strip().split()
                    if len(label_data) > 0:
                        images.append(img)
                        labels.append(label_data)
                    else:
                        print(f"Label file {label_file_path} is empty, skipping this image.")
            else:
                print(f"Label file {label_file_path} not found, skipping this image.")
    return images, labels

# Load training and validation data
train_images, train_labels = load_data(train_path)
val_images, val_labels = load_data(val_path)
# Split the validation set: 50% for validation, 50% for test
val_images, test_images, val_labels, test_labels = train_test_split(
    val_images, val_labels, test_size=0.5, random_state=42)

# EDA - Visualize class distribution
train_counts = [len(os.listdir(os.path.join(train_path, cls, 'images'))) for cls in classes]
val_counts = [len(os.listdir(os.path.join(val_path, cls, 'images'))) for cls in classes]
eda_df = pd.DataFrame({'Class': classes, 'Train': train_counts, 'Validation': val_counts})

fig = go.Figure(data=[
    go.Bar(name='Train', x=eda_df['Class'], y=eda_df['Train']),
    go.Bar(name='Validation', x=eda_df['Class'], y=eda_df['Validation'])
])
fig.update_layout(barmode='group', title='Class Distribution in Training and Validation Sets')
fig.show()

import os
import shutil
from ultralytics import YOLO

# Step 1: Define paths
source_dir = '/kaggle/input/mri-for-brain-tumor-with-bounding-boxes'
train_source = os.path.join(source_dir, 'Train')
val_source = os.path.join(source_dir, 'Val')
dataset_dir = '/kaggle/working/brain_tumor_dataset'

# Step 2: Create YOLOv8 structure
for split in ['train', 'val']:
    os.makedirs(f'{dataset_dir}/images/{split}', exist_ok=True)
    os.makedirs(f'{dataset_dir}/labels/{split}', exist_ok=True)

# Step 3: Move images and labels to YOLO format
def prepare_data(source_split_dir, split):
    for class_dir_name in os.listdir(source_split_dir):
        # Construct the path to the class directory
        class_path = os.path.join(source_split_dir, class_dir_name)

        # Check if it's actually a directory (skip files or hidden items)
        if os.path.isdir(class_path):
            # Construct paths to the 'images' and 'labels' subdirectories
            img_source_dir = os.path.join(class_path, 'images')
            lbl_source_dir = os.path.join(class_path, 'labels')

            # Check if 'images' directory exists
            if os.path.exists(img_source_dir) and os.path.isdir(img_source_dir):
                # Iterate through image files in the 'images' subdirectory
                for file in os.listdir(img_source_dir):
                    file_path = os.path.join(img_source_dir, file)
                    if file.lower().endswith(('.jpg', '.jpeg', '.png')):
                        # Copy image file to the destination
                        shutil.copy(file_path, f'{dataset_dir}/images/{split}/{file}')
            else:
                print(f"Warning: Images directory not found for class '{class_dir_name}' in {source_split_dir}")

            # Check if 'labels' directory exists
            if os.path.exists(lbl_source_dir) and os.path.isdir(lbl_source_dir):
                 # Iterate through label files in the 'labels' subdirectory
                for file in os.listdir(lbl_source_dir):
                    file_path = os.path.join(lbl_source_dir, file)
                    if file.lower().endswith('.txt'):
                        # Copy label file to the destination
                        shutil.copy(file_path, f'{dataset_dir}/labels/{split}/{file}')
            else:
                print(f"Warning: Labels directory not found for class '{class_dir_name}' in {source_split_dir}")


prepare_data(train_source, 'train')
prepare_data(val_source, 'val')

# Step 4: Write data.yaml
data_yaml = f"""
path: {dataset_dir}
train: images/train
val: images/val
nc: 4
names: ['Pituitary', 'No Tumor', 'Meningioma', 'Glioma']
"""
with open(f'{dataset_dir}/data.yaml', 'w') as f:
    f.write(data_yaml)

# Step 5: Train YOLOv8 model
model = YOLO('yolov8n.pt')  # You can use yolov8s.pt for a larger model
model.train(data=f'{dataset_dir}/data.yaml', epochs=20, imgsz=640)

# Step 6: Inference example on validation images
val_images_dir = f'{dataset_dir}/images/val'
results = model(val_images_dir, save=True, conf=0.4)  # Adjust conf threshold as needed

# Display a sample prediction (optional)
from IPython.display import Image, display
import os # Ensure os is imported if not already

for r in results:
    # Use os.path.join to correctly combine the directory and filename
    display(Image(filename=os.path.join(r.save_dir, os.path.basename(r.path))))
    break  # Show just one example

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import numpy as np
import os

# Define class names (same as in data.yaml)
classes = ['Pituitary', 'No Tumor', 'Meningioma', 'Glioma']

# Function to display detection results with highlighted parts
def display_samples(images_dir, yolo_model, num_samples=10):
    images = [os.path.join(images_dir, img) for img in os.listdir(images_dir) if img.lower().endswith(('.jpg', '.jpeg', '.png'))]
    images = images[:num_samples]  # Limit to first N images

    for img_path in images:
        img = Image.open(img_path)
        img_array = np.array(img)

        results = yolo_model(img_path)[0]  # Run inference on a single image
        plt.figure(figsize=(8, 8))
        plt.imshow(img_array)
        ax = plt.gca()

        # Draw bounding boxes and labels
        for box in results.boxes:
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
            conf = box.conf[0].cpu().numpy()
            cls = int(box.cls[0].cpu().numpy())

            rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='r', facecolor='none')
            ax.add_patch(rect)
            plt.text(x1, y1 - 10, f"{classes[cls]} {conf:.2f}", color='white', fontsize=12, backgroundcolor='red')

        plt.title('YOLOv8 Detection')
        plt.axis('off')
        plt.show()

# Usage example (after model training and loading)
val_images_dir = f'{dataset_dir}/images/val'
display_samples(val_images_dir, model)

Conclusion

Future Work
Expand dataset with 3D MRI slices or volumetric data.

Incorporate segmentation techniques for pixel-level tumor boundaries.

Evaluate on cross-institutional datasets to ensure generalization.

Brain-Tumor-Detection-Using-Yolov8n-model-from-ultralytics

Table of contents

Abstract

Introduction

Methodology

Experiments

Results

Conclusion

Table of contents

Files

Abstract

Introduction

Methodology

Experiments

Results

Conclusion

Code

Code

Datasets

Datasets