Dec 03, 2024●41 reads

Weed detection system in a maize farm using vision transform and unet Cnn neural network

s
Selman Jacob Gambo

Unveiling the Power of Deep Learning: Weed Detection and Segmentation with PyTorch
download (28).jpeg
Abstract

Precision agriculture is rapidly evolving to enhance the efficiency and sustainability of modern farming practices. A critical challenge within this domain is the accurate detection and segmentation of weeds, which can significantly impact crop yields and resource management. This publication introduces a deep learning-based solution that utilizes PyTorch to address this challenge effectively. The proposed system integrates two advanced models: a U-Net for image segmentation and a Vision Transformer (ViT) for image classification. The U-Net model excels at precisely identifying and segmenting weed regions within input images, while the ViT model classifies the images as either containing weeds or not. By combining these complementary capabilities, the solution offers a comprehensive approach to weed detection and management, empowering farmers to optimize their crop cultivation practices and achieve better outcomes.

Motivation

The accurate detection and segmentation of weeds in agricultural fields play a pivotal role in enhancing crop yields, minimizing the use of herbicides, and promoting sustainable farming practices. Weeds compete with crops for resources such as light, water, and nutrients, often leading to reduced agricultural productivity. Traditional weed management methods—such as manual identification, mechanical weeding, or rule-based algorithms—face significant limitations. These approaches can be labor-intensive, time-consuming, and prone to errors, particularly in the face of diverse and variable weed populations.

Deep learning techniques have revolutionized many fields, including computer vision, by providing powerful tools for automating complex tasks. The introduction of models like U-Net and Vision Transformers has opened new avenues for improving the accuracy and efficiency of weed detection and segmentation. These models can learn from vast amounts of data, recognizing patterns and features that may be difficult for human experts to discern.

Importance of Accurate Weed Detection

Improved Crop Yields: Weeds can significantly reduce crop yields by competing for essential resources. Accurate detection allows for targeted interventions, ensuring that crops receive the necessary nutrients and light.
Reduced Herbicide Use: Effective weed management can lead to a decrease in herbicide application, which not only reduces costs for farmers but also mitigates environmental impacts. This is particularly important in the context of rising concerns about chemical runoff and its effects on ecosystems.
Enhanced Resource Management: By accurately identifying weed infestations, farmers can optimize their use of water, fertilizers, and other inputs, leading to more sustainable farming practices.

Technological Advancements in Weed Detection

The motivation behind this project is to develop a robust and efficient deep learning-based solution capable of accurately identifying and segmenting weeds in agricultural images. By leveraging the strengths of U-Net for high-resolution image segmentation and ViT for effective image classification, the goal is to create a comprehensive system that supports farmers in making informed decisions regarding weed management. This system aims to facilitate better crop yields, reduce the environmental impact associated with herbicide use, and ultimately foster more sustainable agricultural practices.

U-Net for Image Segmentation

The U-Net architecture is particularly well-suited for tasks requiring precise segmentation of images. Its encoder-decoder structure allows for capturing both high-level context and low-level details, making it effective in distinguishing weed regions from the background. The model's ability to handle varying image resolutions and complex patterns enables it to adapt to diverse agricultural environments.

Encoder-Decoder Structure: The U-Net's architecture consists of a contracting path (encoder) that captures context and a symmetric expanding path (decoder) that enables precise localization. This structure helps in maintaining spatial information, which is critical for accurate segmentation.
Skip Connections: U-Net utilizes skip connections to retain information from earlier layers, improving the model's ability to reconstruct fine details in the segmentation mask. This is essential for accurately delineating weed boundaries.

Vision Transformer (ViT) for Image Classification

The Vision Transformer model represents a shift from traditional convolutional approaches to transformer-based architectures in image classification tasks. By treating images as sequences of patches, ViT leverages self-attention mechanisms to capture relationships across different regions of an image.

Self-Attention Mechanism: This allows the model to weigh the importance of various image patches when making predictions, enabling it to focus on relevant features that distinguish between weed and non-weed images.
Transfer Learning: By utilizing pre-trained ViT models, the system can achieve high classification accuracy with limited training data. Fine-tuning these models for specific agricultural tasks can yield significant improvements in performance.

Practical Applications for Farmers

The integration of U-Net and ViT into a single system offers practical benefits for farmers:

Real-Time Monitoring: Farmers can use the system to monitor fields in real-time, identifying weed outbreaks early and allowing for timely intervention.
Decision Support: The system can provide actionable insights, helping farmers decide when and where to apply herbicides or employ mechanical weeding methods.
Resource Optimization: By accurately identifying weed presence, farmers can optimize their input use, leading to cost savings and improved environmental outcomes.
Scalability: The solution can be applied across various scales of farming operations, from small family farms to large commercial agricultural enterprises, making it a versatile tool for modern agriculture.

Future Directions

The development of this deep learning-based solution is just the beginning. There are numerous opportunities for further research and improvement:

Model Enhancement: Continuous improvements in model architectures and training methods can lead to higher accuracy and robustness in diverse agricultural settings.
Data Collection and Annotation: Building larger, well-annotated datasets that include various weed species and environmental conditions can enhance model performance and generalization.
Integration with Other Technologies: Combining this solution with drone technology and IoT sensors can enable comprehensive monitoring and management of agricultural fields, paving the way for fully automated farming practices.
User-Friendly Interfaces: Developing intuitive interfaces for farmers to easily interact with the system will increase adoption and usability, ensuring that the technology serves its intended purpose effectively.

By focusing on these aspects, the proposed solution has the potential to significantly transform weed management practices in agriculture, leading to more efficient, sustainable, and productive farming.

Segmentation with U-Net

The core of the weed detection system is the U-Net architecture, a widely-adopted convolutional neural network (CNN) for semantic segmentation.
download (27).jpeg
The UNet class in the provided code defines the model's structure, with the following key components:

#Unet
class UNet(nn.Module):
    def __init__(self):
        super(UNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)
        self.final_conv = nn.Conv2d(64, 2, kernel_size=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.final_conv(x)
        return x

The forward() method defines the flow of the input image through the network, starting with two convolutional layers followed by a final convolutional layer that outputs a 2-channel segmentation map. This map is then processed to create a binary mask, highlighting the segmented weed regions.

Classification with Vision Transformer (ViT)

To complement the segmentation capabilities, the system integrates a Vision Transformer (ViT) model for image classification.
download (12).png
The VIT class in the code defines the ViT-based classification model:

#vit
class VIT(nn.Module):
    def __init__(self, config=ViTConfig(), num_labels=2, model_checkpoint='google/vit-base-patch16-224-in21k'):
        super(VIT, self).__init__()
        self.vit = ViTModel.from_pretrained(model_checkpoint, add_pooling_layer=False)
        self.classifier = nn.Linear(config.hidden_size, num_labels)
        self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
        self.pooler_activation = nn.Tanh()

    def forward(self, x):
        x = self.vit(x)['last_hidden_state']
        x = self.pooler_activation(self.pooler(x[:, 0, :]))
        output = self.classifier(x)
        return output

The ViT model is pre-trained on a large-scale dataset and fine-tuned for the specific task of weed/non-weed classification. The forward() method processes the input image through the ViT backbone and applies a linear classifier to produce the final classification output.

Training the ViT Model

The training process involves the following steps:

Loss Function and Optimizer: We use Cross Entropy Loss and Stochastic Gradient Descent (SGD) for optimization.
Training Loop: The model is trained over a specified number of epochs, calculating the loss and accuracy on the test set after each epoch.

model = VIT()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)

            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print("Epoch {}/{}: Loss: {:.4f}, Test Accuracy: {:.2f}%".format(epoch+1, num_epochs, running_loss, accuracy))

# Save the trained model
torch.save(model.state_dict(), 'weed_detection_model.pth')

Training the U-Net Model for Segmentation

The U-Net model is trained using a custom dataset class to handle image-mask pairs. Here’s how it’s structured:

U-Net Implementation

class UNet(nn.Module):
    def __init__(self, num_classes=2):
        super(UNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)
        self.final_conv = nn.Conv2d(64, num_classes, kernel_size=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.final_conv(x)
        x = x.permute(0, 2, 3, 1)  # Reshape to (batch_size, height, width, num_classes)
        return x

Custom Dataset Class

The custom dataset class handles loading images and corresponding masks:

class SegmentationDataset(Dataset):
    def __init__(self, image_dir, mask_dir, transform=None):
        self.image_dir = image_dir
        self.mask_dir = mask_dir
        self.image_files = os.listdir(image_dir)
        self.mask_files = os.listdir(mask_dir)
        self.transform = transform

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        image_path = os.path.join(self.image_dir, self.image_files[idx])
        mask_path = os.path.join(self.mask_dir, self.mask_files[idx])
        image = Image.open(image_path).convert('RGB')
        mask = Image.open(mask_path).convert('L')

        if self.transform:
            image = self.transform(image)
            mask = self.transform(mask)

        return image, mask

Training Loop for U-Net

The training loop for the U-Net model follows a similar structure to the ViT model but focuses on the segmentation task:

# Set the paths to image and mask directories
image_dir = '/content/drive/MyDrive/test (1)/images'
mask_dir = '/content/drive/MyDrive/test (1)/masks'

# Define the desired image and mask sizes
desired_image_size = (768, 432)

# Create the dataset
transform = transforms.Compose([
    transforms.Resize(desired_image_size),
    transforms.ToTensor()
])
dataset = SegmentationDataset(image_dir, mask_dir, transform=transform)

# Create the data loader
batch_size = 4
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Create the U-Net model
model = UNet()

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

for epoch in range(num_epochs):
    for images, masks in data_loader:
        images = images.to(device)
        masks = masks.to(device)

        # Forward pass
        outputs = model(images)

        # Reshape the outputs to match the shape of the masks
        outputs = outputs.permute(0, 3, 1, 2)  # Reshape to (batch_size, num_classes, height, width)

        # Calculate loss
        loss = criterion(outputs, masks)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# Save the trained model
torch.save(model.state_dict(), '/content/drive/MyDrive/model.pt')

Evaluation and Visualization

After training, evaluate the model's performance using metrics such as accuracy and a classification report. Visualizations of the segmentation results can be generated to provide insights into model performance.

from sklearn.metrics import classification_report

# Generate classification report
classification_rep = classification_report([top_predicted_class], [top_predicted_class], labels=[0, 1], target_names=class_names, zero_division=0)
print("Classification Report:")
print(classification_rep)

classification report.PNG

End-to-End Workflow

The provided code demonstrates the end-to-end workflow, from loading and preprocessing the input image to performing both segmentation and classification tasks. The key steps are:

Load and preprocess the input image using the defined transformations.
Pass the preprocessed image through the segmentation model (UNet) to obtain the segmentation mask.
Pass the preprocessed image through the classification model (VIT) to obtain the predicted class and confidence.
Overlay the segmentation mask on the original image for visualization.
Print the predicted class, confidence, and generate a classification report.
Visualize the predicted classes and their corresponding confidences using a 2D histogram.

Visualizations and Code Snippets

To enhance the understanding of the solution, we can include relevant visualizations and code snippets throughout the article.

For example, we can display the input image, the segmentation mask, and the overlaid result to showcase the performance of the U-Net model:

# Visualize the segmentation results
segmentation_mask = torch.argmax(segmentation_output, dim=1).squeeze().byte()
overlaid_image = Image.fromarray((image.permute(1, 2, 0).byte().numpy() * (1 - segmentation_mask.byte().numpy()[:, :, None]) + segmentation_mask.byte().numpy()[:, :, None] * torch.tensor([255, 0, 0])).byte())
overlaid_image.save('segmentation_result.png')

Additionally, we can plot the classification confidence heatmap using the ViT model's output:

# Visualize the classification confidence
class_confidences = torch.softmax(classification_output, dim=1)
plt.figure(figsize=(8, 6))
plt.imshow(class_confidences.detach().cpu().numpy(), cmap='Blues')
plt.colorbar()
plt.title('Classification Confidence Heatmap')
plt.savefig('classification_confidence.png')

By integrating the powerful capabilities of U-Net and Vision Transformer, this solution provides a robust and accurate weed detection and segmentation system, paving the way for more efficient and sustainable precision agriculture practices.

GUI CODE
#gui
import gradio as gr
import torch
import torchvision.transforms as transforms
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

# Load the segmentation model and classification model
class UNet(nn.Module):
    def __init__(self):
        super(UNet, self).__init__()
        # Define the architecture here
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)
        self.final_conv = nn.Conv2d(64, 2, kernel_size=1)

    def forward(self, x):
        # Implement the forward pass here
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.final_conv(x)
        return x
segmentation_model = UNet()
segmentation_model.load_state_dict(torch.load('model (1).pt', map_location=torch.device('cpu')))
segmentation_model.eval()
class VIT(nn.Module):
    def __init__(self, config=ViTConfig(), num_labels=2, model_checkpoint='google/vit-base-patch16-224-in21k'):
        super(VIT, self).__init__()
        self.vit = ViTModel.from_pretrained(model_checkpoint, add_pooling_layer=False)
        self.classifier = nn.Linear(config.hidden_size, num_labels)
        self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
        self.pooler_activation = nn.Tanh()

    def forward(self, x):
        x = self.vit(x)['last_hidden_state']
        x = self.pooler_activation(self.pooler(x[:, 0, :]))
        output = self.classifier(x)
        return output
classification_model = VIT()
classification_model.load_state_dict(torch.load('weed_detection_model.pth', map_location=torch.device('cpu')))
classification_model.eval()

# Define the class names
class_names = ["non-weed", "weed-images"]

# Define the transformations to apply to the input images
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

# Define the function to preprocess the input image
def preprocess_image(image):
    if isinstance(image, np.ndarray):
        image = Image.fromarray(image)
    image = image.convert("RGB")
    image = transform(image)
    image = image.unsqueeze(0)
    return image

# Define the function to perform the prediction
def predict_image(image):
    # Preprocess the image
    image_tensor = preprocess_image(image)

    # Perform classification using the Vision Transformer model
    with torch.no_grad():
        classification_output = classification_model(image_tensor)
    _, predicted_classes = torch.topk(classification_output, k=2, dim=1)
    confidences = torch.softmax(classification_output, dim=1)[0, predicted_classes]

    # Extract the top predicted class and its confidence
    top_predicted_class = predicted_classes[0, 0].item()
    top_predicted_class_name = class_names[top_predicted_class]
    top_confidence = confidences[0, 0].item()

    # Check if both weed and non-weed classes are present
    if 0 in predicted_classes and 1 in predicted_classes:
        second_predicted_class = predicted_classes[0, 1].item()
        second_predicted_class_name = class_names[second_predicted_class]
        second_confidence = confidences[0, 1].item()
    else:
        second_predicted_class = None
        second_predicted_class_name = None
        second_confidence = None

    # Perform segmentation using the U-Net model
    with torch.no_grad():
        segmentation_output = segmentation_model(image_tensor)

    # Process the segmentation output
    binary_mask = (segmentation_output > 0.5).float()
    binary_mask = binary_mask.argmax(dim=1).squeeze().cpu().numpy()
    blue_color = np.array([0, 0, 255], dtype=np.uint8)
    segmented_image = image_tensor.squeeze().permute(1, 2, 0)
    segmented_image = segmented_image.cpu().numpy()
    segmented_image[binary_mask == 1] = blue_color
    segmented_image = Image.fromarray(segmented_image.astype(np.uint8))

    # Return the predicted classes, confidences, and segmented image
    return top_predicted_class_name, top_confidence, second_predicted_class_name, second_confidence, segmented_image

# Define the inputs and outputs for the gradio interface
#inputs = gr.Image()
outputs = [
    gr.Textbox(label="Top Predicted Class"),
    gr.Textbox(label="Top Confidence"),
    gr.Textbox(label="Second Predicted Class"),
    gr.Textbox(label="Second Confidence"),
    gr.Image(label="Segmented Image")
]

# Create the gradio interface
gr.Interface(fn=predict_image, inputs=inputs, outputs=outputs).launch()

Full project link below
Github

Link text

Weed detection system in a maize farm using vision transform and unet Cnn neural network

Models

Models

Datasets

Datasets