Cassava is one of the most important food crops in Africa, providing a crucial source of carbohydrates to millions. However, cassava plants face threats from viral diseases that can devastate yields, pushing smallholder farmers into food insecurity. Traditional methods for detecting these diseases involve government-funded experts physically inspecting plants, which is costly, labor-intensive, and inaccessible to many farmers. Fortunately, with advancements in data science and machine learning (ML), there is a way to revolutionize this process, making disease detection faster, cheaper, and more accessible.
By allowing farmers to simply take a picture of their cassava plant with a mobile phone and receive a real-time diagnosis, we can dramatically reduce the barriers to protecting crops from disease.
At the core of this solution is the machine learning model. Here are the steps involved in creating an effective model:
Data Collection: The dataset includes images of cassava leaves, annotated by experts from the National Crops Resources Research Institute (NaCRRI) and Makerere University. These images capture real-world farming conditions and reflect what smallholder farmers would encounter when identifying diseases in their fields.
Data preprocessing: preparing the dataset is critical. The images need to be standardized, augmented (with techniques such as rotations, flips, and color changes), and split into training, validation, and testing sets. Data augmentation helps the model generalize well to new, unseen images by mimicking the diversity in real-world scenarios.
Model Selection: For a project targeting low-resource environments (e.g., rural farmers using mobile devices), the model must be efficient and lightweight. Pre-trained convolutional neural networks (CNNs) like MobileNet or EfficientNet are ideal due to their balance of accuracy and computational efficiency. These models can be fine-tuned on the cassava dataset to enhance performance for this specific task.
Training: The model is trained to classify cassava leaves into five categories — ffour diseases or healthy. This involves feeding the model large amounts of labeled data, adjusting its parameters to minimize error, and then evaluating its performance on unseen validation data. Once trained, the model should have high accuracy in predicting the correct disease.
MobileNet achieved a validation accuracy of 71.26 percent, with precision and recall scores demonstrating strong performance across all disease classes. The confusion matrix shows that the model effectively differentiates between disease types, with few misclassifications. Additionally, MobileNet’s lightweight design allows for real-time processing, offering inference speeds suitable for mobile use in field conditions.