Machine learning for disease detection

Introduction:

Many bacterial, fungal and viral pathogens and insects pose a constant threat to apple orchards in the growing season. Precise and early identification of diseases and pests is needed for timely management to prevent crop loss. Currently, disease and pest detection in commercial apple orchards relies on manual scouting by crop consultants. However, human scouting is usually time-consuming, expensive and, in some cases, prone to errors. Computer-vision based models can be trained and used for automated disease detection with increased efficiency and accuracy of detection (Thapa et al. 2020).

A flow diagram of the steps to develop computer vision models for disease detection based on symptoms of annotated leaves and fruits of apples.

Plant Pathology Challenge 2020 and dataset:

We have created an expert-annotated disease dataset for major foliar diseases of apples and set-up a Kaggle competition ‘Plant Pathology Challenge’ for the Fine‐Grained Visual Categorization (FGVC 7) workshop at the Computer Vision and Pattern Recognition conference (CVPR 2020). This dataset is based on RGB images of apple foliar disease captured using a Canon Rebel T5i DSLR and smartphones under various conditions. Images were manually annotated for apple scab, cedar apple rust, healthy leaves, and complex disease symptoms from multiple diseases on the same leaf. The final dataset consists of approximately 3,651 RGB images with 1,200 apple scab, 1,399 cedar apple rust, 187 complex disease, and 865 healthy leaves (Thapa et al. 2020).

Images from the dataset showing symptoms of (a) Apple scab, (b) Cedar apple rust, and (c) Multiple diseases in a single leaf, captured under various noise conditions.

The competition was launched at Kaggle on March 9, 2020 and was open until May 26, 2020 to develop machine learning (ML) models to 1) Accurately classify a given image from the testing dataset into different disease categories or a healthy leaf; and to, 2) Accurately distinguish between the many diseases, sometimes more than one on a single leaf.

At the end of competition, mean AUC (area under the ROC curve) values were used to select the three winners with top models. A total of 1,345 teams have participated in the competition and have submitted approximately 5,796 entries.

The top three area ROC values submitted to the private leaderboard were 0.98445,  0.98182, and 0.98089. We also trained an off-the-shelf convolutional neural network (CNN) for disease classification and achieved 97% accuracy on the test set (Thapa et al. 2020).

 

Dataset and ML model availability:

The dataset of foliar images used for ‘Plant Pathology Challenge 2020’ competition is available on Kaggle.

The ML model is available on GitHub.

 

References:

Thapa R, Zhang K, Snavely N, Belongie S, Khan MA. 2020. The Plant Pathology 2020 challenge dataset to classify foliar disease of apples. Applications in Plant Sciences, 8 (9), e11390.