Computer Vision for Plant Pathology
Computer Vision for Plant Pathology
Computer Vision for Plant Pathology
DOI: 10.1002/aps3.11559
REVIEW ARTICLE
1
Department of Computer Science, University of Abstract
York, Deramore Lane, York, YO10 5GH,
Yorkshire, United Kingdom Plant pathogens can decimate crops and render the local cultivation of a species
2 unprofitable. In extreme cases this has caused famine and economic collapse. Timing
Centre for Novel Agricultural Products,
Department of Biology, University of York, is vital in treating crop diseases, and the use of computer vision for precise disease
Wentworth Way, York, YO10 5DD, Yorkshire, detection and timing of pesticide application is gaining popularity. Computer vision
United Kingdom
can reduce labour costs, prevent misdiagnosis of disease, and prevent misapplication
3
Department of Biology, University of York, of pesticides. Pesticide misapplication is both financially costly and can exacerbate
Wentworth Way, York, YO10 5DD, Yorkshire,
United Kingdom
pesticide resistance and pollution. Here, we review the application and development
of computer vision and machine learning methods for the detection of plant disease.
Correspondence This review goes beyond the scope of previous works to discuss important technical
Jamie R. Sykes, Department of Computer Science, concepts and considerations when applying computer vision to plant pathology. We
University of York, Deramore Lane, York, YO10 present new case studies on adapting standard computer vision methods and review
5GH, Yorkshire, United Kingdom.
Email: jamie.sykes@york.ac.uk
techniques for acquiring training data, the use of diagnostic tools from biology, and
the inspection of informative features. In addition to an in‐depth discussion of
This article is part of the special issue “Resilient convolutional neural networks (CNNs) and transformers, we also highlight the
botany: Innovation in the face of limited mobility
and resources.”
strengths of methods such as support vector machines and evolved neural networks.
We discuss the benefits of carefully curating training data and consider situations
where less computationally expensive techniques are advantageous. This includes a
comparison of popular model architectures and a guide to their implementation.
KEYWORDS
agronomy, disease detection, machine learning, plant pathology
Computer vision (CV), typically powered by machine for tasks specific to cocoa (Theobroma cacao L.), such as the
learning (ML), is now used for a variety of tasks in exploration and optimisation of aroma profiles (Fuentes
agriculture, botany, and ecology. These tasks include plant et al., 2019), monitoring of cocoa bean fermentation (Parra
health assessments (Patrício and Rieder, 2018), identifica- et al., 2018; Oliveira et al., 2021), and bean quality
tion of weeds (Wu et al., 2021), identification of drought‐ classification (Mite‐Baidal et al., 2019). While large research
prone areas of land (Ramos‐Giraldo et al., 2020), yield and development budgets for areas such as wheat (Triticum
prediction (Sarkate et al., 2013), and detection of defects or aestivum L.) production have allowed for the use of
bruising in fruits and vegetables (Tripathi and Maktedar, unpiloted aerial vehicle photography to identify disease
2020). We are seeing substantial improvement in the outbreaks (Su et al., 2018; Chiu et al., 2020) and the use of
efficiency of CV techniques (He et al., 2016; Howard et al., multispectral satellite photography to monitor outbreaks of
2017; Zhang et al., 2018) and, at least for now, computa- yellow rust (Puccinia striiformis) from space (Nagarajan
tional resources continue to become more affordable et al., 1984), the application of ML to sectors with fewer
(Mack, 2011). As a result, CV is becoming available to financial resources has had to take a different form.
whole industries, not just areas of highest commercial value; Onboard graphics processing units (GPUs) can run large
for example, ML has been used with increasing regularity neural networks locally, analysing image data from farm
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the
original work is properly cited.
© 2023 The Authors. Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America.
machinery in real time, while fast internet connections can focuses on more practical issues. The final section, “A
be used to run the same large models remotely roadmap to commercial implementation,” includes multiple
(Grosch, 2018). By contrast, implementation of ML in points that are important to consider prior to choosing an
poorer sectors must rely on older hardware, edge devices, architecture and beginning development.
and older‐model smartphones. This means that an emphasis Several review articles have been published on the topic
must be placed on the ultra‐low‐cost implementation and of CV and deep learning that are applicable to plant
high computational efficiency of algorithms. This provides pathology (Voulodimos et al., 2018; Weinstein, 2018;
us with an opportunity and motivation to steer the ML field Chouhan et al., 2020; Xu et al., 2021). High‐quality works
away from brute force computing and toward more such as Weinstein (2018), which reviews the use of CV in
nuanced and efficient approaches. animal ecology, are directly applicable to plant pathology
The cultivation of cocoa represents a prime example of a owing to the flexibility of the techniques discussed here.
sector that could benefit greatly from non‐intrusive and What is missing from these works is a critical review and
highly optimized CV disease detection and will be used as an discussion of the latest and/or less conventional techniques
example throughout this review. The International Cocoa in CV and a discussion of data acquisition and validation.
Organization estimates that up to 38% of the global cocoa Each of these reviews were published prior to or near the
crop is lost to disease annually, with over 1.4 million tonnes release of Detection Transformer (DETR; Carion et al., 2020),
of cocoa lost to just three diseases in 2016 (Maddison Vision Transformer (VIT; Dosovitskiy et al., 2021), and
et al., 1995; Marelli et al., 2019). Additionally, international ConvNeXT (Liu et al., 2022), so naturally these recent
disease spread has been devastating to this industry in the landmark methods are not discussed. However, despite all
past and could be again in the future (Phillips‐Mora and being published after the release of Faster Region‐Based
Wilkinson, 2007; Meinhardt et al., 2008). Following the loss Convolutional Neural Network (Faster R‐CNN; Ren
of a cocoa crop to witches’ broom disease, a plot of land will et al., 2016), ResNet (He et al., 2016), and You Only Look
typically be cleared of forest, and the previous robust Once (YOLO; Redmon et al., 2016), only Xu et al. (2021)
agroforestry system will be replaced with a monoculture mention any of these popular and high‐performing
(Rice and Greenberg, 2000; Meinhardt et al., 2008). This architectures. Those being YOLO and region‐based fully
disease is therefore not only capable of devastating the convolutional networks, an early predecessor to Faster
livelihoods of whole communities of cocoa farmers, elim- R‐CNN.
inating 50–90% of their crop (Meinhardt et al., 2008), but it is A recent survey (Guo et al., 2022) goes into great
also destructive to local biodiversity and has significant detail on the various facets of different attention
negative impact on the carbon capture potential of the land mechanisms, which are integral to transformer architec-
(Kuok Ho and Yap, 2020). Such loss of Amazonian Forest is tures. While this work presents the bleeding edge of CV
a driver of climate change, causing positive feedback and technology, it does not present the holistic, applied, and
exacerbating this global crisis (Malhi et al., 2008). data‐centric perspective provided here. Another paper
A review from 1986 on the use of systemic fungicides to aimed to develop CV models for the classification of
tackle oomycetes, such as Phytophthora spp., highlights cocoa beans, comparing the use of ResNet18, ResNet50,
concern about damage to the environment and human and support vector machines (SVMs; Lopes et al., 2022),
health by pesticides such as methyl bromide, which are still while another recent review gives a high‐level discussion
in use today (Cohen and Coffey, 1986). These concerns, and of a number of CV studies in agriculture, covering the
those of the pesticide resistance (Department of Health, topics of hyperspectral imaging, the use of unpiloted
Victoria, 2018), are still present 37 years later. The use of aerial vehicles, and architectures as recent as ResNeXt
CV and ML for targeted application and calibration of (Xie et al., 2017; Tian et al., 2020). However, while the
pesticide dose are beginning to have massive beneficial latter of these two papers presents a broad view of CV
effects in this area across the agriculture industry. for plant pathology, providing strong links to many
It is estimated that from 2016 to 2026, the number of plant taxa, no mention is made by either Lopes et al.
smartphone users will have doubled from approximately 3.7 (2022) or Tian et al. (2020) of architectures or
billion people to 7.5 billion (Statista, 2022). Therefore, the techniques released after 2017. As such, the fusion of
necessary hardware to run CV models is largely in place, industry standard and bleeding edge methods in data
and we need now only develop and deploy the CV models acquisition, verification, and analysis presented here
to have great potential for impact with little monetary input. make the present review unique among those listed
Here we discuss how best to achieve this. above.
This review is composed of three main sections. This review provides the reader with an in‐depth
Section 1, “Methods in computer vision,” critically reviews understanding of CV for plant pathology and supports the
a wide variety of relevant techniques in ML and CV model previous works. In doing so, we focus on how best to adapt
development and testing, and section 2, “Data acquisition current methods to provide practical solutions for farmers,
and model testing,” discusses techniques for data gathering, agronomists, and botanists without access to high‐
data labelling, and model testing. While section 1 focuses on performance computational resources. While cocoa agricul-
ML theory and comparison of model architectures, section 2 ture is used as a consistent example throughout, all methods
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 3 of 21
discussed here are applicable across plant pathology and Although the requirement for vast training data sets
agriculture, as well as related fields such as plant and animal may preclude the use of transformers for many plant
ecology and forestry. pathology projects, there is a middle ground between the
popular ResNet architectures and transformer models.
Taking inspiration from transformer designs, the highly
METHO DS IN COM P U TE R V IS I O N competitive ResNet architectures were updated to produce a
pure CNN that competes well with transformers in many
Background tasks and is reported to outperform the original ResNets by
about 3% accuracy on ImageNet (Deng et al., 2009). This
Ever since AlexNet was presented at the Conference on family of four models is named ConvNeXt and includes
Neural Information Processing Systems in 2012, the field of models of varying complexity from ConvNeXt Tiny to
CV has been dominated by CNNs (Krizhevsky et al., 2017). ConvNeXt Large. Additionally, ConvNeXt uses layer
While subsequent updates to CNN architectures have normalisation in place of batch normalisation. This
provided dramatic improvements over AlexNet (Liu modification could have important benefits for plant
et al., 2022), it is important to recognise that CNNs are pathology projects, as discussed in the “Image, batch, and
not the only tools at our disposal. Previous work on cocoa layer normalisation” section; however, as the ConvNeXt
disease has assessed the performance of SVMs, random architectures are relatively large (ConvNeXt Tiny: 29
forest regression, and artificial neural networks to identify million parameters, ResNet18: 12 million parameters,
common diseases in cocoa from standard colour images, ResNet50: 26 million parameters), these models too require
hereafter referred to as RGB (red, green, blue) images large and/or complex training data sets to avoid overfitting
(Rodriguez et al., 2021). Here it was shown that artificial and more powerful hardware to run at inference than the
neural networks are capable of identifying late‐stage disease smaller ResNets.
in RGB images of cocoa, but that training data set size is a
limiting factor. Another study applied an SVM to perform
pixel‐wise identification of black pod rot in cocoa (Tan Object detection and semantic segmentation
et al., 2018). The resulting algorithm showed an impressive
ability to detect human‐visible disease symptoms and, given Bounding box object detection and semantic segmentation
the high computational efficiency of SVMs, it was able to are processes by which objects of interest in an image are
run on low‐powered hardware. Additionally, this model was both classified and located in the image. In these tasks,
trained on only 50 images, which is an extremely small either a box (bounding box object detection) or a polygon
training set in CV. However, no mention was made of the or “mask” (semantic segmentation) is drawn around the
ability of these models to detect early disease development object of interest. For an example of semantic segmentation,
or non‐human‐visible symptoms, which will be a central see Case Study 1 (Box 1).
focus of this review. Semantic segmentation and object detection could help
in the accurate manual labelling of disease states in images.
In simple image classification with a CNN, a model must
Vision transformers learn what features, across the whole image, can be used as
true markers of disease. However, annotation of training
In the early 2010s, transformers become the default for images with bounding boxes or segmentation masks may be
natural language processing (Liu et al., 2022), and they are used to focus the attention of the model, thus making
now rapidly gaining popularity in vision‐based tasks. Pure training more efficient and reducing overfitting. This
transformer‐based multilayer perceptrons, such as ViT beneficial effect might be more pronounced with semantic
(Dosovitskiy et al., 2021), do away with the convolutional segmentation than bounding boxes because the edges of a
layers of a CNN. Instead, they subdivide and tokenise an bounding box may extend beyond the edges of the leaf, pod,
image, giving each token a positional embedding, and or tree in question and thus mislabel parts of neighbouring
then pass all of these data to the multi‐head attention healthy plants. However, when comparing the ability of
mechanism of the network. The main drawbacks of such Faster R‐CNN and Mask R‐CNN to detect human‐visible
transformer‐based models are that they require training signs of insect damage in sweet peppers (Capsicum annuum
data sets on the order of millions of images, and they lack L.), Faster R‐CNN was shown to have superior accuracy and
the inductive biases of CNNs, such as translational mean average precision (mAP) (Lin et al., 2020). Here, mAP
equivariance (Dosovitskiy et al., 2021). In addition, the is defined as the mean precision over all classes of the mean
global structure of objects in an image must be learned from per‐class precision, with a given intersection over union.
scratch, whereas this is maintained throughout a CNN. These disparities in performance were contingent on which
However, when pretrained on a large data set and then backbone model architecture (Inception v2, ResNet50, or
fine‐tuned on a more modest data set of tens of thousands ResNet101) (Szegedy et al., 2016) was used. When the more
of images, vision transformers can outcompete CNNs complex ResNet101 was used, Faster R‐CNN and Mask
(Dosovitskiy et al., 2021). R‐CNN performed more similarly, although in this task
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY
F I G U R E 1 Application of semantic segmentation with Mask R‐CNN to highlight whole diseased cocoa trees. Example images of trees infected with (A)
black pod rot, (B) frosty pod rot, and (C) witches’ broom disease. The percentage scores for each show the degree of confidence in the model's diagnosis:
black pod root = 72%, frosty pod root = 80%, witches’ broom disease = 93%.
F1 score and the ensemble model produced the best mAP and Modelling with generative deep neural networks (DNNs)
recall, these improvements were slight. Additionally, F1 is can aid in gaining an intuitive understanding of the
calculated directly from precision and recall, so it seems physical laws that led to the creation of the data to be
counterintuitive that the U‐Net approach could have the modelled. An example of this is the use of artistic style
highest F1, yet the lowest precision and recall. The most transfer with generative adversarial networks (Li and
noteworthy result here is the consistently superior precision of Wand, 2016), where specific semantic features in an
Mask R‐CNN in this comparison and in another against image can be isolated and utilised. Another popular deep
YOLO (Bharati and Pramanik, 2020; Horzyk and Ergün, 2020). generative model architecture is the variational auto-
Additionally, in a study comparing the use of U‐Net and Mask encoder (VAE), which we will focus on here for the task
R‐CNN to segment images of pomegranate (Punica granatum of image data set filtering.
L.) trees, Mask R‐CNN outperformed the U‐Net in both When working with autonomously collected data, for
precision and recall by wide margins (Zhao et al., 2018). example from camera traps or web‐scraping bots, the
An alternative approach applied an SVM to perform acquisition of vast quantities of data is often the easy part
pixel‐wise classification to detect black pod rot in cocoa, of creating a good training data set. Camera traps tend to
with a human expert labelling the diseased pixels in produce a considerable amount of uninformative data and
training images (Tan et al., 2018). Like semantic the data from naive web‐scraping bots can be badly
segmentation, this technique achieves the effect of contaminated with misclassified and irrelevant images; for
providing the model with additional information on the example, a search for the keyword “Acer” will return many
location of disease in an image, relative to a simple CNN. more images of laptops than it will Japanese maple trees, and
However, it imposes arbitrary physical boundaries around a search for “black pod rot” will include many images of
disease symptoms such as lesions and cankers, and the frosty pod rot, cherelle wilt, and insect damage. Therefore,
algorithm is unable to define for itself any symptoms that some level of human supervision is vital in curating training
are not or cannot be identified with human vision. By data, and the importance of consulting farmers and
using semantic segmentation with a CNN backbone, like researchers in data collection and labelling cannot be
in Mask R‐CNN or DETR, to segment whole trees, these overstated. However, manual labelling of a full data set can
effects could be avoided, i.e., the model would be able to be extremely costly, and a potential method to offset some of
detect non‐human‐visible symptoms via feature learning this cost is said to be the use of VAEs for outlier detection.
and model the effects of hyphae propagating through the A VAE is composed of two neural networks that are
plant or systemic changes to a plant's phenotype away trained in parallel. The encoder network projects the image
from the site of infection. data to a smaller latent vector space, thus compressing it,
and the decoder network predicts the original image from
this compressed data as best it can.
Variational autoencoders for outlier detection Generative models tend to generalise to the real world
much better than discriminative models, which aim to
In addition to discriminative modelling, ML provides uncover correlative relationships between data and class
several powerful tools for generative modelling. labels (Kingma and Welling, 2019). However, deep
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY
generative models are typically considered excessive for However, these authors go on to note that, in their use case,
classification problems, as they often have higher bias NVAE transformed images of size (3,32,32) to a latent space
(Banerjee, 2007) and are computationally expensive. of size (16,16,128) without any subsequent downscaling. It
Previous works have successfully used VAEs for text is not surprising then that this architecture is able to
classification (Xu et al., 2017; Xu and Tan, 2020), data reconstruct an image so well after just one training epoch,
clustering (Dilokthanakul et al., 2017; Lim et al., 2020), with no pre‐trained weights, as the dimensionality of the
anomaly detection (An and Cho, 2015), recommender data is expanded rather than compressed. Likewise, NVAE
systems (Li and She, 2017), and dimensionality reduction is not appropriate for identifying outliers by the distribution
(Lin et al., 2020). There are also a limited number of of reconstruction errors as it can reconstruct any image
published papers on the use of VAEs for anomaly detection almost perfectly. For example, when we trained NVAE on a
with colour images (Fan et al., 2020). data set of 54,124 plant images, it was able to reconstruct
Here, we consider two methods for which a VAE might any image in the ImageNet data set with similar binary
be used to detect outlying data in collections of large colour cross‐entropy loss to that of plant images. As an alternative
images. To do so, we will use the example of detecting non‐ to NVAE, we attempted to use a custom convolutional VAE
plant images in a web‐scraped collection of plant images for with a ResNet152 (He et al., 2016) backbone to apply the
use in building a disease classifier. two methods of outlier detection described above. However,
we were unable to get this architecture to function well
Method 1. Distribution of reconstruction loss enough to sufficiently compress the data and reconstruct
Having trained a VAE on only plant images, use this images with high fidelity.
model to compress and decompress all images in the The paucity of papers published on the subject of
contaminated data set and record the reconstruction loss for outlier detection in colour images with VAEs seems to be
each image. Plot the distribution of the loss values and due to the inherent difficulty of this task. The high
record the most extreme high values as outliers. The dimensions of such data and the large storage and GPU
assumption here is that the model should “fail” to memory requirements that training these models on such
reconstruct non‐plant images well, as it should be naive to data necessitates (Sun et al., 2018) has largely been
any images that do not show plants. resolved, although for many projects GPU memory
availability will still preclude this technique. Thus far,
Method 2. Dimension reduction and clustering the inability of the VAE architecture to learn a compres-
Using the encoder network of a VAE that has been sion algorithm for large colour images suggests a hard
trained on the ImageNet data set, compress the images in physical limitation that might not be overcome. More-
the contaminated data set and record the values of the latent over, while Maalø et al. (2019) contest this argument,
space for each image. Reduce the dimensions of the latent Nalisnick et al. (2019) argue comprehensively that
space further with principal component analysis, t‐ generative models are not suitable for outlier detection
distributed stochastic neighbour embedding (t‐SNE), and/ by the reconstruction loss method described above, as
or uniform manifold approximation and projection these models tend to learn low‐level statistics about data
(UMAP). Plot these reduced data. Outliers/contaminant rather than high‐level semantics. As such, they are often
images may then separate from the clean data. unable to differentiate between images that, to the human
Nouveau VAE (NVAE) is the product of an effort to eye, are obviously different. We describe a successful
carefully craft the encoding network architecture of a VAE, alternative method of outlier detection in Box 2.
which appears to produce excellent results (Vahdat and
Kautz, 2021). After training for just one epoch, this
architecture is able to project large colour images onto a Evolutionary algorithms
latent space and reconstruct them almost perfectly.
However, if the aim of using NVAE is to compress image The field of CV is currently dominated by handcrafted
data, this architecture is not appropriate. This is because, DNNs with fixed topologies. However, the seldom‐used
using the recommended settings for the CelebA 64 data set techniques of evolved neural networks have real potential
(Liu et al., 2015), the latent space produced for an image in the field of plant pathology. Computational efficiency
with dimensions (3,224,224) is (100,224,224), i.e., more than at inference and improved ability to generalise are of
33 times larger than the original image. Following the paramount importance to models developed for plant
authors’ provided instructions to constrain the latent space pathology in the field. This is because such models must
to be as small as possible without excessively modifying the be able to cope with complex and highly variable
code, the latent space for this same size of image remains symptoms and backgrounds, and often must run on
the same (100,224,224). This observation is corroborated in low‐powered hardware. Growing neural networks take far
another study where the authors explain how NVAE first longer to train/grow than those with fixed topologies, but
expands the data dimensions to a large number of this is of minor concern given the efficient parallelisation
latent spaces before pruning those spaces based on and the vast computational resources now available
Kullback–Leibler (KL) divergence (Asperti et al., 2021). for training. The hardware available to farmers in
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 7 of 21
offer a very low error rate on this simpler problem and has a modest resources for training. For most object‐detection use
very low number of parameters when compared with other cases in plant pathology, Faster R‐CNN will be the optimal
modern architectures (Tables 1 and 2). Overall, it seems that choice. Mask R‐CNN extends Faster R‐CNN by adding the
evolved neural networks are not yet ready to tackle the more ability to predict a mask in a bounding box, enhancing its
difficult problems in plant pathology, and so more work is utility for semantic segmentation tasks. YOLO is most
required in this area. suitable for real‐time object detection but offers lower
precision than Faster R‐CNN. It is not suitable for use in
plant pathology unless inference time is of primary concern.
Architecture comparison and DETR and Deformable DETR present a novel approach to
recommendations object detection and offer competitive results (Zhu et al.,
2021). However, implementing these architectures can be
The field of CV has produced a numerous and diverse set of difficult and they require substantial GPU VRAM for
architectures, each with unique strengths and weaknesses. training.
Here, we will compare these architectures, focusing on their The choice of CV model architecture for a given project
application in image classification, object detection, and depends on a variety of factors, including data set size,
semantic segmentation. Table 2 gives a detailed breakdown signal‐to‐noise ratio, computational resources, mode of
of the pros and cons of each of these architectures, as well deployment, and accuracy requirements. However, at
the number of trainable parameters, which acts as a proxy present, for most use cases in plant pathology, ResNet18,
for model complexity, and the number of giga floating point ConvNeXT Tiny, or Faster R‐CNN will yield the best results
operations (GFLOPS), which gives a sense of computation while minimising computational cost, risk of overfitting,
cost of running inference with these architectures. and the financial cost of training.
ResNet introduced the concept of skip connections, enabling In a comparison of EVAL‐COVID (Gong et al., 2021) with
the training of much deeper models. Despite its age, ResNet other strong competitors like EVOCNN to detect COVID‐
remains a strong competitor, and ResNet18 is probably still the 19 with evolved CNNs, it was shown that the overuse of
best choice for most small projects with fewer training batch normalisation (BN) can be deleterious to the training
examples. EfficientNetV2 (Tan and Le, 2021) is more of DNNs for disease diagnosis. While BN often improves
computationally demanding than equivalent ResNet and the training time of CNNs and can negate the need for small
ConvNeXT variants, and while it tends to yield high accuracy learning rates and dropout (Ioffe and Szegedy, 2015), its
on large data sets (Dosovitskiy et al., 2021; Liu et al., 2022), we negative effect on the diagnosis of disease was also observed
found that it is prone to overfitting, making it a less favourable in Case Study 3 (Box 3).
choice. The key innovation of EfficientNet was to allow the Several state‐of‐the‐art generative models now omit BN
depth, width, and resolution of the model to be scaled by entirely, while others replace it with weight normalisation or
adjusting a single coefficient (Tan and Le, 2020). However, in focus on fine‐tuning the momentum hyperparameter of BN
practice this requires editing the source code, thus rendering layers (Vahdat and Kautz, 2021). As with simply removing
such adjustments less than convenient. ConvNeXT is an the BN layers of a ResNet, reported above, replacing BN in
updated version of ResNet, incorporating several modern ResNet with the alternative layer normalisation (LN) also
features. Unlike EfficientNet, ConvNeXT is easy to scale, results in worse performance (Wu and He, 2018). However,
making it a promising choice for medium‐ to large‐scale when the developers of ConvNeXt used LN as opposed to
applications, for which it has been shown to give superior BN in their architecture, they observed that the model had
performance to ResNet and VIT (Liu et al., 2022). As the first no difficulty in training with this substitution (Liu
transformer to perform favourably against CNNs for image et al., 2022). The BN momentum hyperparameter is a fixed
classification, VIT represents a significant milestone. However, weight applied to the running mean and variance calcula-
image classification may not be the optimal use case for tions that are tracked during training and used during the
transformer architectures, and at present ConvNeXT outper- application of BN at inference time. Thus, adjusting the BN
forms VIT while requiring less data for training and being less momentum will not affect the effect training (Vahdat and
computationally expensive (Dosovitskiy et al., 2021). Kautz, 2021). However, BN can cause the output of a layer
to be slightly shifted during evaluation. A proposed solution
to this is to optimise the momentum hyperparameter for
Object detection and semantic segmentation each data set (Vahdat and Kautz, 2021).
architectures In this section, we have listed a host of reasons why the
unnecessary normalisation of data is to be avoided. While
Although more complex than YOLO, and arguably DETR, BN will shorten the training time for a CNN, it changes
Faster R‐CNN delivers excellent results and requires only the input data in unpredictable ways, thus worsening
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 9 of 21
T A B L E 2 Pros and cons of popular model architectures for image classification, object detection, and semantic segmentation. Ranges of values
represent the smallest and largest off‐the‐shelf versions available.
Architecture No. of parameters GFLOPS Pros and cons
a
Image classification ResNet (2015) 12–60 M 1.8–11.5 Pros
ResNet18 is the smallest and most computationally efficient
model here
ResNet18 is ideal for modestly sized data sets
ResNet152 performs comparably with transformers such
as VIT
Widely used and tested
Cons
Uses batch normalisation, which can introduce instability and
inconsistent results
Cons
Scaling requires editing the source code
Evaluation using Grad‐CAM (Selvaraju et al., 2017) showed
much overfitting, despite high test scores
Cons
The smallest off‐the‐shelf configurations are too large for many
projects and may overfit
Potential compatibility issues with conversion to ONNX format
Cons
Requires huge data sets to outperform CNNs
Computationally expensive to train and run at inference
Cons
Performs poorly on small objects
Gives the least accurate results of the three architectures
listed here
(Continues)
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY
TABLE 2 (Continued)
Cons
Very computationally expensive to train
Slow to converge in training
Requires huge amount of training data
Can be challenging to implement
Requires a large batch size to achieve stable training
Note: CNN = convolutional neural network; DETR = Detection Transformer; Faster R‐CNN = Faster Region‐Based Convolutional Neural Network; GELU = Gaussian error linear
unit; GFLOPS = giga floating point operations; Mask R‐CNN = Mask Region‐Based Convolutional Neural Network; ResNet = Residual Neural Network; ViT = Visual
Transformer; YOLO = You Only Look Once.
a
Values for the number of trainable parameters (displayed in millions [M]) and GFLOPS were obtained from the PyTorch documentation (PyTorch, 2023).
b
Values for the number of trainable parameters (M) and GFLOPS for YOLO and DETR were calculated using an image 224 × 224 pixels for this comparison.
classification results. However, at present, the best off‐the‐ accuracy. This effect is likely due to the plain white
shelf CNN that is small enough to run on an older‐model background of the lab images causing the model to
smartphone is ResNet18. So, until a more suitable architec- generalise poorly to real‐world applications. This ex-
ture becomes available, BN is unavoidable. We also show emplifies the importance of curating a realistic, high‐
here and in Case Study 4 (Box 4) that optimisation of the quality training data set. By naively training and releasing
BN momentum hyperparameter in ResNet18 leads to a models that are trained on publicly available data sets, we
slight improvement in the results of our cocoa disease risk exacerbating the problems of disease mis-
detection model, that image normalisation should not be classification. At low frequencies, the effect of misla-
included in the training pipeline of a model that aims to belled, misleading, or uninformative data will have
make predictions from subtle colour features, and that limited effect on the performance of a neural network.
excessive image input size is deleterious to classification This feature of neural networks is largely an artifact of
accuracy. batch gradient descent and the learning rate (Motamedi
et al., 2021), which act to greatly buffer the effect of
infrequent misclassifications in the training data. At
DATA ACQUISITION A ND MODEL higher frequencies, these sources of error can have more
TESTING serious consequences. The most obvious solution to this
problem is to carefully curate, label, and annotate the
In this section, we review various interdisciplinary methods training data. However, errors resulting from mis-
available for gathering a training data set and developing a classification can be challenging to eradicate. For
suitable model. While the previous section concerned the example, frosty pod rot, black pod rot, and witches’
theory of ML in CV, this section will focus on practicalities broom disease in cocoa can all present with black or
with respect to low‐cost solutions. brown lesions on the pod, and both frosty pod root and
black pod root can both coat a pod in white mycelium.
This means that without sufficient training in plant
Obtaining the required training data set pathology or access to diagnostic tests, one could easily
mislabel these diseases. This problem can be solved by
Training an image classifier to a high accuracy in a two means, which should be used in tandem: (1) by
controlled laboratory environment is often a trivial task. paying careful attention to detail and applying detailed
However, such a model may perform poorly when knowledge of the pathogen in question, and (2) using
presented with the challenges of the real world (Singh tools and techniques from molecular biology and
et al., 2020). For example, after training a leaf‐disease spectroscopy to better inform model development and
classifier on images taken in the field, the model subsequent disease detection. Such techniques/tools
performed with around 68% accuracy when tested include DNA sequencing, real‐time quantitative PCR
against images taken in the lab (Ferentinos, 2018). (qPCR), loop‐mediated isothermal amplification
However, when trained in the lab and tested in the field, (LAMP), MultispeQ (PhotosynQ, East Lansing, Michi-
the same model architecture performed with about 33% gan, USA), and hyperspectral imaging (HSI).
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 11 of 21
BOX 3 Case Study 3: Disease detection and BOX 4 Case Study 4: Optimisation of BN
normalisation. momentum and image size for cocoa disease
detection.
TABLE 3 Results of an ablation study assessing the effects of image normalisations and batch normalisation on a model's ability to detect plant disease.
Image norm. Batch norm. Layer norm. Train time (min) Loss (%) Accuracy (%) Recall (%) Precision (%) F1 (%)
ConvNeXt Tiny
ResNet18
F I G U R E 2 Image normalisation in the analysis of cocoa disease using computer vision and machine learning. (A) Original and (B) normalised images
of cocoa pods showing various stages of frosty pod rot, black pod rot, and witches’ broom disease development. Note the effect of normalisation on one's
ability to see disease symptoms. The normalisation of pixel values was carried out with the following means and variance values: mean: (0.485, 0.456, 0.406),
variance: (0.229, 0.224, 0.225).
with ITS is now ubiquitous in the molecular study of fungal detection. If detection is the only goal, colour‐ or turbidity‐
ecology and phylogeny, while previous techniques relied on based methods can be used to detect DNA presence by
the morphology of fruiting bodies for identification (Horton visual inspection. A drawback of this method is that any
and Bruns, 2001). pre‐existing PCR primers cannot be used. This is because,
qPCR is used to detect asymptomatic disease across the while PCR primers are designed to amplify a specific region
agricultural industry (Luchi et al., 2020). Traditionally, PCR of complementary DNA, LAMP primers bind to multiple
was unsuitable for portable operations or use in the field regions of the target DNA in a way that allows for the
(Ray et al., 2017). However, rapid real‐time PCR in the field simultaneous amplification of multiple regions of the DNA.
is now possible (Schaad and Frederick, 2002). Real‐time While universal PCR primers for the ITS region exist, it
PCR can also be used to quantify the relative levels of a may be necessary to design LAMP primers or species‐
pathogen in plants (Horevaj et al., 2011). Information from specific PCR primers for ITS or other regions. For a detailed
such analyses could be extremely informative when fine‐ discussion of the use of ITS amplification in fungal ecology
tuning and assessing the performance of the models and the potential pitfalls of specific ITS primer design, see
discussed here. Horton and Bruns (2001).
LAMP can be used in place of qPCR and has four key If novel primers are to be designed, the region of
benefits: (1) it is considerably cheaper (£211or $256 USD for interest must first be sequenced, and if we aim to identify
100 samples) because a thermal cycler is not required; (2) it a currently unknown pathogen with BLAST, all of the
is fast; (3) the reagents do not need to be refrigerated; and DNA in a sample must be sequenced. Sequencing with the
(4) like real‐time PCR, there is potential for it to be used in Oxford Nanopore Technology (Oxford, United Kingdom)
the field. Like qPCR, LAMP can be used to quantify the MinION platform can be an ideal tool for this purpose,
relative amount of DNA present, as well as simply for offering multiple features: (1) The Oxford Nanopore
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 13 of 21
T A B L E 4 Specifications and use cases for the hyperspectral cameras used in the following studies: Okamoto et al. (2007), Feng et al. (2018), Gutiérrez
et al. (2019), Pan et al. (2019), Li et al. (2020), and Nguyen et al. (2020).
Make/modela Task Spectral range (nm) Spectral bands Spectral resolution (nm)
Resonon Pika II Vis‐NIR Mango tree yield estimation 390–890 244 2
problems include the large size of HSI hypercubes, high augmentations or inductive biases may be to solve the
dimensionality, high intra‐class variability, and high correlation former list of issues (Schölkopf et al., 2021), the latter issues
between spectral bands. Many approaches have been taken to will only be solved by truly understanding how our models
analyse these data and, for a long time, SVMs were the most are making predictions.
widely used (Yue et al., 2015). Today, DNNs are commonly Although DNNs are still considered black box
used to analyse these data as they are particularly well suited to optimisers, much work has been done to understand their
the task of classification with HSI data. DNNs are able to isolate various facets and potential foibles. For example, each
hidden and complex data structures, can utilise a great variety of dense layer of a CNN has been shown to have distinct role
data types, are flexible in their architectures and the complexity in feature‐level extraction and generalisability (Yosinski
of the mathematical functions they can apply, and are ideally et al., 2014), and the output of convolution layers have
suited to distributed computing (Paoletti et al., 2019). As such, been visualised to show which physical features in an
with the addition of dimension‐reduction techniques such as image were more exaggerated (Zeiler and Fergus, 2014).
principal component analyses (Yue et al., 2015), the analysis of In a similar study, a host of predefined layer‐wise and
HSI data with DNNs, although more computationally demand- neuron‐wise visualisation techniques were applied to a
ing, becomes little more complex than such analyses of RGB CNN that had been trained on images of plant disease
image data. (Toda and Okura, 2019). This work showed that the CNN
While the field of CV is advancing at a rapid pace, so too in question was indeed using visible symptoms of disease
are the fields of molecular biology and spectroscopy. The that were similar to those used by human experts. Others
use of tools and knowledge from these fields will allow have sought to learn how best to actively deceive or
projects of various budgets to go beyond the simple manipulate a DNN into misclassification. Working within
application of CNNs to RGB images and, in doing so, the remit of cybersecurity, it was shown that image
model disease in greater detail with tangible biological classifiers based on SVMs and DNNs could easily be
explications of model behaviour. deceived with a simple evasion algorithm (Biggio
et al., 2013). This shows how brittle these classifiers can
be and highlights the importance of adopting techniques
Model testing that rely more heavily on causal inference, such as semi‐
supervised learning (Peters et al., 2017) or semantic
The black box of DNNs segmentation. It also highlights the importance of
rigorous and conciliatory interrogation of models prior
It is well known how poorly current CV models deal with to deployment. At present, our methods of model
unexpected edge cases and shifts in test data distribution evaluation are widely considered insufficient, and much
(Schölkopf et al., 2021). However, in applying CV to plant more work is needed in this area.
pathology and agriculture, we encounter more cases than
most ML practitioners where the test data does not align
well with the training data. These problems arise routinely Inspecting informative features
in CV from the effects of camera blur, image quality, or
shifting camera angle. However, in plant pathology we must A key benefit of the use of CNNs is feature learning. This
also contend with the perturbations of weather, climate, is the process by which a model will define for itself
plant growth stage, crop variety, a plant's developmental which features of a data set it considers informative
response to growing conditions, and so on. While it remains (Voulodimos et al., 2018). In other CV algorithms, an
contentious how robust of a fix techniques such as data engineer must handcraft descriptive features of a subject
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 15 of 21
manually, using their expertise and/or diagnostic tools to using a reference image that does not contain the feature of
guide them. In this latter case, pre‐processed data are interest. Applying these methods to misclassified images can
used rather than raw data, as in a CNN. In the highlight why a model is performing suboptimally (Toda
convolution layers of a CNN however, kernels and and Okura, 2019), as results produced with these methods
attention weights are applied to raw or augmented image have been shown to be highly correlated with assessments of
data that emphasise informative physical features, and plant disease made by human experts (Ghosal et al., 2018).
apply inductive biases and self‐attention before these
data are passed to the dense layer(s) of the network
(O'Mahony et al., 2020). We might assume that these A RO A D M A P T O C O M M E R C I A L
physical features would include those that humans IMPLEMENTATION
consider to be the obvious visible markers for plant
disease, such as the presence of lesions on a leaf. Once you have developed, trained, and evaluated your
However, it is likely that these networks will also identify model, it is time to begin the process of implementation.
markers that humans do not notice or cannot perceive, However, it is best to have considered and planned this step
and may ignore some features that plant pathologists well ahead of time. There are several decisions made during
have long considered important. This provides us with development that may depend on the intended mode of
the opportunity to learn more about how to identify implementation. For example, if the model is to be run on
disease early with human vision, CV, and molecular an edge device or smartphone, computation cost must be
biology. Time‐series qPCR, transcriptome, or metabo- kept to a minimum. Likewise, if the model is to be made
lome data can be used to identify the biological markers available via a rented server, reducing computational cost
used by CNNs at the earliest moments of detection. This will reduce financial cost. Prior to training, choosing to use
would allow for the validation of the image features used architectures such as ResNet18 and MobileNetV3 (Howard
by the model. Such a biological explanation of the et al., 2019) will help to keep computational cost down and,
model's informative features would tell us whether the after training, methods such as pruning and quantisation
model were making correct inferences for what we may reduce this cost further. While Google Colab (Google,
consider to be correct reasons, or whether it were correct Mountain View, California, USA; https://colab.research.
for spurious reasons, which would suggest a poor ability google.com/) offers free limited access to GPUs for model
to generalise stemming from naive inductive reasoning. training, the rental cost of a 16‐GB Nvidia V100 GPU
Such work may also highlight new ways to identify (Nvidia, Santa Clara, California, USA), which would be the
disease with or without ML or new ways of combating minimum needed to train a transformer model or large
disease spread through phytosanitation, agrochemistry, CNN, is $2.48 USD per hour. As such, developing and
or plant breeding. training such large models for days, or even weeks, can soon
In recent years, the combination of CNNs and become expensive.
transcriptomics in medical research has seen a surge in ONNX Runtime (Microsoft Corporation, Redmond,
popularity. Such studies involve spatial transcriptomics Washington, USA; https://onnxruntime.ai/) offers a huge
(Chelebian et al., 2021; Xu and McCord, 2021), the array of tools to help accelerate, quantise, and deploy
identification of non‐small‐cell lung cancer subtypes (Yu trained DNNs. Such models can be incorporated into
et al., 2020), and the elucidation of the various functions of Android or iOS apps using the phone's built‐in camera,
drugs (Meyer et al., 2019). In plant science, CNNs have been and they can be deployed via the web, on edge devices
applied alongside transcriptomics in the investigation of such as a Raspberry Pi, or in embedded systems for drone
gene regulation in Arabidopsis thaliana (L.) Heynh. mapping or smart irrigation. However, the operator
(MacLean, 2019). However, the investigation of the black schemas supported by ONNX Runtime must be con-
box nature of CNNs by means of omics appears to be sidered here. For example, ConvNeXT, which uses
completely absent from the literature. Gaussian error linear units (GELUs) and stochastic
Attention maps produced by software like Grad‐CAM depth, may cause problems as these operators are not
(Selvaraju et al., 2017; Wang et al., 2019) are another way to yet supported. TensorFlow (Abadi et al., 2015) also offers
inspect informative features of image data. Grad‐CAM a pipeline for model deployment, and the PyTorch toolkit
produces an explanation for the decision that a model for techniques such as quantisation aware training and
makes about a given image by visually highlighting the model compression is maturing but presented difficulties
informative features of that image. Grad‐CAM is described when we attempted to use it. By contrast, the ONNX
as “gradient‐based” as it uses the gradient data that is fed Runtime pipeline is extremely easy to use and supports
into the last convolution layer of a CNN. This allows us to all popular model formats, including PyTorch, Tensor-
make assessments before the spatial relationships in the data Flow, and SciKit Learn (Pedregosa et al., 2011). While the
are lost in the fully connected layers (Selvaraju et al., 2017). latest methods of pruning are reported to achieve a 30%
Alternative “reference‐based” systems, such as DeepLIFT, reduction in the size of ResNet18 with only a 2% loss in
rely on back‐propagation (Shrikumar et al., 2017) or accuracy on ImageNet (Solodskikh et al., 2023), this
forward propagation (explanation map) (Ghosal et al., 2018), remains an active area of research, producing inconstant
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY
results. There is no guarantee that pruning will lessen detection model resulting in an outbreak of disease could
computational cost. Techniques such as training‐aware have very serious consequences. It is therefore vital that we
pruning show promise but require further research. rigorously test the models we develop to ensure that they are
For the implementation of object detection or segmen- not prone to misclassification born of overfitting and naive
tation models, we recommend the Detectron2 library from generalisations. While metrics such as accuracy, F1, area
Facebook (Wu et al., 2019). This library incorporates Faster under the receiver operating curve (AUC), recall, and
R‐CNN, Mask R‐CNN, and some new transformer models precision are valuable, DNNs are often capable of learning
such as ViTDet, and offers a host of tutorials on the whole to optimise these summary statistics indirectly, rather than
process from training to implementation. learning to produce reliable predictions. Tools such as
confusion matrices, cross validation, and explanation maps
go much further in understanding the behaviour of CV
CO NC LU SIONS models. However, it is important that we invest in the
development of new and tailored means of understanding
We describe here all of the tools necessary to develop highly these models, such as the application of omics, as discussed
optimised and robust ML models that use minimal in the section “Inspecting informative features.”
computational power and provide real benefit to sectors If we apply our wealth of knowledge and proven
that have more modest budgets. The application of these techniques from botany and agronomy to the acquisition of
tools will allow us to break from the common trend in the training data, the development of data‐processing pipelines,
ML industry, where expensive hardware is employed to and the interrogation of trained models, we can produce
develop complex and computationally expensive models to applications with game‐changing potential. We are now
the detriment of improving training data quality. only 27 years away from a predicted global population of 9.7
With the application of off‐the‐shelf architectures to billion people (United Nations, 2022). Thus, with the
stock data sets, such as the PlantVillage data set devastating effects of the climate crisis already very much
(Geetharamani and Pandian, 2019), we can easily achieve apparent, it is vital that we act now to build robust
prediction accuracy scores in the high 90% range (Thapa international infrastructure targeted at securing food
et al., 2020). However, such models have little value because supplies and eliminating extreme poverty. The techniques
they will not generalise to complex real‐world environments discussed here may enable us, as a community of growers,
due to the simplicity of the training data and a high botanists, and ML developers, to help reduce poverty,
likelihood of overfitting. improve the relationship between growers and the natural
We offer the following recommendations for the environment, and increase stability in the agriculture
development of efficient, inexpensive, and robust CV industry from the foundation up.
models for plant pathology.
Garbage in, garbage out: The thoughtless application of AUTHOR CONTRIBUTIONS
advanced models to poorly labelled, simplistic, contami- J.R.S. conceived of this review, read and summarized the
nated, or inappropriately transformed data will yield models literature, and wrote the first draft of the manuscript. K.J.D.
that have little value in the field, with slow inference times, and D.W.F. continually reviewed and edited the manuscript.
poor accuracy, and an inability to generalise. To avoid this All authors approved the final manuscript.
fate, we should: (A) where possible, consult with specialists
and utilise the invaluable tools from biology, chemistry, and ACKNOWLEDGMENTS
spectroscopy to label data; (B) use the minimum appropri- This work was made possible by funding from the Doctoral
ate image input size to improve runtime speed and help Centre for Safe, Ethical and Secure Computing at the
avoid overfitting; and (C) avoid needless data transforma- University of York.
tions such as normalisation, which can alter data in
unreliable ways. DA TA AVAILABILITY STATEME NT
The potential in training procedures: Techniques such The image data, annotations, and link to the accompanying
as semantic segmentation and semi‐supervised learning GitHub repository for Case Study 1 can be found at: https://
have potential to lessen both bias and variance in a model's osf.io/79kx3/?view_only=4a2c1dccee1a4baeb85de5002c702f10.
predictions by promoting deductive reasoning over induc- For Case Study 2, the data used to train the initial supervised
tive reasoning. Additionally, appropriately scaled CNNs and model, the .csv search terms file for the below web scraper, and
evolved neural networks offer the potential to produce the final semi‐supervised model weights can be found at:
models with optimised runtime speed and improved https://osf.io/h5gj7/?view_only=dbf9f245e21a41e185f5b73e718
generalisation ability. b4cad. The “contaminated” data used to train the semi‐
Robust and conciliatory interrogation of models: supervised model were generated using the code at: https://
While simpler modelling methods, such as SVMs, still have github.com/jrsykes/Google-Image-Scraper. The custom code
a role to play in modern CV, most of the models we employ used to train both the initial model and the final semi‐
for this purpose are exceedingly complicated and are prone supervised model can be found at: https://github.com/jrsykes/
to failing in equally complicated ways. Failure of a disease CocoaReader/blob/main/PlantNotPlant. The custom code and
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 17 of 21
accompanying Readme.md used to conduct Case Study 3 can Boratyn, G. M., C. Camacho, P. S. Cooper, G. Coulouris, A. Fong, N. Ma,
be found in the following GitHub repository: https://github. T. L. Madden, et al. 2013. BLAST: A more efficient report with
com/jrsykes/CocoaReader. The data for this study were usability improvements. Nucleic Acids Research 41: W29–W33.
https://doi.org/10.1093/nar/gkt282
scraped from the internet using the code in the following Cao, S., and R. Nevatia. 2016. Exploring deep learning based solutions in
GitHub repository: https://github.com/jrsykes/Google-Image- fine grained activity recognition in the wild. In Proceedings of the
Scraper. The location of the accompanying “.csv search terms 2016 23rd International Conference on Pattern Recognition (ICPR),
file” is described below. The custom code to run the sweep in 384–389. Cancun, Mexico. https://doi.org/10.1109/ICPR.2016.
7899664
Case Study 4 can be found in the following GitHub repository:
Carion, N., F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
https://github.com/jrsykes/CocoaReader/tree/main/CocoaNet. S. Zagoruyko. 2020. End‐to‐end object detection with transformers.
The main script is titled CocoNetsweep_min.sh and the wandb In A. Vedaldi, H. Bischof, T. Brox, J.‐M. Frahm [eds.], Computer
config file is titled CocoaNetSweepConfig_min.yml. The data Vision – ECCV 2020, Lecture Notes in Computer Science, 213–229.
used to generate these results and the full wandb report can be Springer International Publishing, Cham, Switzerland. https://doi.
found at: https://osf.io/2fw6g/?view_only=adc66ba66f83465a org/10.1007/978-3-030-58452-8_13
Chelebian, E., C. Avenel, K. Kartasalo, M. Marklund, A. Tanoglidi,
9e7b111515a60bf2. T. Mirtti, R. Colling, et al. 2021. Morphological features extracted by
AI associated with spatial transcriptomics in prostate cancer. Cancers
ORCID 13: 4837. https://doi.org/10.3390/cancers13194837
Jamie R. Sykes http://orcid.org/0000-0002-0715-8746 Chiu, M. T., X. Xu, Y. Wei, Z. Huang, A. G. Schwing, R. Brunner,
Katherine J. Denby http://orcid.org/0000-0002-7857-6814 H. Khachatrian, et al. 2020. Agriculture‐Vision: A large aerial image
database for agricultural pattern analysis. In Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition,
REFERENCES 2828–2838. Seattle, Washington, USA.
Abadi, M., A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, Chouhan, S. S., U. P. Singh, and S. Jain. 2020. Applications of computer
G. S. Corrado, et al. 2015. TensorFlow: Large‐scale machine learning vision in plant pathology: A survey. Archives of Computational
on heterogeneous distributed systems. Available at https://www. Methods in Engineering 27: 611–632. https://doi.org/10.1007/s11831-
tensorflow.org/ [accessed 29 November 2023]. 019-09324-0
Amer, M., and T. Maul. 2019. A review of modularization techniques in Cohen, Y., and M. D. Coffey. 1986. Systemic fungicides and the control of
artificial neural networks. Artificial Intelligence Review 52: 527–561. oomycetes. Annual Review of Phytopathology 24: 311–338. https://doi.
https://doi.org/10.1007/s10462-019-09706-7 org/10.1146/annurev.py.24.090186.001523
An, J., and S. Cho. 2015. Variational autoencoder based anomaly Dang, D.‐C., A. Eremeev, and P. K. Lehre. 2021. Escaping local optima with
detection using reconstruction probability. Special Lecture on IE 2. non‐elitist evolutionary algorithms. Proceedings of the AAAI
SNU Data Mining Center, Seoul National University, Seoul, Conference on Artificial Intelligence 35: 12275–12283.
Republic of Korea. Deng, J., W. Dong, R. Socher, L.‐J. Li, K. Li, and L. Fei‐Fei. 2009. ImageNet:
Asperti, A., D. Evangelista, and E. Loli Piccolomini. 2021. A survey on A large‐scale hierarchical image database. In IEEE Conference on
variational autoencoders from a green AI perspective. SN Computer Computer Vision and Pattern Recognition, 248–255. Miami, Florida,
Science 2: 301. https://doi.org/10.1007/s42979-021-00702-9 USA. https://doi.org/10.1109/CVPR.2009.5206848
Banerjee, A. 2007. An analysis of logistic models: Exponential family Department of Health, Victoria. 2018. Methyl bromide use in Victoria,
connections and online performance. In Proceedings of the 2007 Community factsheet. Website: https://www.health.vic.gov.au/
SIAM International Conference on Data Mining, 204–215. Society for publications/methyl-bromide-use-in-victoria-community-factsheet
Industrial and Applied Mathematics, Minneapolis, Minnesota, USA. [accessed 26 June 2023].
https://doi.org/10.1137/1.9781611972771.19 Dilokthanakul, N., P. A. M. Mediano, M. Garnelo, M. C. H. Lee, H. Salimbeni,
Barbedo, J. G. A. 2016. A review on the main challenges in automatic plant K. Arulkumaran, and M. Shanahan. 2017. Deep unsupervised clustering
disease identification based on visible range images. Biosystems with Gaussian mixture variational autoencoders. arXiv 1611.02648
Engineering 144: 52–60. https://doi.org/10.1016/j.biosystemseng.2016. [Preprint]. Posted 13 January 2017 [accessed 28 October 2023]. Available
01.017 from: https://doi.org/10.48550/arXiv.1611.02648
Barbedo, J. G. A., L. V. Koenigkan, and T. T. Santos. 2016. Identifying multiple Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
plant diseases using digital image processing. Biosystems Engineering 147: T. Unterthiner, M. Dehghani, et al. 2021. An image is worth 16×16
104–116. https://doi.org/10.1016/j.biosystemseng.2016.03.012 words: Transformers for image recognition at scale. arXiv 2010.11929
Bharati, P., and A. Pramanik. 2020. Deep learning techniques—R‐CNN to [Preprint]. Posted 3 June 2021 [accessed 28 October 2023]. Available
Mask R‐CNN: A survey. In A. K. Das, J. Nayak, B. Naik, S. K. Pati, and from: https://doi.org/10.48550/arXiv.2010.11929.
D. Pelusi [eds.], Computational intelligence in pattern recognition, Fan, Y., G. Wen, D. Li, S. Qiu, M. D. Levine, and F. Xiao. 2020. Video
advances in intelligent systems and computing, 657–668. Springer, anomaly detection and localization via Gaussian Mixture Fully
Singapore. https://doi.org/10.1007/978-981-13-9042-5_56 Convolutional Variational Autoencoder. Computer Vision and Image
Bickford, D., D. J. Lohman, N. S. Sodhi, P. K. L. Ng, R. Meier, K. Winker, Understanding 195: 102920. https://doi.org/10.1016/j.cviu.2020.102920
K. K. Ingram, and I. Das. 2007. Cryptic species as a window on Feng, L., S. Zhu, C. Zhang, Y. Bao, X. Feng, and Y. He. 2018. Identification
diversity and conservation. Trends in Ecology & Evolution 22: of maize kernel vigor under different accelerated aging times using
148–155. https://doi.org/10.1016/j.tree.2006.11.004 hyperspectral imaging. Molecules 23: 3078. https://doi.org/10.3390/
Biggio, B., I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, molecules23123078
G. Giacinto, and F. Roli. 2013. Evasion attacks against machine Ferentinos, K. P. 2018. Deep learning models for plant disease detection
learning at test time. In C. Salinesi, M. C. Norrie, and Ó. Pastor [eds.], and diagnosis. Computers and Electronics in Agriculture 145: 311–318.
Advanced Information Systems Engineering, 387–402. Springer, https://doi.org/10.1016/j.compag.2018.01.009
Berlin, Germany. https://doi.org/10.1007/978-3-642-40994-3_25 Fuentes, S., G. Chacon, D. D. Torrico, A. Zarate, and C. Gonzalez Viejo.
Bock, C. H., G. H. Poole, P. E. Parker, and T. R. Gottwald. 2010. Plant 2019. Spatial variability of aroma profiles of cocoa trees obtained
disease severity estimated visually, by digital photography and image through computer vision and machine learning modelling: A cover
analysis, and by hyperspectral imaging. Critical Reviews in Plant photography and high spatial remote sensing application. Sensors 19:
Sciences 29: 59–107. https://doi.org/10.1080/07352681003617285 3054. https://doi.org/10.3390/s19143054
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY
Geetharamani, G., and A. Pandian. 2019. Identification of plant leaf Khanzada, M., A. Lodhi, and S. Shahzad. 2005. Chemical control of
diseases using a nine‐layer deep convolutional neural network. Lasiodiplodia theobromae, the causal agent of mango decline in
Computers & Electrical Engineering 76: 323–338. https://doi.org/10. Sindh. Pakistan Journal of Botany 37: 1023–1030.
1016/j.compeleceng.2019.04.011 Kingma, D. P., and M. Welling. 2019. An introduction to variational
Ghosal, S., D. Blystone, A. K. Singh, B. Ganapathysubramanian, A. Singh, and autoencoders. Foundations and Trends in Machine Learning 12:
S. Sarkar. 2018. An explainable deep machine vision framework for plant 307–392. https://doi.org/10.1561/2200000056
stress phenotyping. Proceedings of the National Academy of Sciences, USA Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2017. ImageNet classification
115: 4613–4618. https://doi.org/10.1073/pnas.1716999115 with deep convolutional neural networks. Communications of the ACM
Goetz, A. F. H., G. Vane, J. E. Solomon, and B. N. Rock. 1985. Imaging 60: 84–90. https://doi.org/10.1145/3065386
spectrometry for earth remote sensing. Science 228: 1147–1153. Kroon, L. P. N. M., H. Brouwer, A. de Cock, and F. Govers. 2012. The
https://doi.org/10.1126/science.228.4704.1147 genus Phytophthora anno 2012. Phytopathology 102: 348–364. https://
Gong, Y., Y. Sun, D. Peng, P. Chen, Z. Yan, and K. Yang. 2021. Analyze doi.org/10.1094/PHYTO-01-11-0025
COVID‐19 CT images based on evolutionary algorithm with dynamic Kuhlgert, S., G. Austic, R. Zegarac, I. Osei‐Bonsu, D. Hoh,
searching space. Complex and Intelligent Systems 7: 3195–3209. M. I. Chilvers, M. G. Roth, et al. 2016. MultispeQ Beta: A tool
https://doi.org/10.1007/s40747-021-00513-8 for large‐scale plant phenotyping connected to the open Photo-
Goodwin, S., J. Gurtowski, S. Ethe‐Sayers, P. Deshpande, M. C. Schatz, and synQ network. Royal Society Open Science 3: 160592. https://doi.
W. R. McCombie. 2015. Oxford Nanopore sequencing, hybrid error org/10.1098/rsos.160592
correction, and de novo assembly of a eukaryotic genome. Genome Kuok Ho, D. T., and P. S. Yap. 2020. A systematic review of slash‐and‐burn
Research 25: 1750–1756. https://doi.org/10.1101/gr.191395.115 agriculture as an obstacle to future‐proofing climate change.
Grosch, K. 2018. John Deere – Bringing AI to agriculture [online]. Website Proceedings of the 4th International Conference on Climate Change
https://digital.hbs.edu/platform-rctom/submission/john-deere- 4(1): 1–19. https://doi.org/10.17501/2513258X.2020.4101
bringing-ai-to-agriculture/ [accessed 16 May 2022]. Li, B., X. Xu, L. Zhang, J. Han, C. Bian, G. Li, J. Liu, and L. Jin. 2020.
Guo, M.‐H., T.‐X. Xu, J.‐J. Liu, Z.‐N. Liu, P.‐T. Jiang, T.‐J. Mu, S.‐H. Zhang, Above‐ground biomass estimation and yield prediction in potato by
et al. 2022. Attention mechanisms in computer vision: A survey. using UAV‐based RGB and hyperspectral imaging. ISPRS Journal of
Computational Visual Media 8: 331–368. https://doi.org/10.1007/s41095- Photogrammetry and Remote Sensing 162: 161–172. https://doi.org/
022-0271-y 10.1016/j.isprsjprs.2020.02.013
Gutiérrez, S., A. Wendel, and J. Underwood. 2019. Ground based Li, C., and M. Wand. 2016. Precomputed real‐time texture synthesis with
hyperspectral imaging for extensive mango yield estimation. Markovian Generative adversarial networks. In B. Leibe, J. Matas, N.
Computers and Electronics in Agriculture 157: 126–135. https://doi. Sebe, and M. Welling [eds.], Computer Vision – ECCV 2016, Lecture
org/10.1016/j.compag.2018.12.041 Notes in Computer Science, 702–716. Springer International
He, K., X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image Publishing, Cham, Switzerland. https://doi.org/10.1007/978-3-319-
recognition. In Proceedings of the IEEE Conference on Computer 46487-9_43
Vision and Pattern Recognition, 770–778. Las Vegas, Nevada, USA. Li, X., and J. She. 2017. Collaborative variational autoencoder for
He, K., G. Gkioxari, P. Dollar, and R. Girshick. 2017. Mask R‐CNN. In recommender systems. In Proceedings of the 23rd ACM SIGKDD
Proceedings of the IEEE International Conference on Computer International Conference on Knowledge Discovery and Data Mining,
Vision, 2961–2969. Venice, Italy. KDD ’17, 305–314. Association for Computing Machinery, New
Horevaj, P., E. Milus, and B. Bluhm. 2011. A real‐time qPCR assay to York, New York, USA. https://doi.org/10.1145/3097983.3098077
quantify Fusarium graminearum biomass in wheat kernels. Journal of Lim, K.‐L., X. Jiang, and C. Yi. 2020. Deep clustering with variational
Applied Microbiology 111: 396–406. https://doi.org/10.1111/j.1365- autoencoder. IEEE Signal Processing Letters 27: 231–235. https://doi.
2672.2011.05049.x org/10.1109/LSP.2020.2965328
Horton, T. R., and T. D. Bruns. 2001. The molecular revolution in Lin, E., S. Mukherjee, and S. Kannan. 2020. A deep adversarial variational
ectomycorrhizal ecology: Peeking into the black‐box. Molecular autoencoder model for dimensionality reduction in single‐cell RNA
Ecology 10: 1855–1871. https://doi.org/10.1046/j.0962-1083.2001.01333.x sequencing analysis. BMC Bioinformatics 21: 64. https://doi.org/10.
Horzyk, A., and E. Ergün. 2020. YOLOv3 precision improvement by the 1186/s12859-020-3401-5
weighted centers of confidence selection. In 2020 International Joint Lin, T. Y., M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár,
Conference on Neural Networks (IJCNN), 1–8. Glasgow, United and C. L. Zitnick. 2014. Microsoft COCO: Common objects in
Kingdom. https://doi.org/10.1109/IJCNN48605.2020.9206848 context. In Computer Vision–ECCV 2014: 13th European Confer-
Howard, A. G., M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, ence, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V
M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolu- 13, 740–755. Springer, Cham, Switzerland.
tional neural networks for mobile vision applications. arXiv Liu, Z., P. Luo, X. Wang, and X. Tang. 2015. Deep learning face attributes
1704.04861 [Preprint]. Posted 17 April 2017 [accessed 28 October in the wild. In Proceedings of the IEEE International Conference on
2023]. Available from: https://doi.org/10.48550/arXiv.1704.04861 Computer Vision, 3730–3738. Santiago, Chile.
Howard, A., M. Sandler, G. Chu, L.‐C. Chen, B. Chen, M. Tan, W. Wang, Liu, Z., H. Mao, C.‐Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. 2022. A
et al. 2019. Searching for MobileNetV3. arXiv 1905.02244 [Preprint]. ConvNet for the 2020s. arXiv 2201.03545 [Preprint]. Posted 2 March
Posted 20 November 2019 [accessed 28 October 2023]. Available 2022 [accessed 28 October 2023]. Available from: https://doi.org/10.
from: https://doi.org/10.48550/arXiv.1905.02244 48550/arXiv.2201.03545
Huang, H., Z. Wei, and L. Yao. 2019. A novel approach to component Lopes, J. F., V. da Costa, D. F. Barbin, L. J. P. Cruz‐Tirado, V. Baeten, and
assembly inspection based on Mask R‐CNN and support vector S. Barbon Jr. 2022. Deep computer vision system for cocoa
machines. Information 10: 282. https://doi.org/10.3390/info10090282 classification. Multimedia Tools and Applications 81: 41059–41077.
Huda‐Shakirah, A. R., N. M. I. Mohamed Nor, L. Zakaria, Y.‐H. Leong, and https://doi.org/10.1007/s11042-022-13097-3
M. H. Mohd. 2022. Lasiodiplodia theobromae as a causal pathogen of Lu, G., and B. Fei. 2014. Medical hyperspectral imaging: A review. Journal of
leaf blight, stem canker, and pod rot of Theobroma cacao in Malaysia. Biomedical Optics 19: 010901. https://doi.org/10.1117/1.JBO.19.1.010901
Scientific Reports 12: 8966. https://doi.org/10.1038/s41598-022- Luchi, N., R. Ioos, and A. Santini. 2020. Fast and reliable molecular
13057-9 methods to detect fungal pathogens in woody plants. Applied
Ioffe, S., and C. Szegedy. 2015. Batch normalization: Accelerating deep Microbiology and Biotechnology 104: 2453–2468. https://doi.org/10.
network training by reducing internal covariate shift. In Proceedings 1007/s00253-020-10395-4
of the 32nd International Conference on Machine Learning, 448–456. Maalø, L., M. Fraccaro, V. Liévin, and O. Winther. 2019. BIVA: A very
Lille, France. deep hierarchy of latent variables for generative modelling. arXiv
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 19 of 21
1902.02102 [Preprint]. Posted 6 November 2019 [accessed 28 vision. Journal of Food Composition and Analysis 97: 103771. https://
October 2023]. Available from: https://doi.org/10.48550/arXiv. doi.org/10.1016/j.jfca.2020.103771
1902.02102 O'Mahony, N., S. Campbell, A. Carvalho, S. Harapanahalli,
Mack, C. A. 2011. Fifty years of Moore's law. IEEE Transactions on G. V. Hernandez, L. Krpalkova, D. Riordan, and J. Walsh. 2020.
Semiconductor Manufacturing 24: 202–207. https://doi.org/10.1109/ Deep learning vs. traditional computer vision. In K. Arai and S.
TSM.2010.2096437 Kapoor [eds.], Advances in Computer Vision, Advances in
MacLean, D. 2019. A convolutional neural network for predicting Intelligent Systems and Computing, 128–144. Springer Interna-
transcriptional regulators of genes in Arabidopsis transcriptome data tional Publishing, Cham, Switzerland. https://doi.org/10.1007/978-
reveals classification based on positive regulatory interactions. 3-030-17795-9_10
biorXiv 618926 [Preprint]. Posted 28 April 2019 [accessed 28 October Ovaskainen, O., J. Nokso‐Koivisto, J. Hottola, T. Rajala, T. Pennanen,
2023]. Available from: https://doi.org/10.1101/618926 H. Ali‐Kovero, O. Miettinen, et al. 2010. Identifying wood‐inhabiting
Maddison, A. C., G. Macias, C. Moreira, R. Arias, and R. Neira. 1995. fungi with 454 sequencing – What is the probability that BLAST gives
Cocoa production in Ecuador in relation to dry‐season escape from the correct species? Fungal Ecology 3: 274–283. https://doi.org/10.
pod rot caused by Crinipellis perniciosa and Moniliophthora roreri. 1016/j.funeco.2010.01.001
Plant Pathology 44: 982–998. https://doi.org/10.1111/j.1365-3059. Pan, T.‐T., E. Chyngyz, D.‐W. Sun, J. Paliwal, and H. Pu. 2019.
1995.tb02657.x Pathogenetic process monitoring and early detection of pear black
Malhi, Y., J. T. Roberts, R. A. Betts, T. J. Killeen, W. Li, and C. A. Nobre. spot disease caused by Alternaria alternata using hyperspectral
2008. Climate change, deforestation, and the fate of the Amazon. imaging. Postharvest Biology and Technology 154: 96–104. https://doi.
Science 319: 169–172. https://doi.org/10.1126/science.1146961 org/10.1016/j.postharvbio.2019.04.005
Marelli, J.‐P., D. I. Guest, B. A. Bailey, H. C. Evans, J. K. Brown, M. Junaid, Paoletti, M. E., J. M. Haut, J. Plaza, and A. Plaza. 2019. Deep learning
R. W. Barreto, et al. 2019. Chocolate under threat from old and new classifiers for hyperspectral imaging: A review. ISPRS Journal of
cacao diseases. Phytopathology 109: 1331–1343. https://doi.org/10. Photogrammetry and Remote Sensing 158: 279–317. https://doi.org/
1094/PHYTO-12-18-0477-RVW 10.1016/j.isprsjprs.2019.09.006
Meinhardt, L. W., J. Rincones, B. A. Bailey, M. C. Aime, G. W. Griffith, Parra, P., T. Negrete, J. Llaguno, and N. Vega. 2018. Computer vision
D. Zhang, and G. A. G. Pereira. 2008. Moniliophthora perniciosa, the techniques applied in the estimation of the cocoa beans fermentation
causal agent of witches’ broom disease of cacao: What's new from this grade. In 2018 IEEE ANDESCON, 1–10. Santiago de Cali, Colombia.
old foe? Molecular Plant Pathology 9: 577–588. https://doi.org/10. https://doi.org/10.1109/ANDESCON.2018.8564569
1111/j.1364-3703.2008.00496.x Patrício, D. I., and R. Rieder. 2018. Computer vision and artificial
Meyer, J. G., S. Liu, I. J. Miller, J. J. Coon, and A. Gitter. 2019. Learning intelligence in precision agriculture for grain crops: A systematic
drug functions from chemical structures with convolutional neural review. Computers and Electronics in Agriculture 153: 69–81. https://
networks and random forests. Journal of Chemical Information and doi.org/10.1016/j.compag.2018.08.001
Modeling 59: 4438–4449. https://doi.org/10.1021/acs.jcim.9b00236 Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
Mite‐Baidal, K., E. Solís‐Avilés, T. Martínez‐Carriel, A. Marcillo‐Plaza, M. Blondel, et al. 2011. Scikit‐learn: Machine learning in Python.
E. Cruz‐Ibarra, and W. Baque‐Bustamante. 2019. Analysis of Journal of Machine Learning Research 12: 2825–2830.
computer vision algorithms to determine the quality of fermented Peters, J., D. Janzing, and B. Schölkopf. 2017. Elements of causal inference:
cocoa (Theobroma cacao): Systematic literature review. In R. Foundations and learning algorithms. MIT Press, Cambridge,
Valencia‐García, G. Alcaraz‐Mármol, J. del Cioppo‐Morstadt, N. Massachusetts, USA.
Vera‐Lucio, M. Bucaram‐Leverone [eds.], ICT for Agriculture and Phillips‐Mora, W., and M. J. Wilkinson. 2007. Frosty pod of cacao: A
Environment, Advances in Intelligent Systems and Computing, disease with a limited geographic range but unlimited potential for
79–87. Springer International Publishing, Cham, Switzerland. damage. Phytopathology 97: 1644–1647. https://doi.org/10.1094/
https://doi.org/10.1007/978-3-030-10728-4_9 PHYTO-97-12-1644
Motamedi, M., N. Sakharnykh, and T. Kaldewey. 2021. A data‐centric PyTorch. 2023. Models and pre‐trained weights. Website https://pytorch.
approach for training deep neural networks with less data. arXiv org/vision/main/models [accessed 21 June 2023].
2110.03613 [Preprint]. Posted 29 October 2021 [accessed 28 October Ramos‐Giraldo, P., C. Reberg‐Horton, A. M. Locke, S. Mirsky, and
2023]. Available from: https://doi.org/10.48550/arXiv.2110.03613 E. Lobaton. 2020. Drought stress detection using low‐cost computer
Nagarajan, S., G. Seibold, J. Kranza, E. E. Saari, and L. M. Joshi. 1984. vision systems and machine learning techniques. IT Professional 22:
Monitoring wheat rust epidemics with the Landsat‐2 satellite. 27–29. https://doi.org/10.1109/MITP.2020.2986103
Phytopathology 74: 585. https://doi.org/10.1094/Phyto-74-585 Ray, M., A. Ray, S. Dash, A. Mishra, K. G. Achary, S. Nayak, and S. Singh.
Nalisnick, E., A. Matsukawa, Y. W. Teh, D. Gorur, and 2017. Fungal disease detection in plants: Traditional assays, novel
B. Lakshminarayanan. 2019. Do deep generative models know what diagnostic techniques and biosensors. Biosensors and Bioelectronics
they don't know? arXiv 1810.09136 [Preprint]. Posted 24 February 87: 708–723. https://doi.org/10.1016/j.bios.2016.09.032
2019 [accessed 28 October 2023]. Available from: https://doi.org/10. Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look
48550/arXiv.1810.09136 once: Unified, real‐time object detection. In Proceedings of the IEEE
Nguyen, H. D. D., V. Pan, C. Pham, R. Valdez, K. Doan, and C. Nansen. Conference on Computer Vision and Pattern Recognition, 779–788.
2020. Night‐based hyperspectral imaging to study association of Las Vegas, Nevada, USA.
horticultural crop leaf reflectance and nutrient status. Computers and Ren, S., K. He, R. Girshick, and J. Sun. 2016. Faster R‐CNN: Towards real‐
Electronics in Agriculture 173: 105458. https://doi.org/10.1016/j. time object detection with region proposal networks. arXiv
compag.2020.105458 1506.01497 [Preprint]. Posted 6 Jan 2016 [accessed 28 October
O'Donnell, K., T. J. Ward, V. A. R. G. Robert, P. W. Crous, D. M. Geiser, 2023]. Available from: https://doi.org/10.48550/arXiv.1506.01497
and S. Kang. 2015. DNA sequence‐based identification of Fusarium: Rice, R. A., and R. Greenberg. 2000. Cacao cultivation and the conservation
Current status and future directions. Phytoparasitica 43: 583–595. of biological diversity. Ambio 29: 167–173. https://doi.org/10.1579/
https://doi.org/10.1007/s12600-015-0484-z 0044-7447-29.3.167
Okamoto, H., T. Murata, T. Kataoka, and S. I. Hata. 2007. Plant Rodriguez, C., O. Alfaro, P. Paredes, D. Esenarro, and F. Hilario. 2021.
classification for weed detection using hyperspectral imaging with Machine learning techniques in the detection of cocoa (Theobroma
wavelet analysis. Weed Biology and Management 7: 31–37. https://doi. cacao L.) diseases. Annals of the Romanian Society for Cell Biology 25:
org/10.1111/j.1445-6664.2006.00234.x 7732–7741.
Oliveira, M. M., B. V. Cerqueira, S. Barbon, and D. F. Barbin. 2021. Ronneberger, O., P. Fischer, and T. Brox. 2015. U‐Net: Convolutional
Classification of fermented cocoa beans (cut test) using computer networks for biomedical image segmentation. In Medical Image
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
20 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY
Computing and Computer‐Assisted Intervention–MICCAI 2015: Published 11 September 2020 [accessed 21 June 2023]. Available
18th International Conference, Munich, Germany, October 5–9, from: https://doi.org/10.48550/arXiv.1905.11946
2015, Proceedings, Part III 18, 234–241. Springer International Tan, M., and Q. Le. 2021. EfficientNetV2: Smaller models and faster
Publishing, Cham, Switzerland. training. In Proceedings of the International Conference on Machine
Salvatore, M. M., A. Andolfi, and R. Nicoletti. 2020. The thin line between Learning, 10096–10106.
pathogenicity and endophytism: The case of Lasiodiplodia theobro- Tan, D. S., R. N. Leong, A. F. Laguna, C. A. Ngo, A. Lao, D. M. Amalin, and
mae. Agriculture 10: 488. https://doi.org/10.3390/agriculture10100488 D. G. Alvindia. 2018. AuToDiDAC: Automated tool for disease
Sarkate, R. S., N. V. Kalyankar, and P. B. Khanale. 2013. Application of detection and assessment for cacao black pod rot. Crop Protection
computer vision and color image segmentation for yield prediction 103: 98–102. https://doi.org/10.1016/j.cropro.2017.09.017
precision. In Proceedings of the 2013 International Conference on Thapa, R., K. Zhang, N. Snavely, S. Belongie, and A. Khan. 2020. The Plant
Information Systems and Computer Networks, 9–13. Mathura, India. Pathology Challenge 2020 data set to classify foliar disease of apples.
https://doi.org/10.1109/ICISCON.2013.6524164 Applications in Plant Sciences 8: e11390. https://doi.org/10.1002/aps3.
Schaad, N. W., and R. D. Frederick. 2002. Real‐time PCR and its 11390
application for rapid plant disease diagnostics. Canadian Journal of Tian, H., T. Wang, Y. Liu, X. Qiao, and Y. Li. 2020. Computer vision technology
Plant Pathology 24: 2 50–2 58. https://doi.org/10.1 080/ in agricultural automation: A review. Information Processing in Agriculture
07060660209507006 7: 1–19. https://doi.org/10.1016/j.inpa.2019.09.006
Schmidt, A., and Z. Bandar. 1998. Modularity: A concept for new neural Toda, Y., and F. Okura. 2019. How convolutional neural networks diagnose
network architectures. In Proceedings of the IASTED International plant disease. Plant Phenomics 2019: 9237136. https://doi.org/10.
Conference of Computer Systems and Applications, 26–29. Irbid, Jordan. 34133/2019/9237136
Schölkopf, B., F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Tripathi, M. K., and D. D. Maktedar. 2020. A role of computer vision in
Y. Bengio. 2021. Toward causal representation learning. Proceedings of the fruits and vegetables among various horticulture products of
IEEE 109: 612–634. https://doi.org/10.1109/JPROC.2021.3058954 agriculture fields: A survey. Information Processing in Agriculture 7:
Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and 183–203. https://doi.org/10.1016/j.inpa.2019.07.003
D. Batra. 2017. Grad‐CAM: Visual explanations from deep networks United Nations. 2022. World Population Prospects 2022. Website https://
via gradient‐based localization. In Proceedings of the IEEE Interna- population.un.org/wpp/Graphs/Probabilistic/POP/TOT/900 [ac-
tional Conference on Computer Vision, 618–626. Venice, Italy. cessed 18 May 2022].
Shir, O. M. 2012. Niching in evolutionary algorithms. In G. Rozenberg, T. Vahdat, A., and J. Kautz. 2021. NVAE: A deep hierarchical variational
Bäck, and J. N. Kok [eds.], Handbook of natural computing, autoencoder. arXiv 2007.03898 [Preprint]. Posted 8 January 2021
1035–1069. Springer, Berlin, Germany. https://doi.org/10.1007/978- [accessed 28 October 2023]. Available from: https://doi.org/10.48550/
3-540-92910-9_32 arXiv.2007.03898
Shrikumar, A., P. Greenside, and A. Kundaje. 2017. Learning important Voulodimos, A., N. Doulamis, A. Doulamis, and E. Protopapadakis. 2018. Deep
features through propagating activation differences. In Proceedings of learning for computer vision: A brief review. Computational Intelligence
the 34th International Conference on Machine Learning, 3145–3153. and Neuroscience 2018: e7068349. https://doi.org/10.1155/2018/7068349
Sydney, Australia. Vuola, A. O., S. U. Akram, and J. Kannala. 2019. Mask‐RCNN and U‐Net
Singh, D., N. Jain, P. Jain, P. Kayal, S. Kumawat, and N. Batra. 2020. ensembled for nuclei segmentation. In 2019 IEEE 16th International
PlantDoc: A dataset for visual plant disease detection. In Proceedings Symposium on Biomedical Imaging (ISBI 2019), 208–212. Venice,
of the 7th ACM IKDD Conference on Data Sciences (CoD) and 25th Italy. https://doi.org/10.1109/ISBI.2019.8759574
Conference on Management of Data (COMAD), 249–253. Associa- Wang, L., Z. Wu, S. Karanam, K.‐C. Peng, and R. Vikram Singh. 2019.
tion for Computing Machinery, New York, New York, USA. https:// Sharpen Focus: Learning with attention separability and consistency.
doi.org/10.1145/3371158.3371196 arXiv 1811.07484 [Preprint]. Posted 7 August 2019 [accessed 28 October
Solodskikh, K., A. Kurbanov, R. Aydarkhanov, I. Zhelavskaya, Y. Parfenov, 2023]. Available from: https://doi.org/10.48550/arXiv.1811.07484
D. Song, and S. Lefkimmiatis. 2023. Integral neural networks. In Weinstein, B. G. 2018. A computer vision for animal ecology. Journal of
Proceedings of the IEEE/CVF Conference on Computer Vision and Animal Ecology 87: 533–545. https://doi.org/10.1111/1365-2656.12780
Pattern Recognition, 16113–16122. Vancouver, Canada. Wu, Y., and K. He. 2018. Group normalization. In Proceedings of the European
Statista. 2022. Number of smartphone mobile network subscriptions Conference on Computer Vision (ECCV), 3–19. Munich, Germany.
worldwide from 2016 to 2022, with forecasts from 2023 to 2028. Wu, Y., A. Kirillov, F. Massa, W.‐Y. Lo, and R. Girshick. 2019. Detectron2.
Website: https://www.statista.com/statistics/330695/number-of- Website https://github.com/facebookresearch/detectron2 [accessed
smartphone-users-worldwide/ [accessed 16 May 2022]. 28 October 2023].
Steddom, K., M. McMullen, B. Schatz, and C.M. Rush. 2005. Comparing image Wu, Z., Y. Chen, B. Zhao, X. Kang, and Y. Ding. 2021. Review of weed
format and resolution for assessment of foliar diseases of wheat. Plant detection methods based on computer vision. Sensors 21: 3647.
Health Progress 6: 11. https://doi.org/10.1094/PHP-2005-0516-01-RS https://doi.org/10.3390/s21113647
Su, J., C. Liu, M. Coombes, X. Hu, C. Wang, X. Xu, Q. Li, et al. 2018. Wheat Xiao, H., K. Rasul, and R. Vollgraf. 2017. Fashion‐MNIST: A novel image
yellow rust monitoring by learning from multispectral UAV aerial dataset for benchmarking machine learning algorithms. arXiv
imagery. Computers and Electronics in Agriculture 155: 157–166. 1708.07747 [Preprint]. Posted 15 September 2017 [accessed 28
https://doi.org/10.1016/j.compag.2018.10.017 October 2023]. Available from: https://doi.org/10.48550/arXiv.1708.
Sun, J., X. Wang, N. Xiong, and J. Shao, 2018. Learning sparse representation 07747
with variational auto‐encoder for anomaly detection. IEEE Access 6: Xie, S., R. Girshick, P. Dollár, Z. Tu, and K. He. 2017. Aggregated residual
33353–33361. https://doi.org/10.1109/ACCESS.2018.2848210 transformations for deep neural networks. arXiv 1611.05431
Sun, Y., B. Xue, M. Zhang, and G. G. Yen. 2020. Evolving deep [Preprint]. Posted 11 April 2017 [accessed 28 October 2023].
convolutional neural networks for image classification. IEEE Available from: https://doi.org/10.48550/arXiv.1611.05431
Transactions on Evolutionary Computation 24: 394–407. https://doi. Xu, W., H. Sun, C. Deng, and Y. Tan. 2017. Variational autoencoder for
org/10.1109/TEVC.2019.2916183 semi‐supervised text classification. In Thirty‐First AAAI Conference
Szegedy, C., V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. on Artificial Intelligence. San Francisco, California, USA.
Rethinking the inception architecture for computer vision. In Xu, W., and Y. Tan. 2020. Semisupervised text classification by variational
Proceedings of the IEEE Conference on Computer Vision and autoencoder. IEEE Transactions on Neural Networks and Learning
Pattern Recognition, 2818–2826. Las Vegas, Nevada, USA. Systems 31: 295–308. https://doi.org/10.1109/TNNLS.2019.2900734
Tan, M., and Q. V. Le. 2020. EfficientNet: Rethinking model scaling Xu, S., J. Wang, W. Shou, T. Ngo, A.‐M. Sadick, and X. Wang. 2021.
for convolutional neural networks. arXiv 1905.11946 [Preprint]. Computer vision techniques in construction: A critical review.
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 21 of 21
Archives of Computational Methods in Engineering 28: 3383–3397. Proceedings of the IEEE Conference on Computer Vision and Pattern
https://doi.org/10.1007/s11831-020-09504-3 Recognition, 6848–6856. Salt Lake City, Utah, USA.
Xu, Y., and R. P. McCord. 2021. CoSTA: Unsupervised convolutional Zhao, T., Y. Yang, H. Niu, D. Wang, and Y. Chen. 2018. Comparing U‐Net
neural network learning for spatial transcriptomics analysis. BMC convolutional network with mask R‐CNN in the performances of
Bioinformatics 22: 397. pomegranate tree canopy segmentation. In Multispectral, Hyperspec-
Yosinski, J., J. Clune, Y. Bengio, and H. Lipson. 2014. How transferable are tral, and Ultraspectral Remote Sensing Technology, Techniques and
features in deep neural networks? arXiv 1411.1792 [Preprint]. Posted Applications VII, 210–218. Honolulu, Hawaii, USA. https://doi.org/
6 November 2014 [accessed 28 October 2023]. Available from: 10.1117/12.2325570
https://doi.org/10.48550/arXiv.1411.1792 Zhu, X., W. Su, L. Lu, B. Li, X. Wang, and J. Dai. 2021. Deformable
Yu, K.‐H., F. Wang, G. J. Berry, C. Ré, R. B. Altman, M. Snyder, and DETR: Deformable transformers for end‐to‐end object detection.
I. S. Kohane. 2020. Classifying non‐small cell lung cancer types and arXiv 2010.04159 [Preprint]. Posted 18 March 2021 [accessed 28
transcriptomic subtypes using convolutional neural networks. Journal October 2023]. Available from: https://doi.org/10.48550/arXiv.
of the American Medical Informatics Association 27: 757–769. https:// 2010.04159
doi.org/10.1093/jamia/ocz230
Yue, J., W. Zhao, S. Mao, and H. Liu, 2015. Spectral–spatial classification of
hyperspectral images using deep convolutional neural networks.
Remote Sensing Letters 6: 468–477. https://doi.org/10.1080/2150704X.
2015.1047045
Zeiler, M. D., and R. Fergus. 2014. Visualizing and understanding How to cite this article: Sykes, J. R., K. J. Denby, and
convolutional networks. In D. Fleet, T. Pajdla, B. Schiele, and T. D. W. Franks. 2024. Computer vision for plant
Tuytelaars [eds.], Computer Vision – ECCV 2014, Lecture Notes in pathology: A review with examples from cocoa
Computer Science, 818–833. Springer International Publishing, agriculture. Applications in Plant Sciences 12(2):
Cham, Switzerland. https://doi.org/10.1007/978-3-319-10590-1_53
e11559. https://doi.org/10.1002/aps3.11559
Zhang, X., X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely
efficient convolutional neural network for mobile devices. In