Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Computer Vision for Plant Pathology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Received: 7 February 2023 | Accepted: 13 July 2023

DOI: 10.1002/aps3.11559

REVIEW ARTICLE

Computer vision for plant pathology: A review with examples


from cocoa agriculture

Jamie R. Sykes1 | Katherine J. Denby2 | Daniel W. Franks1,3

1
Department of Computer Science, University of Abstract
York, Deramore Lane, York, YO10 5GH,
Yorkshire, United Kingdom Plant pathogens can decimate crops and render the local cultivation of a species
2 unprofitable. In extreme cases this has caused famine and economic collapse. Timing
Centre for Novel Agricultural Products,
Department of Biology, University of York, is vital in treating crop diseases, and the use of computer vision for precise disease
Wentworth Way, York, YO10 5DD, Yorkshire, detection and timing of pesticide application is gaining popularity. Computer vision
United Kingdom
can reduce labour costs, prevent misdiagnosis of disease, and prevent misapplication
3
Department of Biology, University of York, of pesticides. Pesticide misapplication is both financially costly and can exacerbate
Wentworth Way, York, YO10 5DD, Yorkshire,
United Kingdom
pesticide resistance and pollution. Here, we review the application and development
of computer vision and machine learning methods for the detection of plant disease.
Correspondence This review goes beyond the scope of previous works to discuss important technical
Jamie R. Sykes, Department of Computer Science, concepts and considerations when applying computer vision to plant pathology. We
University of York, Deramore Lane, York, YO10 present new case studies on adapting standard computer vision methods and review
5GH, Yorkshire, United Kingdom.
Email: jamie.sykes@york.ac.uk
techniques for acquiring training data, the use of diagnostic tools from biology, and
the inspection of informative features. In addition to an in‐depth discussion of
This article is part of the special issue “Resilient convolutional neural networks (CNNs) and transformers, we also highlight the
botany: Innovation in the face of limited mobility
and resources.”
strengths of methods such as support vector machines and evolved neural networks.
We discuss the benefits of carefully curating training data and consider situations
where less computationally expensive techniques are advantageous. This includes a
comparison of popular model architectures and a guide to their implementation.

KEYWORDS
agronomy, disease detection, machine learning, plant pathology

Computer vision (CV), typically powered by machine for tasks specific to cocoa (Theobroma cacao L.), such as the
learning (ML), is now used for a variety of tasks in exploration and optimisation of aroma profiles (Fuentes
agriculture, botany, and ecology. These tasks include plant et al., 2019), monitoring of cocoa bean fermentation (Parra
health assessments (Patrício and Rieder, 2018), identifica- et al., 2018; Oliveira et al., 2021), and bean quality
tion of weeds (Wu et al., 2021), identification of drought‐ classification (Mite‐Baidal et al., 2019). While large research
prone areas of land (Ramos‐Giraldo et al., 2020), yield and development budgets for areas such as wheat (Triticum
prediction (Sarkate et al., 2013), and detection of defects or aestivum L.) production have allowed for the use of
bruising in fruits and vegetables (Tripathi and Maktedar, unpiloted aerial vehicle photography to identify disease
2020). We are seeing substantial improvement in the outbreaks (Su et al., 2018; Chiu et al., 2020) and the use of
efficiency of CV techniques (He et al., 2016; Howard et al., multispectral satellite photography to monitor outbreaks of
2017; Zhang et al., 2018) and, at least for now, computa- yellow rust (Puccinia striiformis) from space (Nagarajan
tional resources continue to become more affordable et al., 1984), the application of ML to sectors with fewer
(Mack, 2011). As a result, CV is becoming available to financial resources has had to take a different form.
whole industries, not just areas of highest commercial value; Onboard graphics processing units (GPUs) can run large
for example, ML has been used with increasing regularity neural networks locally, analysing image data from farm

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the
original work is properly cited.
© 2023 The Authors. Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America.

Appl. Plant Sci. 2024;12:e11559. wileyonlinelibrary.com/journal/AppsPlantSci | 1 of 21


https://doi.org/10.1002/aps3.11559
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

machinery in real time, while fast internet connections can focuses on more practical issues. The final section, “A
be used to run the same large models remotely roadmap to commercial implementation,” includes multiple
(Grosch, 2018). By contrast, implementation of ML in points that are important to consider prior to choosing an
poorer sectors must rely on older hardware, edge devices, architecture and beginning development.
and older‐model smartphones. This means that an emphasis Several review articles have been published on the topic
must be placed on the ultra‐low‐cost implementation and of CV and deep learning that are applicable to plant
high computational efficiency of algorithms. This provides pathology (Voulodimos et al., 2018; Weinstein, 2018;
us with an opportunity and motivation to steer the ML field Chouhan et al., 2020; Xu et al., 2021). High‐quality works
away from brute force computing and toward more such as Weinstein (2018), which reviews the use of CV in
nuanced and efficient approaches. animal ecology, are directly applicable to plant pathology
The cultivation of cocoa represents a prime example of a owing to the flexibility of the techniques discussed here.
sector that could benefit greatly from non‐intrusive and What is missing from these works is a critical review and
highly optimized CV disease detection and will be used as an discussion of the latest and/or less conventional techniques
example throughout this review. The International Cocoa in CV and a discussion of data acquisition and validation.
Organization estimates that up to 38% of the global cocoa Each of these reviews were published prior to or near the
crop is lost to disease annually, with over 1.4 million tonnes release of Detection Transformer (DETR; Carion et al., 2020),
of cocoa lost to just three diseases in 2016 (Maddison Vision Transformer (VIT; Dosovitskiy et al., 2021), and
et al., 1995; Marelli et al., 2019). Additionally, international ConvNeXT (Liu et al., 2022), so naturally these recent
disease spread has been devastating to this industry in the landmark methods are not discussed. However, despite all
past and could be again in the future (Phillips‐Mora and being published after the release of Faster Region‐Based
Wilkinson, 2007; Meinhardt et al., 2008). Following the loss Convolutional Neural Network (Faster R‐CNN; Ren
of a cocoa crop to witches’ broom disease, a plot of land will et al., 2016), ResNet (He et al., 2016), and You Only Look
typically be cleared of forest, and the previous robust Once (YOLO; Redmon et al., 2016), only Xu et al. (2021)
agroforestry system will be replaced with a monoculture mention any of these popular and high‐performing
(Rice and Greenberg, 2000; Meinhardt et al., 2008). This architectures. Those being YOLO and region‐based fully
disease is therefore not only capable of devastating the convolutional networks, an early predecessor to Faster
livelihoods of whole communities of cocoa farmers, elim- R‐CNN.
inating 50–90% of their crop (Meinhardt et al., 2008), but it is A recent survey (Guo et al., 2022) goes into great
also destructive to local biodiversity and has significant detail on the various facets of different attention
negative impact on the carbon capture potential of the land mechanisms, which are integral to transformer architec-
(Kuok Ho and Yap, 2020). Such loss of Amazonian Forest is tures. While this work presents the bleeding edge of CV
a driver of climate change, causing positive feedback and technology, it does not present the holistic, applied, and
exacerbating this global crisis (Malhi et al., 2008). data‐centric perspective provided here. Another paper
A review from 1986 on the use of systemic fungicides to aimed to develop CV models for the classification of
tackle oomycetes, such as Phytophthora spp., highlights cocoa beans, comparing the use of ResNet18, ResNet50,
concern about damage to the environment and human and support vector machines (SVMs; Lopes et al., 2022),
health by pesticides such as methyl bromide, which are still while another recent review gives a high‐level discussion
in use today (Cohen and Coffey, 1986). These concerns, and of a number of CV studies in agriculture, covering the
those of the pesticide resistance (Department of Health, topics of hyperspectral imaging, the use of unpiloted
Victoria, 2018), are still present 37 years later. The use of aerial vehicles, and architectures as recent as ResNeXt
CV and ML for targeted application and calibration of (Xie et al., 2017; Tian et al., 2020). However, while the
pesticide dose are beginning to have massive beneficial latter of these two papers presents a broad view of CV
effects in this area across the agriculture industry. for plant pathology, providing strong links to many
It is estimated that from 2016 to 2026, the number of plant taxa, no mention is made by either Lopes et al.
smartphone users will have doubled from approximately 3.7 (2022) or Tian et al. (2020) of architectures or
billion people to 7.5 billion (Statista, 2022). Therefore, the techniques released after 2017. As such, the fusion of
necessary hardware to run CV models is largely in place, industry standard and bleeding edge methods in data
and we need now only develop and deploy the CV models acquisition, verification, and analysis presented here
to have great potential for impact with little monetary input. make the present review unique among those listed
Here we discuss how best to achieve this. above.
This review is composed of three main sections. This review provides the reader with an in‐depth
Section 1, “Methods in computer vision,” critically reviews understanding of CV for plant pathology and supports the
a wide variety of relevant techniques in ML and CV model previous works. In doing so, we focus on how best to adapt
development and testing, and section 2, “Data acquisition current methods to provide practical solutions for farmers,
and model testing,” discusses techniques for data gathering, agronomists, and botanists without access to high‐
data labelling, and model testing. While section 1 focuses on performance computational resources. While cocoa agricul-
ML theory and comparison of model architectures, section 2 ture is used as a consistent example throughout, all methods
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 3 of 21

discussed here are applicable across plant pathology and Although the requirement for vast training data sets
agriculture, as well as related fields such as plant and animal may preclude the use of transformers for many plant
ecology and forestry. pathology projects, there is a middle ground between the
popular ResNet architectures and transformer models.
Taking inspiration from transformer designs, the highly
METHO DS IN COM P U TE R V IS I O N competitive ResNet architectures were updated to produce a
pure CNN that competes well with transformers in many
Background tasks and is reported to outperform the original ResNets by
about 3% accuracy on ImageNet (Deng et al., 2009). This
Ever since AlexNet was presented at the Conference on family of four models is named ConvNeXt and includes
Neural Information Processing Systems in 2012, the field of models of varying complexity from ConvNeXt Tiny to
CV has been dominated by CNNs (Krizhevsky et al., 2017). ConvNeXt Large. Additionally, ConvNeXt uses layer
While subsequent updates to CNN architectures have normalisation in place of batch normalisation. This
provided dramatic improvements over AlexNet (Liu modification could have important benefits for plant
et al., 2022), it is important to recognise that CNNs are pathology projects, as discussed in the “Image, batch, and
not the only tools at our disposal. Previous work on cocoa layer normalisation” section; however, as the ConvNeXt
disease has assessed the performance of SVMs, random architectures are relatively large (ConvNeXt Tiny: 29
forest regression, and artificial neural networks to identify million parameters, ResNet18: 12 million parameters,
common diseases in cocoa from standard colour images, ResNet50: 26 million parameters), these models too require
hereafter referred to as RGB (red, green, blue) images large and/or complex training data sets to avoid overfitting
(Rodriguez et al., 2021). Here it was shown that artificial and more powerful hardware to run at inference than the
neural networks are capable of identifying late‐stage disease smaller ResNets.
in RGB images of cocoa, but that training data set size is a
limiting factor. Another study applied an SVM to perform
pixel‐wise identification of black pod rot in cocoa (Tan Object detection and semantic segmentation
et al., 2018). The resulting algorithm showed an impressive
ability to detect human‐visible disease symptoms and, given Bounding box object detection and semantic segmentation
the high computational efficiency of SVMs, it was able to are processes by which objects of interest in an image are
run on low‐powered hardware. Additionally, this model was both classified and located in the image. In these tasks,
trained on only 50 images, which is an extremely small either a box (bounding box object detection) or a polygon
training set in CV. However, no mention was made of the or “mask” (semantic segmentation) is drawn around the
ability of these models to detect early disease development object of interest. For an example of semantic segmentation,
or non‐human‐visible symptoms, which will be a central see Case Study 1 (Box 1).
focus of this review. Semantic segmentation and object detection could help
in the accurate manual labelling of disease states in images.
In simple image classification with a CNN, a model must
Vision transformers learn what features, across the whole image, can be used as
true markers of disease. However, annotation of training
In the early 2010s, transformers become the default for images with bounding boxes or segmentation masks may be
natural language processing (Liu et al., 2022), and they are used to focus the attention of the model, thus making
now rapidly gaining popularity in vision‐based tasks. Pure training more efficient and reducing overfitting. This
transformer‐based multilayer perceptrons, such as ViT beneficial effect might be more pronounced with semantic
(Dosovitskiy et al., 2021), do away with the convolutional segmentation than bounding boxes because the edges of a
layers of a CNN. Instead, they subdivide and tokenise an bounding box may extend beyond the edges of the leaf, pod,
image, giving each token a positional embedding, and or tree in question and thus mislabel parts of neighbouring
then pass all of these data to the multi‐head attention healthy plants. However, when comparing the ability of
mechanism of the network. The main drawbacks of such Faster R‐CNN and Mask R‐CNN to detect human‐visible
transformer‐based models are that they require training signs of insect damage in sweet peppers (Capsicum annuum
data sets on the order of millions of images, and they lack L.), Faster R‐CNN was shown to have superior accuracy and
the inductive biases of CNNs, such as translational mean average precision (mAP) (Lin et al., 2020). Here, mAP
equivariance (Dosovitskiy et al., 2021). In addition, the is defined as the mean precision over all classes of the mean
global structure of objects in an image must be learned from per‐class precision, with a given intersection over union.
scratch, whereas this is maintained throughout a CNN. These disparities in performance were contingent on which
However, when pretrained on a large data set and then backbone model architecture (Inception v2, ResNet50, or
fine‐tuned on a more modest data set of tens of thousands ResNet101) (Szegedy et al., 2016) was used. When the more
of images, vision transformers can outcompete CNNs complex ResNet101 was used, Faster R‐CNN and Mask
(Dosovitskiy et al., 2021). R‐CNN performed more similarly, although in this task
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

training images, or when classifying images of plant disease


BOX 1 Case Study 1: Semantic segmentation for with similar characteristics, such as black pod rot in cocoa
cocoa disease detection. caused by Phytophthora megakarya or P. palmivora.
Furthermore, while P. megakarya and P. palmivora can be
distinguished by eye, Lasiodiplodia species, of which three
In this case study, we applied Mask R‐CNN to segment are known to infect cocoa, can present with identical
images of diseased cocoa trees. The training data set morphological characteristics. This means that traditional
consisted of 186 images of black pod rot, 121 images of classification techniques are insufficient and molecular
frosty pod rot, and 63 images of witches’ broom disease. identification techniques must be used in their place
The model was trained, starting with the “mask rcnn R 50
(Huda‐Shakirah et al., 2022). The development of CV
FPN 3x” weights, for 1000 epochs.
technologies that can make such difficult distinctions would
The preliminary results from this case study were
somewhat encouraging. However, although the selected have important implications for all areas of agriculture and
positive results in Figure 1 show that this model has the botany for two reasons. First, while P. megakarya and P.
potential to perform well, these results are not representa- palmivora are managed in the same way, different species of
tive of the full testing set. The average precision per class Lasiodiplodia are not (Khanzada et al., 2005). Thus, the
was 4.29, 13.45, and 30 for black pod root, frosty pod root, failure of a model to distinguish between species of
and witches’ broom disease, respectively, i.e., the model Phytophthora is not critical for effective disease manage-
performed acceptably on witches’ broom disease, despite ment, but failure to distinguish between species of
the low number of training images, but poorly on most Lasiodiplodia is. Second, cosmopolitan pathogens such as
cases of black pod root and frosty pod root. Phytophthora spp. and Lasiodiplodia spp. have extremely
Notwithstanding the potential theoretical benefits dis- wide host ranges, infecting many commercially important
cussed above, manual annotation of a full training data set
crops. Lasiodiplodia theoromae alone attacks over 189 plant
with masks is extremely laborious. So, without the promise
species across 60 families (Salvatore et al., 2020), while the
of improved results relative to a simple CNN, this
additional effort may not be worthwhile. Despite this, the growing list of described Phytophthora (aka “plant
favourable preliminary results in this study and one other destroyer”) species is currently 116 entries long (Kroon
(Zhao et al., 2018) mean that, with the incorporation of et al., 2012).
automated annotation tools and/or semi‐supervised learn- Transformer‐based object detection models such as DETR
ing, semantic segmentation shows promise as an avenue of (Carion et al., 2020) are also now available and contend well
research for CV in plant pathology. with Faster R‐CNN when trained on the huge Common
Objects in Context (COCO; Lin et al., 2014) benchmark data
set. The key benefit of DETR is that it predicts bounding box
coordinates directly, negating the need for the region proposal
Faster R‐CNN performed best with the simpler architectures network of Faster R‐CNN. Faster R‐CNN's region proposal
(Lin et al., 2020). However, it should be noted that average network has issues trying to identify overlapping objects
precision is not directly comparable between bounding box because of the non‐max suppression algorithm, which was
detection and semantic segmentation models. This is for two removed from YOLO in version 3 (Horzyk and Ergün, 2020).
reasons: (1) it is easier to achieve a given intersection over However, DETR has problems detecting small objects, and has
union with a bounding box as this task is less precise than a very long convergence time. These defects are said to be
segmentation, and (2) Mask R‐CNN simply adds the ability resolved in Deformable DETR (Zhu et al., 2021), although we
to predict a mask in a box predicted by Faster R‐CNN, so encountered significant difficulty in retraining Deformable
segmentation is additive in this case. As such, the results of DETR due to prevailing bugs in the code and so were unable
Lin et al. (2020) should be considered accordingly. to confirm these benefits.
Object detection and semantic segmentation are typi- In segmenting instances of nuclei in microscopy images,
cally performed using Faster R‐CNN (Ren et al., 2016), Mask R‐CNN was compared with the U‐Net architecture
Mask R‐CNN (He et al., 2017), or YOLO (Redmon (Ronneberger et al., 2015), which was designed for medical
et al., 2016). However, these architectures have also been image segmentation. Here, the two techniques were shown to
combined with other methods, such as SVMs, to confirm or give similar mAP, F1, and recall scores (Vuola et al., 2019),
deny the presence of an object in a proposed region although Mask R‐CNN scored 0.812 for precision, while the
(Voulodimos et al., 2018). For example, SVMs have been U‐Net scored only 0.68. A subsequent ensemble approach
used in conjunction with Mask R‐CNN in automated ML was then described, which shares the outputs of the two
pipelines to identify defects in machined parts (Huang independently trained architectures to exploit the U‐Net's
et al., 2019). Additionally, when facing a classification purportedly superior F1 scores (+0.057), in tandem with
problem with high intraclass variance, low interclass Mask R‐CNN's high mAP, precision, and recall. The
variance, and insufficient training examples, the application ensemble model produced comparable, if slightly higher,
of SVMs to features learned by a CNN from ImageNet can mAP (+0.016), F1 (+0.056), and recall (+0.037) scores
improve results relative to a CNN alone (Cao and compared with Mask R‐CNN, but the precision was 0.087
Nevatia, 2016). This may prove useful in projects with few lower. Although the U‐Net was reported to produce the best
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 5 of 21

F I G U R E 1 Application of semantic segmentation with Mask R‐CNN to highlight whole diseased cocoa trees. Example images of trees infected with (A)
black pod rot, (B) frosty pod rot, and (C) witches’ broom disease. The percentage scores for each show the degree of confidence in the model's diagnosis:
black pod root = 72%, frosty pod root = 80%, witches’ broom disease = 93%.

F1 score and the ensemble model produced the best mAP and Modelling with generative deep neural networks (DNNs)
recall, these improvements were slight. Additionally, F1 is can aid in gaining an intuitive understanding of the
calculated directly from precision and recall, so it seems physical laws that led to the creation of the data to be
counterintuitive that the U‐Net approach could have the modelled. An example of this is the use of artistic style
highest F1, yet the lowest precision and recall. The most transfer with generative adversarial networks (Li and
noteworthy result here is the consistently superior precision of Wand, 2016), where specific semantic features in an
Mask R‐CNN in this comparison and in another against image can be isolated and utilised. Another popular deep
YOLO (Bharati and Pramanik, 2020; Horzyk and Ergün, 2020). generative model architecture is the variational auto-
Additionally, in a study comparing the use of U‐Net and Mask encoder (VAE), which we will focus on here for the task
R‐CNN to segment images of pomegranate (Punica granatum of image data set filtering.
L.) trees, Mask R‐CNN outperformed the U‐Net in both When working with autonomously collected data, for
precision and recall by wide margins (Zhao et al., 2018). example from camera traps or web‐scraping bots, the
An alternative approach applied an SVM to perform acquisition of vast quantities of data is often the easy part
pixel‐wise classification to detect black pod rot in cocoa, of creating a good training data set. Camera traps tend to
with a human expert labelling the diseased pixels in produce a considerable amount of uninformative data and
training images (Tan et al., 2018). Like semantic the data from naive web‐scraping bots can be badly
segmentation, this technique achieves the effect of contaminated with misclassified and irrelevant images; for
providing the model with additional information on the example, a search for the keyword “Acer” will return many
location of disease in an image, relative to a simple CNN. more images of laptops than it will Japanese maple trees, and
However, it imposes arbitrary physical boundaries around a search for “black pod rot” will include many images of
disease symptoms such as lesions and cankers, and the frosty pod rot, cherelle wilt, and insect damage. Therefore,
algorithm is unable to define for itself any symptoms that some level of human supervision is vital in curating training
are not or cannot be identified with human vision. By data, and the importance of consulting farmers and
using semantic segmentation with a CNN backbone, like researchers in data collection and labelling cannot be
in Mask R‐CNN or DETR, to segment whole trees, these overstated. However, manual labelling of a full data set can
effects could be avoided, i.e., the model would be able to be extremely costly, and a potential method to offset some of
detect non‐human‐visible symptoms via feature learning this cost is said to be the use of VAEs for outlier detection.
and model the effects of hyphae propagating through the A VAE is composed of two neural networks that are
plant or systemic changes to a plant's phenotype away trained in parallel. The encoder network projects the image
from the site of infection. data to a smaller latent vector space, thus compressing it,
and the decoder network predicts the original image from
this compressed data as best it can.
Variational autoencoders for outlier detection Generative models tend to generalise to the real world
much better than discriminative models, which aim to
In addition to discriminative modelling, ML provides uncover correlative relationships between data and class
several powerful tools for generative modelling. labels (Kingma and Welling, 2019). However, deep
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

generative models are typically considered excessive for However, these authors go on to note that, in their use case,
classification problems, as they often have higher bias NVAE transformed images of size (3,32,32) to a latent space
(Banerjee, 2007) and are computationally expensive. of size (16,16,128) without any subsequent downscaling. It
Previous works have successfully used VAEs for text is not surprising then that this architecture is able to
classification (Xu et al., 2017; Xu and Tan, 2020), data reconstruct an image so well after just one training epoch,
clustering (Dilokthanakul et al., 2017; Lim et al., 2020), with no pre‐trained weights, as the dimensionality of the
anomaly detection (An and Cho, 2015), recommender data is expanded rather than compressed. Likewise, NVAE
systems (Li and She, 2017), and dimensionality reduction is not appropriate for identifying outliers by the distribution
(Lin et al., 2020). There are also a limited number of of reconstruction errors as it can reconstruct any image
published papers on the use of VAEs for anomaly detection almost perfectly. For example, when we trained NVAE on a
with colour images (Fan et al., 2020). data set of 54,124 plant images, it was able to reconstruct
Here, we consider two methods for which a VAE might any image in the ImageNet data set with similar binary
be used to detect outlying data in collections of large colour cross‐entropy loss to that of plant images. As an alternative
images. To do so, we will use the example of detecting non‐ to NVAE, we attempted to use a custom convolutional VAE
plant images in a web‐scraped collection of plant images for with a ResNet152 (He et al., 2016) backbone to apply the
use in building a disease classifier. two methods of outlier detection described above. However,
we were unable to get this architecture to function well
Method 1. Distribution of reconstruction loss enough to sufficiently compress the data and reconstruct
Having trained a VAE on only plant images, use this images with high fidelity.
model to compress and decompress all images in the The paucity of papers published on the subject of
contaminated data set and record the reconstruction loss for outlier detection in colour images with VAEs seems to be
each image. Plot the distribution of the loss values and due to the inherent difficulty of this task. The high
record the most extreme high values as outliers. The dimensions of such data and the large storage and GPU
assumption here is that the model should “fail” to memory requirements that training these models on such
reconstruct non‐plant images well, as it should be naive to data necessitates (Sun et al., 2018) has largely been
any images that do not show plants. resolved, although for many projects GPU memory
availability will still preclude this technique. Thus far,
Method 2. Dimension reduction and clustering the inability of the VAE architecture to learn a compres-
Using the encoder network of a VAE that has been sion algorithm for large colour images suggests a hard
trained on the ImageNet data set, compress the images in physical limitation that might not be overcome. More-
the contaminated data set and record the values of the latent over, while Maalø et al. (2019) contest this argument,
space for each image. Reduce the dimensions of the latent Nalisnick et al. (2019) argue comprehensively that
space further with principal component analysis, t‐ generative models are not suitable for outlier detection
distributed stochastic neighbour embedding (t‐SNE), and/ by the reconstruction loss method described above, as
or uniform manifold approximation and projection these models tend to learn low‐level statistics about data
(UMAP). Plot these reduced data. Outliers/contaminant rather than high‐level semantics. As such, they are often
images may then separate from the clean data. unable to differentiate between images that, to the human
Nouveau VAE (NVAE) is the product of an effort to eye, are obviously different. We describe a successful
carefully craft the encoding network architecture of a VAE, alternative method of outlier detection in Box 2.
which appears to produce excellent results (Vahdat and
Kautz, 2021). After training for just one epoch, this
architecture is able to project large colour images onto a Evolutionary algorithms
latent space and reconstruct them almost perfectly.
However, if the aim of using NVAE is to compress image The field of CV is currently dominated by handcrafted
data, this architecture is not appropriate. This is because, DNNs with fixed topologies. However, the seldom‐used
using the recommended settings for the CelebA 64 data set techniques of evolved neural networks have real potential
(Liu et al., 2015), the latent space produced for an image in the field of plant pathology. Computational efficiency
with dimensions (3,224,224) is (100,224,224), i.e., more than at inference and improved ability to generalise are of
33 times larger than the original image. Following the paramount importance to models developed for plant
authors’ provided instructions to constrain the latent space pathology in the field. This is because such models must
to be as small as possible without excessively modifying the be able to cope with complex and highly variable
code, the latent space for this same size of image remains symptoms and backgrounds, and often must run on
the same (100,224,224). This observation is corroborated in low‐powered hardware. Growing neural networks take far
another study where the authors explain how NVAE first longer to train/grow than those with fixed topologies, but
expands the data dimensions to a large number of this is of minor concern given the efficient parallelisation
latent spaces before pruning those spaces based on and the vast computational resources now available
Kullback–Leibler (KL) divergence (Asperti et al., 2021). for training. The hardware available to farmers in
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 7 of 21

Evolving neural networks have been shown to be


BOX 2 Case Study 2: Semi‐supervised learning highly effective in producing neural networks with a high
for outlier detection. degree of modularity (Amer and Maul, 2019). This
increased modularity is said to be the result of applying
a cost to the number of connections, which both reduces
As an alternative to using a variational autoencoder for computational cost and promotes evolvability as the
outlier detection, we trained a semi‐supervised binary sharing of modular units between parents is made
Outlier‐NoneOutlier (in this case, “plant” or “non‐plant”)
simpler. It is also said that such modularity helps these
classifier, which achieved near‐perfect results. We used the
models to generalise better as each modular unit is
ResNet18 architecture and initially trained it on a manually
curated data set of 57,228 plant images and an equal‐sized capable of independent generalisation (Schmidt and
random subset of the ImageNet data set, which constituted Bandar, 1998). With evolutionary algorithms, one can
the non‐plant images. We then continued training using the also promote diverse populations of networks with
below algorithm and the contaminated data set of 96,692 techniques such as niching (Shir, 2012), and use of
images. non‐elitism strategies can allow for the simultaneous
exploration of fitness valleys and local optima without
getting stuck there (Dang et al., 2021). While elitism
while nRelabledImages > 0 do follows the biologically implausible assumption that the
train model fittest individual/network will always survive to repro-
duce, non‐elitism allows weaker individuals to explore
for image in ContaminatedImages do
fitness valleys, which may lead them to undiscovered
classify image maxima.
if ClassificationConfidence ≥ 99% then While a direct comparison of evolved neural networks
with popular CNN architectures could not be found,
label image
Table 1 shows an indirect comparison between a recent
add image to training set method for evolving neural networks (EVOCNN) and two
end if
popular CNNs, ResNet18 and VGG16; EVOCNN appears
to perform very well in this comparison. However, the error
end for rate for these models was calculated when trained on the
end while Fashion‐MNIST data set, while the top 1 and top 5 accuracy
was produced using ImageNet. Fashion‐MNIST, which is
composed of 28 × 28‐pixel greyscale images of clothing
(Xiao et al., 2017), is not a challenging proposition for
During this process, 1376 non‐plant images and 44,212 plant modern CNNs and is not reflective of real‐world plant
images from the contaminated data set were correctly labelled pathology problems. Additionally, it should be noted that,
by the model. After the first round of semi‐supervised training
in the EVOCNN paper (Sun et al., 2020), the number of
was completed, images that this model classified with >99%
parameters of VGG16 is misreported as 26 million, rather
confidence were manually reviewed. Incorrectly labelled
images were manually re‐labelled and a second round of than the 138 million listed in the Torchvision documenta-
semi‐supervised training was begun. After the first round of tion (PyTorch, 2023). This suggests that VGG16 would have
semi‐supervised training, classification of images as “plant” massively overfit to the Fashion‐MNIST data, making this
with >99% confidence was >99% accurate but classification of an inappropriate comparison. However, EVOCNN does
images as “non‐plant” with >99% confidence was only about
50% accurate. After the second round of semi‐supervised
training, the model performed with >99% accuracy and F1
score for both classes, thus showing a clear superiority in this T A B L E 1 Test results of three architectures trained on two data sets,
providing an indirect comparison. ResNet18 was trained only on
technique's ability to identify contaminant images over the
ImageNet with the top 1 and top 5 classification accuracies shown.
VAE approaches. This is in addition to its ease of
EVOCNN was trained only on Fashion‐MNIST with the percent error
implementation, reduced training time, and low computa- shown. VGG16 was trained on both data sets. Results were taken from Sun
tional requirements. After training, the model was used to et al. (2020) and the PyTorch documentation (PyTorch, 2023). Boldfaced
classify all 96,692 images in the contaminated data set. text signifies the best result per metric.
Top 1 Top 5
accuracy accuracy No. of
Architecture (%) (%) Error (%) parameters
low‐income sectors, such as cocoa, cassava (Manihot
esculenta Crantz), or coffee (Coffea L. spp.) cultivation, is ResNet18 69.758 89.078 — 11.7 M
restricting. This restriction means that producing a model VGG16 71.59 90.38 13.78 138 Ma
that is optimised for runtime speed at inference is a vital
EVOCNN — — 7.28 6.52 M
factor, and growing neural networks with evolutionary
a
algorithms may be an ideal way to achieve this. Number of parameters for VGG16 was misreported by Sun et al. (2020) as 26 million.
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

offer a very low error rate on this simpler problem and has a modest resources for training. For most object‐detection use
very low number of parameters when compared with other cases in plant pathology, Faster R‐CNN will be the optimal
modern architectures (Tables 1 and 2). Overall, it seems that choice. Mask R‐CNN extends Faster R‐CNN by adding the
evolved neural networks are not yet ready to tackle the more ability to predict a mask in a bounding box, enhancing its
difficult problems in plant pathology, and so more work is utility for semantic segmentation tasks. YOLO is most
required in this area. suitable for real‐time object detection but offers lower
precision than Faster R‐CNN. It is not suitable for use in
plant pathology unless inference time is of primary concern.
Architecture comparison and DETR and Deformable DETR present a novel approach to
recommendations object detection and offer competitive results (Zhu et al.,
2021). However, implementing these architectures can be
The field of CV has produced a numerous and diverse set of difficult and they require substantial GPU VRAM for
architectures, each with unique strengths and weaknesses. training.
Here, we will compare these architectures, focusing on their The choice of CV model architecture for a given project
application in image classification, object detection, and depends on a variety of factors, including data set size,
semantic segmentation. Table 2 gives a detailed breakdown signal‐to‐noise ratio, computational resources, mode of
of the pros and cons of each of these architectures, as well deployment, and accuracy requirements. However, at
the number of trainable parameters, which acts as a proxy present, for most use cases in plant pathology, ResNet18,
for model complexity, and the number of giga floating point ConvNeXT Tiny, or Faster R‐CNN will yield the best results
operations (GFLOPS), which gives a sense of computation while minimising computational cost, risk of overfitting,
cost of running inference with these architectures. and the financial cost of training.

Image classification architectures Image, batch, and layer normalisation

ResNet introduced the concept of skip connections, enabling In a comparison of EVAL‐COVID (Gong et al., 2021) with
the training of much deeper models. Despite its age, ResNet other strong competitors like EVOCNN to detect COVID‐
remains a strong competitor, and ResNet18 is probably still the 19 with evolved CNNs, it was shown that the overuse of
best choice for most small projects with fewer training batch normalisation (BN) can be deleterious to the training
examples. EfficientNetV2 (Tan and Le, 2021) is more of DNNs for disease diagnosis. While BN often improves
computationally demanding than equivalent ResNet and the training time of CNNs and can negate the need for small
ConvNeXT variants, and while it tends to yield high accuracy learning rates and dropout (Ioffe and Szegedy, 2015), its
on large data sets (Dosovitskiy et al., 2021; Liu et al., 2022), we negative effect on the diagnosis of disease was also observed
found that it is prone to overfitting, making it a less favourable in Case Study 3 (Box 3).
choice. The key innovation of EfficientNet was to allow the Several state‐of‐the‐art generative models now omit BN
depth, width, and resolution of the model to be scaled by entirely, while others replace it with weight normalisation or
adjusting a single coefficient (Tan and Le, 2020). However, in focus on fine‐tuning the momentum hyperparameter of BN
practice this requires editing the source code, thus rendering layers (Vahdat and Kautz, 2021). As with simply removing
such adjustments less than convenient. ConvNeXT is an the BN layers of a ResNet, reported above, replacing BN in
updated version of ResNet, incorporating several modern ResNet with the alternative layer normalisation (LN) also
features. Unlike EfficientNet, ConvNeXT is easy to scale, results in worse performance (Wu and He, 2018). However,
making it a promising choice for medium‐ to large‐scale when the developers of ConvNeXt used LN as opposed to
applications, for which it has been shown to give superior BN in their architecture, they observed that the model had
performance to ResNet and VIT (Liu et al., 2022). As the first no difficulty in training with this substitution (Liu
transformer to perform favourably against CNNs for image et al., 2022). The BN momentum hyperparameter is a fixed
classification, VIT represents a significant milestone. However, weight applied to the running mean and variance calcula-
image classification may not be the optimal use case for tions that are tracked during training and used during the
transformer architectures, and at present ConvNeXT outper- application of BN at inference time. Thus, adjusting the BN
forms VIT while requiring less data for training and being less momentum will not affect the effect training (Vahdat and
computationally expensive (Dosovitskiy et al., 2021). Kautz, 2021). However, BN can cause the output of a layer
to be slightly shifted during evaluation. A proposed solution
to this is to optimise the momentum hyperparameter for
Object detection and semantic segmentation each data set (Vahdat and Kautz, 2021).
architectures In this section, we have listed a host of reasons why the
unnecessary normalisation of data is to be avoided. While
Although more complex than YOLO, and arguably DETR, BN will shorten the training time for a CNN, it changes
Faster R‐CNN delivers excellent results and requires only the input data in unpredictable ways, thus worsening
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 9 of 21

T A B L E 2 Pros and cons of popular model architectures for image classification, object detection, and semantic segmentation. Ranges of values
represent the smallest and largest off‐the‐shelf versions available.
Architecture No. of parameters GFLOPS Pros and cons
a
Image classification ResNet (2015) 12–60 M 1.8–11.5 Pros
ResNet18 is the smallest and most computationally efficient
model here
ResNet18 is ideal for modestly sized data sets
ResNet152 performs comparably with transformers such
as VIT
Widely used and tested

Cons
Uses batch normalisation, which can introduce instability and
inconsistent results

EfficientNet‐ 22–119 M 8.4–56.1 Pros


V2a (2019) Allows the depth, width, and resolution of the model to be
scaled with a single coefficient

Cons
Scaling requires editing the source code
Evaluation using Grad‐CAM (Selvaraju et al., 2017) showed
much overfitting, despite high test scores

ConvNeXTa (2022) 29–198 M 4.5–34.3 Pros


Reported to outperform any architecture here and requires
much less data than VIT
Scaled easily by editing the convolutional block settings
Incorporates several modern features such as GELU, stochastic
depth, and layer normalisation

Cons
The smallest off‐the‐shelf configurations are too large for many
projects and may overfit
Potential compatibility issues with conversion to ONNX format

ViTa (2021) 87–634 M 17.6–1016 Pros


If trained on millions of images, VIT may slightly outperform
ResNet152

Cons
Requires huge data sets to outperform CNNs
Computationally expensive to train and run at inference

Object detection and Faster R‐CNNa (2015) 44 M 280.4 Pros


semantic segmentation Generally gives higher mean average precision than YOLO
Performs better than YOLO on small objects

Mask R‐CNNa (2017) 46 M 333.6 Cons


More computationally expensive than YOLO
Does poorly when objects overlap

YOLOb (2016) 7M 1.01 Pros


Extremely fast inference time
Fast to train
Very easy to implement

Cons
Performs poorly on small objects
Gives the least accurate results of the three architectures
listed here

(Continues)
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

TABLE 2 (Continued)

Architecture No. of parameters GFLOPS Pros and cons


b
DETR (2021) 40 M 11.2 Pros
Negates the need for region proposal and non‐max suppression
Performs better than Faster R‐CNN and YOLO for overlapping
objects
As opposed to classification, transformers such as DETR show
promise in object detection
Faster at inference than Faster R‐CNN

Cons
Very computationally expensive to train
Slow to converge in training
Requires huge amount of training data
Can be challenging to implement
Requires a large batch size to achieve stable training

Note: CNN = convolutional neural network; DETR = Detection Transformer; Faster R‐CNN = Faster Region‐Based Convolutional Neural Network; GELU = Gaussian error linear
unit; GFLOPS = giga floating point operations; Mask R‐CNN = Mask Region‐Based Convolutional Neural Network; ResNet = Residual Neural Network; ViT = Visual
Transformer; YOLO = You Only Look Once.
a
Values for the number of trainable parameters (displayed in millions [M]) and GFLOPS were obtained from the PyTorch documentation (PyTorch, 2023).
b
Values for the number of trainable parameters (M) and GFLOPS for YOLO and DETR were calculated using an image 224 × 224 pixels for this comparison.

classification results. However, at present, the best off‐the‐ accuracy. This effect is likely due to the plain white
shelf CNN that is small enough to run on an older‐model background of the lab images causing the model to
smartphone is ResNet18. So, until a more suitable architec- generalise poorly to real‐world applications. This ex-
ture becomes available, BN is unavoidable. We also show emplifies the importance of curating a realistic, high‐
here and in Case Study 4 (Box 4) that optimisation of the quality training data set. By naively training and releasing
BN momentum hyperparameter in ResNet18 leads to a models that are trained on publicly available data sets, we
slight improvement in the results of our cocoa disease risk exacerbating the problems of disease mis-
detection model, that image normalisation should not be classification. At low frequencies, the effect of misla-
included in the training pipeline of a model that aims to belled, misleading, or uninformative data will have
make predictions from subtle colour features, and that limited effect on the performance of a neural network.
excessive image input size is deleterious to classification This feature of neural networks is largely an artifact of
accuracy. batch gradient descent and the learning rate (Motamedi
et al., 2021), which act to greatly buffer the effect of
infrequent misclassifications in the training data. At
DATA ACQUISITION A ND MODEL higher frequencies, these sources of error can have more
TESTING serious consequences. The most obvious solution to this
problem is to carefully curate, label, and annotate the
In this section, we review various interdisciplinary methods training data. However, errors resulting from mis-
available for gathering a training data set and developing a classification can be challenging to eradicate. For
suitable model. While the previous section concerned the example, frosty pod rot, black pod rot, and witches’
theory of ML in CV, this section will focus on practicalities broom disease in cocoa can all present with black or
with respect to low‐cost solutions. brown lesions on the pod, and both frosty pod root and
black pod root can both coat a pod in white mycelium.
This means that without sufficient training in plant
Obtaining the required training data set pathology or access to diagnostic tests, one could easily
mislabel these diseases. This problem can be solved by
Training an image classifier to a high accuracy in a two means, which should be used in tandem: (1) by
controlled laboratory environment is often a trivial task. paying careful attention to detail and applying detailed
However, such a model may perform poorly when knowledge of the pathogen in question, and (2) using
presented with the challenges of the real world (Singh tools and techniques from molecular biology and
et al., 2020). For example, after training a leaf‐disease spectroscopy to better inform model development and
classifier on images taken in the field, the model subsequent disease detection. Such techniques/tools
performed with around 68% accuracy when tested include DNA sequencing, real‐time quantitative PCR
against images taken in the lab (Ferentinos, 2018). (qPCR), loop‐mediated isothermal amplification
However, when trained in the lab and tested in the field, (LAMP), MultispeQ (PhotosynQ, East Lansing, Michi-
the same model architecture performed with about 33% gan, USA), and hyperspectral imaging (HSI).
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 11 of 21

BOX 3 Case Study 3: Disease detection and BOX 4 Case Study 4: Optimisation of BN
normalisation. momentum and image size for cocoa disease
detection.

Here, we conducted an ablation study with ResNet18 and


ConvNeXt Tiny (Table 3) to assess the effects of image While training a cocoa disease detection model, we ran a
normalisation (IN), batch normalisation (BN), and layer hyperparameter optimisation sweep using the Weights and
normalisation (LN) in disease detection. Using BN in Biases platform (WANDB; San Francisco, California, USA),
ResNet18 increased the training speed by 2.39 times, while which included the BN momentum hyperparameter and
IN slowed training by 1.74 times. IN did not affect training image input size (Figure 3). The model architecture used was
time in ConvNeXt Tiny. We also found that BN improved ResNet18 (He et al., 2016), and the data set included the
stability in training, as assessed by plots of training and following four classes: black pod rot, frosty pod rot, healthy
validation loss. However, IN decreased the F1 score by 0.76% cocoa, and witches’ broom disease with a 90:10 training:va-
and 0.34% in ConvNeXt and ResNet18, respectively, and lidation set split and a training set size of 271, 266, 436, and
increased overfitting. The removal of BN in ResNet18 92 images, respectively. One hundred models were trained
decreased the F1 by 1.92%, but ConvNeXt (in which BN is with these hyperparameters randomly sampled from pre-
replaced with LN) had an F1 score 2.84% higher than defined ranges (image size: 124:1224 pixels, BN mom.: 0, 10e‐
ResNet18 with BN. Therefore, simply deactivating the BN 5:0.9). We also used WANDB to run a random forest
layers in ResNet18 led to worse results in every metric. regression with the validation F1 as the dependent variable
However, the use of LN instead of BN in ConvNeXt appears and the two hyperparameters as independent variables. From
to have had no deleterious effect. this, an importance score was calculated for each hyperpara-
The removal of the IN transformation, which occurs meter on a scale of 0–1. The highest‐performing model
prior to data input, improved the performance of both scored validation F1 = 0.75 and AUC = 0.87. Additionally,
model architectures for the purpose of disease detection in the per‐class F1 score for healthy cocoa was 0.88, showing a
all metrics, including training time and overfitting. These strong ability to detect non‐specific disease. While the
results are unsurprising if we consider the effect of image importance of image size (0.694) is not surprising, the BN
normalisation shown in Figure 2. Here, IN distorts the momentum score (0.306) is quite low. This casts doubt on
colour of the cocoa pods and obscures much of large lesions the assertion above that optimisation of BN momentum can
that are clearly visible in the original images. This effect may have much impact in lessening the deleterious effects of BN.
not prevent a CNN from identifying these objects as cocoa However, this result and that of the optimised BN
pods or trees by their shape, but it does obscure many subtle momentum value (0.001) (Figure 3A) suggests that this
disease symptoms that are necessary for the detection of hyperparameter should be optimised, rather than relying on
early disease development. The above ablation study was the default value of 0.1. Training the same model with a BN
conducted with a data set of late‐stage disease, from which momentum set at 0.1 yielded an F1 score of 0.737, i.e., a 1.3%
the Figure 2 images were sampled. So, if early disease decrease relative to the optimised value.
detection were required, the differences between these This study also provides an optimised image input size for
methods may have been more pronounced. Additionally, mid‐ to late‐stage disease detection, using ResNet18, of 277 ×
BN has been observed to introduce unacceptable levels of 227 pixels (Figure 3B), although this should be optimised for
error when the batch size is small (Wu and He, 2018). This each use case. Image compression was previously said to have
is an important issue to consider for generative models, CV minor effects on disease detection (Barbedo et al., 2016), while
models with video or high‐resolution images, or when elsewhere it is suggested that image compression should be
computational resources are limited. avoided completely for small symptoms (Barbedo, 2016) or
kept above an arbitrary 1 megapixel (1000 × 1000 pixels)
(Steddom et al., 2005). However, with the present data set,
which contains images of diseases at varying degrees of
progression, using an image size greater than 277 × 277 was
Tools from molecular biology
deleterious to the validation F1 score. This is in addition to the
reduced image size providing faster runtime in training and
DNA sequencing is now commonly used for the identifica- inference, and a reduction in overfitting.
tion of cryptic species (Bickford et al., 2007; Ovaskainen
et al., 2010) and plant pathogens (O'Donnell et al., 2015)
and is an invaluable tool. Once sequenced, reads can be used
to search previously categorised sequences with the Basic
Local Alignment Search Tool (BLAST) from the National RNA genes, which are both highly conserved across taxa
Center for Biotechnology Information (Boratyn et al., 2013) and highly variable between species. Such regions of the
to identify a sample at the level of species or another genome can be amplified using PCR or LAMP to detect a
taxonomic group. However, if we know which pathogen(s) pathogen or identify it with relatively low cost and high
we expect to detect, sequencing the whole genome is accuracy. The ITS region is often sequenced on its own for
excessive. Rather, we can use loci such as the internal near–species level identification or in concert with other loci
transcribed spacer (ITS) region of the nuclear ribosomal for better specificity (Horton and Bruns, 2001). Such work
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

TABLE 3 Results of an ablation study assessing the effects of image normalisations and batch normalisation on a model's ability to detect plant disease.
Image norm. Batch norm. Layer norm. Train time (min) Loss (%) Accuracy (%) Recall (%) Precision (%) F1 (%)
ConvNeXt Tiny

No — Yes 1344 0.290 88.25 88.25 88.82 88.14

Yes — Yes 1368 0.322 84.51 87.51 88.14 87.38

ResNet18

No Yes — 739 0.361 85.41 85.41 86.17 85.30

Yes Yes — 1088 0.380 85.14 85.18 85.68 84.96

No No — 1764 0.412 83.49 83.49 84.05 83.38

F I G U R E 2 Image normalisation in the analysis of cocoa disease using computer vision and machine learning. (A) Original and (B) normalised images
of cocoa pods showing various stages of frosty pod rot, black pod rot, and witches’ broom disease development. Note the effect of normalisation on one's
ability to see disease symptoms. The normalisation of pixel values was carried out with the following means and variance values: mean: (0.485, 0.456, 0.406),
variance: (0.229, 0.224, 0.225).

with ITS is now ubiquitous in the molecular study of fungal detection. If detection is the only goal, colour‐ or turbidity‐
ecology and phylogeny, while previous techniques relied on based methods can be used to detect DNA presence by
the morphology of fruiting bodies for identification (Horton visual inspection. A drawback of this method is that any
and Bruns, 2001). pre‐existing PCR primers cannot be used. This is because,
qPCR is used to detect asymptomatic disease across the while PCR primers are designed to amplify a specific region
agricultural industry (Luchi et al., 2020). Traditionally, PCR of complementary DNA, LAMP primers bind to multiple
was unsuitable for portable operations or use in the field regions of the target DNA in a way that allows for the
(Ray et al., 2017). However, rapid real‐time PCR in the field simultaneous amplification of multiple regions of the DNA.
is now possible (Schaad and Frederick, 2002). Real‐time While universal PCR primers for the ITS region exist, it
PCR can also be used to quantify the relative levels of a may be necessary to design LAMP primers or species‐
pathogen in plants (Horevaj et al., 2011). Information from specific PCR primers for ITS or other regions. For a detailed
such analyses could be extremely informative when fine‐ discussion of the use of ITS amplification in fungal ecology
tuning and assessing the performance of the models and the potential pitfalls of specific ITS primer design, see
discussed here. Horton and Bruns (2001).
LAMP can be used in place of qPCR and has four key If novel primers are to be designed, the region of
benefits: (1) it is considerably cheaper (£211or $256 USD for interest must first be sequenced, and if we aim to identify
100 samples) because a thermal cycler is not required; (2) it a currently unknown pathogen with BLAST, all of the
is fast; (3) the reagents do not need to be refrigerated; and DNA in a sample must be sequenced. Sequencing with the
(4) like real‐time PCR, there is potential for it to be used in Oxford Nanopore Technology (Oxford, United Kingdom)
the field. Like qPCR, LAMP can be used to quantify the MinION platform can be an ideal tool for this purpose,
relative amount of DNA present, as well as simply for offering multiple features: (1) The Oxford Nanopore
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 13 of 21

Spectroscopy and HSI


Although not capable of specific disease diagnosis, the Multi-
speQ is an important low‐cost tool to consider in the context of
disease detection in the absence of visible symptoms. This
handheld plant phenotyping device can be used to indicate the
non‐specific presence of plant disease at an extremely low cost
(Kuhlgert et al., 2016). The MultispeQ operates similarly to
photospectroscopy and measures environmental conditions
such as light intensity, temperature, and humidity. It can also
be used to measure photosystem II quantum yield, which is an
indicator of plant health, and to detect non‐photochemical
exciton quenching, which has been shown to have a significant
negative correlation with disease index (Kuhlgert et al., 2016).
A highly informative technique that we can utilise in the
prediction of plant disease with CV is to sample more
continuously from the electromagnetic spectrum with HSI.
As with the MultispeQ, HSI enables the detection of
changes in the chemical composition of biological tissue in
terms of conditions such as ripeness or disease status change
(Bock et al., 2010). The term “spectral signature” is used to
describe the pattern of electromagnetic radiation reflected
by a subject. However, particularly in the case of biology,
the term “signature” is misleading as biological samples
often have highly heterogeneous reflectance spectra (Bock
et al., 2010). All of the above‐mentioned CV studies applied
ML techniques to RGB images. RGB images capture three
discrete bands of the visible spectrum from 400–700 nm.
Black‐and‐white digital images have two spatial dimensions
and a single dimension that describes the darkness of each
pixel on a scale of 0–255, whereas RGB images have three
F I G U R E 3 Results of a hyperparameter optimisation sweep training colour dimensions represented by values between 0–255,
100 ResNet18 models for disease detection in cocoa trees. This was each describing the intensity of red, green, or blue light.
performed with variable batch normalisation (Batchnorm) momentum (A) Hyperspectral images, however, store a more complete
and a varying square image input size (B). The optimisation sweep
randomly sampled from distributions of the two variables concurrently.
reflectance spectrum for each pixel, while also maintaining
Beginning with the ImageNet1KV2 weights, the models were trained on a spatial relationships. The spectral range of these images can
data set of 1065 images of the following four classes: black pod rot (271 be as wide as 300–2500 nm (Goetz et al., 1985).
images), frosty pod rot (266), witches’ broom disease (92), and healthy Although the applications of hyperspectral photography
cocoa (436). The optimised validation F1 score was 0.75. have long been explored by the National Aeronautics and
Space Administration, this technology is only now becoming
affordable for use in industries such as agriculture. Commer-
Technology field sequencing and library preparation kit cially available cameras capable of capturing data from the
allows for sequencing in the field, immediately after tissue 300–2500 nm range remain expensive, and more typically
samples are gathered, which eliminates the need for cold‐ the cameras used only sample 330–1100 nm (Table 4). Despite
chain storage to avoid sample degradation. (2) It allows the reduced spectral range of the cheaper cameras, they still
for high‐quality sequencing in countries where Illumina provide orders of magnitude more data than RGB cameras,
(San Diego, California, USA) sequencing is not available. although much of these data are highly correlated.
(3) It is slightly cheaper than using the Illumina platform. The uptake of HSI has recently exploded in a host of
(4) The long read length eliminates amplification bias fields including archaeology, art conservation, food safety,
(Goodwin et al., 2015). The avoidance of amplification medicine, and crime scene investigation (Lu and Fei, 2014).
bias is important for gene expression quantification, Typical applications of HSI in agriculture include the
which is relevant to the discussion in the section estimation of yield (Gutiérrez et al., 2019; Li et al., 2020),
“Inspecting informative features.” On the other hand, assessment of vigour (Feng et al., 2018), remote weed
the MinION 1B requires a high‐spec computer and, at a identification (Okamoto et al., 2007), nutrient status (Nguyen
cost of £98 or $119 USD per sample excluding library et al., 2020), and disease monitoring (Pan et al., 2019).
preparation, the use of this platform also remains too The analysis of HSI data presents problems that are familiar
expensive for many projects. to ML engineers and nowadays are solved routinely. These
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

T A B L E 4 Specifications and use cases for the hyperspectral cameras used in the following studies: Okamoto et al. (2007), Feng et al. (2018), Gutiérrez
et al. (2019), Pan et al. (2019), Li et al. (2020), and Nguyen et al. (2020).
Make/modela Task Spectral range (nm) Spectral bands Spectral resolution (nm)
Resonon Pika II Vis‐NIR Mango tree yield estimation 390–890 244 2

Headwall Nano‐Hyperspec w/pushbroom Potato yield estimation 400–1000 272 6

Specim ImSpector N17E Maize kernel vigour 874–1734 NA 5


assessment

Specim ImSpector V10 Weed identification 400–1000 240 10

BaySpec OCI‐UAV‐1000 w/pushbroom Nutrient assessment in rice 460–983 116 5

Specim ImSpector V10E Disease monitoring in pears 328–1115 1002 2.8


a
Manufacturer locations are: Resonon (Bozeman, Montana, USA), Headwall Photonics (Bolton, Massachusetts, USA), Specim (Oulu, Finland), BaySpec (San Jose,
California, USA).

problems include the large size of HSI hypercubes, high augmentations or inductive biases may be to solve the
dimensionality, high intra‐class variability, and high correlation former list of issues (Schölkopf et al., 2021), the latter issues
between spectral bands. Many approaches have been taken to will only be solved by truly understanding how our models
analyse these data and, for a long time, SVMs were the most are making predictions.
widely used (Yue et al., 2015). Today, DNNs are commonly Although DNNs are still considered black box
used to analyse these data as they are particularly well suited to optimisers, much work has been done to understand their
the task of classification with HSI data. DNNs are able to isolate various facets and potential foibles. For example, each
hidden and complex data structures, can utilise a great variety of dense layer of a CNN has been shown to have distinct role
data types, are flexible in their architectures and the complexity in feature‐level extraction and generalisability (Yosinski
of the mathematical functions they can apply, and are ideally et al., 2014), and the output of convolution layers have
suited to distributed computing (Paoletti et al., 2019). As such, been visualised to show which physical features in an
with the addition of dimension‐reduction techniques such as image were more exaggerated (Zeiler and Fergus, 2014).
principal component analyses (Yue et al., 2015), the analysis of In a similar study, a host of predefined layer‐wise and
HSI data with DNNs, although more computationally demand- neuron‐wise visualisation techniques were applied to a
ing, becomes little more complex than such analyses of RGB CNN that had been trained on images of plant disease
image data. (Toda and Okura, 2019). This work showed that the CNN
While the field of CV is advancing at a rapid pace, so too in question was indeed using visible symptoms of disease
are the fields of molecular biology and spectroscopy. The that were similar to those used by human experts. Others
use of tools and knowledge from these fields will allow have sought to learn how best to actively deceive or
projects of various budgets to go beyond the simple manipulate a DNN into misclassification. Working within
application of CNNs to RGB images and, in doing so, the remit of cybersecurity, it was shown that image
model disease in greater detail with tangible biological classifiers based on SVMs and DNNs could easily be
explications of model behaviour. deceived with a simple evasion algorithm (Biggio
et al., 2013). This shows how brittle these classifiers can
be and highlights the importance of adopting techniques
Model testing that rely more heavily on causal inference, such as semi‐
supervised learning (Peters et al., 2017) or semantic
The black box of DNNs segmentation. It also highlights the importance of
rigorous and conciliatory interrogation of models prior
It is well known how poorly current CV models deal with to deployment. At present, our methods of model
unexpected edge cases and shifts in test data distribution evaluation are widely considered insufficient, and much
(Schölkopf et al., 2021). However, in applying CV to plant more work is needed in this area.
pathology and agriculture, we encounter more cases than
most ML practitioners where the test data does not align
well with the training data. These problems arise routinely Inspecting informative features
in CV from the effects of camera blur, image quality, or
shifting camera angle. However, in plant pathology we must A key benefit of the use of CNNs is feature learning. This
also contend with the perturbations of weather, climate, is the process by which a model will define for itself
plant growth stage, crop variety, a plant's developmental which features of a data set it considers informative
response to growing conditions, and so on. While it remains (Voulodimos et al., 2018). In other CV algorithms, an
contentious how robust of a fix techniques such as data engineer must handcraft descriptive features of a subject
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 15 of 21

manually, using their expertise and/or diagnostic tools to using a reference image that does not contain the feature of
guide them. In this latter case, pre‐processed data are interest. Applying these methods to misclassified images can
used rather than raw data, as in a CNN. In the highlight why a model is performing suboptimally (Toda
convolution layers of a CNN however, kernels and and Okura, 2019), as results produced with these methods
attention weights are applied to raw or augmented image have been shown to be highly correlated with assessments of
data that emphasise informative physical features, and plant disease made by human experts (Ghosal et al., 2018).
apply inductive biases and self‐attention before these
data are passed to the dense layer(s) of the network
(O'Mahony et al., 2020). We might assume that these A RO A D M A P T O C O M M E R C I A L
physical features would include those that humans IMPLEMENTATION
consider to be the obvious visible markers for plant
disease, such as the presence of lesions on a leaf. Once you have developed, trained, and evaluated your
However, it is likely that these networks will also identify model, it is time to begin the process of implementation.
markers that humans do not notice or cannot perceive, However, it is best to have considered and planned this step
and may ignore some features that plant pathologists well ahead of time. There are several decisions made during
have long considered important. This provides us with development that may depend on the intended mode of
the opportunity to learn more about how to identify implementation. For example, if the model is to be run on
disease early with human vision, CV, and molecular an edge device or smartphone, computation cost must be
biology. Time‐series qPCR, transcriptome, or metabo- kept to a minimum. Likewise, if the model is to be made
lome data can be used to identify the biological markers available via a rented server, reducing computational cost
used by CNNs at the earliest moments of detection. This will reduce financial cost. Prior to training, choosing to use
would allow for the validation of the image features used architectures such as ResNet18 and MobileNetV3 (Howard
by the model. Such a biological explanation of the et al., 2019) will help to keep computational cost down and,
model's informative features would tell us whether the after training, methods such as pruning and quantisation
model were making correct inferences for what we may reduce this cost further. While Google Colab (Google,
consider to be correct reasons, or whether it were correct Mountain View, California, USA; https://colab.research.
for spurious reasons, which would suggest a poor ability google.com/) offers free limited access to GPUs for model
to generalise stemming from naive inductive reasoning. training, the rental cost of a 16‐GB Nvidia V100 GPU
Such work may also highlight new ways to identify (Nvidia, Santa Clara, California, USA), which would be the
disease with or without ML or new ways of combating minimum needed to train a transformer model or large
disease spread through phytosanitation, agrochemistry, CNN, is $2.48 USD per hour. As such, developing and
or plant breeding. training such large models for days, or even weeks, can soon
In recent years, the combination of CNNs and become expensive.
transcriptomics in medical research has seen a surge in ONNX Runtime (Microsoft Corporation, Redmond,
popularity. Such studies involve spatial transcriptomics Washington, USA; https://onnxruntime.ai/) offers a huge
(Chelebian et al., 2021; Xu and McCord, 2021), the array of tools to help accelerate, quantise, and deploy
identification of non‐small‐cell lung cancer subtypes (Yu trained DNNs. Such models can be incorporated into
et al., 2020), and the elucidation of the various functions of Android or iOS apps using the phone's built‐in camera,
drugs (Meyer et al., 2019). In plant science, CNNs have been and they can be deployed via the web, on edge devices
applied alongside transcriptomics in the investigation of such as a Raspberry Pi, or in embedded systems for drone
gene regulation in Arabidopsis thaliana (L.) Heynh. mapping or smart irrigation. However, the operator
(MacLean, 2019). However, the investigation of the black schemas supported by ONNX Runtime must be con-
box nature of CNNs by means of omics appears to be sidered here. For example, ConvNeXT, which uses
completely absent from the literature. Gaussian error linear units (GELUs) and stochastic
Attention maps produced by software like Grad‐CAM depth, may cause problems as these operators are not
(Selvaraju et al., 2017; Wang et al., 2019) are another way to yet supported. TensorFlow (Abadi et al., 2015) also offers
inspect informative features of image data. Grad‐CAM a pipeline for model deployment, and the PyTorch toolkit
produces an explanation for the decision that a model for techniques such as quantisation aware training and
makes about a given image by visually highlighting the model compression is maturing but presented difficulties
informative features of that image. Grad‐CAM is described when we attempted to use it. By contrast, the ONNX
as “gradient‐based” as it uses the gradient data that is fed Runtime pipeline is extremely easy to use and supports
into the last convolution layer of a CNN. This allows us to all popular model formats, including PyTorch, Tensor-
make assessments before the spatial relationships in the data Flow, and SciKit Learn (Pedregosa et al., 2011). While the
are lost in the fully connected layers (Selvaraju et al., 2017). latest methods of pruning are reported to achieve a 30%
Alternative “reference‐based” systems, such as DeepLIFT, reduction in the size of ResNet18 with only a 2% loss in
rely on back‐propagation (Shrikumar et al., 2017) or accuracy on ImageNet (Solodskikh et al., 2023), this
forward propagation (explanation map) (Ghosal et al., 2018), remains an active area of research, producing inconstant
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

results. There is no guarantee that pruning will lessen detection model resulting in an outbreak of disease could
computational cost. Techniques such as training‐aware have very serious consequences. It is therefore vital that we
pruning show promise but require further research. rigorously test the models we develop to ensure that they are
For the implementation of object detection or segmen- not prone to misclassification born of overfitting and naive
tation models, we recommend the Detectron2 library from generalisations. While metrics such as accuracy, F1, area
Facebook (Wu et al., 2019). This library incorporates Faster under the receiver operating curve (AUC), recall, and
R‐CNN, Mask R‐CNN, and some new transformer models precision are valuable, DNNs are often capable of learning
such as ViTDet, and offers a host of tutorials on the whole to optimise these summary statistics indirectly, rather than
process from training to implementation. learning to produce reliable predictions. Tools such as
confusion matrices, cross validation, and explanation maps
go much further in understanding the behaviour of CV
CO NC LU SIONS models. However, it is important that we invest in the
development of new and tailored means of understanding
We describe here all of the tools necessary to develop highly these models, such as the application of omics, as discussed
optimised and robust ML models that use minimal in the section “Inspecting informative features.”
computational power and provide real benefit to sectors If we apply our wealth of knowledge and proven
that have more modest budgets. The application of these techniques from botany and agronomy to the acquisition of
tools will allow us to break from the common trend in the training data, the development of data‐processing pipelines,
ML industry, where expensive hardware is employed to and the interrogation of trained models, we can produce
develop complex and computationally expensive models to applications with game‐changing potential. We are now
the detriment of improving training data quality. only 27 years away from a predicted global population of 9.7
With the application of off‐the‐shelf architectures to billion people (United Nations, 2022). Thus, with the
stock data sets, such as the PlantVillage data set devastating effects of the climate crisis already very much
(Geetharamani and Pandian, 2019), we can easily achieve apparent, it is vital that we act now to build robust
prediction accuracy scores in the high 90% range (Thapa international infrastructure targeted at securing food
et al., 2020). However, such models have little value because supplies and eliminating extreme poverty. The techniques
they will not generalise to complex real‐world environments discussed here may enable us, as a community of growers,
due to the simplicity of the training data and a high botanists, and ML developers, to help reduce poverty,
likelihood of overfitting. improve the relationship between growers and the natural
We offer the following recommendations for the environment, and increase stability in the agriculture
development of efficient, inexpensive, and robust CV industry from the foundation up.
models for plant pathology.
Garbage in, garbage out: The thoughtless application of AUTHOR CONTRIBUTIONS
advanced models to poorly labelled, simplistic, contami- J.R.S. conceived of this review, read and summarized the
nated, or inappropriately transformed data will yield models literature, and wrote the first draft of the manuscript. K.J.D.
that have little value in the field, with slow inference times, and D.W.F. continually reviewed and edited the manuscript.
poor accuracy, and an inability to generalise. To avoid this All authors approved the final manuscript.
fate, we should: (A) where possible, consult with specialists
and utilise the invaluable tools from biology, chemistry, and ACKNOWLEDGMENTS
spectroscopy to label data; (B) use the minimum appropri- This work was made possible by funding from the Doctoral
ate image input size to improve runtime speed and help Centre for Safe, Ethical and Secure Computing at the
avoid overfitting; and (C) avoid needless data transforma- University of York.
tions such as normalisation, which can alter data in
unreliable ways. DA TA AVAILABILITY STATEME NT
The potential in training procedures: Techniques such The image data, annotations, and link to the accompanying
as semantic segmentation and semi‐supervised learning GitHub repository for Case Study 1 can be found at: https://
have potential to lessen both bias and variance in a model's osf.io/79kx3/?view_only=4a2c1dccee1a4baeb85de5002c702f10.
predictions by promoting deductive reasoning over induc- For Case Study 2, the data used to train the initial supervised
tive reasoning. Additionally, appropriately scaled CNNs and model, the .csv search terms file for the below web scraper, and
evolved neural networks offer the potential to produce the final semi‐supervised model weights can be found at:
models with optimised runtime speed and improved https://osf.io/h5gj7/?view_only=dbf9f245e21a41e185f5b73e718
generalisation ability. b4cad. The “contaminated” data used to train the semi‐
Robust and conciliatory interrogation of models: supervised model were generated using the code at: https://
While simpler modelling methods, such as SVMs, still have github.com/jrsykes/Google-Image-Scraper. The custom code
a role to play in modern CV, most of the models we employ used to train both the initial model and the final semi‐
for this purpose are exceedingly complicated and are prone supervised model can be found at: https://github.com/jrsykes/
to failing in equally complicated ways. Failure of a disease CocoaReader/blob/main/PlantNotPlant. The custom code and
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 17 of 21

accompanying Readme.md used to conduct Case Study 3 can Boratyn, G. M., C. Camacho, P. S. Cooper, G. Coulouris, A. Fong, N. Ma,
be found in the following GitHub repository: https://github. T. L. Madden, et al. 2013. BLAST: A more efficient report with
com/jrsykes/CocoaReader. The data for this study were usability improvements. Nucleic Acids Research 41: W29–W33.
https://doi.org/10.1093/nar/gkt282
scraped from the internet using the code in the following Cao, S., and R. Nevatia. 2016. Exploring deep learning based solutions in
GitHub repository: https://github.com/jrsykes/Google-Image- fine grained activity recognition in the wild. In Proceedings of the
Scraper. The location of the accompanying “.csv search terms 2016 23rd International Conference on Pattern Recognition (ICPR),
file” is described below. The custom code to run the sweep in 384–389. Cancun, Mexico. https://doi.org/10.1109/ICPR.2016.
7899664
Case Study 4 can be found in the following GitHub repository:
Carion, N., F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
https://github.com/jrsykes/CocoaReader/tree/main/CocoaNet. S. Zagoruyko. 2020. End‐to‐end object detection with transformers.
The main script is titled CocoNetsweep_min.sh and the wandb In A. Vedaldi, H. Bischof, T. Brox, J.‐M. Frahm [eds.], Computer
config file is titled CocoaNetSweepConfig_min.yml. The data Vision – ECCV 2020, Lecture Notes in Computer Science, 213–229.
used to generate these results and the full wandb report can be Springer International Publishing, Cham, Switzerland. https://doi.
found at: https://osf.io/2fw6g/?view_only=adc66ba66f83465a org/10.1007/978-3-030-58452-8_13
Chelebian, E., C. Avenel, K. Kartasalo, M. Marklund, A. Tanoglidi,
9e7b111515a60bf2. T. Mirtti, R. Colling, et al. 2021. Morphological features extracted by
AI associated with spatial transcriptomics in prostate cancer. Cancers
ORCID 13: 4837. https://doi.org/10.3390/cancers13194837
Jamie R. Sykes http://orcid.org/0000-0002-0715-8746 Chiu, M. T., X. Xu, Y. Wei, Z. Huang, A. G. Schwing, R. Brunner,
Katherine J. Denby http://orcid.org/0000-0002-7857-6814 H. Khachatrian, et al. 2020. Agriculture‐Vision: A large aerial image
database for agricultural pattern analysis. In Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition,
REFERENCES 2828–2838. Seattle, Washington, USA.
Abadi, M., A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, Chouhan, S. S., U. P. Singh, and S. Jain. 2020. Applications of computer
G. S. Corrado, et al. 2015. TensorFlow: Large‐scale machine learning vision in plant pathology: A survey. Archives of Computational
on heterogeneous distributed systems. Available at https://www. Methods in Engineering 27: 611–632. https://doi.org/10.1007/s11831-
tensorflow.org/ [accessed 29 November 2023]. 019-09324-0
Amer, M., and T. Maul. 2019. A review of modularization techniques in Cohen, Y., and M. D. Coffey. 1986. Systemic fungicides and the control of
artificial neural networks. Artificial Intelligence Review 52: 527–561. oomycetes. Annual Review of Phytopathology 24: 311–338. https://doi.
https://doi.org/10.1007/s10462-019-09706-7 org/10.1146/annurev.py.24.090186.001523
An, J., and S. Cho. 2015. Variational autoencoder based anomaly Dang, D.‐C., A. Eremeev, and P. K. Lehre. 2021. Escaping local optima with
detection using reconstruction probability. Special Lecture on IE 2. non‐elitist evolutionary algorithms. Proceedings of the AAAI
SNU Data Mining Center, Seoul National University, Seoul, Conference on Artificial Intelligence 35: 12275–12283.
Republic of Korea. Deng, J., W. Dong, R. Socher, L.‐J. Li, K. Li, and L. Fei‐Fei. 2009. ImageNet:
Asperti, A., D. Evangelista, and E. Loli Piccolomini. 2021. A survey on A large‐scale hierarchical image database. In IEEE Conference on
variational autoencoders from a green AI perspective. SN Computer Computer Vision and Pattern Recognition, 248–255. Miami, Florida,
Science 2: 301. https://doi.org/10.1007/s42979-021-00702-9 USA. https://doi.org/10.1109/CVPR.2009.5206848
Banerjee, A. 2007. An analysis of logistic models: Exponential family Department of Health, Victoria. 2018. Methyl bromide use in Victoria,
connections and online performance. In Proceedings of the 2007 Community factsheet. Website: https://www.health.vic.gov.au/
SIAM International Conference on Data Mining, 204–215. Society for publications/methyl-bromide-use-in-victoria-community-factsheet
Industrial and Applied Mathematics, Minneapolis, Minnesota, USA. [accessed 26 June 2023].
https://doi.org/10.1137/1.9781611972771.19 Dilokthanakul, N., P. A. M. Mediano, M. Garnelo, M. C. H. Lee, H. Salimbeni,
Barbedo, J. G. A. 2016. A review on the main challenges in automatic plant K. Arulkumaran, and M. Shanahan. 2017. Deep unsupervised clustering
disease identification based on visible range images. Biosystems with Gaussian mixture variational autoencoders. arXiv 1611.02648
Engineering 144: 52–60. https://doi.org/10.1016/j.biosystemseng.2016. [Preprint]. Posted 13 January 2017 [accessed 28 October 2023]. Available
01.017 from: https://doi.org/10.48550/arXiv.1611.02648
Barbedo, J. G. A., L. V. Koenigkan, and T. T. Santos. 2016. Identifying multiple Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
plant diseases using digital image processing. Biosystems Engineering 147: T. Unterthiner, M. Dehghani, et al. 2021. An image is worth 16×16
104–116. https://doi.org/10.1016/j.biosystemseng.2016.03.012 words: Transformers for image recognition at scale. arXiv 2010.11929
Bharati, P., and A. Pramanik. 2020. Deep learning techniques—R‐CNN to [Preprint]. Posted 3 June 2021 [accessed 28 October 2023]. Available
Mask R‐CNN: A survey. In A. K. Das, J. Nayak, B. Naik, S. K. Pati, and from: https://doi.org/10.48550/arXiv.2010.11929.
D. Pelusi [eds.], Computational intelligence in pattern recognition, Fan, Y., G. Wen, D. Li, S. Qiu, M. D. Levine, and F. Xiao. 2020. Video
advances in intelligent systems and computing, 657–668. Springer, anomaly detection and localization via Gaussian Mixture Fully
Singapore. https://doi.org/10.1007/978-981-13-9042-5_56 Convolutional Variational Autoencoder. Computer Vision and Image
Bickford, D., D. J. Lohman, N. S. Sodhi, P. K. L. Ng, R. Meier, K. Winker, Understanding 195: 102920. https://doi.org/10.1016/j.cviu.2020.102920
K. K. Ingram, and I. Das. 2007. Cryptic species as a window on Feng, L., S. Zhu, C. Zhang, Y. Bao, X. Feng, and Y. He. 2018. Identification
diversity and conservation. Trends in Ecology & Evolution 22: of maize kernel vigor under different accelerated aging times using
148–155. https://doi.org/10.1016/j.tree.2006.11.004 hyperspectral imaging. Molecules 23: 3078. https://doi.org/10.3390/
Biggio, B., I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, molecules23123078
G. Giacinto, and F. Roli. 2013. Evasion attacks against machine Ferentinos, K. P. 2018. Deep learning models for plant disease detection
learning at test time. In C. Salinesi, M. C. Norrie, and Ó. Pastor [eds.], and diagnosis. Computers and Electronics in Agriculture 145: 311–318.
Advanced Information Systems Engineering, 387–402. Springer, https://doi.org/10.1016/j.compag.2018.01.009
Berlin, Germany. https://doi.org/10.1007/978-3-642-40994-3_25 Fuentes, S., G. Chacon, D. D. Torrico, A. Zarate, and C. Gonzalez Viejo.
Bock, C. H., G. H. Poole, P. E. Parker, and T. R. Gottwald. 2010. Plant 2019. Spatial variability of aroma profiles of cocoa trees obtained
disease severity estimated visually, by digital photography and image through computer vision and machine learning modelling: A cover
analysis, and by hyperspectral imaging. Critical Reviews in Plant photography and high spatial remote sensing application. Sensors 19:
Sciences 29: 59–107. https://doi.org/10.1080/07352681003617285 3054. https://doi.org/10.3390/s19143054
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

Geetharamani, G., and A. Pandian. 2019. Identification of plant leaf Khanzada, M., A. Lodhi, and S. Shahzad. 2005. Chemical control of
diseases using a nine‐layer deep convolutional neural network. Lasiodiplodia theobromae, the causal agent of mango decline in
Computers & Electrical Engineering 76: 323–338. https://doi.org/10. Sindh. Pakistan Journal of Botany 37: 1023–1030.
1016/j.compeleceng.2019.04.011 Kingma, D. P., and M. Welling. 2019. An introduction to variational
Ghosal, S., D. Blystone, A. K. Singh, B. Ganapathysubramanian, A. Singh, and autoencoders. Foundations and Trends in Machine Learning 12:
S. Sarkar. 2018. An explainable deep machine vision framework for plant 307–392. https://doi.org/10.1561/2200000056
stress phenotyping. Proceedings of the National Academy of Sciences, USA Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2017. ImageNet classification
115: 4613–4618. https://doi.org/10.1073/pnas.1716999115 with deep convolutional neural networks. Communications of the ACM
Goetz, A. F. H., G. Vane, J. E. Solomon, and B. N. Rock. 1985. Imaging 60: 84–90. https://doi.org/10.1145/3065386
spectrometry for earth remote sensing. Science 228: 1147–1153. Kroon, L. P. N. M., H. Brouwer, A. de Cock, and F. Govers. 2012. The
https://doi.org/10.1126/science.228.4704.1147 genus Phytophthora anno 2012. Phytopathology 102: 348–364. https://
Gong, Y., Y. Sun, D. Peng, P. Chen, Z. Yan, and K. Yang. 2021. Analyze doi.org/10.1094/PHYTO-01-11-0025
COVID‐19 CT images based on evolutionary algorithm with dynamic Kuhlgert, S., G. Austic, R. Zegarac, I. Osei‐Bonsu, D. Hoh,
searching space. Complex and Intelligent Systems 7: 3195–3209. M. I. Chilvers, M. G. Roth, et al. 2016. MultispeQ Beta: A tool
https://doi.org/10.1007/s40747-021-00513-8 for large‐scale plant phenotyping connected to the open Photo-
Goodwin, S., J. Gurtowski, S. Ethe‐Sayers, P. Deshpande, M. C. Schatz, and synQ network. Royal Society Open Science 3: 160592. https://doi.
W. R. McCombie. 2015. Oxford Nanopore sequencing, hybrid error org/10.1098/rsos.160592
correction, and de novo assembly of a eukaryotic genome. Genome Kuok Ho, D. T., and P. S. Yap. 2020. A systematic review of slash‐and‐burn
Research 25: 1750–1756. https://doi.org/10.1101/gr.191395.115 agriculture as an obstacle to future‐proofing climate change.
Grosch, K. 2018. John Deere – Bringing AI to agriculture [online]. Website Proceedings of the 4th International Conference on Climate Change
https://digital.hbs.edu/platform-rctom/submission/john-deere- 4(1): 1–19. https://doi.org/10.17501/2513258X.2020.4101
bringing-ai-to-agriculture/ [accessed 16 May 2022]. Li, B., X. Xu, L. Zhang, J. Han, C. Bian, G. Li, J. Liu, and L. Jin. 2020.
Guo, M.‐H., T.‐X. Xu, J.‐J. Liu, Z.‐N. Liu, P.‐T. Jiang, T.‐J. Mu, S.‐H. Zhang, Above‐ground biomass estimation and yield prediction in potato by
et al. 2022. Attention mechanisms in computer vision: A survey. using UAV‐based RGB and hyperspectral imaging. ISPRS Journal of
Computational Visual Media 8: 331–368. https://doi.org/10.1007/s41095- Photogrammetry and Remote Sensing 162: 161–172. https://doi.org/
022-0271-y 10.1016/j.isprsjprs.2020.02.013
Gutiérrez, S., A. Wendel, and J. Underwood. 2019. Ground based Li, C., and M. Wand. 2016. Precomputed real‐time texture synthesis with
hyperspectral imaging for extensive mango yield estimation. Markovian Generative adversarial networks. In B. Leibe, J. Matas, N.
Computers and Electronics in Agriculture 157: 126–135. https://doi. Sebe, and M. Welling [eds.], Computer Vision – ECCV 2016, Lecture
org/10.1016/j.compag.2018.12.041 Notes in Computer Science, 702–716. Springer International
He, K., X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image Publishing, Cham, Switzerland. https://doi.org/10.1007/978-3-319-
recognition. In Proceedings of the IEEE Conference on Computer 46487-9_43
Vision and Pattern Recognition, 770–778. Las Vegas, Nevada, USA. Li, X., and J. She. 2017. Collaborative variational autoencoder for
He, K., G. Gkioxari, P. Dollar, and R. Girshick. 2017. Mask R‐CNN. In recommender systems. In Proceedings of the 23rd ACM SIGKDD
Proceedings of the IEEE International Conference on Computer International Conference on Knowledge Discovery and Data Mining,
Vision, 2961–2969. Venice, Italy. KDD ’17, 305–314. Association for Computing Machinery, New
Horevaj, P., E. Milus, and B. Bluhm. 2011. A real‐time qPCR assay to York, New York, USA. https://doi.org/10.1145/3097983.3098077
quantify Fusarium graminearum biomass in wheat kernels. Journal of Lim, K.‐L., X. Jiang, and C. Yi. 2020. Deep clustering with variational
Applied Microbiology 111: 396–406. https://doi.org/10.1111/j.1365- autoencoder. IEEE Signal Processing Letters 27: 231–235. https://doi.
2672.2011.05049.x org/10.1109/LSP.2020.2965328
Horton, T. R., and T. D. Bruns. 2001. The molecular revolution in Lin, E., S. Mukherjee, and S. Kannan. 2020. A deep adversarial variational
ectomycorrhizal ecology: Peeking into the black‐box. Molecular autoencoder model for dimensionality reduction in single‐cell RNA
Ecology 10: 1855–1871. https://doi.org/10.1046/j.0962-1083.2001.01333.x sequencing analysis. BMC Bioinformatics 21: 64. https://doi.org/10.
Horzyk, A., and E. Ergün. 2020. YOLOv3 precision improvement by the 1186/s12859-020-3401-5
weighted centers of confidence selection. In 2020 International Joint Lin, T. Y., M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár,
Conference on Neural Networks (IJCNN), 1–8. Glasgow, United and C. L. Zitnick. 2014. Microsoft COCO: Common objects in
Kingdom. https://doi.org/10.1109/IJCNN48605.2020.9206848 context. In Computer Vision–ECCV 2014: 13th European Confer-
Howard, A. G., M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, ence, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V
M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolu- 13, 740–755. Springer, Cham, Switzerland.
tional neural networks for mobile vision applications. arXiv Liu, Z., P. Luo, X. Wang, and X. Tang. 2015. Deep learning face attributes
1704.04861 [Preprint]. Posted 17 April 2017 [accessed 28 October in the wild. In Proceedings of the IEEE International Conference on
2023]. Available from: https://doi.org/10.48550/arXiv.1704.04861 Computer Vision, 3730–3738. Santiago, Chile.
Howard, A., M. Sandler, G. Chu, L.‐C. Chen, B. Chen, M. Tan, W. Wang, Liu, Z., H. Mao, C.‐Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. 2022. A
et al. 2019. Searching for MobileNetV3. arXiv 1905.02244 [Preprint]. ConvNet for the 2020s. arXiv 2201.03545 [Preprint]. Posted 2 March
Posted 20 November 2019 [accessed 28 October 2023]. Available 2022 [accessed 28 October 2023]. Available from: https://doi.org/10.
from: https://doi.org/10.48550/arXiv.1905.02244 48550/arXiv.2201.03545
Huang, H., Z. Wei, and L. Yao. 2019. A novel approach to component Lopes, J. F., V. da Costa, D. F. Barbin, L. J. P. Cruz‐Tirado, V. Baeten, and
assembly inspection based on Mask R‐CNN and support vector S. Barbon Jr. 2022. Deep computer vision system for cocoa
machines. Information 10: 282. https://doi.org/10.3390/info10090282 classification. Multimedia Tools and Applications 81: 41059–41077.
Huda‐Shakirah, A. R., N. M. I. Mohamed Nor, L. Zakaria, Y.‐H. Leong, and https://doi.org/10.1007/s11042-022-13097-3
M. H. Mohd. 2022. Lasiodiplodia theobromae as a causal pathogen of Lu, G., and B. Fei. 2014. Medical hyperspectral imaging: A review. Journal of
leaf blight, stem canker, and pod rot of Theobroma cacao in Malaysia. Biomedical Optics 19: 010901. https://doi.org/10.1117/1.JBO.19.1.010901
Scientific Reports 12: 8966. https://doi.org/10.1038/s41598-022- Luchi, N., R. Ioos, and A. Santini. 2020. Fast and reliable molecular
13057-9 methods to detect fungal pathogens in woody plants. Applied
Ioffe, S., and C. Szegedy. 2015. Batch normalization: Accelerating deep Microbiology and Biotechnology 104: 2453–2468. https://doi.org/10.
network training by reducing internal covariate shift. In Proceedings 1007/s00253-020-10395-4
of the 32nd International Conference on Machine Learning, 448–456. Maalø, L., M. Fraccaro, V. Liévin, and O. Winther. 2019. BIVA: A very
Lille, France. deep hierarchy of latent variables for generative modelling. arXiv
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 19 of 21

1902.02102 [Preprint]. Posted 6 November 2019 [accessed 28 vision. Journal of Food Composition and Analysis 97: 103771. https://
October 2023]. Available from: https://doi.org/10.48550/arXiv. doi.org/10.1016/j.jfca.2020.103771
1902.02102 O'Mahony, N., S. Campbell, A. Carvalho, S. Harapanahalli,
Mack, C. A. 2011. Fifty years of Moore's law. IEEE Transactions on G. V. Hernandez, L. Krpalkova, D. Riordan, and J. Walsh. 2020.
Semiconductor Manufacturing 24: 202–207. https://doi.org/10.1109/ Deep learning vs. traditional computer vision. In K. Arai and S.
TSM.2010.2096437 Kapoor [eds.], Advances in Computer Vision, Advances in
MacLean, D. 2019. A convolutional neural network for predicting Intelligent Systems and Computing, 128–144. Springer Interna-
transcriptional regulators of genes in Arabidopsis transcriptome data tional Publishing, Cham, Switzerland. https://doi.org/10.1007/978-
reveals classification based on positive regulatory interactions. 3-030-17795-9_10
biorXiv 618926 [Preprint]. Posted 28 April 2019 [accessed 28 October Ovaskainen, O., J. Nokso‐Koivisto, J. Hottola, T. Rajala, T. Pennanen,
2023]. Available from: https://doi.org/10.1101/618926 H. Ali‐Kovero, O. Miettinen, et al. 2010. Identifying wood‐inhabiting
Maddison, A. C., G. Macias, C. Moreira, R. Arias, and R. Neira. 1995. fungi with 454 sequencing – What is the probability that BLAST gives
Cocoa production in Ecuador in relation to dry‐season escape from the correct species? Fungal Ecology 3: 274–283. https://doi.org/10.
pod rot caused by Crinipellis perniciosa and Moniliophthora roreri. 1016/j.funeco.2010.01.001
Plant Pathology 44: 982–998. https://doi.org/10.1111/j.1365-3059. Pan, T.‐T., E. Chyngyz, D.‐W. Sun, J. Paliwal, and H. Pu. 2019.
1995.tb02657.x Pathogenetic process monitoring and early detection of pear black
Malhi, Y., J. T. Roberts, R. A. Betts, T. J. Killeen, W. Li, and C. A. Nobre. spot disease caused by Alternaria alternata using hyperspectral
2008. Climate change, deforestation, and the fate of the Amazon. imaging. Postharvest Biology and Technology 154: 96–104. https://doi.
Science 319: 169–172. https://doi.org/10.1126/science.1146961 org/10.1016/j.postharvbio.2019.04.005
Marelli, J.‐P., D. I. Guest, B. A. Bailey, H. C. Evans, J. K. Brown, M. Junaid, Paoletti, M. E., J. M. Haut, J. Plaza, and A. Plaza. 2019. Deep learning
R. W. Barreto, et al. 2019. Chocolate under threat from old and new classifiers for hyperspectral imaging: A review. ISPRS Journal of
cacao diseases. Phytopathology 109: 1331–1343. https://doi.org/10. Photogrammetry and Remote Sensing 158: 279–317. https://doi.org/
1094/PHYTO-12-18-0477-RVW 10.1016/j.isprsjprs.2019.09.006
Meinhardt, L. W., J. Rincones, B. A. Bailey, M. C. Aime, G. W. Griffith, Parra, P., T. Negrete, J. Llaguno, and N. Vega. 2018. Computer vision
D. Zhang, and G. A. G. Pereira. 2008. Moniliophthora perniciosa, the techniques applied in the estimation of the cocoa beans fermentation
causal agent of witches’ broom disease of cacao: What's new from this grade. In 2018 IEEE ANDESCON, 1–10. Santiago de Cali, Colombia.
old foe? Molecular Plant Pathology 9: 577–588. https://doi.org/10. https://doi.org/10.1109/ANDESCON.2018.8564569
1111/j.1364-3703.2008.00496.x Patrício, D. I., and R. Rieder. 2018. Computer vision and artificial
Meyer, J. G., S. Liu, I. J. Miller, J. J. Coon, and A. Gitter. 2019. Learning intelligence in precision agriculture for grain crops: A systematic
drug functions from chemical structures with convolutional neural review. Computers and Electronics in Agriculture 153: 69–81. https://
networks and random forests. Journal of Chemical Information and doi.org/10.1016/j.compag.2018.08.001
Modeling 59: 4438–4449. https://doi.org/10.1021/acs.jcim.9b00236 Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
Mite‐Baidal, K., E. Solís‐Avilés, T. Martínez‐Carriel, A. Marcillo‐Plaza, M. Blondel, et al. 2011. Scikit‐learn: Machine learning in Python.
E. Cruz‐Ibarra, and W. Baque‐Bustamante. 2019. Analysis of Journal of Machine Learning Research 12: 2825–2830.
computer vision algorithms to determine the quality of fermented Peters, J., D. Janzing, and B. Schölkopf. 2017. Elements of causal inference:
cocoa (Theobroma cacao): Systematic literature review. In R. Foundations and learning algorithms. MIT Press, Cambridge,
Valencia‐García, G. Alcaraz‐Mármol, J. del Cioppo‐Morstadt, N. Massachusetts, USA.
Vera‐Lucio, M. Bucaram‐Leverone [eds.], ICT for Agriculture and Phillips‐Mora, W., and M. J. Wilkinson. 2007. Frosty pod of cacao: A
Environment, Advances in Intelligent Systems and Computing, disease with a limited geographic range but unlimited potential for
79–87. Springer International Publishing, Cham, Switzerland. damage. Phytopathology 97: 1644–1647. https://doi.org/10.1094/
https://doi.org/10.1007/978-3-030-10728-4_9 PHYTO-97-12-1644
Motamedi, M., N. Sakharnykh, and T. Kaldewey. 2021. A data‐centric PyTorch. 2023. Models and pre‐trained weights. Website https://pytorch.
approach for training deep neural networks with less data. arXiv org/vision/main/models [accessed 21 June 2023].
2110.03613 [Preprint]. Posted 29 October 2021 [accessed 28 October Ramos‐Giraldo, P., C. Reberg‐Horton, A. M. Locke, S. Mirsky, and
2023]. Available from: https://doi.org/10.48550/arXiv.2110.03613 E. Lobaton. 2020. Drought stress detection using low‐cost computer
Nagarajan, S., G. Seibold, J. Kranza, E. E. Saari, and L. M. Joshi. 1984. vision systems and machine learning techniques. IT Professional 22:
Monitoring wheat rust epidemics with the Landsat‐2 satellite. 27–29. https://doi.org/10.1109/MITP.2020.2986103
Phytopathology 74: 585. https://doi.org/10.1094/Phyto-74-585 Ray, M., A. Ray, S. Dash, A. Mishra, K. G. Achary, S. Nayak, and S. Singh.
Nalisnick, E., A. Matsukawa, Y. W. Teh, D. Gorur, and 2017. Fungal disease detection in plants: Traditional assays, novel
B. Lakshminarayanan. 2019. Do deep generative models know what diagnostic techniques and biosensors. Biosensors and Bioelectronics
they don't know? arXiv 1810.09136 [Preprint]. Posted 24 February 87: 708–723. https://doi.org/10.1016/j.bios.2016.09.032
2019 [accessed 28 October 2023]. Available from: https://doi.org/10. Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look
48550/arXiv.1810.09136 once: Unified, real‐time object detection. In Proceedings of the IEEE
Nguyen, H. D. D., V. Pan, C. Pham, R. Valdez, K. Doan, and C. Nansen. Conference on Computer Vision and Pattern Recognition, 779–788.
2020. Night‐based hyperspectral imaging to study association of Las Vegas, Nevada, USA.
horticultural crop leaf reflectance and nutrient status. Computers and Ren, S., K. He, R. Girshick, and J. Sun. 2016. Faster R‐CNN: Towards real‐
Electronics in Agriculture 173: 105458. https://doi.org/10.1016/j. time object detection with region proposal networks. arXiv
compag.2020.105458 1506.01497 [Preprint]. Posted 6 Jan 2016 [accessed 28 October
O'Donnell, K., T. J. Ward, V. A. R. G. Robert, P. W. Crous, D. M. Geiser, 2023]. Available from: https://doi.org/10.48550/arXiv.1506.01497
and S. Kang. 2015. DNA sequence‐based identification of Fusarium: Rice, R. A., and R. Greenberg. 2000. Cacao cultivation and the conservation
Current status and future directions. Phytoparasitica 43: 583–595. of biological diversity. Ambio 29: 167–173. https://doi.org/10.1579/
https://doi.org/10.1007/s12600-015-0484-z 0044-7447-29.3.167
Okamoto, H., T. Murata, T. Kataoka, and S. I. Hata. 2007. Plant Rodriguez, C., O. Alfaro, P. Paredes, D. Esenarro, and F. Hilario. 2021.
classification for weed detection using hyperspectral imaging with Machine learning techniques in the detection of cocoa (Theobroma
wavelet analysis. Weed Biology and Management 7: 31–37. https://doi. cacao L.) diseases. Annals of the Romanian Society for Cell Biology 25:
org/10.1111/j.1445-6664.2006.00234.x 7732–7741.
Oliveira, M. M., B. V. Cerqueira, S. Barbon, and D. F. Barbin. 2021. Ronneberger, O., P. Fischer, and T. Brox. 2015. U‐Net: Convolutional
Classification of fermented cocoa beans (cut test) using computer networks for biomedical image segmentation. In Medical Image
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
20 of 21 | COMPUTER VISION FOR PLANT PATHOLOGY

Computing and Computer‐Assisted Intervention–MICCAI 2015: Published 11 September 2020 [accessed 21 June 2023]. Available
18th International Conference, Munich, Germany, October 5–9, from: https://doi.org/10.48550/arXiv.1905.11946
2015, Proceedings, Part III 18, 234–241. Springer International Tan, M., and Q. Le. 2021. EfficientNetV2: Smaller models and faster
Publishing, Cham, Switzerland. training. In Proceedings of the International Conference on Machine
Salvatore, M. M., A. Andolfi, and R. Nicoletti. 2020. The thin line between Learning, 10096–10106.
pathogenicity and endophytism: The case of Lasiodiplodia theobro- Tan, D. S., R. N. Leong, A. F. Laguna, C. A. Ngo, A. Lao, D. M. Amalin, and
mae. Agriculture 10: 488. https://doi.org/10.3390/agriculture10100488 D. G. Alvindia. 2018. AuToDiDAC: Automated tool for disease
Sarkate, R. S., N. V. Kalyankar, and P. B. Khanale. 2013. Application of detection and assessment for cacao black pod rot. Crop Protection
computer vision and color image segmentation for yield prediction 103: 98–102. https://doi.org/10.1016/j.cropro.2017.09.017
precision. In Proceedings of the 2013 International Conference on Thapa, R., K. Zhang, N. Snavely, S. Belongie, and A. Khan. 2020. The Plant
Information Systems and Computer Networks, 9–13. Mathura, India. Pathology Challenge 2020 data set to classify foliar disease of apples.
https://doi.org/10.1109/ICISCON.2013.6524164 Applications in Plant Sciences 8: e11390. https://doi.org/10.1002/aps3.
Schaad, N. W., and R. D. Frederick. 2002. Real‐time PCR and its 11390
application for rapid plant disease diagnostics. Canadian Journal of Tian, H., T. Wang, Y. Liu, X. Qiao, and Y. Li. 2020. Computer vision technology
Plant Pathology 24: 2 50–2 58. https://doi.org/10.1 080/ in agricultural automation: A review. Information Processing in Agriculture
07060660209507006 7: 1–19. https://doi.org/10.1016/j.inpa.2019.09.006
Schmidt, A., and Z. Bandar. 1998. Modularity: A concept for new neural Toda, Y., and F. Okura. 2019. How convolutional neural networks diagnose
network architectures. In Proceedings of the IASTED International plant disease. Plant Phenomics 2019: 9237136. https://doi.org/10.
Conference of Computer Systems and Applications, 26–29. Irbid, Jordan. 34133/2019/9237136
Schölkopf, B., F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Tripathi, M. K., and D. D. Maktedar. 2020. A role of computer vision in
Y. Bengio. 2021. Toward causal representation learning. Proceedings of the fruits and vegetables among various horticulture products of
IEEE 109: 612–634. https://doi.org/10.1109/JPROC.2021.3058954 agriculture fields: A survey. Information Processing in Agriculture 7:
Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and 183–203. https://doi.org/10.1016/j.inpa.2019.07.003
D. Batra. 2017. Grad‐CAM: Visual explanations from deep networks United Nations. 2022. World Population Prospects 2022. Website https://
via gradient‐based localization. In Proceedings of the IEEE Interna- population.un.org/wpp/Graphs/Probabilistic/POP/TOT/900 [ac-
tional Conference on Computer Vision, 618–626. Venice, Italy. cessed 18 May 2022].
Shir, O. M. 2012. Niching in evolutionary algorithms. In G. Rozenberg, T. Vahdat, A., and J. Kautz. 2021. NVAE: A deep hierarchical variational
Bäck, and J. N. Kok [eds.], Handbook of natural computing, autoencoder. arXiv 2007.03898 [Preprint]. Posted 8 January 2021
1035–1069. Springer, Berlin, Germany. https://doi.org/10.1007/978- [accessed 28 October 2023]. Available from: https://doi.org/10.48550/
3-540-92910-9_32 arXiv.2007.03898
Shrikumar, A., P. Greenside, and A. Kundaje. 2017. Learning important Voulodimos, A., N. Doulamis, A. Doulamis, and E. Protopapadakis. 2018. Deep
features through propagating activation differences. In Proceedings of learning for computer vision: A brief review. Computational Intelligence
the 34th International Conference on Machine Learning, 3145–3153. and Neuroscience 2018: e7068349. https://doi.org/10.1155/2018/7068349
Sydney, Australia. Vuola, A. O., S. U. Akram, and J. Kannala. 2019. Mask‐RCNN and U‐Net
Singh, D., N. Jain, P. Jain, P. Kayal, S. Kumawat, and N. Batra. 2020. ensembled for nuclei segmentation. In 2019 IEEE 16th International
PlantDoc: A dataset for visual plant disease detection. In Proceedings Symposium on Biomedical Imaging (ISBI 2019), 208–212. Venice,
of the 7th ACM IKDD Conference on Data Sciences (CoD) and 25th Italy. https://doi.org/10.1109/ISBI.2019.8759574
Conference on Management of Data (COMAD), 249–253. Associa- Wang, L., Z. Wu, S. Karanam, K.‐C. Peng, and R. Vikram Singh. 2019.
tion for Computing Machinery, New York, New York, USA. https:// Sharpen Focus: Learning with attention separability and consistency.
doi.org/10.1145/3371158.3371196 arXiv 1811.07484 [Preprint]. Posted 7 August 2019 [accessed 28 October
Solodskikh, K., A. Kurbanov, R. Aydarkhanov, I. Zhelavskaya, Y. Parfenov, 2023]. Available from: https://doi.org/10.48550/arXiv.1811.07484
D. Song, and S. Lefkimmiatis. 2023. Integral neural networks. In Weinstein, B. G. 2018. A computer vision for animal ecology. Journal of
Proceedings of the IEEE/CVF Conference on Computer Vision and Animal Ecology 87: 533–545. https://doi.org/10.1111/1365-2656.12780
Pattern Recognition, 16113–16122. Vancouver, Canada. Wu, Y., and K. He. 2018. Group normalization. In Proceedings of the European
Statista. 2022. Number of smartphone mobile network subscriptions Conference on Computer Vision (ECCV), 3–19. Munich, Germany.
worldwide from 2016 to 2022, with forecasts from 2023 to 2028. Wu, Y., A. Kirillov, F. Massa, W.‐Y. Lo, and R. Girshick. 2019. Detectron2.
Website: https://www.statista.com/statistics/330695/number-of- Website https://github.com/facebookresearch/detectron2 [accessed
smartphone-users-worldwide/ [accessed 16 May 2022]. 28 October 2023].
Steddom, K., M. McMullen, B. Schatz, and C.M. Rush. 2005. Comparing image Wu, Z., Y. Chen, B. Zhao, X. Kang, and Y. Ding. 2021. Review of weed
format and resolution for assessment of foliar diseases of wheat. Plant detection methods based on computer vision. Sensors 21: 3647.
Health Progress 6: 11. https://doi.org/10.1094/PHP-2005-0516-01-RS https://doi.org/10.3390/s21113647
Su, J., C. Liu, M. Coombes, X. Hu, C. Wang, X. Xu, Q. Li, et al. 2018. Wheat Xiao, H., K. Rasul, and R. Vollgraf. 2017. Fashion‐MNIST: A novel image
yellow rust monitoring by learning from multispectral UAV aerial dataset for benchmarking machine learning algorithms. arXiv
imagery. Computers and Electronics in Agriculture 155: 157–166. 1708.07747 [Preprint]. Posted 15 September 2017 [accessed 28
https://doi.org/10.1016/j.compag.2018.10.017 October 2023]. Available from: https://doi.org/10.48550/arXiv.1708.
Sun, J., X. Wang, N. Xiong, and J. Shao, 2018. Learning sparse representation 07747
with variational auto‐encoder for anomaly detection. IEEE Access 6: Xie, S., R. Girshick, P. Dollár, Z. Tu, and K. He. 2017. Aggregated residual
33353–33361. https://doi.org/10.1109/ACCESS.2018.2848210 transformations for deep neural networks. arXiv 1611.05431
Sun, Y., B. Xue, M. Zhang, and G. G. Yen. 2020. Evolving deep [Preprint]. Posted 11 April 2017 [accessed 28 October 2023].
convolutional neural networks for image classification. IEEE Available from: https://doi.org/10.48550/arXiv.1611.05431
Transactions on Evolutionary Computation 24: 394–407. https://doi. Xu, W., H. Sun, C. Deng, and Y. Tan. 2017. Variational autoencoder for
org/10.1109/TEVC.2019.2916183 semi‐supervised text classification. In Thirty‐First AAAI Conference
Szegedy, C., V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. on Artificial Intelligence. San Francisco, California, USA.
Rethinking the inception architecture for computer vision. In Xu, W., and Y. Tan. 2020. Semisupervised text classification by variational
Proceedings of the IEEE Conference on Computer Vision and autoencoder. IEEE Transactions on Neural Networks and Learning
Pattern Recognition, 2818–2826. Las Vegas, Nevada, USA. Systems 31: 295–308. https://doi.org/10.1109/TNNLS.2019.2900734
Tan, M., and Q. V. Le. 2020. EfficientNet: Rethinking model scaling Xu, S., J. Wang, W. Shou, T. Ngo, A.‐M. Sadick, and X. Wang. 2021.
for convolutional neural networks. arXiv 1905.11946 [Preprint]. Computer vision techniques in construction: A critical review.
21680450, 2024, 2, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.11559 by Cochrane Colombia, Wiley Online Library on [27/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
COMPUTER VISION FOR PLANT PATHOLOGY | 21 of 21

Archives of Computational Methods in Engineering 28: 3383–3397. Proceedings of the IEEE Conference on Computer Vision and Pattern
https://doi.org/10.1007/s11831-020-09504-3 Recognition, 6848–6856. Salt Lake City, Utah, USA.
Xu, Y., and R. P. McCord. 2021. CoSTA: Unsupervised convolutional Zhao, T., Y. Yang, H. Niu, D. Wang, and Y. Chen. 2018. Comparing U‐Net
neural network learning for spatial transcriptomics analysis. BMC convolutional network with mask R‐CNN in the performances of
Bioinformatics 22: 397. pomegranate tree canopy segmentation. In Multispectral, Hyperspec-
Yosinski, J., J. Clune, Y. Bengio, and H. Lipson. 2014. How transferable are tral, and Ultraspectral Remote Sensing Technology, Techniques and
features in deep neural networks? arXiv 1411.1792 [Preprint]. Posted Applications VII, 210–218. Honolulu, Hawaii, USA. https://doi.org/
6 November 2014 [accessed 28 October 2023]. Available from: 10.1117/12.2325570
https://doi.org/10.48550/arXiv.1411.1792 Zhu, X., W. Su, L. Lu, B. Li, X. Wang, and J. Dai. 2021. Deformable
Yu, K.‐H., F. Wang, G. J. Berry, C. Ré, R. B. Altman, M. Snyder, and DETR: Deformable transformers for end‐to‐end object detection.
I. S. Kohane. 2020. Classifying non‐small cell lung cancer types and arXiv 2010.04159 [Preprint]. Posted 18 March 2021 [accessed 28
transcriptomic subtypes using convolutional neural networks. Journal October 2023]. Available from: https://doi.org/10.48550/arXiv.
of the American Medical Informatics Association 27: 757–769. https:// 2010.04159
doi.org/10.1093/jamia/ocz230
Yue, J., W. Zhao, S. Mao, and H. Liu, 2015. Spectral–spatial classification of
hyperspectral images using deep convolutional neural networks.
Remote Sensing Letters 6: 468–477. https://doi.org/10.1080/2150704X.
2015.1047045
Zeiler, M. D., and R. Fergus. 2014. Visualizing and understanding How to cite this article: Sykes, J. R., K. J. Denby, and
convolutional networks. In D. Fleet, T. Pajdla, B. Schiele, and T. D. W. Franks. 2024. Computer vision for plant
Tuytelaars [eds.], Computer Vision – ECCV 2014, Lecture Notes in pathology: A review with examples from cocoa
Computer Science, 818–833. Springer International Publishing, agriculture. Applications in Plant Sciences 12(2):
Cham, Switzerland. https://doi.org/10.1007/978-3-319-10590-1_53
e11559. https://doi.org/10.1002/aps3.11559
Zhang, X., X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely
efficient convolutional neural network for mobile devices. In

You might also like