Forests 14 00415
Forests 14 00415
Forests 14 00415
1 College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China
2 College of Information Management, Nanjing Agricultural University, Nanjing 210095, China
3 Department of Computing and Software, McMaster University, Hamilton, ON L8S 4L8, Canada
* Correspondence: baidi000@njau.edu.cn (D.B.); haifeng.lin@njfu.edu.cn (H.L.); Tel.: +86-25-8542-7827 (H.L.)
Abstract: Diseases and insect pests of tea leaves cause huge economic losses to the tea industry every
year, so the accurate identification of them is significant. Convolutional neural networks (CNNs) can
automatically extract features from images of tea leaves suffering from insect and disease infestation.
However, photographs of tea tree leaves taken in a natural environment have problems such as leaf
shading, illumination, and small-sized objects. Affected by these problems, traditional CNNs cannot
have a satisfactory recognition performance. To address this challenge, we propose YOLO-Tea, an
improved model based on You Only Look Once version 5 (YOLOv5). Firstly, we integrated self-
attention and convolution (ACmix), and convolutional block attention module (CBAM) to YOLOv5
to allow our proposed model to better focus on tea tree leaf diseases and insect pests. Secondly, to
enhance the feature extraction capability of our model, we replaced the spatial pyramid pooling
fast (SPPF) module in the original YOLOv5 with the receptive field block (RFB) module. Finally,
we reduced the resource consumption of our model by incorporating a global context network
(GCNet). This is essential especially when the model operates on resource-constrained edge devices.
When compared to YOLOv5s, our proposed YOLO-Tea improved by 0.3%–15.0% over all test data.
YOLO-Tea’s AP0.5 , AP TLB , and AP GMB outperformed Faster R-CNN and SSD by 5.5%, 1.8%, 7.0%
and 7.7%, 7.8%, 5.2%. YOLO-Tea has shown its promising potential to be applied in real-world tree
disease detection systems.
Keywords: tea leaf diseases; object detection; computer vision; deep learning
techniques for pest detection and extraction by constructing an automated detection and
extraction system for the estimation of pest densities in rice fields. Barbedo et al. [4]
presented a method for identifying plant diseases based on color transformations, color
histograms, and a pair-based classification system. However, the accuracy for identifying
multiple plant diseases fluctuated between 40% and 80% when tested. Zhang et al. [5]
proposed a leaf-image-based cucumber disease identification using K-means clustering
and sparse representation classification. Hossain et al. [6] developed a system for image
processing using a support vector machine (SVM) classifier for disease identification. It
was able to identify and classify brown spot and algal leaf diseases from healthy leaves.
Sun et al. [7] proposed a new method combining simple linear iterative cluster (SLIC) and
SVM to achieve accurate extraction of tea tree leaf disease salinity maps in a complex
background context. To summarize, the classical machine learning methods (e.g., random
forests and support vector machines) for plant disease detection require manual extraction
of plant leaf disease features. The manually extracted features may not be the essential
characteristics of the crop disease, which will significantly affect the precision of the
disease diagnosis.
With the development of deep learning techniques, more and more researchers are
investigating using deep learning to detect crop leaf diseases and insect pests. Recent
development of image recognition technologies has led to the widespread use of convo-
lutional neural networks (CNN) in deep learning for automatic image classification and
recognition of plant diseases [8]. Chen et al. [8] proposed a CNN model called LeafNet to
automatically extract features of tea tree diseases from images. Hu et al. [9] proposed a
low-shot learning method. They utilized SVM to separate diseased spots in diseased tea
photos to remove background interference and a modified C-DCGAN to solve insufficient
samples. Hu et al. [1] proposed a tea disease detection method based on the CIFAR10-quick
model with the addition of a multiscale feature extraction module and depth-separable
convolution. Jiang et al. [10] used CNN to extract rice leaf disease image features before
using SVM to classify and predict specific diseases. The CNN-based tea leaf disease iden-
tification method outperforms traditional machine learning methods [1,11]. In the above
method, the researchers used CNNs to automatically extract crop-disease-specific features
instead of manually extracting them. While the above methods have performed well in
the treatment of crop diseases, they focus solely on either crop disease image identification
or classification.
In recent years, deep-learning-based image detection networks have been divided
into two-stage and one-stage detection networks [12]. Faster region-based convolutional
neural networks (Faster R-CNN) [13] is one of the more representative two-stage detection
networks. Zhou et al. [14] proposed a rice disease detection algorithm based on Faster
R-CNN and FCM-KM fusion and achieved relatively good performance. Although the
detection accuracy of Faster R-CNN is good, the detection speed is slow and therefore
cannot meet the real-time requirements. The one-stage detection networks are more efficient
than the two-stage ones, although it is less accurate. You Only Look Once (YOLO) [15],
Single Shot MultiBox Detector (SSD) [16], and RetinaNet [17] are representatives of one-
stage detection networks. Among them, the YOLO family is widely used in agriculture
due to their ability to detect efficiently and accurately. Tian et al. [18] designed a system
based on YOLOv3 [19] that can detect apples at three different stages in the orchard
in real time. Roy et al. [20] designed a high-performance real-time fine-grained target
detection framework that can address obstacles such as dense distribution and irregular
morphology, which is based on an improvement of YOLOv4 [21]. Sun et al. [22] proposed
a novel concept for the synergistic use of the YOLO-v4 deep learning network for ITC
segmentation and a computer graphics algorithm for refinement of the segmentation results
involving overlapping tree crowns. Dai et al. [23] developed a crop leaf disease detection
method called YOLOv5-CAcT based on YOLOv5. To the best of our knowledge, the YOLO
family has already been widely used in the detection of leaf diseases and insect pests in
Forests 2023, 14, 415 3 of 18
Figure 1. Some representative samples of our dataset. (a) Tea leaf blight; (b) tea leaf blight; (c) green
Figure
mirid 1. Some
bug; representative
(d) green mirid bug.samples of our dataset. (a) Tea leaf blight; (b) tea leaf blight; (c) green
mirid bug; (d) green mirid bug.
2.2. YOLO-Tea
2.2.1. YOLOv5
YOLOv5 is a member of the YOLO series presented by the Ultralytics LLC team.
Depending on the network width and depth, YOLOv5 can be classified as YOLOv5n,
YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. In this paper, our proposed YOLO-Tea tea
disease and pest detection model is improved based on YOLOv5s. The image inference
speed of the Yolov5s model reaches 455FPS on a subset of researchers’ devices, which is
widely used by a large number of scholars with this advantage [29].
As shown in Figure 2, YOLOv5s-6.1 is divided into four parts: input, backbone, neck,
and head. Cross-stage partial 1 (CSP1) and cross-stage partial 2 (CSP2) in the backbone
and neck of YOLOv5 are designed with reference to the cross-stage partial network (CSP-
Net) [30] structure. CSP1 is used for feature extraction in the backbone section. CSP2 is
used for feature fusion in the neck section. In the backbone, there is a spatial pyramid
pooling fast (SPPF) module in addition to the CSP1 module. The function of the SPPF
module is to extract the global information of the detection target. In the neck, YOLOv5
uses the path aggregation network (PANet) [31] structure. The PANet structure not only
merges the extracted semantic features with the location features, but also the features of
the backbone with the head, so that the model obtains more abundant feature information.
Finally, the head consists of three branches, with feature maps of different sizes used to
detect target objects of different sizes.
2.2.3. ACmix
Convolution and self-attention are two powerful techniques for representation learn-
ing [24]. Pan et al. [24] proposed a mixed model that enjoys the benefit of both self-attention
and convolution (ACmix). The structure of ACmix is shown in Figure 4.
EER REVIEW 5 of 20
Finally,
Forests the
head consists of three branches, with feature maps of different sizes used to
2023, 14, 415 5 of 18
Figure 3. The details of CBAM. (a) The structure of CAM. (b) The structure of SAM. (c) The structure
Figure 3. The details of CBAM. (a) The structure of CAM. (b) The structure of SAM. (c) The structure
of CBAM.
of CBAM.
2.2.4. Receptive Field Block
2.2.3. ACmix
Liu et al. [26] proposed the receptive field block (RFB), which was inspired by the
perceptual fields of human vision, to enhance network extraction capabilities. The RFB
Convolution and self-attention are two powerful techniques for representation learn-
network is composed of several convolutional layers with different sizes of convolutional
ing [24]. Pan et al. [24] proposed a mixed model that enjoys the benefit of both self-atten-
kernels. Each branch uses a combination of convolutional kernels of different scales and
tion and convolution (ACmix). The structure of ACmix is shown in Figure 4.
cavity convolution with different expansion rates, allowing the perceptual field of each
Firstly, they reshape the input features into N segments using three 1 × 1 convolutions
branch to expand to different degrees. The structure of RFB borrows from that of Inception
in the first stage, respectively. Through this method, a feature set containing 3 × N feature
by adding a cavity convolution to Inception, which effectively increases the receptive field.
maps is obtained. Secondly, in the second stage, they input the feature set obtained in the
The structure of RFB is shown in Figure 5. Firstly, the parameters were reduced by
first stage into two paths: the self-attention path and the convolution path. In the self-
1 × 1 convolutional dimensionality reduction. Secondly, 1 × 1, 3 × 3, and 5 × 5 convo-
attention path, they divide the features obtained in the first stage into N groups. Each
lutions are performed to simulate different scales of perceptual fields. Thirdly, the 1 × 1,
group contains three 1 × 1 convolutional output features from the first stage. The corre-
3 × 3, and 5 × 5 convolution kernels are connected with the 3 × 3 dilated convolutions
sponding three feature maps are used as queries, keys, and values, following the tradi-
corresponding to expansion rates of 1, 3, and 5, respectively. Finally, the output of each
tional multiheaded self-attention module. In the convolution path, since the convolution
branch is concatenated to fuse different features and improve the network model’s ability
kernel size is k, the feature map obtained in the first stage is transformed into k 2 feature
to represent different-sized targets. In addition, the RFB module also adopts the shortcut
maps using the light fully connected layer. Subsequently, they generate features by shift-
in ResNet, which can effectively mitigate the gradient disappearance and improve the
ing and aggregation. Finally, the feature maps output by the self-attention path and the
training performance of the network.
convolution path are summed, and the intensity can be controlled by two learnable sca-
lars.
R PEER REVIEW 7 of 20
Forests 2023, 14, 415 7 of 18
Figure6.6.The
Figure Themethods
methods for
for integrating
integrating GCNet.
GCNet. (a)
(a) The
The structure
structure of
of GCNet.
GCNet. (b)The
(b)TheCSP1
CSP1structure
structureofof
thefused
the fusedGCNet.
GCNet.
2.2.6.
2.2.6.The
TheStructure
Structure of
of YOLO-Tea
YOLO-Tea.
The
The structure of theYOLO-Tea
structure of the YOLO-Teaobtained
obtainedby byimproving
improvingYOLOv5s
YOLOv5sisisshown
shownininFigure
Figure7.
Firstly, we replaced the CPS1 module on layers 2, 4, 6, and 8 of YOLOv5 with
7. Firstly, we replaced the CPS1 module on layers 2, 4, 6, and 8 of YOLOv5 with the the CSP1GC
module.
CSP1GCThe CSP1GC
module. The module
CSP1GCismodule
not only lighter
is not onlythan the than
lighter original CSP1 module,
the original but also
CSP1 module,
allows
but also allows for a better global focus on tea disease and pest targets. Secondly, intoorder
for a better global focus on tea disease and pest targets. Secondly, in order solve
to solve the problem of missing feature information due to the low pixel count of small
targets for tea leaf diseases and pests, we added the ACMix module to the backbone sec-
tion and the CBAM module to the neck of YOLOv5. The ACmix module was only added
at layer 9 in the backbone section due to the high overhead of the ACmix parameter. The
Forests 2023, 14, 415 9 of 18
the problem of missing feature information due to the low pixel count of small targets for
tea leaf diseases and pests, we added the ACMix module to the backbone section and the
CBAM module to the neck of YOLOv5. The ACmix module was only added at layer 9
in the backbone section due to the high overhead of the ACmix parameter. The CBAM
module is lighter so it is added at layers 19, 23, and 27 in the neck section. Thirdly, we
replaced the SPPF module in the original YOLOv5 with the RFB module in order to obtain
better global information on tea disease and pest targets. In the head, 20 × 20, 40 × 40 and
80 × 80 feature maps are output, which are used to detect large, medium, and small targets,
respectively. Each of these three feature maps contains three anchors, so there are a total of
x FOR PEER REVIEW 10 of 20
nine anchors in YOLO-Tea. This corresponds to three detection heads for large, medium
and small targets.
The precision (P) rate represents the ratio of targets detected correctly by the model to
all detected targets. The formula for calculating the precision rate is shown in Equation (1).
In the formula, TP means that the prediction is tea leaf blight or green mirid bug and the
prediction is correct, and FP means that the prediction is tea leaf blight or green mirid bug
and the prediction is incorrect.
TP
P = (1)
TP + FP
Recall (R) indicates the proportion of targets correctly predicted by the model as a
percentage of all targets. The formula for calculating the recall rate is shown in Equation (2).
FN indicates that the target is tea leaf blight or green mirid bug target and the model detects
it incorrectly.
TP
R = (2)
TP + FN
Average precision (AP) is the area enclosed by the axes below the precision–recall
curve, which is the curve plotted with precision as the y-axis and recall as the x-axis. When
additional enclosing boxes are accepted, the precision value is shown via a precision–recall
curve (i.e., higher recall value due to a low threshold of class probability). A strong model
can sustain high precision as recall rises [34]. Typically, the intersection over union (IoU)
threshold is set at 0.5. In general, higher AP represents better model performance. Note
Forests 2023, 14, 415 11 of 18
that AP0.5 in the Microsoft COCO evaluation metrics is equivalent to mAP@0.5. mAP@0.5
is the arithmetic mean of the APs for all target categories. The formulas for calculating AP
and mAP are shown in Equations (3) and (4).
Z 1
AP = P(r)dr (3)
0
C
mAP = ∑ APi /C (4)
i=1
Average recall (AR) represents twice the area of the R–IoU curve devolved to the coor-
dinate axis. The value of AR is similar to the value of AP, with higher values representing
better model performance. The formula for calculating AR is shown in Equation (5).
Z 0.95
AR = 2 R(IoU)dIoU (5)
0.5
3. Results
3.1. Training
The experiment settings in this paper are shown in Table 3. Some of the main training
parameters for the tea disease and pest detection model were set as shown in Table 4.
Model AP0.5 APS APM APL AR0.5:0.95ARS ARM ARL APTLB APGMB
YOLOv5s (baseline) 71.7 31.1 53.6 70.1 50.2 44.6 52.6 72.1 69.3 74.1
YOLOv5s + CBAM 72.4 31.4 54.4 75.3 51.6 42.2 55.4 76.1 73.3 76.3
YOLOv5s + GCnet 73.0 34,2 54.6 72.4 54.0 45.4 57.5 73.3 66 80.1
YOLOv5 + RFB 73.7 33.1 53.9 71.2 52.4 44.9 55.4 67.5 71.2 76.1
YOLOv5 + ACmix 73.7 31.2 53.8 72.1 51.1 41.7 54.9 72.4 70.6 76.8
YOLOv5 + CBAM + ACmix 75.0 33.0 54.2 79.9 52.7 39.8 58.1 80.0 71.2 78.7
YOLOv5
Forests + xCBAM
2023, 14, + ACmix
FOR PEER REVIEW + RFB 75.8 32.8 56.7 82.5 54.6 44.6 58.5 84.513 71.7
of 20 79.9
YOLOv5 + CBAM + ACmix + RFB + GCnet 79.3 34.2 57.3 85.1 54.7 44.9 57.4 85.7 73.7 82.6
(YOLO-Tea) (+7.6) (+3.1) (+3.7) (+15.0) (+4.5) (+0.3) (+4.8) (+13.6) (+4.4) (+8.5)
Table 6. The data from the comparative experiments.
Table 7.YOLOv5
The data of+resource
CBAM consumption
+ ACmix + experiment.
RFB 8722131 17.1
YOLO-Tea 7959886 15.6 (−1.5)
Model Parameters Model Size (MB)
YOLOv5s 7025023 13.7
YOLOv5s + GCnet 6262778 13.0(−0.7)
YOLOv5 + CBAM + ACmix + RFB 8722131 17.1
YOLO-Tea 7959886 15.6(−1.5)
Forests 2023, 14, 415 13 of 18
3.3. Comparison
The comparison experiments in Section 3.2 show that YOLO-Tea’s AP0.5 , APTLB , and
APGMB not only improve by 7.6%, 4.4%, and 8.5%, respectively, over the native YOLOv5s,
but they also improve over both the Faster R-CNN model and the SSD model. Among
them, Faster R-CNN as a two-stage target detection model, AP0.5 , APTLB , and APGMB were
2.1%, 2.6%, and 1.5% higher than YOLOv5s, respectively, but YOLO-Tea’s AP0.5 , APTLB ,
and APGMB were 5.5%, 1.8%, and 7.0% higher than Faster R-CNN. The results from these
comparative experiments also demonstrate the design of our proposed YOLO-Tea model.
In Experiment 2 of the ablation experiments in Section 3.2, we added the CBAM mod-
ule to the YOLOv5s model. Although the ARS of the model with the CBAM added was
reduced by 2.4% compared to YOLOv5s, all other data improved, by 0.3%–5.2%, respec-
tively. This demonstrates the effectiveness of including the CBAM module in YOLOv5s.
Similarly, Experiment 3 of the ablation experiment showed that combining YOLOv5s with
the GCnet module results in a significant improvement in model performance. Experiment
4 of the ablation experiment showed that replacing the SPPF module in YOLOv5s with the
RFB module can also result in effective performance improvements. Experiment 5 of the
ablation experiment demonstrated that adding the ACmix module to YOLOv5s improved
performance significantly.
In Experiment 6 of the ablation experiment, we not only added CBAM to YOLOv5s,
but also fused YOLOv5s with the GCnet module. The test data for the improved module im-
proved over YOLOv5s, except for the ARs which decreased by 4.8% compared to YOLOv5s.
This proved that the YOLOv5s model with the addition of the CBAM module was effective
when fused with the GCnet module. Similarly, Experiment 7 demonstrated that the addition
of the CBAM module to YOLOv5s, the fusion of the GCnet module, and the replacement of
the SPPF module with the RFB module can lead to performance improvements.
Finally, in Experiment 8 of the ablation experiment, our proposed YOLO-Tea showed
an improvement of 0.3%–15.0% in all test data compared to YOLOv5s. Furthermore, as can
be seen from the precision–recall curves shown in Figure 8, YOLO-Tea has an even higher
precision as recall rises. Figures 9 and 10 show a comparison between the YOLO-Tea and
the YOLOv5s results of detection.
In Figure 9a, YOLOv5s detected 21 targets of tea leaf blight disease. However, these
21 tea leaf blight targets contained one false detection and one duplicate detection. There
were also two missed targets for green mirid bug infestation. A total of 20 targets for the
tea leaf blight disease and two targets for the green mirid bug pest were correctly identified
by YOLO-Tea in Figure 9b. In Figure 9c, YOLOv5s detected seven tea leaf blight disease
targets correctly, but missed one green mirid bug target and detected two tea leaf blight
disease targets incorrectly. In Figure 9d, YOLO-Tea correctly detected four tea leaf blight
disease targets and one green mirid bug pest target, though. However, three tea leaf blight
disease targets were missed.
In Figure 10a, while the YOLOv5s correctly detected four green mirid bug infestation
targets, it also failed to detected two green mirid bug infestation targets. In Figure 10b,
YOLO-Tea correctly detected four targets of green mirid bug infestation. In Figure 10c,
while YOLOv5s correctly detected four tea leaf blight targets, it failed to detect two tea leaf
blight target and missed one green mirid bug target. In Figure 10d, YOLO-Tea correctly
detected six tea leaf blight disease targets and one green blind bug infestation target.
Forests
Forests 2023, 14, 4152023, 14, x FOR PEER REVIEW 15 of 20 14 of 18
Figure
Forests 2023, 14, x FOR PEERFigure
REVIEW 9. Comparison
9. Comparison ofofmodel
model detection
detection results. (a) YOLOv5's
results. detection
(a) YOLOv5's results. (b)
detection YOLOv5-Tea’s
results. 16
(b)ofYOLOv5-Tea’s
20
detection results. (c) YOLOv5's detection results. (d) YOLOv5-Tea’s detection results.
detection results. (c) YOLOv5's detection results. (d) YOLOv5-Tea’s detection results.
In Figure 9a, YOLOv5s detected 21 targets of tea leaf blight disease. However, these
21 tea leaf blight targets contained one false detection and one duplicate detection. There
were also two missed targets for green mirid bug infestation. A total of 20 targets for the
tea leaf blight disease and two targets for the green mirid bug pest were correctly identi-
fied by YOLO-Tea in Figure 9b. In Figure 9c, YOLOv5s detected seven tea leaf blight dis-
ease targets correctly, but missed one green mirid bug target and detected two tea leaf
blight disease targets incorrectly. In Figure 9d, YOLO-Tea correctly detected four tea leaf
blight disease targets and one green mirid bug pest target, though. However, three tea
leaf blight disease targets were missed.
4. Discussion
Due to various characteristics such as texture, shape, and color, diseases and insect
pests of tea tree leaves are hard to accurately detect. The leaves of tea trees are also much
smaller than those of other crops, which makes it difficult to detect disease on such a
small target. The YOLOv5s model’s performance fell short of what was needed for our
subsequent study in the face of these issues. As a result, we enhanced the YOLOv5s model
in numerous ways.
First, as YOLOv5 is not able to focus effectively on small targets of tea leaves, we made
our model more effective in focusing on tea leaflet targets by adding ACmix and CBAM.
ACmix fuses convolution and self-attention, two powerful techniques in computer vision.
The detection performance of models with ACmix is improved by 1.3%–2.0%, in terms of
AP0.5 , APTLB , and APGMB . However, ACmix has a large number of parameters. ACmix
cannot be added to the model in large numbers in order to keep the model real time. CBAM
is a lightweight module obtained by concatenating the channel attention module and the
spatial attention module. The parameters of CBAM are smaller than those of ACmix, so
we added it at three sites in the model (Figure 7). The addition of CBAM improved the
performance of the model by 0.7%–4.0% in terms of AP0.5 , APTLB , and APGMB .
Second, RFB is a combination of convolutional kernels of different sizes and dilated
convolution, with the creators of RFB believing that larger convolutional kernels have a
larger field of perception. We will tentatively refer to the YOLOv5 model with the addition
of ACmix and CBAM as Model A. Replacing the SPPF module in Model A with the RFB
module improves performance by 0.5%–1.2%.
Thirdly, the addition of ACmix, CBAM, and the replacement of SPPF with RFB to
YOLOv5 (7025023) improved the model performance, but the significant increase in the
number of parameters in the model (8,722,131) led to a reduction in the real-time per-
formance of the model in detecting tea diseases and insect pests. We chose GCnet to
improve the cross-stage partial 1 (CSP1) module in YOLOv5. YOLOv5 with GCnet had a
performance increase of 0.8%–1.9%, in terms of AP0.5 , APTLB , and APGMB . The number of
parameters for YOLOv5 of the fused GCnet was reduced to 6,262,778. We tentatively refer
to YOLOv5 with the addition of ACmix, CBAM, and the replacement of SPPF with RFB
as Model B. Model B with GCnet had a performance increase of 2.0%–3.5%, in terms of
AP0.5 , APTLB , and APGMB . The number of parameters for Model B of the fused GCnet was
reduced to 7,959,886. The model size of Model B after incorporating the GCNet was also
reduced from 17.1 MB to 15.6 MB. The experiments show that the improved CSP1 based on
GCnet not only reduces the number of parameters to make the model more lightweight,
but also improves the performance of the model in detecting tea diseases and insect pests.
We eventually developed the YOLO-Tea tea tree leaf disease and pest detection model
through a series of improvements. We will first further enhance YOLO-Tea’s performance
in subsequent studies. This is due to the fact that we discovered through ablation practice
that YOLO-Tea still has shortcomings in the detection of smaller tea tree leaf diseases and
insect pests. YOLO-Tea’s APS and ARS only improved by 3.1% and 0.3% compared to
YOLOv5. These two figures show that there is still room for improvement in the detection
of very small targets with YOLO-Tea. Second, based on the quantity and concentration
of diseases and pests found, we will also create a system for evaluating the use of pes-
ticides. Tea farmers will have a reference for pesticide dosing thanks to this system for
assessing pesticide dose. Third, motivated by Lin’s two deep learning bus route planning
applications [35,36], we also intend to create a deep learning model for planning individual
drones for pesticide spraying on tea plantations in our subsequent research. In addition,
the method proposed by Xue et al. [37] allows direct modeling of the detailed distribution
of canopy radiation at the plot scale. In our opinion, the method proposed by Xue et al. [37]
may be a useful aid to our subsequent continued research on tea diseases and insect pests.
Forests 2023, 14, 415 16 of 18
5. Conclusions
The yield and quality of tea leaves are significantly impacted by tea diseases. The
precise control of tea diseases is facilitated by high-precision automatic detection and
identification. However, because of the illumination, small targets, and occlusions in
natural environments, deep learning methods tend to have low detection accuracy. In order
to address these issues, we enhanced the YOLOv5s model in this paper.
First, we added the ACmix and CBAM modules to address the issue of false detection
caused by small targets. We then enhanced retaining the global information of small tea
disease and insect pest targets by swapping the SPPF module out for the RFB module. In
order to reduce the number of parameters and boost performance, GCnet and YOLOv5s
were finally combined.
To prove that our proposed YOLO-Tea model performs better than YOLOv5s, Faster
R-CNN, and SSD, ablative experiments and comparison experiments were carried out. The
experiment results show that our model has great potential to be used in real-world tea
diseases monitoring applications.
The dataset used in this paper was taken in good afternoon light and does not take into
account early morning and poor night light conditions for the time being. Further research
will be conducted in the future to address the early morning and late night conditions. In
addition, the ccd and other equipment may be affected by the high ambient temperature of
the working environment in the afternoon. In future research, we will address these issues
to further improve the performance of our model.
Author Contributions: Z.X. devised the programs and drafted the initial manuscript and contributed
to writing embellishments. R.X. helped with data collection, data analysis, and revised the manuscript.
H.L. and D.B. designed the project and revised the manuscript. All authors have read and agreed to
the published version of the manuscript.
Funding: This work was funded by The Jiangsu Modern Agricultural Machinery Equipment and
Technology Demonstration and Promotion Project (NJ2021-19) and The Nanjing Modern Agricultural
Machinery Equipment and Technological Innovation Demonstration Projects (NJ [2022]09).
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
CNNs Convolutional neural networks
YOLOv5 You Only Look Once version 5
CBAM Convolutional block attention module
SPPF Spatial pyramid pooling fast
GCNet Global context network
SLIC Simple linear iterative cluster
SVM Support vector machine
Faster R-CNN Faster region-based convolutional neural networks
SSD Single Shot Multibox Detector
ACmix Self-attention and convolution
CSP1 Cross-stage partial 1
CSP2 Cross-stage partial 2
CSPNet Cross-stage partial networks
CAM Channel attention module
SAM Spatial attention module
RFB Receptive field block
SE Squeeze-and-excitation
SNL Simplified Nonlocal
NL Nonlocal
P Precision
R Recall
AP Average precision
Forests 2023, 14, 415 17 of 18
References
1. Hu, G.; Yang, X.; Zhang, Y.; Wan, M. Identification of tea leaf diseases by using an improved deep convolutional neural network.
Sustain. Comput. Inform. Syst. 2019, 24, 100353. [CrossRef]
2. Bao, W.; Fan, T.; Hu, G.; Liang, D.; Li, H. Detection and identification of tea leaf diseases based on AX-RetinaNet. Sci. Rep. 2022,
12, 2183. [CrossRef] [PubMed]
3. Miranda, J.L.; Gerardo, B.D.; Tanguilig, B.T., III. Pest detection and extraction using image processing techniques. Int. J. Comput.
Commun. Eng. 2014, 3, 189. [CrossRef]
4. Barbedo, J.G.A.; Koenigkan, L.V.; Santos, T.T. Identifying multiple plant diseases using digital image processing. Biosyst. Eng.
2016, 147, 104–116. [CrossRef]
5. Zhang, S.; Wu, X.; You, Z.; Zhang, L. Leaf image-based cucumber disease recognition using sparse representation classification.
Comput. Electron. Agric. 2017, 134, 135–141. [CrossRef]
6. Hossain, S.; Mou, R.M.; Hasan, M.M.; Chakraborty, S.; Razzak, M.A. Recognition and detection of tea leaf’s diseases using
support vector machine. In Proceedings of the 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications
(CSPA), Penang, Malaysia, 9–10 March 2018.
7. Sun, Y.; Jiang, Z.; Zhang, L.; Dong, W.; Rao, Y. SLIC_SVM based leaf diseases saliency map extraction of tea plant. Comput.
Electron. Agric. 2019, 157, 102–109. [CrossRef]
8. Chen, J.; Liu, Q.; Gao, L. Visual tea leaf disease recognition using a convolutional neural network model. Symmetry 2019, 11, 343.
[CrossRef]
9. Hu, G.; Wu, H.; Zhang, Y.; Wan, M. A low shot learning method for tea leaf’s disease identification. Comput. Electron. Agric. 2019,
163, 104852. [CrossRef]
10. Jiang, F.; Lu, Y.; Chen, Y.; Cai, D.; Li, G. Image recognition of four rice leaf diseases based on deep learning and support vector
machine. Comput. Electron. Agric. 2020, 179, 105824. [CrossRef]
11. Sun, X.; Mu, S.; Xu, Y.; Cao, Z.; Su, T. Image recognition of tea leaf diseases based on convolutional neural network. In Proceedings
of the 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Jinan, China, 14–17 December 2018.
12. Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A survey of deep learning-based object detection. IEEE Access 2019, 7,
128837–128868. [CrossRef]
13. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 28. [CrossRef]
14. Zhou, G.; Zhang, W.; Chen, A.; He, M.; Ma, X. Rapid detection of rice disease based on FCM-KM and faster R-CNN fusion. IEEE
Access 2019, 7, 143190–143206. [CrossRef]
15. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
16. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016.
17. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017.
18. Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the
improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [CrossRef]
19. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
20. Roy, A.M.; Bose, R.; Bhaduri, J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural
Comput. Appl. 2022, 34, 3895–3921. [CrossRef]
21. Bochkovskiy, A.; Wang, C.; Liao, H.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
22. Sun, C.; Huang, C.; Zhang, H.; Chen, B.; An, F.; Wang, L.; Yun, T. Individual tree crown segmentation and crown width extraction
from a heightmap derived from aerial laser scanning data using a deep learning framework. Front. Plant Sci. 2022, 13, 914974.
[CrossRef]
23. Dai, G.; Fan, J. An industrial-grade solution for crop disease image detection tasks. Front. Plant Sci. 2022, 13, 921057. [CrossRef]
24. Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the integration of self-attention and convolution. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022.
25. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference
on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018.
26. Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on
Computer Vision (ECCV), Munich, Germany, 8–14 September 2018.
27. Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings
of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019.
28. Lu, Y.H.; Qiu, F.; Feng, H.Q.; Li, H.B.; Yang, Z.C.; Wyckhuys KA, G.; Wu, K.M. Species composition and seasonal abundance of
pestiferous plant bugs (Hemiptera: Miridae) on Bt cotton in China. Crop Prot. 2008, 27, 465–472. [CrossRef]
Forests 2023, 14, 415 18 of 18
29. Qian, J.; Lin, H. A Forest Fire Identification System Based on Weighted Fusion Algorithm. Forests 2022, 13, 1301. [CrossRef]
30. Wang, C.Y.; Liao HY, M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability
of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA,
USA, 14–19 June 2020.
31. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018.
32. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018.
33. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018.
34. Khasawneh, N.; Fraiwan, M.; Fraiwan, L. Detection of K-complexes in EEG signals using deep transfer learning and YOLOv3.
Clust. Comput. 2022, 1–11. [CrossRef]
35. Lin, H.; Tang, C. Intelligent Bus Operation Optimization by Integrating Cases and Data Driven Based on Business Chain and
Enhanced Quantum Genetic Algorithm. IEEE Trans. Intell. Transp. Syst. 2021, 23, 9869–9882. [CrossRef]
36. Lin, H.; Tang, C. Analysis and optimization of urban public transport lines based on multiobjective adaptive particle swarm
optimization. IEEE Trans. Intell. Transp. Syst. 2021, 23, 16786–16798. [CrossRef]
37. Xue, X.; Jin, S.; An, F.; Zhang, H.; Fan, J.; Eichhorn, M.P.; Jin, C.; Chen, B.; Jiang, L.; Yun, T. Shortwave radiation calculation for
forest plots using airborne LiDAR data and computer graphics. Plant Phenom. 2022, 2022, 9856739. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.