Computational Intelligence and Neuroscience - 2017 - Sun - Deep Learning For Plant Identification in Natural Environment
Computational Intelligence and Neuroscience - 2017 - Sun - Deep Learning For Plant Identification in Natural Environment
Computational Intelligence and Neuroscience - 2017 - Sun - Deep Learning For Plant Identification in Natural Environment
Research Article
Deep Learning for Plant Identification in Natural Environment
Copyright © 2017 Yu Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Plant image identification has become an interdisciplinary focus in both botanical taxonomy and computer vision. The first plant
image dataset collected by mobile phone in natural scene is presented, which contains 10,000 images of 100 ornamental plant species
in Beijing Forestry University campus. A 26-layer deep learning model consisting of 8 residual building blocks is designed for large-
scale plant classification in natural environment. The proposed model achieves a recognition rate of 91.78% on the BJFU100 dataset,
demonstrating that deep learning is a promising technology for smart forestry.
1. Introduction other leaves. The experiment shows that the neural network
is more effective in identifying the vein images. Li et al.
Automatic plant image identification is the most promising [6] proposed an efficient leaf vein extraction method by
solution towards bridging the botanical taxonomic gap, combining snakes technique with cellular neural networks,
which receives considerable attention in both botany and which obtained satisfactory results on leaf segmentation.
computer community. As the machine learning technol- He and Huang used the probabilistic neural network as a
ogy advances, sophisticated models have been proposed classifier to identify the plant leaf images, which has a better
for automatic plant identification. With the popularity of identification accuracy comparing to BP neural network [7].
smartphones and the emergence of Pl@ntNet mobile apps In 2013, the idea of natural-based leaf recognition was pro-
[1], millions of plant photos have been acquired. Mobile- posed, and the method of contour segmentation algorithm
based automatic plant identification is essential to real-world based on polygon leaf model was used to obtain contour
social-based ecological surveillance [2], invasive exotic plant image [8]. With the deep learning becoming a hot spot in
monitor [3], ecological science popularization, and so on. the field of image recognition, Liu and Kan proposed texture
Improving the performance of mobile-based plant identifi- features in combination with shape characteristics, using
cation models attracts increased attention from scholars and deep belief network architecture as a classifier [9]. Zhang
engineers. et al. designed a deep learning system which includes eight
Nowadays, many efforts have been conducted in extract- layers of Convolution Neural Network to identify leaf images
ing local characteristics of leaf, flower, or fruit. Most and achieved a higher recognition rate. Some researchers
researchers use variations on leaf characteristic as a com- focus on the flowers. Nilsback and Zisserman proposed a
parative tool for studying plants, and some leaf datasets method of bag of visual word to describe the color, shape,
including Swedish leaf dataset, Flavia dataset, and ICL texture features, and other characteristics [10]. In [11], Zhang
dataset are standard benchmark. In [4], Söderkvist extracted et al. combined Harr features with SIFT features of flower
shape characteristics and moment features of the leaves and image, coding them with nonnegative sparse coding method
analyzed the 15 different Swedish tree classes using back and classifying them by k-nearest neighbor method. In [12],
propagation for the feed-forward neural network. In [5], Fu et they raised a method of recognizing the picking rose by
al. chose the local contrast and other parameters to describe integrating BP neural network. The studies of identifying
the characteristics of the surrounding pixels of veins. The plants by fruit are relatively rare. Li et al. proposed the method
artificial neural network was used to segment the veins and of multifeature integration using preference Ainet as the
8483, 2017, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2017/7361042, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 Computational Intelligence and Neuroscience
recognition algorithm [13]. After so many years continued object detection, and many other domains such as drug
exploration into plant recognition technology, the dedicated discovery and genomics [17]. The deep convolutional neural
mobile applications such as LeafSnap [14], Pl@ntNet [1], networks proposed in [18] demonstrated outstanding perfor-
or Microsoft Garage’s Flower Recognition app [15] can be mance in the large-scale image classification task of ILSVRC-
conveniently used for identify plants. 2012 [19]. The model was trained on more than one million
Although the research on automatic plant taxonomy has images and has achieved a winning top-5 test error rate of
yield fruitful results, one must note that those models are 15.3% over 1,000 classes. It almost halved the error rates of the
still far from the requirements of a fully automated ecological best competing approaches. This success has brought about
surveillance scenario [3]. The aforesaid datasets lack the a revolution in computer vision [17]. Recent progress in the
mobile-based plant images acquired in natural scene which field has advanced the feasibility of deep learning applications
vary greatly in contributors, cameras, areas, periods of the to solve complex, real-world problems [20].
year, individual plants, and so on. The traditional classi-
fication models rely heavily on preprocessing to eliminate 2.1. BJFU100 Dataset. The BJFU100 dataset is collected from
complex background and enhance desiring features. What natural scene by mobile devices. It consists of 100 species
is more, the handcraft feature engineering is incapable of of ornamental plants in Beijing Forestry University cam-
dealing with large-scale datasets consisting of unconstrained pus. Each category contains one hundred different photos
images. acquired by smartphone in natural environment. The smart-
To overcome aforementioned challenges and inspired by phone is equipped with a prime lens of 28 mm equivalent
the deep learning breakthrough in image recognition, we focal length and a RGB sensor of 3120 × 4208 resolution.
acquired the BJFU100 dataset by mobile phone in natural For tall arbors, images were taken from a low angle at
environment. The proposed dataset contains 10,000 images of ground as shown in Figures 1(a)–1(d). Low shrubs were shot
100 ornamental plant species in Beijing Forestry University from a high angle, as shown in Figures 1(e)–1(h). Other
campus. A 26-layer deep learning model consisting of 8 ornamental plants were taken from a level angle. Subjects may
residual building blocks is designed for uncontrolled plant vary in size by an order of magnitude (i.e., some images show
identification. The proposed model achieves a recognition only the leaf, others an entire plant from a distance), as shown
rate of 91.78% on the BJFU100 dataset. in Figures 1(i)–1(l).
2. Proposed BJFU100 Dataset and Deep 2.2. The Deep Residual Network. With the network depth
Learning Model increasing, traditional methods are not as expected to
improve accuracy but introduce problems like vanishing
Deep learning architectures are formed by multiple linear gradient and degradation. The residual network, that is,
and nonlinear transformations of input data, with the goal ResNet, introduces skip connections that allow the informa-
of yielding more abstract and discriminative representations tion (from the input or those learned in earlier layers) to flow
[16]. These methods have dramatically improved the state- more into the deeper layers [23, 24]. With increasing depth,
of-the-art in speech recognition, visual object recognition, ResNets give better function approximation capabilities as
8483, 2017, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2017/7361042, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational Intelligence and Neuroscience 3
F(x)
Input Input
x
3 3 1 3 1
3 3 1 3 1
+ F(x) + x +
Output Output
(a) (b)
Figure 2: (a) A basic building block. (b) A “bottleneck” building block of deep residual networks.
112 × 112 × 64
56 × 56 × 64
28 × 28 × 128
14 × 14 × 256
7 × 7 × 512
1 × 1 × 512
Input ..
.
Average pool
they gain more parameters and successfully contribute to The baseline building block is shown in Figure 2(a). A
solving vanishing gradient and degradation problems. Deep shortcut connection is added to each pair of 3 × 3 filters.
residual networks with residual units have shown compelling Concerning the training time on deeper nets, a bottleneck
accuracy and nice convergence behaviors on several large- building block is designed as in Figure 2(b). The three layers
scale image recognition tasks, such as ImageNet [23] and MS are 1 × 1, 3 × 3, and 1 × 1 convolutions, where the 1 ×
COCO [25] competitions. 1 layers are responsible for reducing and then restoring
dimensions, leaving 3 × 3 layer a bottleneck with smaller
2.2.1. Residual Building Blocks. Residual structural unit uti- input/output dimensions [23]. Bottleneck building blocks use
lizes shortcut connections with the help of identity mapping. fewer parameters to obtain more abstraction of layers.
Shortcut connections are those skipping one or more layers. The overall network architecture of our 26-layer ResNet,
The original underlying mapping can be realized by feed- that is, ResNet26, model is depicted in Figure 3. As Figure 3
forward neural networks with shortcut connections. The shows, the model is mainly designed by using bottleneck
building block illustrated in Figure 2 is defined as building blocks. The input image is fed into a 7 × 7 convo-
lution layer and a 3 × 3 max pooling layer followed by 8 bot-
𝑦 = 𝐹 (𝑥, {𝑊𝑖 }) + 𝑥, tleneck building blocks. When the dimensions increase, 1 ×
1 convolution is used in bottleneck to match dimensions.
𝐹 = 𝑊2 𝜎 (𝑊1 𝑥) , (1)
The 1 × 1 convolution enriches the level of abstraction and
𝜎 (𝑎) = max (0, 𝑎) , reduces the time complexity. The network ends with a global
average pooling, a fully connected layer, and a softmax layer.
where 𝑥 and 𝑦 are the input and output vectors of stacked We adopt batch normalization (BN) [27] right after each con-
layers, respectively. The function 𝐹(𝑥, {𝑊𝑖 }) represents the volution layer and before ReLU [26] activation layer. Down-
residual mapping that needs to be learned. The function 𝜎(𝑎) sampling is performed by the first convolution layer, the max
denotes ReLU [26] and the biases are omitted for simplifying pooling layer, and the 3, 5, and 7 bottleneck building blocks.
notations. The dimensions of 𝑥 and 𝐹 must be equal to
perform the element-wise addition. If this is not the case, a
linear projection 𝑊𝑠 is applied to match the dimensions of 𝑥 3. Experiments and Results
and 𝐹:
3.1. Implementation and Preprocess. The model implementa-
𝑦 = 𝐹 (𝑥, {𝑊𝑖 }) + 𝑊𝑠 𝑥. (2) tion is based on the open source deep learning framework
8483, 2017, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2017/7361042, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Computational Intelligence and Neuroscience
1.0 0.92
0.90
0.8
0.88
0.6
0.86
0.4
0.84
0.2 0.82
0.0 0.80
0 10 20 30 40 50 18 26 34 50
Epoch ResNet layers
Figure 4: Evolution of classification accuracy in the test set. Figure 5: Test accuracy of the ResNet18, ResNet34, ResNet50 [23],
and ResNet26 model. The proposed ResNet26 outperforms the best
reference ResNet by 2.51%.
keras [28]. All the experiments were conducted on a Ubuntu
16.04 Linux server with a 3.40 GHz i7-3770 CPU (16 GB
memory) and a GTX 1070 GPU (8 GB memory). The 100 3.3. Results Analysis. To find the best deep residual network,
samples of each class are split into 80 training samples and a series of experiments have been conducted on BJFU100
20 test samples. Compared with conventional classification dataset. Figure 5 shows the comparison of test accuracy
methods, data preprocess on deep learning approaches is among the proposed ResNet26 model and the original ResNet
much simpler. In this paper, the inputs to the network are model of 18, 34, and 50 layers [23] designed for ImageNet.
RGB color images. All the images only need to be rescaled The ResNet18, ResNet34, and ResNet50 yield a test accuracy
to 224 × 224 pixels and then per-pixel value is divided by 255. of 89.27%, 88.28%, and 86.15%, respectively. The proposed
ResNet26 results in 91.78% accuracy which increases the
3.2. Training Algorithm. During the back propagation phase, overall efficiency up to 2.51%.
the model parameter is trained by the stochastic gradi- The ResNet26 is the best tradeoff between model capacity
ent descent (SGD) algorithm, with the categorical cross- and optimization difficulty. For the size of BJFU100, ResNet26
entropy loss function as optimization object. The SGD can contains enough trainable parameter to learn the discrim-
be expressed as follows: inative feathers, which prevents underfitting. Compared to
larger model, ResNet26 results in fast and robust convergence
during SGD optimization, which prevents overfitting or falls
𝛿𝑥 = 𝑤𝑥+1 (𝜎 (𝑤𝑥+1 ∙ 𝑐𝑥 + 𝑏𝑥+1 ) ∘ up (𝛿𝑥+1 )) ,
into local optimum.
(3)
Δ𝑤𝑥 = −𝜂 ∙ ∑ (𝛿𝑥 ∘ down (𝑆𝑥−1 )) ,
𝑖,𝑗 4. ResNet26 on Flavia Dataset
To show the effectiveness of the proposed ResNet26 model,
where 𝛿𝑥 is sensitivity, 𝑤𝑥+1 is multiplicative bias, ∘ indicates
a series of experiments have been performed on the publicly
that each element is multiplied, up is upsampling, down is
available Flavia [29] leaf dataset. It comprises 1907 images of
downsampling, Δ𝑤𝑥 represents the weight update of the layer,
1600 × 1200 pixels, with 32 categories. Some of the samples
and 𝜂 is the learning rate. The cross-entropy loss function is
are shown in Figure 6. We randomly select 80% of the dataset
defined to be
for training and 20% for testing.
All the images are doubled and resized to 224 × 224
𝑒𝑓𝑦𝑖
𝐿 𝑖 = − log ( ), (4) pixels. Per-pixel value is divided by the maximum value and
∑𝑗 𝑒𝑓𝑗 subtracted the mean values of the data.
The training algorithm is exactly the same as that applied
where 𝑓𝑗 is the 𝑗th element in the classification score vector to the BJFU100 dataset. Figure 7 shows the training process
𝑓. of ResNet26 model. Test accuracy improves quickly since the
After some preliminary training experiments, the base first epochs and stabilizes after 30 epochs.
learning rate is set to 0.001, which is gradually reduced at The test accuracy of each model is estimated by 10-
each epoch. The decay rate is 10−6 and the momentum is fold cross-validation, as visualized in Figure 8. The ResNet18,
0.9. Figure 4 shows the training process of ResNet26 model. ResNet34, and ResNet50 achieve a test accuracy of 99.44%,
Test accuracy improves quickly since the first epochs and 98.95%, and 98.60%, respectively. The proposed ResNet26
stabilizes after 40 epochs. gains 99.65% accuracy which increases the overall efficiency
8483, 2017, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2017/7361042, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational Intelligence and Neuroscience 5
1.00
0.98
0.94
0.92
0.90
18 26 34 50
ResNet layers
Figure 6: Example images of the Flavia dataset.
Figure 8: Test accuracy of the ResNet18, ResNet34, ResNet50 [23],
and ResNet26 model on Flavia dataset. The proposed ResNet26
outperforms the best reference ResNet by 0.21%.
1.0
0.8
Table 1: Recognition rate comparison on Flavia dataset.
Test accuracy (%)
Conflicts of Interest
up to 0.21%. Table 1 summarizes our result and other pre- The authors declare that they have no conflicts of interest.
viously published results on Flavia [29] leaf dataset. The
ResNet26 model achieves a 0.28% improvement compared
with the best-performing method. Authors’ Contributions
Yu Sun and Yuan Liu contributed equally to this work.
5. Conclusion
Acknowledgments
The first mobile device acquired BJFU100 dataset containing
10,000 images of 100 plant species which provides data pillar This work was supported by the Fundamental Research
stone for further plant identification study. We continue to Funds for the Central Universities: YX2014-17 and TD2014-
expand the BJFU100 dataset by wider coverage of species and 01.
seasons. The dataset is open for academic community, which
is available at http://pan.baidu.com/s/1jILsypS. This work also
studied a deep learning approach to automatically discover References
the representations needed for classification, allowing use [1] A. Joly, H. Goëau, P. Bonnet et al., “Interactive plant identifica-
of a unified end-to-end pipeline for recognizing plants in tion based on social image data,” Ecological Informatics, vol. 23,
natural environment. The proposed model ResNet26 results pp. 22–34, 2014.
in 91.78% accuracy in test set, demonstrating that deep [2] H. Goëau, P. Bonnet, and A. Joly, “LifeCLEF plant identification
learning is the promising technology for large-scale plant task 2015,” in Proceedings of the Conference and Labs of the
classification in natural environment. Evaluation Forum (CLEF ’15), 2015.
In future work, the BJFU100 database will be expanded [3] H. Goëau, P. Bonnet, and A. Joly, “Plant identification in an
by more plant species at different phases of life cycle and open-world (lifeclef 2016),” in Proceedings of the CLEF working
more detailed annotations. The deep learning model will be notes, vol. 2016, 2016.
8483, 2017, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2017/7361042, Wiley Online Library on [13/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Computational Intelligence and Neuroscience
[4] O. Söderkvist, Computer Vision Classification of Leaves from for leaf recognition,” in Proceedings of the 55th International
Swedish Trees, 2001. Symposium (ELMAR ’13), pp. 23–26, September 2013.
[5] H. Fu, Z. Chi, J. Chang, and C. Fu, “Extraction of leaf vein feat- [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
ures based on artificial neural network—Studies on the living for image recognition,” in Proceedings of the IEEE Conference on
plant identification I,” Chinese Bulletin of Botany, vol. 21, pp. Computer Vision and Pattern Recognition (CVPR ’16), pp. 770–
429–436, 2003. 778, Las Vegas, Nev, USA, June 2016.
[6] Y. Li, Q. Zhu, Y. Cao, and C. Wang, “A leaf vein extraction [24] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep
method based on snakes technique,” in Proceedings of the Inter- residual networks,” in Proceedings of the European Conference
national Conference on Neural Networks and Brain (ICNN&B on Computer Vision, pp. 630–645, 2016.
’05), pp. 885–888, 2005. [25] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation
[7] P. He and L. Huang, “Feature extraction and recognition of plant via multi-task network cascades,” in Proceedings of the IEEE
leaf,” Journal of Agricultural Mechanization Research, vol. 6, p. Conference on Computer Vision and Pattern Recognition (CVPR
52, 2008. ’16), pp. 3150–3158, Las Vegas, Nev, USA, June 2016.
[8] G. Cerutti, L. Tougne, J. Mille, A. Vacavant, and D. Coquin, [26] V. Nair and G. E. Hinton, “Rectified linear units improve
“Understanding leaves in natural images - a model-based restricted boltzmann machines,” in Proceedings of the 27th
approach for tree species identification,” Computer Vision and International Conference on Machine Learning (ICML ’10), pp.
Image Understanding, vol. 117, no. 10, pp. 1482–1501, 2013. 807–814, June 2010.
[9] N. Liu and J.-M. Kan, “Plant leaf identification based on the [27] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
multi-feature fusion and deep belief networks method,” Journal network training by reducing internal covariate shift,” https://
of Beijing Forestry University, vol. 38, no. 3, pp. 110–119, 2016. arxiv.org/abs/1502.03167.
[10] M.-E. Nilsback and A. Zisserman, “Delving deeper into the [28] https://keras.io/.
whorl of flower segmentation,” Image and Vision Computing, [29] S. G. Wu, F. S. Bao, E. Y. Xu, Y.-X. Wang, Y.-F. Chang, and Q.-
vol. 28, no. 6, pp. 1049–1062, 2010. L. Xiang, “A leaf recognition algorithm for plant classification
[11] C. Zhang, J. Liu, C. Liang, Q. Huang, and Q. Tian, “Image using probabilistic neural network,” in 2007 IEEE International
classification using Harr-like transformation of local features Symposium on Signal Processing and Information Technology, pp.
with coding residuals,” Signal Processing, vol. 93, no. 8, pp. 2111– 11–16, Giza, Egypt, December 2007.
2118, 2013.
[12] Y. J. Wang, Y. W. Zhang, D. L. Wang, X. Yin, and W. J. Zeng,
“Recognition algorithm of edible rose image based on neural
network,” Journal of China Agricultural University, vol. 19, no. 4,
pp. 180–186, 2014.
[13] X. Li, L. Li, Z. Gao, J. Zhou, and S. Min, “Image recognition
of camellia fruit based on preference for aiNET multi-features
integration,” Transactions of the Chinese Society of Agricultural
Engineering, vol. 28, no. 14, pp. 133–137, 2012.
[14] N. Kumar, P. N. Belhumeur, A. Biswas et al., “Leafsnap: a com-
puter vision system for automatic plant species identification,”
in Proceedings of the Computer Vision—ECCV 2012, pp. 502–
516, 2012.
[15] https://www.microsoft.com/en-us/research/project/flowerre-
co/.
[16] Y. Bengio, A. Courville, and P. Vincent, “Representation learn-
ing: a review and new perspectives,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp.
1798–1828, 2013.
[17] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
vol. 521, no. 7553, pp. 436–444, 2015.
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi-
fication with deep convolutional neural networks,” in Advances
in neural information processing systems, pp. 1097–1105, 2012.
[19] http://www.image-net.org/challenges/LSVRC/2012/.
[20] B. Huval, T. Wang, S. Tandon et al., “An empirical evaluation
of deep learning on highway driving,” https://arxiv.org/abs/
1504.01716.
[21] A. Kulkarni, H. Rai, K. Jahagirdar, and P. Upparamani, “A leaf
recognition technique for plant classification using RBPNN and
Zernike moments,” International Journal of Advanced Research
in Computer and Communication Engineering, vol. 2, pp. 984–
988, 2013.
[22] C. Sari, C. B. Akgül, and B. Sankur, “Combination of gross shape
features, fourier descriptors and multiscale distance matrix