Bird Image Retrieval and Recognition Using A Deep Learning Platform
June 4, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2918274
ABSTRACT Birdwatching is a common hobby but to identify their species requires the assistance of bird
books. To provide birdwatchers a handy tool to admire the beauty of birds, we developed a deep learning
platform to assist users in recognizing 27 species of birds endemic to Taiwan using a mobile app named
the Internet of Birds (IoB). Bird images were learned by a convolutional neural network (CNN) to localize
prominent features in the images. First, we established and generated a bounded region of interest to refine
the shapes and colors of the object granularities and subsequently balanced the distribution of bird species.
Then, a skip connection method was used to linearly combine the outputs of the previous and current layers
to improve feature extraction. Finally, we applied the softmax function to obtain a probability distribution
of bird features. The learned parameters of bird features were used to identify pictures uploaded by mobile
users. The proposed CNN model with skip connections achieved higher accuracy of 99.00 % compared
with the 93.98% from a CNN and 89.00% from the SVM for the training images. As for the test dataset,
the average sensitivity, specificity, and accuracy were 93.79%, 96.11%, and 95.37%, respectively.
INDEX TERMS Bird image recognition, convolutional neural network, deep learning, mobile app.
Ultimately, information obtained from a bird image uploaded digital cameras in smartphones are the most pervasive tools
by an end-user, captured using a mobile camera, can be used for recognizing the salient features of physical objects,
navigated through the client–server architecture to retrieve enabling users to detect, identify objects and share related
information and predict bird species from the trained model knowledge. Birds present in a flock are often deeply colorful;
stored on the server. This process facilitates efficient correla- therefore, identification at a glance is challenging for both
tion of fine-grained object parts and autonomous bird identifi- birdwatchers and onlookers because of birds’ ambiguous
cation from captured images and can contribute considerable, semantic features [19]. To address this problem, an infor-
valuable information regarding bird species. mation retrieval model for birdwatching has been proposed
The remainder of this paper is organized as follows. that uses deep neural networks to localize and clearly
Section II briefly reviews related approaches for fine-grained describe bird features with the aid of an Android smart-
visual categorization. Section III describes the various types phone [20], [21].
of dataset used for feature extraction. Section IV focuses
on the deep learning model and its features used in object III. DATA ACQUISITION
part models, and describes the correlation between part Feature extraction is vital to the classification of relevant
localization and fine-grained feature extraction. Section IV information and the differentiation of bird species. We com-
also describes various correlation requirements, such as data bined bird data from the Internet of Birds (IoB) and an
augmentation, for excellent performance, localization, seg- Internet bird dataset to learn the bird species.
mentation, identification of subcategories, as well as the
requirement of a classifier for effective object prediction.
The experimental results and analysis of the datasets are
presented in Section V. Section VI summarize the discussion
and limitation part of the study. Conclusions and directions
for future study are provided in Section VII.
In this subsection, we explain using a high-resolution smart-
phone camera to identify and classify bird information [40]
based on deep learning. To complete the semantic bird search
D. FEATURE EXTRACTION task, we established a client–server architecture to bridge the
Extracting features from raw input images is the primary task communication gap between the cloud and mobile device
when extracting relevant and descriptive information for fine- over a network. The entire setup was executed in the follow-
grained object recognition [36]–[38]. However, because of ing manner:
semantic and intraclass variance, feature extraction remains • Raw bird images were distilled to remove irrelevant
challenging. We separately extracted the features in relevant parts and learned by the CNN to yield parameters on the
positions for each part of an image and subsequently learned GPU platform. Subsequently, a TF inference model [41]
the parts of the model features that were mapped directly to was developed in the workstation for deployment in the
the corresponding parts. The features were calculated using smartphone.
ReLU 5 and ReLU 6. Localization was used to find object • The output was detected using an Android app
parts defined by bounding box coordinates and their dimen- platform or through the web.
sions (width and height) in the image [39]. For the localiza- On the workstation/server side, the following segments
tion task an intersection over union score >0.5 was set for were considered. The TF backend session model for object
our model. An FC layer with a ReLU was used to predict the detection was prepared to save the TF computation graphs
location of bounding box Bx . Subsequent steps of the learning of input, output, weight, and bias as graph_def text files
algorithm were for learning the map of the feature vectors (tfdroid.pbtxt), which comprised the entire architecture of the
of the input image, deciding whether the region fit an object model. The CNN architecture was trained to load the raw
class of interest, and then classifying the expected output with input data of bird images using Keras [42] callbacks with
the correct labels in the image. For a given image, feature the predefined parameters into TF format to fit the model
vectors represent the probability of target object centrality in for inference. After training the model, the parameters of all
Performance comparison of the three models for the training
