Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
35 views

Image Segmentation Using Convolutional Neural Network For Image Annotation

Uploaded by

Anup Kumar Roy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Image Segmentation Using Convolutional Neural Network For Image Annotation

Uploaded by

Anup Kumar Roy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)

IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

Image Segmentation using Convolutional Neural


Network for Image Annotation
S. B. Nemade1
S. P. Sonavane2
Department of Computer Science and Engineering
Department of Information Technology
Walchand College of Engineering
Walchand College of Engineering
Sangli, India
Sangli, India
sangita.nemade@walchandsangli.ac.in
shefali.sonavane@walchandsangli.ac.in

Abstract — Due to rapid advancement in the digital display, accurate image annotation. Image segmentation is used to
communication and storage devices with effective techniques detect boundaries and objects in images. If the underlying
are needed to organize, index, retrieve and annotate a large segmentation is inaccurate then image annotation will not
image database. Image segmentation finds application in provide better results [3]. The good image segmentation aims
various areas of image processing and computer vision. at segmenting the image correctly and labeling each segment
Inferring of low level features from the given image is a with the respective semantic class. Since image segmentation
challenging task in unstructured regions. Most of the is a foundation of region annotation, effective image
annotation and retrieval algorithms fail to consider region segmentation is needed for an accurate region based image
semantics. This paper propose s convolutional neural network
annotation. AIA methods can roughly be classified into five
(CNN) based image segmentation for image annotation
categories[4][5]: (i) Nearest neighbor models, (ii) Generative
application. The proposed CNN includes pixel based prediction
of the regions that are applied to obtain low level image
models, (iii) Discriminative models, (iv) Tag completion-
features. The algorithm uses image region information based based AIA models and v) Deep learning based models.
on the precise color distribution within the image. Nearest neighbor models assume that the images that have
Experimental results demonstrate better results of image similar features may have a high probability of similar labels.
segmentation using CNN. Various models used are Joint Equal Contribution (JEC) [6]
and Tag Propagation (TagProp) [7]. Generative models
Keywords — image Segmentation, Convolutional Neural emphasis on inferring the correlations between image
Network(CNN), Automatic Image Annotation. features and semantic concepts. It finds the probability of a
label by computing joint probability of image features and
I. INTRODUCTION labels from training samples. Some of the generative models
With the advancement of digital capturing technologies, are Cross-Media Relevance Model [8], Multiple Bernoulli
storage devices and communication network, the number of Relevance Model [9], Latent Dirichlet Allocation [10].
digital images have increased rapidly. Hence, it is necessary Discriminative models consider AIA as a classification
to have effective technique for organizing, indexing and problem where, each label/keyword is treated as a class.
searching of these images. Concept-based and content-based Researchers have designed these models for class predictions
image retrieval are two major categories to address problems such as SML [11], SVM [12]. Tag completion-based AIA
related to emerging opportunities [1]. However in CBIR methods predict and automatically fill the missing labels for
system, one of the most significant issues is a semantic gap. any given image ant at the same time correct noisy tags. It
Semantic gap is the dissimilarities between the image consist of various models such as Subspace Clustering and
features extraction interpretation and image features content Matrix Completion model [13], TMC model[14]. Deep
interpretation by Human Visual System(HVS) . Therefore learning based models enable to solve AIA task using feature
from last few years, more research is focused on important representation based on deep learning. For image annotation,
part of concept-based retrieval i.e. automatic image CNN is used for robust features generation. Several models
annotation. The purpose behind the automatic image are used for this, such as CNN-RNN framework used in the
annotation (AIA) is to allocate textual labels to the image RIA model [15] for image annotation, Deep Multiple
that clearly describes content or objects in the image. Many Instance Learning (DMIL) model [16]. The AIA framework
researchers have developed various methods and frameworks is basically consisting of steps such as image segmentation,
for AIA but more research is carried out on complete image feature extraction and classification. Image segmentation is
annotation rather than considering the semantics of the one of the essential modules in AIA. Images containing
regions. Region based image annotation methods have natural scenes has unstructured regions that require
achieved more accurate semantic information than global computationally complex analysis for segmentation. Several
based image annotation [2]. It focuses on each independent segmentation algorithms proposed in the literature use
region of an image, making the visual features more accurate various image elements such as colors, textures, shape and
in order to present a particular semantic concept. Therefore, object compositions [17]–[18]. The image segmentation aims
image region needs to be analyzed and labeled for attaining at segmenting the image correctly since it is a foundation of
the region annotation. Many image segmentation algorithms,

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 838


Authorized licensed use limited to: Indian Institute of Technology Gandhinagar. Downloaded on June 05,2024 at 09:07:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

such as JSEG [17] and NCUT [18] have been designed and pixel with low probability of being part of an object and
are commonly used in image annotation. However, it is improves performance accuracy. The weights of the kernel
ineffective in segmenting region bearing similar color are initialized to one by twenty five which is updated in the
distribution. Texture, color and shape are the important learning / training process so as to predict object through
features within the image. These features play an important image pixels as given in equation (1). The convolution
role in the effective segmentation of the objects or regions. operation is represented using equation (2) that is applied on
Colors are observed by the human eye and are the each color space (RGB) separately.
fundamental feature that differentiates among the objects.
Therefore color enhanced segmentation algorithm using
CNN is proposed. The algorithm attempts to differentiate
objects bearing similar color through separately processing Image Colour Space (RGB)
each R, G and B color space effectively assisted by edge
detection.
Convolutional based networks are the popular
architectures for Image segmentation. Long et al. [19]
developed fully convolutional networks that are pre-trained Convolution Layer
on ImageNet[20] with each pixel classification. For semantic 5 x 5 Kernel, Cnt=1
segmentation, contextual hierarchical model is developed by
Seyedhosseini et al. [21] for learning contextual information
in a hierarchical framework.
The main contribution of this paper is the CNN for image
segmentation for the application of image annotation. The Learning /
algorithm uses image region information based on precise ReLU Layer
Training
color distribution within the image. The paper is organized as
follows. Section II outlines the proposed methodology.
Section III demonstrates the experimental result obtained
after training and classifications of images. Conclusion and
future scope is illustrated in section IV.
II. COLOUR ENHANCED SEGMENTATION ALGORITHM
Max Pooling Layer
USING CNN
Kernel 2 x 2, Cnt=Cnt+1
CNN is made up of one or more convolution layers with
pooling layer and one or more fully connected layers
followed. The CNN architecture is designed in such a way
that it takes advantage of two dimensional input image
structures. This is achieved through local connections and YES
associated weights followed by pooling (average or max Cnt<=3
pooling) resulting in translation- invariant features. With the =3
same number of hidden units, a fully connected network
needs more parameters than CNN. Therefore, it is easier to NO
train CNN than a fully connected network [20].
Fully Connected Activation Layer 1 x 1024
The paper proposes a framework for supervised image
segmentation based on candidate regions. The framework of
the paper consists of two phases of learning and testing. Fig.
1 demonstrates the CNN architecture employed in this
algorithm.
A. Convolutional Layer Classification, Color Region
Merging & Edge Detection
In the proposed algorithm, each layer of image data is a
three-dimensional array of size p x q x r where, p and q are
spatial coordinates of the image pixels and r is the feature or
channel or color dimension. Predicting each pixel’s class
independently of its neighbors results in better region / class
predictions accuracy. Mostly it is observed that objects have Segmented Image with
smooth boundaries and well defined shapes, different from Boundaries
the background that tends to be shapeless regions.
Convolutional layer initially starts with the smoothing
operations to predict a standard pixel with a kernel size of Fig. 1. CNN architecture for image segmentation.
5x5. The smoothing improves algorithm by forcing each

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 839


Authorized licensed use limited to: Indian Institute of Technology Gandhinagar. Downloaded on June 05,2024 at 09:07:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

(1) respectively, whereas number of filters remains to 3. The


entire network is trained by back propagation using the soft-
max function to normalize the logistic regression loss against
the predicted scores. The biases are initialized as constant
∑∑ ∑ ∑ one in the first convolutional layer, as well as the remaining
fully connected layers. After each mini-batch of the size 50
(2) images, the network updates all weights. The training starts
with a learning rate greater than 0.05 (5%), and reduces it to
Where, 0.01 (1%) when the performance of the evaluation set no
longer improves. The algorithm separately processes the R,
are the initial weights of the kernel that are G and B, color spaces through CNN. Thus each object is
updated during training process. marked / identified in every color space and effectively
backed with edge detection. Finally the result of R, G and B
color space are merged that effectively segments the objects.
Output of convolutional layer
III. EXPERIMENTA L RESULTS
B. Rectified Linear Units (ReLU) Layer
The data set used in this experiment is obtained from the
There are many activation functions available for neural Corel-10k database for segmentation purposes [22]. This
network models. Deep learning models mostly use ReLU as dataset contains 10,000 images and 100 categories. The
an activation function due to the reduced probability of training set contains 10k images. The task is to segment the
vanishing gradient. For every positive value of x, it returns image into its various regions that include foreground,
the same value (x), but if it receives a zero or negative value objects, and background. The images have different sources
of x, zero is returned[19]. Therefore, mathematically ReLU such as natural scenes, objects such as person, gun, ships,
can be represented using equation (3). toys, balloon etc. and also images with multiple objects are
0 (3) selected. The images from database are selected randomly to
train the proposed CNN. Test image that is excluded in the
C. Max Pooling Layer
training is taken from the same dataset in order to validate
The accumulation should lead the network towards the algorithm. The experiments are conducted using
correct pixel level assignments for each region within the computer with i7 processor speed 2.8 GHz and 8GB RAM
image, so that segmentation task can be performed properly. and MATLAB version 9.6 software is employed for
An evident accumulation task is to take the average of implementation of proposed CNN. The step-by-step result
all pixel positions under consideration. In Max pooling obtained after the training and the classification of the
layer, the kernel size is selected at 2 x 2 that provides balloon image is shown in Fig. 2
accumulation of 4 pixels values. It increases the score of the
pixels that further helps in classification. The operation
performed in max pooling can be represented using equation
(4). The advantage of this accumulation is that in the
training procedure, pixels with similar scores have a similar
weight, while maintaining the goal of predicting similar
pixels / regions.

∑∑∑∑

(4)

D. Implementation Details
The variation image sizes are standardized using image
resize at 284x284x3. The three stage convolutional neural
networks are illustrated in fig. 1. The input given to network
is of the size 284x284x3. Three filters of size 5×5 are used in
first convolutional stage. The activations size is 280*280, as
no padding is used. These are normalized in the neighboring
feature map. With a stride of one, no overlapping spatial
pooling is applied in 2x2 local regions
Fig. 2 (a) Original image, (b) The first channel output of CNN, (c) The
The second and third stages are similar to the first, except second channel output of CNN,(d) The third channel output of CNN ,(e)-
that the size of the input data is 140x140 and 68x68 (g) Result of Edge detection using canny after each layer, (h) Result of
color region merging, (i) Segmented image.

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 840


Authorized licensed use limited to: Indian Institute of Technology Gandhinagar. Downloaded on June 05,2024 at 09:07:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

Segmented results are illustrated in Fig. 3 after training Original Image Segmented Image
and classification. In the learning phase, images are applied
to the CNN sequentially which updates the weights of the
filter in each layer. It is proposed to start with a learning rate
greater than 0.05 (5%), and reduces it to 0.01 (1%) when the
performance of the evaluation set no longer improves. These
updated weights are applied in testing phase that performs
segmentation. The results clearly indicate that the proposed
image segmentation algorithm using CNN segments regions
within the images. It clearly segments foreground, objects (a) Balloon
and background in the images. The algorithm has been
further improved therefore it predicts the map of an image
back to its original size without any post-processing. Finally
results are obtained in multiple separated region of an
image. The metrics commonly used for evaluating image
segmentation are variations in pixel accuracy.Let be the
number of pixels of class i predicted belonging to class j and
let be the total number of pixels of class i. The pixel (b) Gun
accuracy (pa) is given by equation (5) and tabulated in table
I & II for images containing single and multiple objects
respectively.


*100 (5)

The obtained experimental results are compared


with early hierarchical contexts learned by CNN [23] and
image segmentation based on CNN and conditional random (c) Santa
field [24] depicted in table III & IV respectively. Technique
in [23] is applied for single object segmentation whereas
technique in [24] is applied for multiple objects.

It is clearly observed that the proposed technique is


suitable for multiple and single image segmentation
application. Careful observation of the resultant images
shows that proposed algorithm effectively segments the (d) Doll and toy
objects based color space. Each object consists of some
uniform pattern of color which is made from combination of
R, G and B color space. Processing each color space
separately evaluates each object with uniform distribution
followed with edge detection. Finally, merging of each color
space with edges enhances the major boundaries of the
object whereas suppresses the minor regions within the
image.
(e) Many balloons

Fig. 3(c) and 3(d) illustrate some minor boundaries


within the object and highlights the major boundaries.

Table I. Pixel accuracy for image containing single object

Sr. No. Image Type (Single Object) Pixel Accuracy (% )


1 Balloon 98
2 Gun 90 (f) Many Objects

3 Santa 91
Fig. 3. Segmentation results for images containing single and multiple
objects.

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 841


Authorized licensed use limited to: Indian Institute of Technology Gandhinagar. Downloaded on June 05,2024 at 09:07:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

Table II. Pixel accuracy for image containing multiple objects [5] Q. Cheng,Q. Zhang, P. Fu, C.Tu, S. Li, “A survey and analysis on
automatic image annotation,” Elsevier Pattern Recognition, vol. 79,
Sr. No. Image Type (Multiple Objects) Pixel Accuracy (% ) pp. 242-259, Feb. 2018.
1 Doll and toy 93 [6] A. Makadia, V. Pavlovic, and S. Kumar, “A new baseline for image
annotation,” In Proc. 10th Eur. Conf. Comput. Vis., pp. 316–329,
2 Many Balloons 86 2008.
3 Many Objects 92 [7] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, “TagProp:
Discriminative metric learning in nearest neighbor models for image
autoannotation,” In Proc. IEEE 12th Int. Conf. Comput. Vis., pp.
309–316, 2009.
Table III. Comparison of pixel accuracy for single object
[8] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic image
Image Type Pixel Accuracy (% ) Pixel Accuracy annotation and retrieval using cross-media relevance models,” In
with Proposed (% ) with Proceedings of the 26th Annual International ACM SIGIR
Method technique in [23] Conference on Research and Development in Informaion Retrieval,
SIGIR ’03, pp. 119–126, 2003.
Single object 93 (average) 86.83
[9] S. L. Feng, R. Manmatha and V. Lavrenko, “Multiple bernoulli
relevance models for image and video annotation,” In Proceedings of
the 2004 IEEE Computer Society Conference on Computer Vision
Table IV. Comparison of pixel accuracy for multiple objects and Pattern Recognition, CVPR’04, pp. 1002–1009, 2004.
Image Type Pixel Accuracy (% ) Pixel Accuracy [10] D. Putthividhy, H. Attias and S. Nagarajan, “Topic regressionmulti-
with Proposed (% ) with modal latent Dirichlet allocation for image annotation,” In
Method technique in [24] Proceedings of the 2004 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, CVPR’10, pp. 3408–3415,
Multiple objects 90.33 (average) 91.7 2010.
[11] G. Carneiro, A. B. Chan, P. J. Moreno, and N. Vasconcelos, “
Supervised learning of semantic classes for image annotation and
IV. CONCLUSION retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3,pp.
394–410, Mar. 2007.
This paper focuses on CNN based image segmentation [12] Y. Verma and C. Jawahar, “Exploring svm for image annotation in
for image annotation application. The proposed CNN presence of confusing labels ,” In Proceedings of the 24th British
includes a prediction of the region using image pixels that Machine Vision Conference, 2013.
[13] Y. Hou , Z. Lin , “Image tag completion and refinement by subspace
are applied to obtain low level image features. The clustering and matrix completion,” Vis. Commun. Image Process .
algorithm uses image region information based on precise (VCIP) ,pp.1–4, 2015.
color distribution within the image. Experimental results [14] L. Wu , R. Jin , A.K. Jain, “Tag completion for image retrieval,”
demonstrate better image segmentation using CNN. Pixel IEEE Trans. Pattern Anal. Mach. Intell.,vol. 35,no.3,pp.716, 2013.
accuracy is calculated to measure the performance of the [15] Jin , H. Nakayama , “Annotation order matters: recurrent image
algorithm to predict and segment foreground, background annotator for arbitrary length image tagging,” International
Conference on Pattern Recognition, pp. 2452–2457 ,2017.
and objects in the image. Mean filter at the convolution
[16] J. Wu , Y. Yu , C. Huang , K. Yu , “Deep multiple instance learning
layer and accumulation at the max pooling layer effectively for image classification and auto-annotation,” In Proceedings of the
improve the accuracy of the segmentation algorithm. To IEEE Conference on Computer Vision and Pattern Recognition, pp.
smoothen the boundaries or ensure consistent segmentation, 3460–3469,2015.
more sophisticated post processing can be applied. Further, [17] Deng Y., Manjunath B.S., Shin H, “Color image segmentation”, In
the algorithm can be improved to region based segmentation Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Fort Collins, CO, USA, pp. 446-451, 1999.
for semantic segmentation and automatic image annotation
[18] C. T. Zahn, "Graph-Theoretical Methods for Detecting and
applications. Describing Gestalt Clusters," in IEEE Transactions on Computers ,
vol. C-20, no. 1, pp. 68-86, 1971.
REFERENCES [19] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks
for semantic segmentation," IEEE Conference on Computer Vision
[1] J. Zhang, Y. Mu, S. Feng, K. Li, Y. Yuan, Chin-Hui Lee, “Image and Pattern Recognition (CVPR), Boston, pp. 3431-3440,2015.
region annotation based on segmentation and semantic correlation [20] A. Krizhevsky, I. Sutskever, and G. Hinton., "Imagenet classification
analysis,” IET Image Processing, vol.12, no.8, pp.1331-1337, Mar. with deep convolutional neural networks," In Advances in neural
2018. information processing systems , pp. 1097-1105, 2012.
[2] Y. Wang, T. Mei, S.G. Gong, X.S Hua, “Combining global, regional [21] Seyedhosseini, Mojtaba, and Tolga Tasdizen, “Semantic Image
and contextual features for automatic image annotatio n,” Elsevier Segmentation with Contextual Hierarchical Models,” IEEE
Pattern Recognition, vol.42, no.2, pp.259-266, Feb. 2009. transactions on pattern analysis and machine intelligence vol. 38,no.
[3] A. Morales-González, E García-Reyes, LE Sucar, “Image Annotation 5, pp. 951-64, 2016.
by a Hierarchical and Iterative Combination of Recognition and [22] Guang-Hai Liu, Jing-Yu Yang, etc, “Content-based image retrieval
Segmentation,” World Scientific International Journal of Pattern using computational visual attention model,” Pattern Recognition,
Recognition and Artificial Int,lligence, vol. 32, no. 01,pp.1860014, vol. 48,no8 ,pp. 2554-2566,2015.
Feb. 2018. [23] Zifeng Wu, et. al, “Early Hierarchical Contexts Learned by
[4] X. Jing, F. Wu, Z. Li, R. Hu, D. Zhang, “Multi-Label Dictionary Convolutional Networks for Image Segmentation,” IEEE 22nd
Learning for Image Annotation,” IEEE Trans. Image Processing, vol. International Conference on Pattern Recognition Stockholm, Sweden,
25, no. 6, pp.2712-2725 , June 2016. pp. 1538 – 1543, 2014.
[24] Hu Tao, et. al, “Image Semantic Segmentation Based on
Convolutional Neural Network and Conditional Random Field,”

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 842


Authorized licensed use limited to: Indian Institute of Technology Gandhinagar. Downloaded on June 05,2024 at 09:07:47 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

IEEE 10th International Conference on Advanced Computational


Intelligence (ICACI), China, pp. 568 – 572, 2018.

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 843


Authorized licensed use limited to: Indian Institute of Technology Gandhinagar. Downloaded on June 05,2024 at 09:07:47 UTC from IEEE Xplore. Restrictions apply.

You might also like