Abstract
Content-Based Image Retrieval (CBIR) method analyzes the content of an image and extracts the features to describe images, also called the image annotations (or called image labels). A machine learning (ML) algorithm is commonly used to get the annotations, but it is a time-consuming process. In addition, the semantic gap is another problem in image labeling. To overcome the first difficulty, Google Cloud Vision API is a solution because it can save much computational time. To resolve the second problem, a transformation method is defined for mapping the undefined terms by using the WordNet. In the experiments, a well-known dataset, Pascal VOC 2007, with 4952 testing figures is used and the Cloud Vision API on image labeling implemented by R language, called Cloud Vision API. At most ten labels of each image if the scores are over 50. Moreover, we compare the Cloud Vision API with well-known ML algorithms. This work found this API yield 42.4% mean average precision (mAP) among the 4,952 images. Our proposed approach is better than three well-known ML algorithms. Hence, this work could be extended to test other image datasets and as a benchmark method while evaluating the performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bashir, F.I., Khokhar, A.A., Schonfeld, D.: Object trajectory-based activity classification and recognition using hidden markov models. IEEE Trans. Image Process. 16(7), 1912–1919 (2007)
Chang, S.-F., Ma, W.-Y., Smeulders, A.: Recent advances and challenges of semantic image/video search. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. IV-1205. IEEE (2007)
Dorai, C., Venkatesh, S.: Bridging the semantic gap with computational media aesthetics. IEEE Multimed. 10(2), 15–17 (2003)
Fang, Q., Xu, C., Sang, J., Hossain, M., Ghoneim, A.: Folksonomy-based visual ontology construction and its applications. IEEE Trans. Multimed. 18(4), 702–713 (2016)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. II-1002. IEEE (2004)
Feng, S., Feng, Z., Jin, R.: Learning to rank image tags with limited training examples. IEEE Trans. Image Process. 24(4), 1223–1234 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Hong, R., Yang, Y., Wang, M., Hua, X.-S.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)
Hu, X., Li, K., Han, J., Hua, X., Guo, L., Liu, T.: Bridging the semantic gap via functional brain imaging. IEEE Trans. Multimed. 14(2), 314–325 (2012)
Im, D.-H., Park, G.-D.: Linked tag: image annotation using semantic relationships between image tags. Multimed. Tools Appl. 74(7), 2273–2287 (2015)
Kekre, H., Sarode, T.K., Thepade, S.D., Vaishali, V.: Improved texture feature based image retrieval using kekres fast codebook generation algorithm. In: Pise, S.J. (ed.) Thinkquest\({}^{\sim }\) 2010, 143–149. Springer, Heidelberg (2011)
Kesorn, K., Poslad, S.: An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Trans. Multimed. 14(1), 211–222 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Kuric, E., Bielikova, M.: ANNOR: efficient image annotation based on combining local and global features. Comput. Graph. 47, 1–15 (2015)
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Advances in Neural Information Processing Systems (2003). p. None
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1075–1088 (2003)
Li, L.-J., Wang, C., Lim, Y., Blei, D.M., Fei-Fei, L.: Building and using a semantivisual image hierarchy. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3336–3343. IEEE (2010)
Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3158–3165 (2013)
Liu, G.-H., Yang, J.-Y.: Content-based image retrieval using color difference histogram. Pattern Recogn. 46(1), 188–198 (2013)
Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)
Lu, Z., Wang, L.: Learning descriptive visual representation for image classification and annotation. Pattern Recogn. 48(2), 498–508 (2015)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management, pp. 1–9. Citeseer (1999)
Murala, S., Maheshwari, R., Balasubramanian, R.: Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans. Image Process. 21(5), 2874–2886 (2012)
Osman, T., Thakker, D., Schaefer, G.: Utilising semantic technologies for intelligent indexing and retrieval of digital images. Computing 96(7), 651–668 (2014)
Pan, Y., Yao, T., Mei, T., Li, H., Ngo, C.-W., Rui, Y.: Click-through-based cross-view learning for image search. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 717–726. ACM (2014)
Pesquita, C., Ferreira, J.D., Couto, F.M., Silva, M.J.: The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources. J. Biomed. Semant. 5, 4 (2014)
Poslad, S., Kesorn, K.: A multi-modal incompleteness ontology model (mmio) to enhance information fusion for image retrieval. Inf. Fusion 20, 225–241 (2014)
Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3246–3253 (2013)
Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Ontology-based annotation and retrieval of services in the cloud. Knowl. Based Syst. 56, 15–25 (2014)
Sarker, I.H., Iqbal, S.: Content-based image retrieval using haar wavelet transform and color moment. SmartCR 3(3), 155–165 (2013)
Su, J.-H., Chou, C.-L., Lin, C.-Y., Tseng, V.S.: Effective semantic annotation by image-to-concept distribution model. IEEE Trans. Multimed. 13(3), 530–538 (2011)
Xia, Z., Peng, J., Feng, X., Fan, J.: Automatic abstract tag detection for social image tag refinement and enrichment. J. Signal Process. Syst. 74(1), 5–18 (2014)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1794–1801. IEEE (2009)
Yuan, Z., Xu, C., Sang, J., Yan, S., Hossain, M.S.: Learning feature hierarchies: a layer-wise tag-embedded approach. IEEE Trans. Multimed. 17(6), 816–827 (2015)
Zhang, S., Tian, Q., Hua, G., Huang, Q., Gao, W.: Objectpatchnet: towards scalable and semantic image annotation and retrieval. Comput. Vis. Image Underst. 118, 16–29 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, SH., Chen, YH. (2017). A Content-Based Image Retrieval Method Based on the Google Cloud Vision API and WordNet. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10191. Springer, Cham. https://doi.org/10.1007/978-3-319-54472-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-54472-4_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54471-7
Online ISBN: 978-3-319-54472-4
eBook Packages: Computer ScienceComputer Science (R0)