survey

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval

Authors:

Tiberio Uricchio,

Lamberto Ballan,

Cees G. M. Snoek,

Alberto Del BimboAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 49, Issue 1

Article No.: 14, Pages 1 - 39

https://doi.org/10.1145/2906152

Published: 06 June 2016 Publication History

Abstract

Where previous reviews on content-based image retrieval emphasize what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems (i.e., image tag assignment, refinement, and tag-based image retrieval) is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, that is, estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this article introduces a two-dimensional taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison with the state of the art, a new experimental protocol is presented, with training sets containing 10,000, 100,000, and 1 million images, and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.

References

[1]

Morgan Ames and Mor Naaman. 2007. Why we tag: Motivations for annotation in mobile and online media. In Proc. of ACM CHI. 971--980.

Digital Library

[2]

Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. 2003. Support vector machines for multiple-instance learning. In Proc. of NIPS. 561--568.

[3]

Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16, 6 (2010), 345--379.

Digital Library

[4]

Lamberto Ballan, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2015. Data-driven approaches for social image and video tagging. Multimedia Tools and Applications 74, 4 (2015), 1443--1468.

Digital Library

[5]

Lamberto Ballan, Tiberio Uricchio, Lorenzo Seidenari, and Alberto Del Bimbo. 2014. A cross-media model for automatic image annotation. In Proc. of ACM ICMR. 73--80.

Digital Library

[6]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O’Reilly Media.

Digital Library

[7]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022.

Digital Library

[8]

Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proc. of ACM MM. 223--232.

Digital Library

[9]

Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM 58, 3 (2011), 11.

Digital Library

[10]

Lin Chen, Dong Xu, Ivor W. Tsang, and Jiebo Luo. 2012. Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Transactions on Multimedia 14, 4 (2012), 1057--1067.

Digital Library

[11]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proc. of ACM CIVR. 48:1--48:9.

Digital Library

[12]

Rudi L. Cilibrasi and Paul M. B. Vitanyi. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19, 3 (2007), 370--383.

Digital Library

[13]

Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. Computing Surveys 40, 2 (2008), 5:1--5:60.

Digital Library

[14]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. of CVPR. 248--255.

[15]

Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé, III, Alexander C. Berg, and Tamara L. Berg. 2012. Detecting visual text. In Proc. of NAACL. 762--772.

Digital Library

[16]

Kun Duan, David J. Crandall, and Dhruv Batra. 2014. Multimodal learning in loosely-organized web images. In Proc. of CVPR. 2465--2472.

Digital Library

[17]

Lixin Duan, Wen Li, Ivor Wai-Hung Tsang, and Dong Xu. 2011. Improving web image search by bag-based reranking. IEEE Transactions on Image Processing 20, 11 (2011), 3280--3290.

Digital Library

[18]

Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The PASCAL visual object classes challenge: A retrospective. International Journal of Computer Vision 111, 1 (2015), 98--136.

Digital Library

[19]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871--1874.

Digital Library

[20]

Songhe Feng, Congyan Lang, and Bing Li. 2012. Towards relevance and saliency ranking of image tags. In Proc. of ACM MM. 917--920.

Digital Library

[21]

Zheyun Feng, Songhe Feng, Rong Jin, and Anil K. Jain. 2014. Image tag completion by noisy matrix recovery. In Proc. of ECCV. 424--438.

[22]

Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4 (2003), 933--969.

Digital Library

[23]

Yue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, and Xindong Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22, 1 (2013), 363--376.

Digital Library

[24]

Alexandru Lucian Ginsca, Adrian Popescu, Bogdan Ionescu, Anil Armagan, and Ioannis Kanellos. 2014. Toward an estimation of user tagging credibility for social image retrieval. In Proc. of ACM MM. 1021--1024.

Digital Library

[25]

Scott A. Golder and Bernardo A. Huberman. 2006. Usage patterns of collaborative tagging systems. Journal of Information Science 32, 2 (2006), 198--208.

Digital Library

[26]

Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Johns Hopkins University Press.

[27]

Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proc. of ICCV. 309--316.

[28]

Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han. 2010. Survey on social tagging techniques. SIGKDD Explorations Newsletter 12, 1 (2010), 58--72.

Digital Library

[29]

Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proc. of ACM MM. 243--252.

Digital Library

[30]

Mark J. Huiskes, Bart Thomee, and Michael S. Lew. 2010. New trends and ideas in visual concept detection: The MIR Flickr retrieval evaluation initiative. In Proc. of ACM MIR. 527--536.

Digital Library

[31]

Fouzia Jabeen, Shah Khusro, Amna Majid, and Azhar Rauf. 2016. Semantics discovery in social tagging systems: A review. Multimedia Tools and Applications 75, 1 (2016), 573--605.

Digital Library

[32]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Intelligent Systems and Technology 20, 4 (2002), 422--446.

Digital Library

[33]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128.

Digital Library

[34]

Yu-Gang Jiang, Chong-Wah Ngo, and Shih-Fu Chang. 2009. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proc. of ACM MM. 155--164.

Digital Library

[35]

Yohan Jin, Latifur Khan, Lei Wang, and Mamoun Awad. 2005. Image annotations by combining multiple evidence & wordNet. In Proc. of ACM MM. 706--715.

Digital Library

[36]

Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Proc. of ICML. 200--209.

Digital Library

[37]

Justin Johnson, Lamberto Ballan, and Li Fei-Fei. 2015. Love thy neighbors: Image annotation by exploiting image metadata. In Proc. of ICCV.

Digital Library

[38]

Mahdi M. Kalayeh, Haroon Idrees, and Mubarak Shah. 2014. NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In Proc. of CVPR. 184--191.

Digital Library

[39]

Lyndon S. Kennedy, Shih-Fu Chang, and Igor V. Kozintsev. 2006. To search or to label?: Predicting the performance of search-based automatic image classifiers. In Proc. of ACM MIR. 249--258.

Digital Library

[40]

Lyndon S. Kennedy, Malcolm Slaney, and Kilian Weinberger. 2009. Reliable tags using image similarity: Mining specificity and expertise from large-scale multimedia databases. In Proc. of ACM MM Workshop on Web-Scale Multimedia Corpus. 17--24.

Digital Library

[41]

Gunhee Kim and Eric P. Xing. 2013. Time-sensitive web image ranking and retrieval via dynamic multi-task regression. In Proc. of ACM WSDM. 163--172.

Digital Library

[42]

Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Transactions on Multimedia 14, 4 (2012), 1079--1090.

Digital Library

[43]

Tian Lan and Greg Mori. 2013. A max-margin riffled independence model for image tag ranking. In Proc. of CVPR. 3103--3110.

Digital Library

[44]

Sihyoung Lee, Wesley De Neve, and Yong Man Ro. 2013. Visually weighted neighbor voting for image tag relevance learning. Multimedia Tools and Applications 72, 2 (2013), 1363--1386.

Digital Library

[45]

Mingling Li. 2007. Texture moment for content-based image retrieval. In Proc. of ICME. 508--511.

[46]

Wen Li, Lixin Duan, Dong Xu, and Ivor Wai-Hung Tsang. 2011a. Text-based image retrieval using progressive multi-instance learning. In Proc. of ICCV. 2049--2055.

Digital Library

[47]

Xirong Li. 2016. Tag relevance fusion for social image retrieval. Multimedia Systems. In press (2016).

[48]

Xirong Li, Efstratios Gavves, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2011b. Personalizing automated image annotation using cross-entropy. In Proc. of ACM MM. 233--242.

Digital Library

[49]

Xirong Li and Cees G. M. Snoek. 2013. Classifying tag relevance with relevant positive and negative examples. In Proc. of ACM MM. 485--488.

Digital Library

[50]

Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2009a. Annotating images by harnessing worldwide user-tagged photos. In Proc. of ICASSP. 3717--3720.

Digital Library

[51]

Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2009b. Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia 11, 7 (2009), 1310--1322.

Digital Library

[52]

Xirong Li, Cees G. M. Snoek, and Marcel Worring. 2010. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proc. of ACM CIVR. 10--17.

Digital Library

[53]

Xirong Li, Cees G. M. Snoek, Marcel Worring, Dennis Koelma, and Arnold W. M. Smeulders. 2013. Bootstrapping visual categorization with relevant negatives. IEEE Transactions on Multimedia 15, 4 (2013), 933--945.

Digital Library

[54]

Xirong Li, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2012. Harvesting social images for bi-concept search. IEEE Transactions on Multimedia 14, 4 (2012), 1091--1104.

Digital Library

[55]

Zechao Li, Jing Liu, and Hanqing Lu. 2013. Nonlinear matrix factorization with unified embedding for social tag relevance learning. Neurocomputing 105 (2013), 38--44.

Digital Library

[56]

Zechao Li, Jing Liu, Xiaobin Zhu, Tinglin Liu, and Hanqing Lu. 2010. Image annotation using multi-correlation probabilistic matrix factorization. In Proc. of ACM MM. 1187--119.

Digital Library

[57]

Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. 2007. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68, 3 (2007), 267--276.

Digital Library

[58]

Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang, and Xiaojun Ye. 2013. Image tag completion via image-specific and tag-specific linear sparse reconstructions. In Proc. of CVPR. 1618--1625.

Digital Library

[59]

Dong Liu, Xian-Sheng Hua, Meng Wang, and Hong-Jiang Zhang. 2010. Image retagging. In Proc. of ACM MM. 491--500.

Digital Library

[60]

Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, and Hong-Jiang Zhang. 2009. Tag ranking. In Proc. of WWW. 351--360.

Digital Library

[61]

Dong Liu, Xian-Sheng Hua, and Hong-Jiang Zhang. 2011. Content-based tag processing for internet social images. Multimedia Tools and Applications 51, 2 (2011), 723--738.

Digital Library

[62]

Dong Liu, Shuicheng Yan, Xian-Sheng Hua, and Hong-Jiang Zhang. 2011b. Image retagging using collaborative tag propagation. IEEE Transactions on Multimedia 13, 4 (2011), 702--712.

Digital Library

[63]

Jing Liu, Zechao Li, Jinhui Tang, Yu Jiang, and Hanqing Lu. 2014. Personalized geo-specific tag recommendation for photos on social websites. IEEE Transactions on Multimedia 16, 3 (2014), 588--600.

Digital Library

[64]

Jing Liu, Yifan Zhang, Zechao Li, and Hanqing Lu. 2013. Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing 119, 7 (2013), 3--9.

Digital Library

[65]

Yang Liu, Fei Wu, Yin Zhang, Jian Shao, and Yueting Zhuang. 2011a. Tag clustering and refinement on semantic unity graph. In Proc. of ICDM. 417--426.

Digital Library

[66]

Hao Ma, Jianke Zhu, Michael Rung-Tsong Lyu, and Irwin King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473.

Digital Library

[67]

Subhransu Maji, Alexander C. Berg, and Jitendra Malik. 2008. Classification using intersection kernel support vector machines is efficient. In Proc. of CVPR. 1--8.

[68]

Ameesh Makadia, Vladimir Pavlovic, and Sanjiv Kumar. 2010. Baselines for image annotation. International Journal of Computer Vision 90, 1 (2010), 88--105.

Digital Library

[69]

Julian McAuley and Jure Leskovec. 2012. Image labeling on a network: Using social-network metadata for image classification. In Proc. of ECCV. 828--841.

Digital Library

[70]

Philip McParlane, Stewart Whiting, and Joemon Jose. 2013b. Improving automatic image tagging using temporal tag co-occurrence. In Proc. of MMM. 251--262.

[71]

Philip J. McParlane, Yashar Moshfeghi, and Joemon M. Jose. 2013a. On contextual photo tag recommendation. In Proc. of ACM SIGIR. 965--968.

Digital Library

[72]

Tao Mei, Yong Rui, Shipeng Li, and Qi Tian. 2014. Multimedia search reranking: A literature survey. Computing Surveys 46, 3 (2014), 38.

Digital Library

[73]

Ryszard S. Michalski. 1993. A theory and methodology of inductive learning. In Readings in Knowledge Acquisition and Learning. Morgan Kaufmann Publishers, 323--348.

Digital Library

[74]

Liqiang Nie, Shuicheng Yan, Meng Wang, Richang Hong, and Tat-Seng Chua. 2012. Harvesting visual concepts for image search with complex queries. In Proc. of ACM MM. 59--68.

Digital Library

[75]

Zhenxing Niu, Gang Hua, Xinbo Gao, and Qi Tian. 2014. Semi-supervised relational topic model for weakly annotated image recognition in social media. In Proc. of CVPR. 4233--4240.

Digital Library

[76]

Oded Nov and Chen Ye. 2010. Why do people tag?: Motivations for photo tagging. Communications of the ACM 53, 7 (2010), 128--131.

Digital Library

[77]

Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 521--535.

Digital Library

[78]

Guo-Jun Qi, Charu Aggarwal, Qi Tian, Heng Ji, and Thomas Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 850--862.

Digital Library

[79]

Xueming Qian, Xian-Sheng Hua, Yuan Yan Tang, and Tao Mei. 2014. Social image tagging with diverse semantics. IEEE Transactions on Cybernetics 44, 12 (2014), 2493--2508.

[80]

Zhiming Qian, Ping Zhong, and Runsheng Wang. 2015. Tag refinement for user-contributed images via graph learning and nonnegative tensor factorization. IEEE Signal Processing Letters 22, 9 (2015), 1302--1305.

[81]

Fabian Richter, Stefan Romberg, Eva Hörster, and Rainer Lienhart. 2012. Leveraging community metadata for multimodal image ranking. Multimedia Tools and Applications 56, 1 (2012), 35--62.

Digital Library

[82]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[83]

Jitao Sang, Changsheng Xu, and Jing Liu. 2012a. User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia 14, 3 (2012), 883--895.

Digital Library

[84]

Jitao Sang, Changsheng Xu, and Dongyuan Lu. 2012b. Learn to personalized image search from the photo sharing websites. IEEE Transactions on Multimedia 14, 4 (2012), 963--974.

Digital Library

[85]

Neela Sawant, Ritendra Datta, Jia Li, and James Z. Wang. 2010. Quest for relevant tags using local interaction networks and visual content. In Proc. of ACM MIR. 231--240.

Digital Library

[86]

Neela Sawant, Jia Li, and James Z. Wang. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications 51, 1 (2011), 213--246.

Digital Library

[87]

Shilad Sen, Shyong K. Lam, Al Mamunur Rashid, Dan Cosley, Dan Frankowski, Jeremy Osterhouse, F. Maxwell Harper, and John Riedl. 2006. Tagging, communities, vocabulary, evolution. In Proc. of CSCW. 181--190.

Digital Library

[88]

Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proc. of WWW. 327--336.

Digital Library

[89]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proc. of ICLR.

[90]

Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380.

Digital Library

[91]

Nitish Srivastava and Ruslan R. Salakhutdinov. 2014. Multimodal learning with deep Boltzmann machines. Journal of Machine Learning Research 15, 1 (2014), 2949--2980.

Digital Library

[92]

Aixin Sun, Sourav S. Bhowmick, Nam Nguyen, Khanh Tran, and Ge Bai. 2011. Tag-based social image retrieval: An empirical evaluation. Journal of the American Society for Information Science and Technology 62, 12 (2011), 2364--2381.

Digital Library

[93]

Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology 2, 2 (2011), 14:1--14:15.

Digital Library

[94]

Jinhui Tang, Shuicheng Yan, Richang Hong, Guo-Jun Qi, and Tat-Seng Chua. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proc. of ACM MM. 223--232.

Digital Library

[95]

Ba Quan Truong, Aixin Sun, and Sourav S. Bhowmick. 2012. Content is still king: The effect of neighbor voting schemes on tag relevance for social image retrieval. In Proc. of ACM ICMR. 9:1--9:8.

Digital Library

[96]

Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279--311.

[97]

Tiberio Uricchio, Lamberto Ballan, Marco Bertini, and Alberto Del Bimbo. 2013. An evaluation of nearest-neighbor methods for tag refinement. In Proc. of ICME. 1--6.

[98]

Koen E. A. Van De Sande, Theo Gevers, and Cees G. M. Snoek. 2010. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1582--1596.

Digital Library

[99]

Jakob Verbeek, Matthieu Guillaumin, Thomas Mensink, and Cordelia Schmid. 2010. Image annotation with TagProp on the MIRFLICKR set. In Proc. of ACM MIR. 537--546.

Digital Library

[100]

Daan T. J. Vreeswijk, Cees G. M. Snoek, Koen E. A. van de Sande, and Arnold W. M. Smeulders. 2012. All vehicles are cars: Subclass preferences in container concepts. In Proc. of ACM ICMR. 8:1--8:7.

Digital Library

[101]

Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang. 2006. Image annotation refinement using random walk with restarts. In Proc. of ACM MM. 647--650.

Digital Library

[102]

Gang Wang, Derek Hoiem, and David Forsyth. 2009. Building text features for object image classification. In Proc. of CVPR. 1367--1374.

[103]

Jingdong Wang, Jiazhen Zhou, Hao Xu, Tao Mei, Xian-Sheng Hua, and Shipeng Li. 2014. Image tag refinement by regularized latent Dirichlet allocation. Computer Vision and Image Understanding 124 (2014), 61--70.

[104]

Meng Wang, Bingbing Ni, Xian-Sheng Hua, and Tat-Seng Chua. 2012. Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. Computing Surveys 44, 4 (2012), 25:1--25:24.

Digital Library

[105]

Meng Wang, Kuiyuan Yang, Xian-Sheng Hua, and Hong-Jiang Zhang. 2010. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia 12, 8 (2010), 829--842.

Digital Library

[106]

Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, and Shipeng Li. 2008. Flickr distance. In Proc. of ACM MM. 31--40.

Digital Library

[107]

Lei Wu, Rong Jin, and Anubhav K. Jain. 2013. Tag completion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 3 (2013), 716--727.

Digital Library

[108]

Lei Wu, Linjun Yang, Nenghai Yu, and Xian-Sheng Hua. 2009. Learning to tag. In Proc. of WWW. 361--370.

Digital Library

[109]

Pengcheng Wu, Steven Chu-Hong Hoi, Peilin Zhao, and Ying He. 2011. Mining social images with distance metric learning for automated image tagging. In Proc. of ACM WSDM. 97--206.

Digital Library

[110]

Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proc. of ACL. 133--138.

Digital Library

[111]

Hao Xu, Jingdong Wang, Xian-Sheng Hua, and Shipeng Li. 2009. Tag refinement by regularized LDA. In Proc. of ACM MM. 573--576.

Digital Library

[112]

Xing Xu, Akira Shimada, and Rin-ichiro Taniguchi. 2014. Tag completion with defective tag assignments via image-tag re-weighting. In Proc. of ICME. 1--6.

[113]

Kuiyuan Yang, Xian-Sheng Hua, Meng Wang, and Hong-Jiang Zhang. 2011. Tag tagging: Towards more descriptive keywords of image content. IEEE Transactions on Multimedia 13, 4 (2011), 662--673.

Digital Library

[114]

Yang Yang, Yue Gao, Hanwang Zhang, Jie Shao, and Tat-Seng Chua. 2014. Image tagging with social assistance. In Proc. of ACM ICMR. 81--88.

Digital Library

[115]

Bolei Zhou, Vignesh Jagadeesh, and Robinson Piramuthu. 2015. ConceptLearner: Discovering visual concepts from weakly labeled image collections. In Proc. of CVPR.

[116]

Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with hypergraphs: Clustering, classification, and embedding. In Proc. of NIPS. 1601--1608.

[117]

Guangyu Zhu, Shuicheng Yan, and Yi Ma. 2010. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proc. of ACM MM. 461--470.

Digital Library

[118]

Shiai Zhu, Chong-Wah Ngo, and Yu-Gang Jiang. 2012. Sampling and ontologically pooling web images for visual concept learning. IEEE Transactions on Multimedia 14, 4 (2012), 1068--1078.

Digital Library

[119]

Xiaofei Zhu, Wolfgang Nejdl, and Mihai Georgescu. 2014. An adaptive teleportation random walk model for learning social tag relevance. In Proc. of ACM SIGIR. 223--232.

Digital Library

[120]

Jinfeng Zhuang and Steven C. H. Hoi. 2011. A two-view learning approach for image tag ranking. In Proc. of ACM WSDM. 625--634.

Digital Library

[121]

Amel Znaidia, Hervé Le Borgne, and Céline Hudelot. 2013. Tag completion based on belief theory and neighbor voting. In Proc. of ACM ICMR. 49--56.

Digital Library

Cited By

Ye YZhu QXiao SZhang KZeng W(2024)The Contemporary Art of Image Search: Iterative User Intent Expansion via Vision-Language ModelProceedings of the ACM on Human-Computer Interaction10.1145/36410198:CSCW1(1-31)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3641019
Nassif JTekli JKamradt MNassif JTekli JKamradt M(2024)Digital Images – The Bread and Butter of Computer VisionSynthetic Data10.1007/978-3-031-47560-3_5(89-106)Online publication date: 4-Jan-2024
https://doi.org/10.1007/978-3-031-47560-3_5
Li ZZheng K(2023)An Image-Text Matching Method for Multi-Modal RobotsJournal of Organizational and End User Computing10.4018/JOEUC.33470136:1(1-21)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.4018/JOEUC.334701
Show More Cited By

Index Terms

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Learning tag relevance by neighbor voting for social image retrieval
MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Social image retrieval is important for exploiting the increasing amounts of amateur-tagged multimedia such as Flickr images. Since amateur tagging is known to be uncontrolled, ambiguous, and personalized, a fundamental problem is how to reliably ...
Image Tag Assignment, Refinement and Retrieval
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

This tutorial focuses on challenges and solutions for content-based image annotation and retrieval in the context of online image sharing and tagging. We present a unified review on three closely linked problems, i.e., tag assignment, tag refinement, ...
Enriching and localizing semantic tags in internet videos
MM '11: Proceedings of the 19th ACM international conference on Multimedia

Tagging of multimedia content is becoming more and more widespread as web 2.0 sites, like Flickr and Facebook for images, YouTube and Vimeo for videos, have popularized tagging functionalities among their users. These user-generated tags are used to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 49, Issue 1

March 2017

705 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2911992

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville, FL

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Accepted: 01 March 2016

Revised: 01 December 2015

Received: 01 March 2015

Published in CSUR Volume 49, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

Research Funds of Renmin University of China
NSFC
SRF for ROCS, SEM
SRFDP
STW STORY project, Telecom Italia PhD
Dutch national program COMMIT
AQUIS-CH
EC's FP7
Fundamental Research Funds for the Central Universities
Tuscany Region (Italy)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

124
Total Citations
View Citations
1,167
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ye YZhu QXiao SZhang KZeng W(2024)The Contemporary Art of Image Search: Iterative User Intent Expansion via Vision-Language ModelProceedings of the ACM on Human-Computer Interaction10.1145/36410198:CSCW1(1-31)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3641019
Nassif JTekli JKamradt MNassif JTekli JKamradt M(2024)Digital Images – The Bread and Butter of Computer VisionSynthetic Data10.1007/978-3-031-47560-3_5(89-106)Online publication date: 4-Jan-2024
https://doi.org/10.1007/978-3-031-47560-3_5
Li ZZheng K(2023)An Image-Text Matching Method for Multi-Modal RobotsJournal of Organizational and End User Computing10.4018/JOEUC.33470136:1(1-21)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.4018/JOEUC.334701
Ricci SUricchio TDel Bimbo A(2023)Meta-learning Advisor Networks for Long-tail and Noisy Labels in Social Image ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358436019:5s(1-23)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3584360
Chen WLiu YWang WBakker EGeorgiou TFieguth PLiu LLew M(2023)Deep Learning for Instance Retrieval: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.321859145:6(7270-7292)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3218591
J S SS M(2023)A Review on Content Based Image Retrieval Techniques2023 International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT58313.2023.10245360(1251-1256)Online publication date: 10-Aug-2023
https://doi.org/10.1109/ICCPCT58313.2023.10245360
Yousefzadeh SPourreza HMahyar H(2023)A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image RetrievalJournal of Signal Processing Systems10.1007/s11265-023-01865-995:4(529-541)Online publication date: 25-Apr-2023
https://dl.acm.org/doi/10.1007/s11265-023-01865-9
Salameh KAkoum FTekli J(2023)Unsupervised knowledge representation of panoramic dental X-ray images using SVG image-and-object clusteringMultimedia Systems10.1007/s00530-023-01099-629:4(2293-2322)Online publication date: 24-May-2023
https://doi.org/10.1007/s00530-023-01099-6
Mohammadi Kashani MAmiri S(2022)Scalable Image Annotation by Summarizing Training Samples into Labeled PrototypesSignal and Data Processing10.52547/jsdp.18.4.4918:4(49-68)Online publication date: 1-Mar-2022
https://doi.org/10.52547/jsdp.18.4.49
Li WSong HZhang HLi HWang P(2022)The Image Annotation Refinement in Embedding Feature Space based on Mutual InformationInternational Journal of Circuits, Systems and Signal Processing10.46300/9106.2022.16.2316(191-201)Online publication date: 10-Jan-2022
https://doi.org/10.46300/9106.2022.16.23
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents