Abstract
The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to per-form content classification and c lustering and enable more effective semantic indexing and retrieval of visual data. However there is need to overcome the relatively low quality of these metadata: user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, excessively personalized and limited - and at the same time take into account the ‘web-scale’ quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images, considering also the temporal patterns of their usage, and discuss extensions to tag suggestion and localization in web video sequences.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Alexa Internet, Inc. http://www.alexa.com
Google Trends. http://www.google.com/trends
Available on request at: http://www.micc.unifi.it/ballan/research/tag-webvideos/
Source code and dataset metadata are available from http://www.micc.unifi.it/uricchio/
References
Alonso O, Gertz M, Baeza-Yates R (2007) On the value of temporal information in information retrieval. SIGIR Forum 41(2): 35–41
Ballan L, Bertini M, Del Bimbo A, Meoni M, Serra G (2010) Tag suggestion and localization in user-generated videos based on social knowledge. In: Proceedings of ACM SIGMM Workshop on Social Media (WSM). Firenze
Ballan L, Bertini M, Del Bimbo A, Serra G (2011) Enriching and localizing semantic tags in internet videos. In: Proceedings of ACM international conference on multimedia (ACM MM), pp 1541–1544. doi:10.1145/2072298.2072060
Choi H, Varian H (2011) Predicting the present with Google Trends. Tech. rep., Google
Chu W T, Li C J (2011) Tag suggestion and localization for web videos by bipartite graph matching. In: Proceedings of ACM SIGMM Workshop on Social Media (WSM). New York
Chua T S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of ACM CIVR
Cohen J (1988) Statistical power analysis for the behavioral sciences. Routledge Academic
Ginsberg J, Mohebbi M H, Patel R S, Brammer L, Smolinski M S, Brilliant L (2009) Detecting influenza epidemics using search engine query data. Nature 457(7232): 1012–1014
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of ICCV
Huiskes M J, Lew MS (2008), The MIR Flickr retrieval evaluation. In: Proceeding of ACM MIR
Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In: Proceedings of ACM MIR, pp 527–536
Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using Flickr for prediction and forecast. In: Proceedings of ACM MM, pp 1235–1244
Kennedy L S, Chang S F, Kozintsev I V (2006) To search or to label? Predicting the performance of search-based automatic image classifiers. In: Proceedings of ACM MIR
Kennedy L S, Slaney M, Weinberger K (2009) Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases. In: Proceedings of ACM-MM Workshop on Web-Scale Multimedia Corpus. Beijing
Kim G, Xing EP (2013) Time-sensitive web image ranking and retrieval via dynamic multi-task regression. In: Proceedings of ACM WSDM, pp 163–172
Kim G, Xing EP, Torralba A (2010) Modeling and analysis of dynamic behaviors of web image collections. In: Proceedings of ECCV, pp 85–98
Kim G, Fei-Fei L, Xing EP (2012) Web image prediction using multivariate point processes. In: Proceedings of ACM SIGKDD, pp 1068–1076
Li G, Wang M, Zheng Y T, Chua T S (2011) ShotTagger: tag location for internet videos. In: Proceedings of ACM ICMR
Li H, Yi L, Guan Y, Zhang H (2013) DUT-WEBV: a benchmark dataset for performance evaluation of tag localization for web video. In: Proceedings of MMM
Li X, Snoek C G M, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7): 1310–1322
Li X, Snoek C G M, Worring M (2010a) Unsupervised multi-feature tag relevance learning for social image retrieval. In: Proceedings of ACM CIVR
Li Z, Liu J, Zhu X, Liu T, Lu H (2010b) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the international conference on multimedia, MM’10. ACM, New York, pp 11871190
Liu D, Hua X S, Yang L, Wang M, Zhang HJ (2009) Tag ranking. In: Proceedings of WWW
Liu D, Hua X S, Wang M, Zhang HJ (2010) Image retagging. In: Proceedings of ACM multimedia
Liu D, Hua X S, Zhang H J (2011a) Content-based tag processing for internet social images. Multimed Tools Appl 51(1): 723–738
Liu D, Yan S, Hua X S, Zhang H J (2011b) Image retagging using collaborative tag propagation. IEEE Trans Multimed 13(4): 702–712
Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: AAAI-06: proceedings of the ninth national conference on artificial intelligence, vol 21. AAAI Press, p 421
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of ECCV
Min H S, Choi J, De Neve W, Ro Y M, Plataniotis K N (2009) Semantic annotation of personal video content using an image folksonomy. In: Proceedings of IEEE ICIP
Paisitkriangkrai S, Mei T, Zhang J, Hua X S (2010) Scalable clip-based near-duplicate video detection with ordinal measure. In: Proceedings of ACM CIVR
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from flickr tags. In: Proceedings of ACM SIGIR, pp 103–110
Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. Adv. Neural Info Process Syst 20: 1257–1264
Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Trans Multimed 14(3): 883–895
Shao J, Yin W, Ma S, Zhuang Y (2010) Topic discovery of web video using star-structured k-partite graph. In: Proceedings of ACM multimedia
Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of WWW, pp 327–336
Thomee B, Bakker EM, Lew MS (2010) TOP-SURF: a visual words toolkit. In: Proceedings of ACM multimedia. doi:10.1145/1873951.1874250
Tsai D, Jing Y, Liu Y, Rowley H A, Ioffe S, Rehg J M (2011) Large-scale image annotation using visual synset. In: 2011 IEEE International conference on computer vision (ICCV). IEEE, pp 611–618
Ulges A, Schulze C, Koch M, Breuel T M (2010) Learning automatic concept detectors from online video. Comp Vision Image Underst 114(4): 429–438
Verbeek J, Guillaumin M, Mensink T, Schmid C (2010) Image annotation with TagProp on the MIRFLICKR set. In: Proceedings of ACM MIR
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of ACMCHI
Wang C, Jing F, Zhang L, Zhang HJ (2007) Content-based image annotation refinement. In: Proceedings of CVPR
Zhang ML, Zhou ZH (2004) Improve multi-instance neural networks through feature selection. Neural Process Lett 19(1):1–10. doi:10.1023/B:NEPL.0000016836.03614.9f
Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank. In: Proceedings of ACM multimedia
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ballan, L., Bertini, M., Uricchio, T. et al. Data-driven approaches for social image and video tagging. Multimed Tools Appl 74, 1443–1468 (2015). https://doi.org/10.1007/s11042-014-1976-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1976-4