Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

When Location Meets Social Multimedia: A Survey on Vision-Based Recognition and Mining for Geo-Social Multimedia Analytics

Published: 26 March 2015 Publication History

Abstract

Coming with the popularity of multimedia sharing platforms such as Facebook and Flickr, recent years have witnessed an explosive growth of geographical tags on social multimedia content. This trend enables a wide variety of emerging applications, for example, mobile location search, landmark recognition, scene reconstruction, and touristic recommendation, which range from purely research prototype to commercial systems. In this article, we give a comprehensive survey on these applications, covering recent advances in recognition and mining of geographical-aware social multimedia. We review related work in the past decade regarding to location recognition, scene summarization, tourism suggestion, 3D building modeling, mobile visual search and city navigation. At the end, we further discuss potential challenges, future topics, as well as open issues related to geo-social multimedia computing, recognition, mining, and analytics.

References

[1]
A. Akbarzadeh, J. M. Frahm, P. Mordohai, and B. Clipp. 2006. Towards urban 3D reconstruction from video. In 3DPVT.
[2]
D. G. Aliaga, D. Yanovsky, T. Funkhouser, and I. Carlbom. 2003. Image sequence geolocation with human travel priors. In International Symposium on Interactive 3D Graphics.
[3]
Y. Aloimonos. 1993. Active Perception. Lawrence Erlbaum Associates.
[4]
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. 2006. Analysis of representations for domain adaptation. In NIPS.
[5]
D. Blei, A. Ng, and M. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993--1022.
[6]
S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual Web search engine. In World Wide Web.
[7]
S. Brin. 1995. Near neighbor search in large metric spaces. In VLDB. 574--584.
[8]
D. Brockmann, L. Hufnagel, and T. Geisel. 2006. The scaling laws of human travel. Nature 439, 7075, 462--465.
[9]
O. Buyukkokten, J. Cho, H. Garcia-Molina, L. Gravano, and N. Shivakumar. 1999. Exploiting geographic location information of web pages. In ACM SIGMOD Workshop on the Web and Databases.
[10]
I. Cadez and P. Bradley. 2001. Model based population tracking and automatic detection of distribution changes. In NIPS.
[11]
C. Campbell and K. P. Bennett. 2001. A linear programming approach to novelty detection. In NIPS.
[12]
L. Cao, Y. Gao, Q. Liu, and R. Ji. 2012. Geographical retagging. In Multimedia Modeling.
[13]
L.-L. Cao, J. Yu, J. Luo, and T. S. Huang. 2009. Enhancing semantic and geographic annotation of web images via logistic canonical correlation regression. In ACM Multimedia.
[14]
D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. 2009. Mapping the world’s photos. In WWW.
[15]
M. Cristani, A. Perina, U. Castellani, and V. Murino. 2008. Geolocated image analysis using latent representations. In CVPR.
[16]
P. E. Debevec, C. J. Taylor, and J. Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry and image-based approach. In ACM SigGraph. 11--20.
[17]
D. Donoho. 2006. For most large underdetermined systems of equations, the minimal l1-norm near-solution approximates the sparsest near-solution. Communication on Pure and Applied Mathematics 59, 7, 907--934.
[18]
M. Dundar and J. Bi. 2007. Joint optimization of cascaded classifiers for computer aided detection. In CVPR.
[19]
E. D. Eade and T. W. Drummond. 2008. Unified loop closing and recovery for real time monocular SLAM. In BMVC.
[20]
EveryScape. 2009. Homepage. Retrieved from www.everyscape.com.
[21]
K. Fukunaga and P. M. Narendra. 1975. A branch and bound algorithms for computing k-nearest neighbors. IEEE Transactions on Computers 24, 7, 750--753.
[22]
L. Fei-Fei and P. Perona. 2007. A Bayesian hierarchical model for learning natural scene categories. In ICCV.
[23]
Y. Gao, J. Tang, R. Hong, Q. Dai, T.-S. Chua, and R. Jain. 2010. W2Go: A travel guidance system by automatic landmark ranking. In International Conference on Multimedia.
[24]
T. Goedeme and T. Tuytelaars. 2004. Fast wide baseline matching for visual navigation. In CVPR. 24--29.
[25]
M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. 2008. Understanding individual human mobility patterns. Nature 453, 7196, 779--782.
[26]
K. Grauman and T. Darrell. 2007. Approximate correspondences in high dimensions. In NIPS.
[27]
A. Irschara, C. Zach, J. M. Frahm, and H. Bischof. 2009. From structure-from-motion point clouds to fast location recognition. In CVPR.
[28]
R. Ji, X. Xie, H. Yao, and W.-Y. Ma. 2009. Hierarchical optimization of visual vocabulary for effective and transferable retrieval. In CVPR.
[29]
R. I. Hartley and A. Zisserman. 2004. Multiple View Geometry. Cambridge University Press.
[30]
J. Hays and A. Efros. 2008. IMG2GPS: Estimating geographic information from a single image. In CVPR.
[31]
T. Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 41, 177--196.
[32]
Y. Hu, M.-J. Li, Z. Li, and W.-Y. Ma. 2006. Discovering authoritative news sources and top news stories. Information Retrieval Technology, Section 2A: Web Information Retrieval.
[33]
J. Hu, S. You, and U. Neumann. 2003. Approaches to large-scale urban modeling. In IEEE Computer Graphics and Applications.
[34]
L. Hufnagel, D. Brockmann, and T. Geisel. 2004. Forecast and control of epidemics in a globalized world. PNAS 101, 24, 15124--15129.
[35]
Y. Li, D. J. Crandall, and D. P. Huttenlocher. 2009. Landmark recognition in large-scale image collections. In ICCV.
[36]
H. Li, R. K. Srihari, C. Niu, and W. Li. 2002. Location normalization for information extraction. In COLING.
[37]
H. Li, R. K. Srihari, C. Niu, and W. Li. 2003. InfoXtract location normalizations: A hybrid approach to geographic references in information extraction. In International Workshop on the Analysis of Geographic References.
[38]
A. Lippman. 1980. Movie maps: An application of the optical videodisc to computer graphics. In ACM SigGraph. 32--43.
[39]
D. Liu, M. Scott, R. Ji, H. Yao, and X. Xie. 2009. Geolocation sensitive image based advertisement platform. In ACM Multimedia.
[40]
Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM
[41]
H. Jegou, H. Harzallah, and C. Schmid. 2007. A contextual dissimilarity measure for accurate and efficient image search. CVPR.
[42]
R. Ji, Y. Gao, B. Zhong, H. Yao, and Q. Tian. 2012. Mining Flickr landmarks by modeling reconstruction sparsity. ACM Transactions on Multimedia Computing, Communication, and Application.
[43]
R. Ji, X. Xie, H. Yao, W.-Y. Ma, and Y. Wu. 2008. Vocabulary tree incremental indexing for scalable scene recognition. In IEEE International Conference on Multimedia and Expo.
[44]
R. Ji, X. Xie, H. Yao, and W.-Y. Ma. 2009a. Hierarchical optimization of visual vocabulary for effective and transferable retrieval. CVPR.
[45]
R. Ji, X. Xie, H. Yao, and W.-Y. Ma. 2009b. Mining city landmarks from blogs by graph modeling. In ACM Multimedia.
[46]
M. Jia, X. Fan, X. Xie, M. Li, and W.-Y. Ma. 2006. Photo-to-search: Using camera phones to inquire of the surrounding world. In Mobile Data Management.
[47]
Y. Jing and S. Baluju. 2008. PageRank for product image search. In World Wide Web.
[48]
F. Jing, L. Zhang, and W.-Y. Ma. 2006. VirtualTour: An online travel assistant based on high quality images. In ACM Multimedia.
[49]
D. Joshi, A. Gallagher, J. Yu, and J. Luo. 2010. Inferring photographic location using geotagged web images. In Multimedia Tools and Applications.
[50]
F. Jurie and B. Triggs. 2005. Creating efficient codebooks for visual recognition. In ICCV.
[51]
E. Kalogerakis, O. Vesselova, J. Hays, A. Efros, and A. Hertzmann. 2009. Image sequence geolocation with human travel priors. In CVPR.
[52]
Y. Keiji and B. Qiu. 2010. Mining regional representative photos from consumer-generated geotagged photos. In Handbook of Social Network Technologies and Applications.
[53]
L. Kennedy, M. Naaman, and S. Ahern. 2007. How Flickr helps us make sense of the world: Context and content in community contributed media collections. In ACM Multimedia.
[54]
H. Kretzschmar, C. Stachniss, C. Plagemann, and W. Burgard. 2008. Estimating landmark locations from geo-referenced photographs. In IEEE Conference on Intelligent Robots and Systems.
[55]
J. A. Lee, K.-C. Yow, and A. Sluzek. 2008. Image-based information guide on mobile devices. In Advances in Visual Computing.
[56]
T. Leung and J. Malik. 2005. Representation and recognition the visual appearance of materials using 3-d textons. International Journal of Computer Vision 32, 1, 29--44.
[57]
X. Li, C. Wu, C. Zach, S. Lazebnik, and J.-M. Frahm. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. ECCV.
[58]
H. C. Longuet-Higgins. 1981. A computer algorithm for reconstructing a scene from two projections. Nature 293, 133--135.
[59]
D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 20, 2, 91--110.
[60]
J. Luo, D. Joshi, J. Yu, and A. Gallagher. 2011. Geotagging in multimedia and computer vision—a survey. Multimedia Tools and Applications 51, 1, 187--211.
[61]
Y. Ma, H. Derksen, W. Hong, and J. Wright. 2007. Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[62]
H. Mannilla, H. Toivonen, and A. Verkamo. 1997. Discovery of frequent episodes in event sequences. ACM SIGKDD.
[63]
J. Matas, O. Chum, M. Urban, and T. Pajla. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing 22, 10, 761--767.
[64]
K. Mikolajczyk, B. Leibe, and B. Schiele. 2005. Local features for object class recognition. In ICCV.
[65]
D. Nister and H. Stewenius. 2006. Scalable recognition with a vocabulary tree. In CVPR.
[66]
J. Oliensis. 1999. A multi-frame structure-from-motion algorithm under perspective projection. International Journal of Computer Vision 34, 2--3, 163--192.
[67]
N. O’Hare, C. Gurrin, G. Jones, and A. Smeaton. 2005. Combination of content analysis and context features for digital photograph retrieval. In European Workshop on the Integration of Knowledge, Semantic and Digital Media Technology.
[68]
L. Paletta and G. Fritz. 2005. Urban object detection from mobile phone imagery using informative sift descriptors. In Scandinavian Conference on Image Analysis.
[69]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabulary and fast spatial matching. In CVPR.
[70]
M. Pollefeys, R. Koch, and L. Van Gool. 1999. Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters. International Journal of Computer Vision 32, 1, 7 -- 25.
[71]
M. Pollefeys and L. Van Gool. 2002. From images to 3D models. Communications of the ACM 45, 7, 50--55.
[72]
P. Resnick, N. Lacovou, M. Suchak, P. Bergstrom, and J. Riedl. 1994. GroupLens: An open architecture for collaborative filtering of netnews. In ACM Conference on Computer Supported Cooperative Work.
[73]
D. Robertson and R. Cipolla. 2004. An image-based system for urban navigation. In BMVC.
[74]
A. Roman, G. Garg, and M. Levoy. 2004. Interactive design of multi-perspective images for visualizing urban landscapes. In IEEE Conference on Visualization.
[75]
G. Salton and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 5, 513--523.
[76]
G. Schindler and M. Brown. 2007. City-scale location recognition. In CVPR.
[77]
H. Shao, T. Svoboda, T. Tuytelaars, and L. J. Van Gool. 2003. Hpat indexing for fast object/scene recognition based on local appearance. In CIVR.
[78]
H. Shao, T. Svoboda, and L. Van Gool. 2003. ZuBuD: Zurich Buildings Database for Image Based Recognition. Technical Report.
[79]
I. Simmon, N. Snavely, and S. M. Seitz. 2007. Scene summarization for online image collections. In ICCV.
[80]
J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In ICCV.
[81]
N. Snavely, S. Seitz, and R. Szeliski. 2006. PhotoTourism: Exploring photo collections in 3D. In ACM SigGraph.
[82]
P. Symeon, Z. Christos, K. Yiannis, and V. Athena. 2011. Cluster-based landmark and event detection for tagged photo collections. IEEE Multimedia 18, 1, 52--63.
[83]
R. Szeliski and S. B. Kang. 1994. Recovering 3D shape and motion from image streams using nonlinear least squares. Journal of Visual Communication and Image Representation 5, 1, 10--28.
[84]
R. Szeliski. 2006. Image alignment and stitching: A tutorial. Foundations and Trends in Computer Graphics and Computer Vision 2, 1, 1--104.
[85]
S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg, M. Jethwa, and N. Master. 2003. Calibrated, registered images of an extended urban area. International Journal of Computer Vision 53, 1, 93--107.
[86]
R. Tibshirani. 1997. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society 58, 1, 267--288.
[87]
C. Tomasi and T. Kanade. 1992. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision 9, 2, 133--154.
[88]
C. Torniai, S. Batte, and S. Cayzer. 2007. Sharing, Discovering and Browsing Geotagged Pictures on the Web. HP Lab. Technical Report.
[89]
Travel Guide. 2008. Homepage. Retrieved from www.travel.msra.cn.
[90]
B. Triggs, Andrew Fitzgibbon, Richard Hartley, and Philip F. McLauchlan. 1999. Bundle adjustment—A modern synthesis. In International Workshop on Vision Algorithms. 298--372.
[91]
C. Tsai, A. Qamra, and E. Chang. 2005. Extent: Inferring image metadata from context and content. In ICME.
[92]
M. Vergauwen and L. Van Gool. 2006. Web-based 3D reconstruction service. Machine Vision and Applications 17, 2, 321--329.
[93]
P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In CVPR.
[94]
L. Wang. 2007. Toward a discriminative codebook: Codeword selection across multi-resolution. In CVPR.
[95]
C. Wang, X. Xie, L. Wang, Y. Lu, and W.-Y. Ma. 2005. Detecting geographic locations from web resources. In ACM Geographical Information Systems Workshop.
[96]
C. L. Wayne. 2000. Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In International Conference of the Language Resources and Evaluation.
[97]
J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[98]
J. X. Xiao, J. N. Chen, D. Y. Yeung, and L. Quan. 2008. Structuring visual words in 3D for arbitrary-view object localization. In ECCV.
[99]
J. Xiao, T. Fang, P. Zhao, M. Lhuillier, L. Quan. 2009. Image-based street-side city modeling. In ACM SigGraph Asia.
[100]
J. Yang, Y.-G. Jiang, A. Hauptmann, and C.-W. Ngo. 2007. Evaluating bag-of-visual-words representations in scene classification. In ACM Multimedia.
[101]
J. Yang, J. Wright, T. Huang, and Y. Ma. 2008. Image super-resolution as sparse representation of raw image patches. In CVPR.
[102]
R. B. Yates and B. R. Neto. 1999. Modern Information Retrieval. ACM Press.
[103]
T. Yeh, J. Lee, and T. Darell. 2007. Adaptive vocabulary forest for dynamic indexing and category learning. In CVPR.
[104]
T. Yeh, K. Tollmar, and T. Darrell. 2004. Searching the web with mobile images for location recognition. In CVPR.
[105]
W. Zhang and J. Kosecka. 2006. Image based localization in urban environments. In International Symposium on 3D Data Processing, Visualization and Transmission.
[106]
Y. Zheng, L. Liu, L. Wang, and X. Xie. 2008. Learning transportation modes from raw GPS data for geographic application on the web. In World Wild Web.
[107]
Y.-T. Zheng, M. Zhao, Y. Song, and H. Adam. 2009. Tour the world: Building a web-scale landmark recognition engine. In CVPR.

Cited By

View all
  • (2023)Fast Dual-Feature Extraction Based on Tightly Coupled Lightweight Network for Visual Place RecognitionIEEE Access10.1109/ACCESS.2023.333137111(127855-127865)Online publication date: 2023
  • (2022)Graph Convolutional Adversarial Networks for Spatiotemporal Anomaly DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313617133:6(2416-2428)Online publication date: Jun-2022
  • (2020)Artificial intelligence and big data in tourism: a systematic literature reviewJournal of Hospitality and Tourism Technology10.1108/JHTT-12-2018-0118ahead-of-print:ahead-of-printOnline publication date: 29-May-2020
  • Show More Cited By

Index Terms

  1. When Location Meets Social Multimedia: A Survey on Vision-Based Recognition and Mining for Geo-Social Multimedia Analytics

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 1
          April 2015
          255 pages
          ISSN:2157-6904
          EISSN:2157-6912
          DOI:10.1145/2745393
          Issue’s Table of Contents
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 26 March 2015
          Accepted: 01 December 2013
          Revised: 01 October 2013
          Received: 01 August 2013
          Published in TIST Volume 6, Issue 1

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Algorithms
          2. Internet
          3. image analysis
          4. knowledge representation
          5. multimedia systems

          Qualifiers

          • Survey
          • Survey
          • Refereed

          Funding Sources

          • National Nature Science Foundation of China

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)21
          • Downloads (Last 6 weeks)3
          Reflects downloads up to 06 Oct 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Fast Dual-Feature Extraction Based on Tightly Coupled Lightweight Network for Visual Place RecognitionIEEE Access10.1109/ACCESS.2023.333137111(127855-127865)Online publication date: 2023
          • (2022)Graph Convolutional Adversarial Networks for Spatiotemporal Anomaly DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313617133:6(2416-2428)Online publication date: Jun-2022
          • (2020)Artificial intelligence and big data in tourism: a systematic literature reviewJournal of Hospitality and Tourism Technology10.1108/JHTT-12-2018-0118ahead-of-print:ahead-of-printOnline publication date: 29-May-2020
          • (2020)Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM toolsISPRS Journal of Photogrammetry and Remote Sensing10.1016/j.isprsjprs.2020.04.016167(230-251)Online publication date: Sep-2020
          • (2019)A Pseudo-likelihood Approach for Geo-localization of Events from Crowd-sourced Sensor-MetadataACM Transactions on Multimedia Computing, Communications, and Applications10.1145/332170115:3(1-26)Online publication date: 20-Aug-2019
          • (2019)Codebook-Free Compact Descriptor for Scalable Visual SearchIEEE Transactions on Multimedia10.1109/TMM.2018.285662821:2(388-401)Online publication date: 1-Feb-2019
          • (2018)Real-Time Multimedia Social Event Detection in MicroblogIEEE Transactions on Cybernetics10.1109/TCYB.2017.276234448:11(3218-3231)Online publication date: Nov-2018
          • (2018)Landmark Image Retrieval by Jointing Feature Refinement and Multimodal Classifier LearningIEEE Transactions on Cybernetics10.1109/TCYB.2017.271279848:6(1682-1695)Online publication date: Jun-2018
          • (2017)An improved CS-LSSVM algorithm-based fault pattern recognition of ship power equipmentsPLOS ONE10.1371/journal.pone.017124612:2(e0171246)Online publication date: 9-Feb-2017
          • (2017)Feature-centric ranking algorithms for georeferenced video searchProceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/3139958.3139976(1-10)Online publication date: 7-Nov-2017
          • Show More Cited By

          View Options

          Get Access

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media