research-article

Open access

Learning Multi-context Aware Location Representations from Large-scale Geotagged Images

Authors:

Rajiv Ratn Shah,

Roger ZimmermannAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 899 - 907

https://doi.org/10.1145/3474085.3475268

Published: 17 October 2021 Publication History

Abstract

With the ubiquity of sensor-equipped smartphones, it is common to have multimedia documents uploaded to the Internet that have GPS coordinates associated with them. Utilizing such geotags as an additional feature is intuitively appealing for improving the performance of location-aware applications. However, raw GPS coordinates are fine-grained location indicators without any semantic information. Existing methods on geotag semantic encoding mostly extract hand-crafted, application-specific location representations that heavily depend on large-scale supplementary data and thus cannot perform efficiently on mobile devices. In this paper, we present a machine learning based approach, termed GPS2Vec+, which learns rich location representations by capitalizing on the world-wide geotagged images. Once trained, the model has no dependence on the auxiliary data anymore so it encodes geotags highly efficiently by inference. We extract visual and semantic knowledge from image content and user-generated tags, and transfer the information into locations by using geotagged images as a bridge. To adapt to different application domains, we further present an attention-based fusion framework that estimates the importance of the learnt location representations under different contexts for effective feature fusion. Our location representations yield significant performance improvements over the state-of-the-art geotag encoding methods on image classification and venue annotation.

References

[1]

American Community Survey. 2020. http://www.census.gov/acs/www/.

[2]

Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning Sound Representations from Unlabeled Video. In Advances in neural information processing systems. 892--900.

Digital Library

[3]

Julia Bernd, Damian Borth, Carmen Carrano, Jaeyoung Choi, Benjamin Elizalde, Gerald Friedland, Luke Gottlieb, Karl Ni, Roger Pearce, Doug Poland, et al. 2015. Kickstarting the Commons: The YFCC100M and the YLI Corpora. In Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions. 1--6.

Digital Library

[4]

Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and Bart Thomee. 2015. The Placing Task at MediaEval 2015. In MediaEval.

[5]

Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. 2018. Functional Map of the World. In IEEE Conference on Computer Vision and Pattern Recognition.

[6]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-world Web Image Database from National University of Singapore. In ACM International Conference on Image and Video Retrieval. 48:1--48:9.

Digital Library

[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-scale Hierarchical Image Database. In IEEE CVPR. 248--255.

[8]

Mohamad Dolatshah, Ali Hadian, and Behrouz Minaei-Bidgoli. 2015. Ball*-tree: Efficient Spatial Indexing for Constrained Nearest-Neighbor Search in Metric Spaces. arXiv preprint arXiv:1511.00628 (2015).

[9]

GeoNames. 2020. http://www.geonames.org/.

[10]

Google Maps. 2020. https://maps.google.com/.

[11]

Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross Modal Distillation for Supervision Transfer. In IEEE CVPR. 2827--2836.

[12]

Dhiraj Joshi and Jiebo Luo. 2008. Inferring Generic Activities and Events from Image Content and Bags of Geo-tags. In International Conference on Content-based Image and Video Retrieval. 37--46.

Digital Library

[13]

Jim Kleban, Emily Moxley, Jiejun Xu, and B. S. Manjunath. 2009. Global Annotation on Georeferenced Photographs. In ACM International Conference on Image and Video Retrieval. 12:1--12:8.

Digital Library

[14]

John Krumm and Dany Rouhana. 2013. Placer: Semantic Place Labels from Diary Data. In ACM International Joint Conference on Pervasive and Ubiquitous Computing. 163--172.

Digital Library

[15]

Xirong Li, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2012. Fusing Concept Detection and Geo Context for Visual Search. In ACM International Conference on Multimedia Retrieval. 4:1--4:8.

Digital Library

[16]

S. Liao, X. Li, H. T. Shen, Y. Yang, and X. Du. 2015. Tag Features for Geo-Aware Image Classification. IEEE Transactions on Multimedia, Vol. 17, 7 (2015), 1058--1067.

Digital Library

[17]

Hatem Mousselly-Sergieh, Daniel Watzinger, Bastian Huber, Mario Döller, Elöd Egyed-Zsigmond, and Harald Kosch. 2014. World-wide Scale Geotagged Image Dataset for Automatic Image Annotation and Reverse Geotagging. In ACM Multimedia Systems Conference. 47--52.

Digital Library

[18]

Xueming Qian, Xiaoxiao Liu, Chao Zheng, Youtian Du, and Xingsong Hou. 2013. Tagging Photos Using Users' Vocabularies. Neurocomputing (2013), 144--153.

Digital Library

[19]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).

[20]

Vincent Spruyt. 2018. Loc2Vec: Learning Location Embeddings with Triplet-loss Networks. https://www.sentiance.com/2018/05/03/loc2vec-learning-location-embeddings-w-triplet-loss-networks/.

[21]

Kevin Tang, Manohar Paluri, Li Fei-Fei, Rob Fergus, and Lubomir Bourdev. 2015. Improving Image Classification With Location Context. In IEEE International Conference on Computer Vision. 1008--1016.

Digital Library

[22]

G. Wang, D. Hoiem, and D. Forsyth. 2009. Building Text Features for Object Image Classification. In IEEE Conference on Computer Vision and Pattern Recognition. 1367--1374.

[23]

Dingqi Yang, Daqing Zhang, Longbiao Chen, and Bingqing Qu. 2015. NationTelescope: Monitoring and Visualizing Large-scale Collective Behavior in LBSNs. Journal of Network and Computer Applications, Vol. 55 (2015), 170--180.

[24]

Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2016. Participatory Cultural Mapping based on Collective Behavior in Location based Social Networks. ACM Transactions on Intelligent Systems and Technology, Vol. 7, 3 (2016), 30:1--30:23.

Digital Library

[25]

Di Yao, Chao Zhang, Jianhui Huang, and Jingping Bi. 2017. SERM: A Recurrent Model for Next Location Prediction in Semantic Trajectories. In ACM International Conference on Information and Knowledge Management. 2411--2414.

Digital Library

[26]

Mao Ye, Dong Shou, Wang-Chien Lee, Peifeng Yin, and Krzysztof Janowicz. 2011. On the Semantic Annotation of Places in Location-based Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining. 520--528.

Digital Library

[27]

Yifang Yin, Zhenguang Liu, Ying Zhang, Sheng Wang, Rajiv Ratn Shah, and Roger Zimmermann. 2019. GPS2Vec: Towards Generating Worldwide GPS Embeddings. 416--419.

Digital Library

[28]

Yifang Yin, Beomjoo Seo, and Roger Zimmermann. 2015. Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 11, 3 (2015), 39:1--39:21.

Digital Library

[29]

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. 2018. Places: A 10 Million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 6 (2018), 1452--1464.

[30]

Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning Deep Features for Scene Recognition using Places Database. In NIPS. 487--495.

Digital Library

Cited By

Zou XYan YHao XHu YWen HLiu EZhang JLi YLi TZheng YLiang Y(2025)Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlookInformation Fusion10.1016/j.inffus.2024.102606113(102606)Online publication date: Jan-2025
https://doi.org/10.1016/j.inffus.2024.102606
Rahin SHui BLi W(2024)Location-Aware Context Detection Based-On Behavior Sensors2024 6th International Conference on Computer Communication and the Internet (ICCCI)10.1109/ICCCI62159.2024.10674531(83-93)Online publication date: 14-Jun-2024
https://doi.org/10.1109/ICCCI62159.2024.10674531
Al-Saedi ABoeva VCasalicchio EExner P(2022)Context-Aware Edge-Based AI Models for Wireless Sensor Networks—An OverviewSensors10.3390/s2215554422:15(5544)Online publication date: 25-Jul-2022
https://doi.org/10.3390/s22155544

Index Terms

Learning Multi-context Aware Location Representations from Large-scale Geotagged Images
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Document representation
  2. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Inferring photographic location using geotagged web images

Geotagging has become a recent phenomenon that allows users to visualize and manage photo collections in many new and interesting ways. Unfortunately, manual geotagging of a large collection of pictures on the globe is still a time-consuming and ...
Identification of scene locations from geotagged images

Due to geotagging capabilities of consumer cameras, it has become easy to capture the exact geometric location where a picture is taken. However, the location is not the whereabouts of the scene taken by the photographer but the whereabouts of the ...
Context-aware location prediction
MSM/MUSE/SenseML'14: Proceedings of the 5th and 1st International Conference on Big Data Analytics in the Social and Ubiquitous Context - 5th International Workshop on Modeling Social Media, 5th International Workshop on Mining Ubiquitous and Social Environments and First International Workshop on Machine Learning for Urban Sensor Data

Predicting the future location of mobile objects has become an important and challenging problem. With the widespread use of mobile devices, applications of location prediction include location-based services, resource allocation, handoff management in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Singapore Ministry of Education

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
582
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)24

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zou XYan YHao XHu YWen HLiu EZhang JLi YLi TZheng YLiang Y(2025)Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlookInformation Fusion10.1016/j.inffus.2024.102606113(102606)Online publication date: Jan-2025
https://doi.org/10.1016/j.inffus.2024.102606
Rahin SHui BLi W(2024)Location-Aware Context Detection Based-On Behavior Sensors2024 6th International Conference on Computer Communication and the Internet (ICCCI)10.1109/ICCCI62159.2024.10674531(83-93)Online publication date: 14-Jun-2024
https://doi.org/10.1109/ICCCI62159.2024.10674531
Al-Saedi ABoeva VCasalicchio EExner P(2022)Context-Aware Edge-Based AI Models for Wireless Sensor Networks—An OverviewSensors10.3390/s2215554422:15(5544)Online publication date: 25-Jul-2022
https://doi.org/10.3390/s22155544

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents