Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475268acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Learning Multi-context Aware Location Representations from Large-scale Geotagged Images

Published: 17 October 2021 Publication History

Abstract

With the ubiquity of sensor-equipped smartphones, it is common to have multimedia documents uploaded to the Internet that have GPS coordinates associated with them. Utilizing such geotags as an additional feature is intuitively appealing for improving the performance of location-aware applications. However, raw GPS coordinates are fine-grained location indicators without any semantic information. Existing methods on geotag semantic encoding mostly extract hand-crafted, application-specific location representations that heavily depend on large-scale supplementary data and thus cannot perform efficiently on mobile devices. In this paper, we present a machine learning based approach, termed GPS2Vec+, which learns rich location representations by capitalizing on the world-wide geotagged images. Once trained, the model has no dependence on the auxiliary data anymore so it encodes geotags highly efficiently by inference. We extract visual and semantic knowledge from image content and user-generated tags, and transfer the information into locations by using geotagged images as a bridge. To adapt to different application domains, we further present an attention-based fusion framework that estimates the importance of the learnt location representations under different contexts for effective feature fusion. Our location representations yield significant performance improvements over the state-of-the-art geotag encoding methods on image classification and venue annotation.

References

[1]
American Community Survey. 2020. http://www.census.gov/acs/www/.
[2]
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning Sound Representations from Unlabeled Video. In Advances in neural information processing systems. 892--900.
[3]
Julia Bernd, Damian Borth, Carmen Carrano, Jaeyoung Choi, Benjamin Elizalde, Gerald Friedland, Luke Gottlieb, Karl Ni, Roger Pearce, Doug Poland, et al. 2015. Kickstarting the Commons: The YFCC100M and the YLI Corpora. In Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions. 1--6.
[4]
Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and Bart Thomee. 2015. The Placing Task at MediaEval 2015. In MediaEval.
[5]
Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. 2018. Functional Map of the World. In IEEE Conference on Computer Vision and Pattern Recognition.
[6]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-world Web Image Database from National University of Singapore. In ACM International Conference on Image and Video Retrieval. 48:1--48:9.
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A Large-scale Hierarchical Image Database. In IEEE CVPR. 248--255.
[8]
Mohamad Dolatshah, Ali Hadian, and Behrouz Minaei-Bidgoli. 2015. Ball*-tree: Efficient Spatial Indexing for Constrained Nearest-Neighbor Search in Metric Spaces. arXiv preprint arXiv:1511.00628 (2015).
[9]
GeoNames. 2020. http://www.geonames.org/.
[10]
Google Maps. 2020. https://maps.google.com/.
[11]
Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross Modal Distillation for Supervision Transfer. In IEEE CVPR. 2827--2836.
[12]
Dhiraj Joshi and Jiebo Luo. 2008. Inferring Generic Activities and Events from Image Content and Bags of Geo-tags. In International Conference on Content-based Image and Video Retrieval. 37--46.
[13]
Jim Kleban, Emily Moxley, Jiejun Xu, and B. S. Manjunath. 2009. Global Annotation on Georeferenced Photographs. In ACM International Conference on Image and Video Retrieval. 12:1--12:8.
[14]
John Krumm and Dany Rouhana. 2013. Placer: Semantic Place Labels from Diary Data. In ACM International Joint Conference on Pervasive and Ubiquitous Computing. 163--172.
[15]
Xirong Li, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2012. Fusing Concept Detection and Geo Context for Visual Search. In ACM International Conference on Multimedia Retrieval. 4:1--4:8.
[16]
S. Liao, X. Li, H. T. Shen, Y. Yang, and X. Du. 2015. Tag Features for Geo-Aware Image Classification. IEEE Transactions on Multimedia, Vol. 17, 7 (2015), 1058--1067.
[17]
Hatem Mousselly-Sergieh, Daniel Watzinger, Bastian Huber, Mario Döller, Elöd Egyed-Zsigmond, and Harald Kosch. 2014. World-wide Scale Geotagged Image Dataset for Automatic Image Annotation and Reverse Geotagging. In ACM Multimedia Systems Conference. 47--52.
[18]
Xueming Qian, Xiaoxiao Liu, Chao Zheng, Youtian Du, and Xingsong Hou. 2013. Tagging Photos Using Users' Vocabularies. Neurocomputing (2013), 144--153.
[19]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).
[20]
Vincent Spruyt. 2018. Loc2Vec: Learning Location Embeddings with Triplet-loss Networks. https://www.sentiance.com/2018/05/03/loc2vec-learning-location-embeddings-w-triplet-loss-networks/.
[21]
Kevin Tang, Manohar Paluri, Li Fei-Fei, Rob Fergus, and Lubomir Bourdev. 2015. Improving Image Classification With Location Context. In IEEE International Conference on Computer Vision. 1008--1016.
[22]
G. Wang, D. Hoiem, and D. Forsyth. 2009. Building Text Features for Object Image Classification. In IEEE Conference on Computer Vision and Pattern Recognition. 1367--1374.
[23]
Dingqi Yang, Daqing Zhang, Longbiao Chen, and Bingqing Qu. 2015. NationTelescope: Monitoring and Visualizing Large-scale Collective Behavior in LBSNs. Journal of Network and Computer Applications, Vol. 55 (2015), 170--180.
[24]
Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2016. Participatory Cultural Mapping based on Collective Behavior in Location based Social Networks. ACM Transactions on Intelligent Systems and Technology, Vol. 7, 3 (2016), 30:1--30:23.
[25]
Di Yao, Chao Zhang, Jianhui Huang, and Jingping Bi. 2017. SERM: A Recurrent Model for Next Location Prediction in Semantic Trajectories. In ACM International Conference on Information and Knowledge Management. 2411--2414.
[26]
Mao Ye, Dong Shou, Wang-Chien Lee, Peifeng Yin, and Krzysztof Janowicz. 2011. On the Semantic Annotation of Places in Location-based Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining. 520--528.
[27]
Yifang Yin, Zhenguang Liu, Ying Zhang, Sheng Wang, Rajiv Ratn Shah, and Roger Zimmermann. 2019. GPS2Vec: Towards Generating Worldwide GPS Embeddings. 416--419.
[28]
Yifang Yin, Beomjoo Seo, and Roger Zimmermann. 2015. Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 11, 3 (2015), 39:1--39:21.
[29]
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. 2018. Places: A 10 Million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 6 (2018), 1452--1464.
[30]
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning Deep Features for Scene Recognition using Places Database. In NIPS. 487--495.

Cited By

View all
  • (2025)Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlookInformation Fusion10.1016/j.inffus.2024.102606113(102606)Online publication date: Jan-2025
  • (2024)Location-Aware Context Detection Based-On Behavior Sensors2024 6th International Conference on Computer Communication and the Internet (ICCCI)10.1109/ICCCI62159.2024.10674531(83-93)Online publication date: 14-Jun-2024
  • (2022)Context-Aware Edge-Based AI Models for Wireless Sensor Networks—An OverviewSensors10.3390/s2215554422:15(5544)Online publication date: 25-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention-based fusion
  2. geo-aware applications
  3. location representations
  4. pre-trained neural networks

Qualifiers

  • Research-article

Funding Sources

  • Singapore Ministry of Education

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)199
  • Downloads (Last 6 weeks)24
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlookInformation Fusion10.1016/j.inffus.2024.102606113(102606)Online publication date: Jan-2025
  • (2024)Location-Aware Context Detection Based-On Behavior Sensors2024 6th International Conference on Computer Communication and the Internet (ICCCI)10.1109/ICCCI62159.2024.10674531(83-93)Online publication date: 14-Jun-2024
  • (2022)Context-Aware Edge-Based AI Models for Wireless Sensor Networks—An OverviewSensors10.3390/s2215554422:15(5544)Online publication date: 25-Jul-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media