Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Modeling locations with social media

Published: 01 February 2013 Publication History

Abstract

In this paper we focus on the locations explicit and implicit in users descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency. The geotagged public photos in Flickr serve as a convenient ground truth. Our results show that we can predict location within a one kilometer by one kilometer cell with 17 % accuracy, and within a three kilometer radius around such a one kilometer cell with 40 % accuracy, using only a photo’s tags. This is significantly better than the state of the art. Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.

References

[1]
Ahern, S., Naaman, M., Nair, R., & Yang, J. H.-I. (2007). World Explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07), pp. 1–10.
[2]
Amitay, E., Har’El, N., Sivan, R., & Soffer, A. (2004). Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04), pp. 273–280.
[3]
Backstrom, L., Kleinberg, J., Kumar, R., & Novak, J. (2008). Spatial variation in search engine queries. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08), pp. 357–366.
[4]
Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., & Rabitti, F. (2009). CoPhIR: A test collection for content-based image retrieval. CoRR, abs/0905.4627v2.
[5]
Chen L., Hu B.-G., Zhang L., Li M., and Zhang H. Face annotation for family photo album management International Journal of Image and Graphics 2003 3 1 81-94
[6]
Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM ’10), pp. 759–768.
[7]
Clements, M., Serdyukov, P., de Vries, A. P., & Reinders, M. J. T. (2010). Finding wormholes with flickr geotags. In Proceedings of the 32nd European Conference on Advances in Information Retrieval (ECIR ’10), pp. 658–661.
[8]
Crandall, D. J., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09), pp. 761–770.
[9]
Ding, J., Gravano, L., & Shivakumar, N. (2000). Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 545–556.
[10]
Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP ’10), pp. 1277–1287.
[11]
Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08).
[12]
Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL ’98) (pp. 569–584). London: Springer-Verlag.
[13]
Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using flickr to describe city cores. Journal of Spatial Information Science, (1).
[14]
Jones C. B., Purves R. S., Clough P. D., and Joho H. Modelling vague places with knowldge from the web International Journal of Geographical Information Science 2008 22 10 1045-1065
[15]
Jones R., Zhang W., Rey B., Jhala P., and Stipp E. Geographic intention and modification in web search International Journal of Geographical Information Science 2008 22 3 229-246
[16]
Kantor, P. B., & Voorhees, E. M. (1996). Report on the trec-5 confusion track. In NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), pp. 65–74.
[17]
Kennedy, L., Naaman, M., Ahern, S., Nair, R., & Rattenbury, T. (2007). How flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA ’07), pp. 631–640.
[18]
Lowe D. G. Distinctive image features from scale-invariant keypoints International Journal of Computer Vision 2004 60 91-110
[19]
Manning C. D. and Schütze H. Foundations of Statistical Natural Language Processing 1999 Cambridge, Massachusetts The MIT Press
[20]
Mc Donald, K., & Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the International Conference on Image and Video Retrieval (CIVR 2005), pp. 61–70.
[21]
Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on the World Wide Web (WWW ’06).
[22]
Moxley, E., Kleban, J., & Manjunath, B. S. (2008). Spirittagger: A geo-aware tag suggestion tool mined from flickr. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR ’08), pp. 24–30.
[23]
Murdock, V. (2006). Aspects of Sentence Retrieval. PhD thesis, University of Massachusetts.
[24]
Naaman, M., Paepcke, A., & Garcia-Molina, H. (2003). From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (COOPIS 2003).
[25]
Nov, O., Naaman, M., & Ye, C. (2010). Analysis of participation in an online photo-sharing community: A multidimensional perspective. Journal of the American Society for Information Science and Technology, 61(3).
[26]
O’Hare N. and Smeaton A. F. Context-aware person identification in personal photo collections IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management 2009 11 2 220-228
[27]
Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98), pp. 275–281.
[28]
Rattenbury, T., Good, N., & Naaman, M. (2007). Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’07).
[29]
Serdyukov, P., Murdock, V., & van Zwol, R. (2009). Placing flickr photos on a map. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09) (pp. 484–491). ACM.
[30]
Sigurbjörnsson, B., & van Zwol, R. (2008). Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China.
[31]
Smucker, M. D., & Allan, J. (2005). An investigation of dirichlet prior smoothing’s performance advantage. Technical Report CIIR Technical Report IR-548, The Center for Intelligent Information Retrieval, The University of Massachusetts.
[32]
Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the Eleventh ACM International Conference on Multimedia (MULTIMEDIA ’03), pp. 156–166.
[33]
Vadrevu, S., Zhang, Y., Tseng, B., Sun, G., & Li, X. (2008). Identifying regional sensitive queries in web search. In Proceedings of the 17th International Conference on the World Wide Web (WWW ’08).
[34]
van House, N. (2007). Flickr and public image-sharing: Distance closeness and photo exhibition. In Extended Abstracts CHI.
[35]
Vincenty T Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations Survey Review 1975 23 176 88-93
[36]
Wang, C., Wang, J., Xie, X., & Ma, W.-Y. (2007). Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop On Geographic Information Retrieval (GIR ’07).
[37]
Westerveld, T., de Vries, A. P., & van Ballegooij, A. R. (2003). CWI at the TREC-2002 video track. In NIST Special Publication: SP 500-251: The Eleventh Text REtrieval Conference (TREC 2002), pp. 207–216.
[38]
Yi, X., Raghavan, H., & Leggetter, C. (2009). Discovering users’ specific geo intention in web search. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09) (pp. 481–490). New York, NY, USA.
[39]
Zhuang, Z., Brunk, C., & Giles, C. L. (2008). Modeling and visualizing geosensitive queries based on user clicks. In First International Workshop on Location and the Web (LocWeb ’08).
[40]
Zong, W., Wu, D., Sun, A., Lim, E.-P., & Goh, D. H.-L. (2005). On assigning place names to geography related web pages. In Proceedings of the Joint Conference on Digital Libraries (JCDL ’05), pp. 354–362.

Cited By

View all

Index Terms

  1. Modeling locations with social media
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Information Retrieval
          Information Retrieval  Volume 16, Issue 1
          Feb 2013
          90 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 01 February 2013
          Accepted: 24 February 2012
          Received: 23 March 2011

          Author Tags

          1. Language models
          2. Geographic context
          3. Geotagging
          4. User-generated content
          5. Flickr

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 08 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2020)Webly Supervised Semantic Embeddings for Large Scale Zero-Shot LearningComputer Vision – ACCV 202010.1007/978-3-030-69544-6_31(514-531)Online publication date: 30-Nov-2020
          • (2019)Fine-grained Geolocation of Tweets in Temporal ProximityACM Transactions on Information Systems10.1145/329105937:2(1-33)Online publication date: 11-Jan-2019
          • (2018)Exploiting User and Venue Characteristics for Fine-Grained Tweet GeolocationACM Transactions on Information Systems10.1145/315666736:3(1-34)Online publication date: 2-Feb-2018
          • (2017)A spatial, temporal and sentiment based framework for indexing and clustering in twitter blogosphereJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-16929732:5(3619-3632)Online publication date: 1-Jan-2017
          • (2017)Point of interest mining with proper semantic annotationMultimedia Tools and Applications10.1007/s11042-016-4114-776:22(23435-23457)Online publication date: 1-Nov-2017
          • (2016)Data-Driven Transit Network Design From Mobile Phone TrajectoriesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2015.249678317:6(1724-1733)Online publication date: 26-May-2016
          • (2016)A survey on Flickr multimedia research challengesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2016.01.00651:C(71-91)Online publication date: 1-May-2016
          • (2016)Predicting celebrity attendees at public events using stock photo metadataMultimedia Tools and Applications10.1007/s11042-014-2399-y75:4(2145-2167)Online publication date: 1-Feb-2016
          • (2014)Text-based twitter user geolocation predictionJournal of Artificial Intelligence Research10.5555/2655713.265572649:1(451-500)Online publication date: 1-Jan-2014
          • (2014)Dynamic location modelsProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609552(1231-1234)Online publication date: 3-Jul-2014
          • Show More Cited By

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media