Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178876.3186078acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

Strategies for Geographical Scoping and Improving a Gazetteer

Published: 10 April 2018 Publication History

Abstract

Many applications that use geographical databases (a.k.a. gazetteers) rely on the accuracy of the information in the database. However, poor data quality is an issue when data is integrated from multiple sources with different quality constraints and sometimes with little information about the sources. One major consequence of this is that the geographical scope of a location and/or its position may not be known or may not be accurate. In this paper, we study the problem of detecting the scope of locations in a geographical database and its applications in identifying inconsistencies and improving the quality of a gazetteer. We develop novel strategies, including probabilistic and geometric approaches, to accurately derive the geographical scope of places based on the spatial hierarchy of a gazetteer as well as other public information (such as area) that may be available. We show how the boundary information derived here can be useful in identifying inconsistencies, enhancing the location hierarchy and improving the applications that rely on gazetteers. Our experimental evaluation on two public-domain gazetteers reveals that the proposed approaches significantly outperform, in terms of the accuracy of the geographical bounding boxes, a baseline that is based on the parent-child relationship of a gazetteer. Among applications, we show that the boundary information derived here can move more than 20% of locations in a public gazetteer to better positions in the hierarchy and that the accuracy of those moves is over 90%.

References

[1]
Dirk Ahlers. 2013. Assessment of the accuracy of GeoNames gazetteer data Proceedings of the 7th Workshop on Geographic Information Retrieval. ACM, 74--81.
[2]
Harith Alani, Christopher B Jones, and Douglas Tudhope. 2001. Voronoi-based region approximation for geographical information retrieval with gazetteers. International Journal of Geographical Information Science Vol. 15, 4 (2001), 287--306.
[3]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: an efficient and robust access method for points and rectangles ACM Sigmod Record, Vol. Vol. 19. Acm, 322--331.
[4]
Jens Bleiholder and Felix Naumann. 2009. Data fusion. ACM Computing Surveys (CSUR) Vol. 41, 1 (2009), 1.
[5]
Volha Bryl and Christian Bizer. 2014. Learning conflict resolution strategies for cross-language wikipedia data fusion. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 1129--1134.
[6]
Cláudio El'ızio Calazans Campelo and Cláudio de Souza Baptista. 2008. Geographic scope modeling for web documents. In Proceedings of the 2nd international workshop on Geographic information retrieval. ACM, 11--18.
[7]
Cláudio Campelo and Cláudio de Souza Baptista. 2009. A model for geographic knowledge extraction on web documents. Advances in Conceptual Modeling-Challenging Perspectives (2009), 317--326.
[8]
Jiaoli Chen and Shih-Lung Shaw. 2016. Representing the Spatial Extent of Places Based on Flickr Photos with a Representativeness-Weighted Kernel Density Estimation. In International Conference on Geographic Information Science. Springer, 130--144.
[9]
Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and Bart Thomee. 2016. The Placing Task at MediaEval 2016. MediaEval 2016 Workshop (Oct. 20--21. 2016).
[10]
Xin Luna Dong and Felix Naumann. 2009. Data fusion: resolving data conflicts for integration. Proceedings of the VLDB Endowment Vol. 2, 2 (2009), 1654--1655.
[11]
Jan Funke, Fred A Hamprecht, and Chong Zhang. 2015. Learning to segment: training hierarchical segmentation under a topological loss. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 268--275.
[12]
Maurizio Gibin, Alex Singleton, Richard Milton, Pablo Mateos, and Paul Longley. 2008. An exploratory cartographic visualisation of London through the Google Maps API. Applied Spatial Analysis and Policy Vol. 1, 2 (2008), 85--97.
[13]
Victoria Hodge and Jim Austin. 2004. A survey of outlier detection methodologies. Artificial intelligence review Vol. 22, 2 (2004), 85--126.
[14]
Livia Hollenstein and Ross Purves. 2012. Exploring place through user-generated content: Using Flickr tags to describe city cores. Journal of Spatial Information Science Vol. 2010, 1 (2012), 21--48.
[15]
Ehsan Kamalloo and Davood Rafiei. 2018. A coherent unsupervised model for toponym resolution Proceedings of the Web (former WWW) Conference. ACM.deftempurl%
[16]
Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2015. Geotagging social media content with a refined language modelling approach Pacific-Asia Workshop on Intelligence and Security Informatics. Springer, 21--40.
[17]
Weimo Liu, Md Farhadur Rahman, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das. 2015. Aggregate estimations over location based services. Proceedings of the VLDB Endowment Vol. 8, 12 (2015), 1334--1345.
[18]
Maxwell Guimar aes de Oliveira, Cláudio EC Campelo, Cláudio de Souza Baptista, and Michela Bertolotto. 2016. Gazetteer enrichment for addressing urban areas: a case study. Journal of Location Based Services Vol. 10, 2 (2016), 142--159.
[19]
Jonathon K Parker and Joni A Downs. 2013. Footprint generation using fuzzy-neighborhood clustering. Geoinformatica Vol. 17, 2 (2013), 285--299.
[20]
Adrian Popescu, Gregory Grefenstette, and Pierre Alain Moëllic. 2008. Gazetiki: automatic creation of a geographical gazetteer Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries. ACM, 85--93.
[21]
Martin F Porter. 2001. Snowball: A language for stemming algorithms. (2001).
[22]
Rosanne Price, Nectaria Tryfona, and Christian S Jensen. 2001. Modeling topological constraints in spatial part-whole relationships International Conference on Conceptual Modeling. Springer, 27--40.
[23]
Nataliya Prokoshyna, Jaroslaw Szlichta, Fei Chiang, Renée J Miller, and Divesh Srivastava. 2015. Combining quantitative and logical data cleaning. Proceedings of the VLDB Endowment Vol. 9, 4 (2015), 300--311.
[24]
C Carl Robusto. 1957. The cosine-haversine formula. The American Mathematical Monthly Vol. 64, 1 (1957), 38--40.
[25]
Peter J Rousseeuw, Ida Ruts, and John W Tukey. 1999. The bagplot: a bivariate boxplot. The American Statistician Vol. 53, 4 (1999), 382--387.
[26]
Pavel Serdyukov, Vanessa Murdock, and Roelof Van Zwol. 2009. Placing flickr photos on a map. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 484--491.
[27]
Sanket Singh and Davood Rafiei. 2016. Geotagging Flickr Photos And Videos Using Language Models, In MediaEval 2016 Working Notes Proceedings. Available from World Wide Web: http://slim-sig.irisa.fr/me16proc/.
[28]
Mar'ıa J Somodevilla and Fred E Petry. 2004. Fuzzy minimum bounding rectangles. In Spatio-Temporal Databases. Springer, 237--263.
[29]
Kurt Stüwe. 2007. Geodynamics of the lithosphere: An introduction. Springer Science & Business Media.
[30]
Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. Yfcc100m: The new data in multimedia research. Commun. ACM Vol. 59, 2 (2016), 64--73.
[31]
Mark Wick and Bernard Vatant. 2012. The geonames geographical database. Available from World Wide Web: http://geonames. org (2012).
[32]
David F Williamson, Robert A Parker, and Juliette S Kendrick. 1989. The box plot: a simple visual method to interpret data. Annals of internal medicine Vol. 110, 11 (1989), 916--921.
[33]
Jiangwei Yu and Davood Rafiei. 2016. Geotagging Named Entities in News and Online Documents Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1321--1330.
[34]
Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) Vol. 22, 2 (2004), 179--214.
[35]
Wei Zhang and Judith Gelernter. 2014. Geocoding location expressions in Twitter messages: A preference learning method. Journal of Spatial Information Science Vol. 2014, 9 (2014), 37--70.

Cited By

View all
  • (2021)To Honor our Heroes: Analysis of the Obituaries of Australians Killed in Action in WWI and WWII2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9413145(6965-6972)Online publication date: 10-Jan-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 10 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gazetteer improvement
  2. geographical scoping
  3. geotagging

Qualifiers

  • Research-article

Funding Sources

  • Natural Sciences and Engineering Research Council of Canada

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)17
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)To Honor our Heroes: Analysis of the Obituaries of Australians Killed in Action in WWI and WWII2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9413145(6965-6972)Online publication date: 10-Jan-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media