article

What's missing in geographical parsing?

Authors:

Mohammad Taher Pilehvar,

Nut Limsopatham,

Nigel CollierAuthors Info & Claims

Language Resources and Evaluation, Volume 52, Issue 2

Pages 603 - 623

https://doi.org/10.1007/s10579-017-9385-8

Published: 01 June 2018 Publication History

Abstract

Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information in many real-world applications such as emergency responses, real-time social media geographical event analysis, understanding location instructions in auto-response systems and more. However, geoparsing is still widely regarded as a challenge because of domain language diversity, place name ambiguity, metonymic language and limited leveraging of context as we show in our analysis. Results to date, whilst promising, are on laboratory data and unlike in wider NLP are often not cross-compared. In this study, we evaluate and analyse the performance of a number of leading geoparsers on a number of corpora and highlight the challenges in detail. We also publish an automatically geotagged Wikipedia corpus to alleviate the dearth of (open source) corpora in this domain.

References

[1]

Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., & Curran, J. R. (2009). Named entity recognition in Wikipedia. In Proceedings of the 2009 workshop on the people's web meets NLP: Collaboratively constructed semantic resources (pp. 10-18).

[2]

Brando, C., Frontini, F., & Ganascia, J.-G. (2016). REDEN: Named entity linking in digital literary editions using linked datasets. Complex Systems Informatics and Modeling Quarterly, 7, 60-80.

[3]

Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geolocating Twitter users. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 759-768).

[4]

DeLozier, G., Baldridge, J., & London, L. (2015). Gazetteer-independent toponym resolution using geographic word profiles. In Aaai (pp. 2382-2388).

[5]

DeLozier, G., Wing, B., Baldridge, J., & Nesbit, S. (2016). Creating a novel geolocation corpus from historical texts. LAW X (p. 188).

[6]

Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613-619.

[7]

Gale, W. A., Church, K. W., & Yarowsky, D. (1992). One sense per discourse. In Proceedings of the workshop on speech and natural language (pp. 233-237).

[8]

Gelernter, J., & Balaji, S. (2013). An algorithm for local geoparsing of microtext. GeoInformatica, 17(4), 635-667.

Digital Library

[9]

Gentleman, R., & Lang, D. T. (2012). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 18-19.

[10]

Grover, C., Tobin, R., Byrne, K., Woollard, M., Reid, J., Dunn, S., et al. (2010). Use of the Edinburgh Geoparser for georeferencing digitized historical collections. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1925), 3875-3889.

[11]

Jurgens, D., Finethy, T., McCorriston, J., Xu, Y. T., & Ruths, D. (2015). Geolocation prediction in Twitter using social networks: A critical analysis and review of current practice. In Proceedings of the 9th international AAAI conference on weblogs and social media (icwsm).

[12]

Karimzadeh, M., Huang, W., Banerjee, S., Wallgrün, J. O., Hardisty, F., Pezanowski, S., & MacEachren, A. M. (2013). GeoTxt: A web API to leverage place references in text. In Proceedings of the 7th workshop on geographic information retrieval (pp. 72-73).

[13]

Leek, J. T., & Peng, R. D. (2015). Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences, 112(6), 1645-1646.

[14]

Leidner, J. L. (2006). An evaluation dataset for the toponym resolution task. Computers, Environment and Urban Systems, 30(4), 400-417.

[15]

Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., & Lee, B. S. (2012). Twiner: Named entity recognition in targeted Twitter stream. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 721-730).

[16]

Lieberman, M.D., Samet, H., & Sankaranarayanan, J. (2010). Geotagging with local lexicons to build indexes for textually-specified spatial data. In 2010 IEEE 26th international conference on data engineering (ICDE 2010) (pp. 201-212).

[17]

Lingad, J., Karimi, S., & Yin, J. (2013). Location extraction from disaster-related microblogs. In Proceedings of the 22nd international conference on world wide web (pp. 1017-1020).

[18]

Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., & Gómez-Berbís, J. M. (2013). Named entity recognition: Fallacies, challenges and opportunities. Computer Standards & Interfaces, 35(5), 482-489.

[19]

Mota, C., & Grishman, R. (2008). Is this NE tagger getting old? In Lrec.

[20]

Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227.

[21]

Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in Named Entity Recognition. In Proceedings of the 13th conference on computational natural language learning (pp. 147-155).

[22]

Simon, R., Isaksen, L., Barker, E., & De Soto Cañamares, P. (2015). The pleiades gazetteer and the pelagios project. Bloomington: Indiana University Press.

[23]

Speck, R., & Ngomo, A. C. N. (2014). Ensemble learning for named entity recognition. In International semantic web conference (pp. 519-534).

[24]

Sufi, S., Hong, N. C., Hettrick, S., Antonioletti, M., Crouch, S., Hay, A., et al. (2014). Software in reproducible research: Advice and best practice collected from experiences at the collaborations workshop. In Proceedings of the 1st ACM sigplan workshop on reproducible research methodologies and new publication models in computer engineering (p. 2).

[25]

Tobin, R., Grover, C., Byrne, K., Reid, J., & Walsh, J. (2010). Evaluation of georeferencing. In Proceedings of the 6th workshop on geographic information retrieval (p. 7).

[26]

Wallgrün, J.O., Hardisty, F., MacEachren, A. M., Karimzadeh, M., Ju, Y., & Pezanowski, S. (2014). Construction and first analysis of a corpus for the evaluation and training of Microblog/Twitter geoparsers. In Proceedings of the 8th workshop on geographic information retrieval (p. 4).

[27]

Yosef, M. A., Hoffart, J., Bordino, I., Spaniol, M., & Weikum, G. (2011). Aida: An online tool for accurate disambiguation of named entities in text and tables. Proceedings of the VLDB Endowment, 4(12), 1450-1453.

Digital Library

[28]

Zhang, W., & Gelernter, J. (2014). Geocoding location expressions in Twitter messages: A preference learning method. Journal of Spatial Information Science, 2014(9), 37-70.

Cited By

Mai GHuang WSun JSong SMishra DLiu NGao SLiu TCong GHu YCundy CLi ZZhu RLao N(2024)On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/365307010:2(1-46)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3653070
Hu XPurves RMoncla LKersten JStock K(2024)2nd International Workshop on Geographic Information Extraction from Texts (GeoExT 2024)Advances in Information Retrieval10.1007/978-3-031-56069-9_60(437-441)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56069-9_60
Hu XZhou ZLi HHu YGu FKersten JFan HKlan F(2023)Location Reference Recognition from Texts: A Survey and ComparisonACM Computing Surveys10.1145/362581956:5(1-37)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3625819
Show More Cited By

Recommendations

Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

Location extraction, also called “toponym extraction,” is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five “best-...
A Framework for False Negative Detection in NER/NEL
Natural Language Processing and Information Systems
Abstract
Finding the false negatives of a NER/NEL system is fundamental to improve it, and is usually done by manual annotation of texts. However, in an environment with a huge volume of unannotated texts (e.g. a hospital) and a low frequency of positives (...
A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics
Abstract
Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage ...

Comments

Information & Contributors

Information

Published In

cover image Language Resources and Evaluation

Language Resources and Evaluation Volume 52, Issue 2

June 2018

304 pages

ISSN:1574-020X

Issue’s Table of Contents

Copyright © Copyright © 2018 Springer Science+Business Media B.V., part of Springer Nature.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mai GHuang WSun JSong SMishra DLiu NGao SLiu TCong GHu YCundy CLi ZZhu RLao N(2024)On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/365307010:2(1-46)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3653070
Hu XPurves RMoncla LKersten JStock K(2024)2nd International Workshop on Geographic Information Extraction from Texts (GeoExT 2024)Advances in Information Retrieval10.1007/978-3-031-56069-9_60(437-441)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56069-9_60
Hu XZhou ZLi HHu YGu FKersten JFan HKlan F(2023)Location Reference Recognition from Texts: A Survey and ComparisonACM Computing Surveys10.1145/362581956:5(1-37)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3625819
Hu XHu YResch BKersten J(2023)Geographic Information Extraction from Texts (GeoExT)Advances in Information Retrieval10.1007/978-3-031-28241-6_44(398-404)Online publication date: 2-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-28241-6_44
Bauman TOmer IDalyot S(2022)Constructing Place Representations from Human-Generated Descriptions in HebrewWeb and Wireless Geographical Information Systems10.1007/978-3-031-06245-2_5(51-60)Online publication date: 28-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-031-06245-2_5
Fize JRoche MTeisseire M(2020)Could spatial features help the matching of textual data?Intelligent Data Analysis10.3233/IDA-19474924:5(1043-1064)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/IDA-194749
Bartlett RPurves RJones C(2019)Local geographic information storing and querying using ElasticsearchProceedings of the 13th Workshop on Geographic Information Retrieval10.1145/3371140.3371144(1-4)Online publication date: 28-Nov-2019
https://dl.acm.org/doi/10.1145/3371140.3371144
Wang JHu Y(2019)Are we there yet?Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities10.1145/3356991.3365470(1-6)Online publication date: 5-Nov-2019
https://dl.acm.org/doi/10.1145/3356991.3365470
Cardoso AMartins BEstima J(2019)Using Recurrent Neural Networks for Toponym Resolution in TextProgress in Artificial Intelligence10.1007/978-3-030-30244-3_63(769-780)Online publication date: 3-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-30244-3_63
Hu Y(2018)EUPEGProceedings of the 12th Workshop on Geographic Information Retrieval10.1145/3281354.3281357(1-2)Online publication date: 6-Nov-2018
https://dl.acm.org/doi/10.1145/3281354.3281357

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents