Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

What's missing in geographical parsing?

Published: 01 June 2018 Publication History

Abstract

Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information in many real-world applications such as emergency responses, real-time social media geographical event analysis, understanding location instructions in auto-response systems and more. However, geoparsing is still widely regarded as a challenge because of domain language diversity, place name ambiguity, metonymic language and limited leveraging of context as we show in our analysis. Results to date, whilst promising, are on laboratory data and unlike in wider NLP are often not cross-compared. In this study, we evaluate and analyse the performance of a number of leading geoparsers on a number of corpora and highlight the challenges in detail. We also publish an automatically geotagged Wikipedia corpus to alleviate the dearth of (open source) corpora in this domain.

References

[1]
Balasuriya, D., Ringland, N., Nothman, J., Murphy, T., & Curran, J. R. (2009). Named entity recognition in Wikipedia. In Proceedings of the 2009 workshop on the people's web meets NLP: Collaboratively constructed semantic resources (pp. 10-18).
[2]
Brando, C., Frontini, F., & Ganascia, J.-G. (2016). REDEN: Named entity linking in digital literary editions using linked datasets. Complex Systems Informatics and Modeling Quarterly, 7, 60-80.
[3]
Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: A content-based approach to geolocating Twitter users. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 759-768).
[4]
DeLozier, G., Baldridge, J., & London, L. (2015). Gazetteer-independent toponym resolution using geographic word profiles. In Aaai (pp. 2382-2388).
[5]
DeLozier, G., Wing, B., Baldridge, J., & Nesbit, S. (2016). Creating a novel geolocation corpus from historical texts. LAW X (p. 188).
[6]
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613-619.
[7]
Gale, W. A., Church, K. W., & Yarowsky, D. (1992). One sense per discourse. In Proceedings of the workshop on speech and natural language (pp. 233-237).
[8]
Gelernter, J., & Balaji, S. (2013). An algorithm for local geoparsing of microtext. GeoInformatica, 17(4), 635-667.
[9]
Gentleman, R., & Lang, D. T. (2012). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16(1), 18-19.
[10]
Grover, C., Tobin, R., Byrne, K., Woollard, M., Reid, J., Dunn, S., et al. (2010). Use of the Edinburgh Geoparser for georeferencing digitized historical collections. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1925), 3875-3889.
[11]
Jurgens, D., Finethy, T., McCorriston, J., Xu, Y. T., & Ruths, D. (2015). Geolocation prediction in Twitter using social networks: A critical analysis and review of current practice. In Proceedings of the 9th international AAAI conference on weblogs and social media (icwsm).
[12]
Karimzadeh, M., Huang, W., Banerjee, S., Wallgrün, J. O., Hardisty, F., Pezanowski, S., & MacEachren, A. M. (2013). GeoTxt: A web API to leverage place references in text. In Proceedings of the 7th workshop on geographic information retrieval (pp. 72-73).
[13]
Leek, J. T., & Peng, R. D. (2015). Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences, 112(6), 1645-1646.
[14]
Leidner, J. L. (2006). An evaluation dataset for the toponym resolution task. Computers, Environment and Urban Systems, 30(4), 400-417.
[15]
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., & Lee, B. S. (2012). Twiner: Named entity recognition in targeted Twitter stream. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (pp. 721-730).
[16]
Lieberman, M.D., Samet, H., & Sankaranarayanan, J. (2010). Geotagging with local lexicons to build indexes for textually-specified spatial data. In 2010 IEEE 26th international conference on data engineering (ICDE 2010) (pp. 201-212).
[17]
Lingad, J., Karimi, S., & Yin, J. (2013). Location extraction from disaster-related microblogs. In Proceedings of the 22nd international conference on world wide web (pp. 1017-1020).
[18]
Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., & Gómez-Berbís, J. M. (2013). Named entity recognition: Fallacies, challenges and opportunities. Computer Standards & Interfaces, 35(5), 482-489.
[19]
Mota, C., & Grishman, R. (2008). Is this NE tagger getting old? In Lrec.
[20]
Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227.
[21]
Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in Named Entity Recognition. In Proceedings of the 13th conference on computational natural language learning (pp. 147-155).
[22]
Simon, R., Isaksen, L., Barker, E., & De Soto Cañamares, P. (2015). The pleiades gazetteer and the pelagios project. Bloomington: Indiana University Press.
[23]
Speck, R., & Ngomo, A. C. N. (2014). Ensemble learning for named entity recognition. In International semantic web conference (pp. 519-534).
[24]
Sufi, S., Hong, N. C., Hettrick, S., Antonioletti, M., Crouch, S., Hay, A., et al. (2014). Software in reproducible research: Advice and best practice collected from experiences at the collaborations workshop. In Proceedings of the 1st ACM sigplan workshop on reproducible research methodologies and new publication models in computer engineering (p. 2).
[25]
Tobin, R., Grover, C., Byrne, K., Reid, J., & Walsh, J. (2010). Evaluation of georeferencing. In Proceedings of the 6th workshop on geographic information retrieval (p. 7).
[26]
Wallgrün, J.O., Hardisty, F., MacEachren, A. M., Karimzadeh, M., Ju, Y., & Pezanowski, S. (2014). Construction and first analysis of a corpus for the evaluation and training of Microblog/Twitter geoparsers. In Proceedings of the 8th workshop on geographic information retrieval (p. 4).
[27]
Yosef, M. A., Hoffart, J., Bordino, I., Spaniol, M., & Weikum, G. (2011). Aida: An online tool for accurate disambiguation of named entities in text and tables. Proceedings of the VLDB Endowment, 4(12), 1450-1453.
[28]
Zhang, W., & Gelernter, J. (2014). Geocoding location expressions in Twitter messages: A preference learning method. Journal of Spatial Information Science, 2014(9), 37-70.

Cited By

View all
  • (2024)On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/365307010:2(1-46)Online publication date: 1-Jul-2024
  • (2024)2nd International Workshop on Geographic Information Extraction from Texts (GeoExT 2024)Advances in Information Retrieval10.1007/978-3-031-56069-9_60(437-441)Online publication date: 24-Mar-2024
  • (2023)Location Reference Recognition from Texts: A Survey and ComparisonACM Computing Surveys10.1145/362581956:5(1-37)Online publication date: 27-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Language Resources and Evaluation
Language Resources and Evaluation  Volume 52, Issue 2
June 2018
304 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2018

Author Tags

  1. Geocoding
  2. Geoparsing
  3. Geotagging
  4. NED
  5. NEL
  6. NER
  7. NLP

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)ACM Transactions on Spatial Algorithms and Systems10.1145/365307010:2(1-46)Online publication date: 1-Jul-2024
  • (2024)2nd International Workshop on Geographic Information Extraction from Texts (GeoExT 2024)Advances in Information Retrieval10.1007/978-3-031-56069-9_60(437-441)Online publication date: 24-Mar-2024
  • (2023)Location Reference Recognition from Texts: A Survey and ComparisonACM Computing Surveys10.1145/362581956:5(1-37)Online publication date: 27-Nov-2023
  • (2023)Geographic Information Extraction from Texts (GeoExT)Advances in Information Retrieval10.1007/978-3-031-28241-6_44(398-404)Online publication date: 2-Apr-2023
  • (2022)Constructing Place Representations from Human-Generated Descriptions in HebrewWeb and Wireless Geographical Information Systems10.1007/978-3-031-06245-2_5(51-60)Online publication date: 28-Apr-2022
  • (2020)Could spatial features help the matching of textual data?Intelligent Data Analysis10.3233/IDA-19474924:5(1043-1064)Online publication date: 1-Jan-2020
  • (2019)Local geographic information storing and querying using ElasticsearchProceedings of the 13th Workshop on Geographic Information Retrieval10.1145/3371140.3371144(1-4)Online publication date: 28-Nov-2019
  • (2019)Are we there yet?Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities10.1145/3356991.3365470(1-6)Online publication date: 5-Nov-2019
  • (2019)Using Recurrent Neural Networks for Toponym Resolution in TextProgress in Artificial Intelligence10.1007/978-3-030-30244-3_63(769-780)Online publication date: 3-Sep-2019
  • (2018)EUPEGProceedings of the 12th Workshop on Geographic Information Retrieval10.1145/3281354.3281357(1-2)Online publication date: 6-Nov-2018

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media