Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3149858.3149865acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article
Open access

A deeply annotated testbed for geographical text analysis: The Corpus of Lake District Writing

Published: 07 November 2017 Publication History

Abstract

This paper describes the development of an annotated corpus which forms a challenging testbed for geographical text analysis methods. This dataset, the Corpus of Lake District Writing (CLDW), consists of 80 manually digitised and annotated texts (comprising over 1.5 million word tokens). These texts were originally composed between 1622 and 1900, and they represent a range of different genres and authors. Collectively, the texts in the CLDW constitute an indicative sample of writing about the English Lake District during the early seventeenth century and the early twentieth century. The corpus is annotated more deeply than is currently possible with vanilla Named Entity Recognition, Disambiguation and geoparsing. This is especially true of the geographical information the corpus contains, since we have undertaken not only to link different historical and spelling variants of place-names, but also to identify and to differentiate geographical features such as waterfalls, woodlands, farms or inns. In addition, we illustrate the potential of the corpus as a gold standard by evaluating the results of three different NLP libraries and geoparsers on its contents. In the evaluation, the standard NER processing of the text by the different NLP libraries produces many false positive and false negative results, showing the strength of the gold standard.

References

[1]
Mariona Coll Ardanuy and Caroline Sporleder. 2017. Toponym Disambiguation in Historical Documents Using Semantic and Geographic Features. In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage (DATeCH2017). ACM, New York, NY, USA, 175--180.
[2]
James O. Butler, Christopher E. Donaldson, Joanna E. Taylor, and Ian N. Gregory. 2017. Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition. Journal of Map & Geography Libraries 13, 1 (2017), 58--81.
[3]
Gregory Crane. 2000. The Perseus Digital Library. (2000). http://www.perseus.tufts.edu/
[4]
Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of EMNLP-CoNLL. http://aclanthology.coli.uni-saarland.de/pdf/D/D07/D07-1074.pdf
[5]
Daniel Defoe. 1983. A Tour thro' the Whole Island of Great Britain (1724-26), ed. by Pat Rogers. Penguin.
[6]
Grant DeLozier, Ben Wing, Jason Baldridge, and Scott Nesbit. 2016. Creating a Novel Geolocation Corpus from Historical Texts. In Proceedings of LAW X - The 10th Linguistic Annotation Workshop. Association for Computational Linguistics, 188--198.
[7]
Caleb Derven, Aja Teehan, and John Keating. 2014. Mapping and unmapping Joyce: Geoparsing wandering rocks. In In Proceedings of Digital Humanities 2014.
[8]
Christopher Donaldson, Ian N. Gregory, and Joanna E. Taylor. 2017. Locating the beautiful, picturesque, sublime and majestic: spatially analysing the application of aesthetic terminology in descriptions of the English Lake District. Journal of Historical Geography 56 (2017), 43--60.
[9]
Jenny Rose Finkel and Christopher D. Manning. 2009. Joint Parsing and Named Entity Recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 326--334. http://aclanthology.coli.uni-saarland.de/pdf/N/N09/N09-1037.pdf
[10]
Ian Gregory and Christopher Donaldson. 2016. Geographical text analysis: digital cartographies of Lake District literature. In Literary mapping in the digital age, David Cooper, Chris Donaldson, and Patricia Murrieta-Flores (Eds.). Routledge, 67--78.
[11]
Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham, and Nigel Collier. 2017. What's missing in geographical parsing? Language Resources and Evaluation (07 Mar 2017).
[12]
Claire Grover, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball. 2010. Use of the Edinburgh geoparser for georeferencing digitized historical collections. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 368, 1925 (2010), 3875--3889.
[13]
I. Heywood, S. Cornelius, and S. Carver. 2002. An Introduction to Geographical Information Systems. 2nd edition. Prentice Hall.
[14]
R.T. Tally Jr. 2013. Spatiality. Routledge.
[15]
Jochen L Leidner. 2008. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Universal Press, FL, USA.
[16]
P.A. Longley, M.F. Goodchild, D.J. Maguire, and D.W. Rhind. 2001. Geographical Information Systems and Science. Wiley.
[17]
Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing Nie. 2015. Joint Entity Recognition and Disambiguation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 879--888.
[18]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14-5010
[19]
Patricia Murrieta-Flores, Alistair Baron, Ian Gregory, Andrew Hardie, and Paul Rayson. 2015. Automatically analysing large texts in a GIS environment: the Registrar General's reports and cholera in the nineteenth century. Transactions in GIS 19, 2 (4 2015), 296--320.
[20]
Paolo Plini, Sabina Di Franco, and Rosamaria Salvatori. 2016. One name one place? Dealing with toponyms in WWI. GeoJournal (21 Nov 2016).
[21]
Catherine Porter, Paul Atkinson, and Ian Gregory. 2015. Geographical text analysis: a new approach to understanding nineteenth-century mortality. Health and Place 36 (11 2015), 25--34.
[22]
Erik Rauch, Michael Bukatin, and Kenneth Baker. 2003. A confidence-based framework for disambiguating geographic terms. (2003). http://aclanthology.coli.uni-saarland.de/pdf/W/W03/W03-0108.pdf
[23]
Paul Rayson, Alistair Baron, and Andrew Hardie. 2012. Which 'Lancaster' do you mean? Disambiguation challenges in extracting place names for Spatial Humanities. In Proceedings of the Digital Humanities Congress, Sheffield University.
[24]
C. J. Rupp, Paul Rayson, Alistair Baron, Christopher Donaldson, Ian Gregory, Andrew Hardie, and Patricia Murrieta-Flores. 2013. Customising geoparsing and georeferencing for historical texts. IEEE, 59--62.
[25]
C. J. Rupp, Paul Rayson, Ian Gregory, Andrew Hardie, Amelia Joulain, and Daniel Hartmann. 2014. Dealing with heterogeneous big data when geoparsing historical corpora. In Proceedings of the 2014 IEEE International Conference on Big Data. IEEE, 80--83.
[26]
Barker E. Simon, R. and L. Isaksen. 2012. Exploring Pelagios: a visual browser for geo-tagged datasets. Agirre et al. (eds.), International Workshop on Supporting Users' Exploration of Digital Libraries. (2012), 29--34.
[27]
Michael Speriosu and Jason Baldridge. 2013. Text-Driven Toponym Resolution using Indirect Supervision. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. ACL, 1466--1476. http://aclanthology.coli.uni-saarland.de/pdf/P/P13/P13-1144.pdf
[28]
David J Unwin. 1995. Geographical information systems and the problem of 'error and uncertainty'. Progress in Human Geography 19, 4 (1995), 549--558.
[29]
Matthew Wilkens. 2013. The Geographic Imagination of Civil War-Era American Fiction. American Literary History 25, 4 (2013), 803--840.

Cited By

View all
  • (2024)Exploring Qualitative Geographies in Large Volumes of Digital Text: Placing Tourists, Travelers, and Inhabitants in the English Lake DistrictAnnals of the American Association of Geographers10.1080/24694452.2024.2369593114:9(1985-2009)Online publication date: 15-Jul-2024
  • (2024)‘Spatializing’ Travel Narratives in the Belgrade Forest Project: Grounded Methods and Reflexive Strategies for Interdisciplinary CollaborationJournal of Map & Geography Libraries10.1080/15420353.2024.232820819:1-2(22-54)Online publication date: 10-Apr-2024
  • (2024)A survey on geocoding: algorithms and datasets for toponym resolutionLanguage Resources and Evaluation10.1007/s10579-024-09730-2Online publication date: 10-Jun-2024
  • Show More Cited By

Index Terms

  1. A deeply annotated testbed for geographical text analysis: The Corpus of Lake District Writing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GeoHumanities '17: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities
    November 2017
    60 pages
    ISBN:9781450354967
    DOI:10.1145/3149858
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2017

    Check for updates

    Author Tags

    1. corpus
    2. onomastics
    3. spatial humanities
    4. toponyms

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGSPATIAL'17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 15 of 21 submissions, 71%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)138
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploring Qualitative Geographies in Large Volumes of Digital Text: Placing Tourists, Travelers, and Inhabitants in the English Lake DistrictAnnals of the American Association of Geographers10.1080/24694452.2024.2369593114:9(1985-2009)Online publication date: 15-Jul-2024
    • (2024)‘Spatializing’ Travel Narratives in the Belgrade Forest Project: Grounded Methods and Reflexive Strategies for Interdisciplinary CollaborationJournal of Map & Geography Libraries10.1080/15420353.2024.232820819:1-2(22-54)Online publication date: 10-Apr-2024
    • (2024)A survey on geocoding: algorithms and datasets for toponym resolutionLanguage Resources and Evaluation10.1007/s10579-024-09730-2Online publication date: 10-Jun-2024
    • (2023)Exploring explorers, travellers, and tourists: digital humanities approaches in North America and the United KingdomMondes du tourisme10.4000/tourisme.6376Online publication date: 15-Dec-2023
    • (2023)Towards an Extensible Framework for Understanding Spatial NarrativesProceedings of the 7th ACM SIGSPATIAL International Workshop on Geospatial Humanities10.1145/3615887.3627761(1-10)Online publication date: 13-Nov-2023
    • (2022)Deep mapping middletownProceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities10.1145/3557919.3565815(28-31)Online publication date: 1-Nov-2022
    • (2022)The Value of Diary WritingEveryday Mobilities in Nineteenth- and Twentieth-Century British Diaries10.1007/978-3-031-12684-0_2(21-53)Online publication date: 20-Oct-2022
    • (2021)Deep Learning for Toponym Resolution: Geocoding Based on Pairs of ToponymsISPRS International Journal of Geo-Information10.3390/ijgi1012081810:12(818)Online publication date: 2-Dec-2021
    • (2020)Semantically geo-annotating an ancient Greek "travel guide" Itineraries, Chronotopes, Networks, and Linked DataProceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities10.1145/3423337.3429433(1-9)Online publication date: 3-Nov-2020
    • (2019)Places of the Holocaust: Towards a model of GIS of placeTransactions in GIS10.1111/tgis.1258324:4(842-857)Online publication date: 30-Sep-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media