Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3589132.3625579acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
short-paper

The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps

Published: 22 December 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Scanned historical maps in libraries and archives are valuable repositories of geographic data that often do not exist elsewhere. Despite the potential of machine learning tools like the Google Vision APIs for automatically transcribing text from these maps into machine-readable formats, they do not work well with large-sized images (e.g., high-resolution scanned documents), cannot infer the relation between the recognized text and other datasets, and are challenging to integrate with post-processing tools. This paper introduces the mapKurator system, an end-to-end system integrating machine learning models with a comprehensive data processing pipeline. mapKurator empowers automated extraction, post-processing, and linkage of text labels from large numbers of large-dimension historical map scans. The output data, comprising bounding polygons and recognized text, is in the standard GeoJSON format, making it easily modifiable within Geographic Information Systems (GIS). The proposed system allows users to quickly generate valuable data from large numbers of historical maps for in-depth analysis of the map content and, in turn, encourages map findability, accessibility, interoperability, and reusability (FAIR principles). We deployed the mapKurator system and enabled the processing of over 60,000 maps and over 100 million text/place names in the David Rumsey Historical Map collection. We also demonstrated a seamless integration of mapKurator with a collaborative web platform to enable accessing automated approaches for extracting and linking text labels from historical map scans and collective work to improve the results.

    References

    [1]
    Bianco et al. 2015. An interactive tool for manual, semi-automatic and automatic video annotation. Computer Vision and Image Understanding 131 (2015), 88--99.
    [2]
    Bojanowski et al. 2016. Enriching Word Vectors with Subword Information. arXiv:1607.04606 [cs.CL]
    [3]
    Chiang et al. 2020. Using Historical Maps in Scientific Studies Applications, Challenges, and Best Practices. Springer.
    [4]
    Gupta et al. 2016. Synthetic Data for Text Localisation in Natural Images. In Proc. of the IEEE CVPR '16. 2315--2324.
    [5]
    Huang et al. 2022. SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition. In Proc. of IEEE CVPR '22. 4593--4603.
    [6]
    Li et al. 2018. Intelligent map reader: A framework for topographic map understanding with deep learning and gazetteer. IEEE Access 6 (2018), 25363--25376.
    [7]
    Li et al. 2020. An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images. In Proc. of ACM SIGKDD '20. 3290--3298.
    [8]
    Li et al. 2021. Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection. In Proc. of ACM SIGSPATIAL GeoAI '21 Workshop. 17--26.
    [9]
    Russell et al. 2008. LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision 77, 1 (2008), 157--173.
    [10]
    Southall et al. 2017. GB1900: Engaging the Public in Very Large Scale Gazetteer Construction from the Ordnance Survey "County Series" 1:10,560 Mapping of Great Britain. Journal of Map & Geography Libraries 13, 1 (2017), 7--28.
    [11]
    Simon et al. 2017. Linked Data Annotation Without the Pointy Brackets: Introducing Recogito 2. Journal of Map & Geography Libraries 13, 1 (2017), 111--132.
    [12]
    Wilkinson et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3, 1 (2016), 1--9.
    [13]
    Weinman et al. 2019. Deep Neural Networks for Text Detection and Recognition in Historical Maps. In Proc. of IEEE ICDAR '19. IEEE, 902--909.
    [14]
    Zhu et al. 2020. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv:2010.04159 [cs.CL]
    [15]
    Zhang et al. 2022. Text Spotting Transformers. In Proc. of IEEE CVPR '22. 9519--9528.
    [16]
    Namgung and Chiang. 2022. Incorporating Spatial Context for Post-OCR in Map Images. In Proc. of ACM SIGSPATIAL GeoAI '22 Workshop. Association for Computing Machinery, New York, NY, USA, 14--17.
    [17]
    Robert E Roth. 2012. Cartographic Interaction Primitives: Framework and Synthesis. The Cartographic Journal 49, 4 (2012), 376--395.
    [18]
    Rumsey and Williams. 2002. Historical Maps in GIS.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGSPATIAL '23: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems
    November 2023
    686 pages
    ISBN:9798400701689
    DOI:10.1145/3589132
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 December 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic system
    2. historical maps
    3. text spotter
    4. linked data

    Qualifiers

    • Short-paper

    Conference

    SIGSPATIAL '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 220 of 1,116 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 34
      Total Downloads
    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)4

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media