research-article

Open access

A Computer Vision Approach for Detecting Discrepancies in Map Textual Labels

Authors:

Abdulrahman Salama,

Mahmoud Elkamhawy,

Abdeltawab Hendawi,

Vashutosh Agrawal,

Ravi PrakashAuthors Info & Claims

SSDBM '23: Proceedings of the 35th International Conference on Scientific and Statistical Database Management

Article No.: 12, Pages 1 - 9

https://doi.org/10.1145/3603719.3603722

Published: 27 August 2023 Publication History

All formats PDF

Abstract

Maps provide various sources of information. An important example of such information is textual labels such as cities, neighborhoods, and street names. Although we treat this information as facts, and despite the massive effort done by providers to continuously improve their accuracy, this data is far from perfect. Discrepancies in textual labels rendered on the map are one of the major sources of inconsistencies across map providers. These discrepancies can have significant impacts on the reliability of the derived information and decision-making processes. Thus, it is important to validate the accuracy and consistency in such data. Most providers treat this data as their propriety data and it is not available to the public, thus we cannot compare the data directly. To address these challenges, we introduce a novel computer vision-based approach for automatically extracting and classifying labels based on the visual characteristics of the label, which indicates its category based on the format convention used by the specific map provider. Based on the extracted data, we detect the degree of discrepancies across map providers. We consider three map providers: Bing Maps, Google Maps, and OpenStreetMaps. The neural network we develop classifies the text labels with an accuracy up to 93% in all providers. We leverage our system to analyze randomly selected regions in different markets. The studied markets are USA, Germany, France, and Brazil. Experimental results and statistical analysis reveal the amount of discrepancies across map providers per region. We calculate the Jaccard distance between the extracted text sets for each pair of map providers, which represents the discrepancy percentage. Discrepancies percentages as high as 90% were found in some markets.

References

[1]

Azure. 2022. Azure Cognitive Services. https://azure.microsoft.com/en-us/products/cognitive-services/

[2]

Ayush Bandil, Vaishali Girdhar, Hieu Chau, Mohamed Ali, Abdeltawab Hendawi, Harsh Govind, Peiwei Cao, and Ashley Song. 2021. GeoDart: A System for Discovering Maps Discrepancies. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 2535–2546. https://doi.org/10.1109/ICDE51399.2021.00285

[3]

Ayush Bandil, Vaishali Girdhar, Kivanc Dincer, Harsh Govind, Peiwei Cao, Ashley Song, and Mohamed Ali. 2020. An interactive system to compare, explore and identify discrepancies across map providers. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems. 425–428.

Digital Library

[4]

Phagasinee Boottho and Sally E. Goldin. 2017. Automated evaluation of online mapping platforms. In 2017 International Electrical Engineering Congress (iEECON). 1–4. https://doi.org/10.1109/IEECON.2017.8075809

[5]

Maria Antonia Brovelli, Marco Minghini, Monia Molinari, and Peter Mooney. 2017. Towards an automated comparison of OpenStreetMap with authoritative road datasets. Transactions in GIS 21, 2 (2017), 191–206.

[6]

Yao-Yi Chiang and Craig A Knoblock. 2010. An Approach for Recognizing Text Labels in Raster Maps. In 2010 20th International Conference on Pattern Recognition. 3199–3202. https://doi.org/10.1109/ICPR.2010.783

Digital Library

[7]

CVAT.ai Corporation. 2022. Computer Vision Annotation Tool (CVAT). https://github.com/opencv/cvat

[8]

Marc Pierrot Deseilligny, Hervé Le Men, and Georges Stamon. 1995. Character string recognition on maps, a rotation-invariant recognition method. Pattern Recognition Letters 16, 12 (1995), 1297–1310. https://doi.org/10.1016/0167-8655(95)00084-5

Digital Library

[9]

Marco Helbich, Chritoph Amelunxen, Pascal Neis, and Alexander Zipf. 2012. Comparative spatial analysis of positional accuracy of OpenStreetMap and proprietary geodata. Proceedings of GI_Forum 4 (2012), 24.

[10]

Steven P Jackson, William Mullen, Peggy Agouris, Andrew Crooks, Arie Croitoru, and Anthony Stefanidis. 2013. Assessing completeness and spatial error of features in volunteered geographic information. ISPRS International Journal of Geo-Information 2, 2 (2013), 507–530.

[11]

Musfira Jilani, Padraig Corcoran, and Michela Bertolotto. 2013. Automated quality improvement of road network in OpenStreetMap. In Agile Workshop (action and interaction in volunteered geographic information). 19.

[12]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.

[13]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.

[14]

Bing Maps. 2022. Bing Maps Tile System. https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system?redirectedfrom=MSDN

[15]

Ashish Ranjan, Varun Nagesh Jolly Behera, and Motahar Reza. 2021. OCR Using Computer Vision and Machine Learning. Springer International Publishing, Cham, 83–105. https://doi.org/10.1007/978-3-030-50641-4_6

[16]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). Vol. 28. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf

[17]

I. Schlegel. 2021. Automated Extraction of Labels from Large-Scale Historical Maps. AGILE: GIScience Series 2 (2021), 12. https://doi.org/10.5194/agile-giss-2-12-2021

[18]

Fares Tabet, Sikha Pentyala, Birva H. Patel, Abdeltawab Hendawi, Peiwei Cao, Ashley Song, Harsh Govind, and Mohamed Ali. 2021. OSMRunner : A System for Exploring and Fixing OSM Connectivity. In 2021 22nd IEEE International Conference on Mobile Data Management (MDM). 193–200. https://doi.org/10.1109/MDM52706.2021.00039

[19]

Wikipedia. 2023. Jaccard index. https://en.wikipedia.org/wiki/Jaccard_index

[20]

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.

Index Terms

A Computer Vision Approach for Detecting Discrepancies in Map Textual Labels
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Generalizing self-organizing map for categorical data

The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Self-organizing maps have been successfully applied to many ...
Snap-drift self organising map
ICANN'10: Proceedings of the 20th international conference on Artificial neural networks: Part II

A novel self-organising map (SOM) algorithm based on the snapdrift neural network (SDSOM) is proposed. The modal learning algorithm deploys a combination of the snap-drift modes; fuzzy AND (or Min) learning (snap), and Learning Vector Quantisation (...
Confidence-based Weighted Loss for Multi-label Classification with Missing Labels
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

The problem of multi-label classification with missing labels (MLML) is a common challenge that is prevalent in several domains, e.g. image annotation and auto-tagging. In multi-label classification, each instance may belong to multiple class labels ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SSDBM '23: Proceedings of the 35th International Conference on Scientific and Statistical Database Management

July 2023

232 pages

ISBN:9798400707469

DOI:10.1145/3603719

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SSDBM 2023

SSDBM 2023: 35th International Conference on Scientific and Statistical Database Management

July 10 - 12, 2023

CA, Los Angeles, USA

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
251
Total Downloads

Downloads (Last 12 months)231
Downloads (Last 6 weeks)47

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents