Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512026acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Geospatial Entity Resolution

Published: 25 April 2022 Publication History

Abstract

A geospatial database is today at the core of an ever increasing number of services. Building and maintaining it remains challenging due to the need to merge information from multiple providers. Entity Resolution (ER) consists of finding entity mentions from different sources that refer to the same real world entity. In geospatial ER, entities are often represented using different schemes and are subject to incomplete information and inaccurate location, making ER and deduplication daunting tasks. While tremendous advances have been made in traditional entity resolution and natural language processing, geospatial data integration approaches still heavily rely on static similarity measures and human-designed rules. In order to achieve automatic linking of geospatial data, a unified representation of entities with heterogeneous attributes and their geographical context, is needed. To this end, we propose Geo-ER1, a joint framework that combines Transformer-based language models, that have been successfully applied in ER, with a novel learning-based architecture to represent the geospatial character of the entity. Different from existing solutions, Geo-ER does not rely on pre-defined rules and is able to capture information from surrounding entities in order to make context-based, accurate predictions. Extensive experiments on eight real world datasets demonstrate the effectiveness of our solution over state-of-the-art methods. Moreover, Geo-ER proves to be robust in settings where there is no available training data for a specific city.

References

[1]
Rifaat Abdalla. 2016. Geospatial Data Integration. 105–124. https://doi.org/10.1007/978-3-319-33603-9_6
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv 1409 (09 2014).
[3]
Sandrine Balley, Christine Parent, and Stefano Spaccapietra. 2004. Modelling geographic data with multiple representations. International Journal of Geographical Information Science 18 (06 2004), 327–352. https://doi.org/10.1080/13658810410001672881
[4]
Nils Barlaug and Jon Atle Gulla. 2020. Neural Networks for Entity Matching. CoRR abs/2010.11075(2020). arXiv:2010.11075https://arxiv.org/abs/2010.11075
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR abs/1607.04606(2016). arXiv:1607.04606http://arxiv.org/abs/1607.04606
[6]
Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2019. End-to-End Entity Resolution for Big Data: A Survey. CoRR abs/1905.06397(2019). arXiv:1905.06397http://arxiv.org/abs/1905.06397
[7]
Nilesh Dalvi, Vibhor Rastogi, Anirban Dasgupta, Anish Das Sarma, and Tamas Sarlos. 2013. Optimal Hashing Schemes for Entity Matching. In 22nd International World Wide Web Conference, WWW ’13. Rio de Janeiro, Brazil, 295–306. http://dl.acm.org/citation.cfm?id=2488415
[8]
Hongzhong Deng, Luo Yun, Yi Liu, and Wang Pu. 2019. Point of Interest Matching between Different Geospatial Datasets. ISPRS International Journal of Geo-Information 8 (10 2019), 435. https://doi.org/10.3390/ijgi8100435
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805(2018). arXiv:1810.04805http://arxiv.org/abs/1810.04805
[10]
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2017. DeepER - Deep Entity Resolution. CoRR abs/1710.00597(2017). arXiv:1710.00597http://arxiv.org/abs/1710.00597
[11]
Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, and Si Yin. 2014. NADEEF/ER: Generic and Interactive Entity Resolution. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA, 1071–1074. https://doi.org/10.1145/2588555.2594511
[12]
Ahmed Elmagarmid, Panos Ipeirotis, and Vassilios Verykios. 2007. Duplicate Record Detection: A Survey. Knowledge and Data Engineering, IEEE Transactions on 19 (02 2007), 1 – 16. https://doi.org/10.1109/TKDE.2007.250581
[13]
Donatella Firmani, Barna Saha, and Divesh Srivastava. 2016. Online Entity Resolution Using an Oracle. Proc. VLDB Endow. 9, 5 (Jan. 2016), 384–395. https://doi.org/10.14778/2876473.2876474
[14]
Cheng Fu, Xianpei Han, Jiaming He, and Le Sun. 2020. Hierarchical Matching Network for Heterogeneous Entity Resolution. In IJCAI. 3665–3671.
[15]
Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-End Multi-Perspective Matching for Entity Resolution. 4961–4967. https://doi.org/10.24963/ijcai.2019/689
[16]
Chaitanya S. Gokhale, Sanjib Das, AnHai Doan, Jeffrey F. Naughton, Narasimhan Rampalli, Jude W. Shavlik, and Xiaojin Zhu. 2014. Corleone: hands-off crowdsourcing for entity matching. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (2014).
[17]
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization. CoRR abs/2003.11080(2020). arXiv:2003.11080https://arxiv.org/abs/2003.11080
[18]
Suela Isaj, Torben Bach Pedersen, and Esteban Zimányi. 2019. Multi-Source Spatial Entity Linkage. CoRR abs/1911.09016(2019). arXiv:1911.09016http://arxiv.org/abs/1911.09016
[19]
Roula Karam, Franck Favetta, Rima Kilany, and Robert Laurini. 2010. Integration of Similar Location Based Services Proposed by Several Providers. Communications in Computer and Information Science 88, 136–144. https://doi.org/10.1007/978-3-642-14306-9_14
[20]
Bing Li, Yukai Miao, Yaoshu Wang, Yifang Sun, and Wei Wang. 2021. Improving the Efficiency and Effectiveness for BERT-based Entity Resolution. In The 35th AAAI Conference on Artificial Intelligence (AAAI 2021).
[21]
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment 14, 1 (Sep 2020), 50–60. https://doi.org/10.14778/3421424.3421431
[22]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692(2019). arXiv:1907.11692http://arxiv.org/abs/1907.11692
[23]
George Miller, R. Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. 1991. Introduction to WordNet: An On-line Lexical Database*. 3 (01 1991). https://doi.org/10.1093/ijl/3.4.235
[24]
Anthony Morana, Thomas Morel, Bilal Berjawi, and Fabien Duchateau. 2014. GeoBench: a Geospatial Integration Tool for Building a Spatial Entity Matching Benchmark (Demo. In International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL’2014). Dallas, Texas, United States, 533–536. https://hal.archives-ouvertes.fr/hal-01301125
[25]
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 19–34. https://doi.org/10.1145/3183713.3196926
[26]
Hao Nie, Xianpei Han, Ben He, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution. In CIKM. 629–638.
[27]
Ralph Peeters and Christian Bizer. 2021. Dual-Objective Fine-Tuning of BERT for Entity Matching. Proc. VLDB Endow. 14(2021), 1913–1921.
[28]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
[29]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.
[30]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.
[31]
Michael Schäfers and Udo W. Lipeck. 2014. SimMatching: Adaptable Road Network Matching for Efficient and Scalable Spatial Data Integration. In Proceedings of the 1st ACM SIGSPATIAL PhD Workshop (Dallas/Fort Worth, Texas) (SIGSPATIAL PhD ’14). Association for Computing Machinery, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/2694859.2694866
[32]
Vivek R. Shivaprabhu, Booma Sowkarthiga Balasubramani, and Isabel F. Cruz. 2017. Ontology-Based Instance Matching for Geospatial Urban Data Integration. In Proceedings of the 3rd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics (Redondo Beach, CA, USA) (UrbanGIS’17). Association for Computing Machinery, New York, NY, USA, Article 8, 8 pages. https://doi.org/10.1145/3152178.3152186
[33]
Paulo Tabarro, Jacynthe Pouliot, Richard Fortier, and Louis-Martin Losier. 2017. A WEBGIS TO SUPPORT GPR 3D DATA ACQUISITION: A FIRST STEP FOR THE INTEGRATION OF UNDERGROUND UTILITY NETWORKS IN 3D CITY MODELS. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4/W7 (10 2017), 43–48. https://doi.org/10.5194/isprs-archives-XLII-4-W7-43-2017
[34]
Antonio Torralba, Kevin Murphy, W.T. Freeman, and Mark Rubin. 2003. Context-Based Vision System for Place and Object Recognition. Proceedings of the IEEE International Conference on Computer Vision 1, 273–280 vol.1. https://doi.org/10.1109/ICCV.2003.1238354
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762(2017). arXiv:1706.03762http://arxiv.org/abs/1706.03762
[36]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arxiv:1710.10903 [stat.ML]
[37]
Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. 2019. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. CoRR abs/1909.07940(2019). arXiv:1909.07940http://arxiv.org/abs/1909.07940
[38]
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. 2012. CrowdER: Crowdsourcing Entity Resolution. Proc. VLDB Endow. 5, 11 (July 2012), 1483–1494. https://doi.org/10.14778/2350229.2350263
[39]
Jiannan Wang, Guoliang Li, Jeffrey Xu Yu, and Jianhua Feng. 2011. Entity Matching: How Similar is Similar. Proc. VLDB Endow. 4, 10 (July 2011), 622–633. https://doi.org/10.14778/2021017.2021020
[40]
Ying Zhang, Puhai Yang, Chaopeng Li, Gengrui Zhang, Cheng Wang, Hui He, Xiang Hu, and Zhitao Guan. 2018. A Multi-Feature Based Automatic Approach to Geospatial Record Linking. International Journal on Semantic Web and Information Systems 14 (10 2018), 73–91. https://doi.org/10.4018/IJSWIS.2018100104
[41]
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, and Xiang Zhou. 2019. Semantics-aware BERT for Language Understanding. CoRR abs/1909.02209(2019). arXiv:1909.02209http://arxiv.org/abs/1909.02209

Cited By

View all
  • (2024)Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph DatasetsProceedings of the ACM Web Conference 202410.1145/3589334.3645720(2325-2336)Online publication date: 13-May-2024
  • (2024)MultiMatch: Low-Resource Generalized Entity Matching Using Task-Conditioned Hyperadapters in Multitask LearningBig Data Analytics and Knowledge Discovery10.1007/978-3-031-68323-7_4(51-65)Online publication date: 26-Aug-2024
  • (2023)AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuningProceedings of the 27th International Database Engineered Applications Symposium10.1145/3589462.3589498(140-147)Online publication date: 5-May-2023
  • Show More Cited By

Index Terms

  1. Geospatial Entity Resolution
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '22: Proceedings of the ACM Web Conference 2022
      April 2022
      3764 pages
      ISBN:9781450390965
      DOI:10.1145/3485447
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 April 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Entity resolution
      2. geospatial data
      3. graph attention
      4. neighbourhood embedding
      5. neural networks

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      WWW '22
      Sponsor:
      WWW '22: The ACM Web Conference 2022
      April 25 - 29, 2022
      Virtual Event, Lyon, France

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)143
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph DatasetsProceedings of the ACM Web Conference 202410.1145/3589334.3645720(2325-2336)Online publication date: 13-May-2024
      • (2024)MultiMatch: Low-Resource Generalized Entity Matching Using Task-Conditioned Hyperadapters in Multitask LearningBig Data Analytics and Knowledge Discovery10.1007/978-3-031-68323-7_4(51-65)Online publication date: 26-Aug-2024
      • (2023)AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuningProceedings of the 27th International Database Engineered Applications Symposium10.1145/3589462.3589498(140-147)Online publication date: 5-May-2023
      • (2023)Mining Geospatial Relationships from TextProceedings of the ACM on Management of Data10.1145/35889471:1(1-26)Online publication date: 30-May-2023
      • (2022)PromptEMProceedings of the VLDB Endowment10.14778/3565816.356583616:2(369-378)Online publication date: 1-Oct-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media