Deep Entity Matching: Challenges and Opportunities

Published: 06 January 2021


Entity matching refers to the task of determining whether two different representations refer to the same real-world entity. It continues to be a prevalent problem for many organizations where data resides in different sources and duplicates the need to be identified and managed. The term “entity matching” also loosely refers to the broader problem of determining whether two heterogeneous representations of different entities should be associated together. This problem has an even wider scope of applications, from determining the subsidiaries of companies to matching jobs to job seekers, which has impactful consequences.
In this article, we first report our recent system DITTO, which is an example of a modern entity matching system based on pretrained language models. Then we summarize recent solutions in applying deep learning and pre-trained language models for solving the entity matching task. Finally, we discuss research directions beyond entity matching, including the promise of synergistically integrating blocking and entity matching steps together, the need to examine methods to alleviate steep training data requirements that are typical of deep learning or pre-trained language models, and the importance of generalizing entity matching solutions to handle the broader entity matching problem, which leads to an even more pressing need to explain matching outcomes.


Cited By

View all
  • (2024)Entity Matching by Pool-Based Active LearningElectronics10.3390/electronics1303055913:3(559)Online publication date: 30-Jan-2024
  • (2024)Leveraging Pretrained Language Models for Enhanced Entity MatchingInternational Journal of Intelligent Systems10.1155/2024/19412212024Online publication date: 1-Jan-2024
  • (2024)Threshold-Independent Fair Matching through Score CalibrationProceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669845(40-44)Online publication date: 9-Jun-2024
  • Show More Cited By



March 2021
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 January 2021
Accepted: 01 October 2020
Revised: 01 October 2020
Received: 01 October 2020
Published in JDIQ Volume 13, Issue 1


Author Tags

  1. Entity matching
  2. data integration
  3. deep learning
  4. entity resolution
  5. pre-trained language models


