Online entity resolution using an oracle

D Firmani, B Saha, D Srivastava - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
Proceedings of the VLDB Endowment, 2016dl.acm.org
Entity resolution (ER) is the task of identifying all records in a database that refer to the same
underlying entity. This is an expensive task, and can take a significant amount of money and
time; the end-user may want to take decisions during the process, rather than waiting for the
task to be completed. We formalize an online version of the entity resolution task, and use an
oracle which correctly labels matching and non-matching pairs through queries. In this
setting, we design algorithms that seek to maximize progressive recall, and develop a novel …
Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity. This is an expensive task, and can take a significant amount of money and time; the end-user may want to take decisions during the process, rather than waiting for the task to be completed. We formalize an online version of the entity resolution task, and use an oracle which correctly labels matching and non-matching pairs through queries. In this setting, we design algorithms that seek to maximize progressive recall, and develop a novel analysis framework for prior proposals on entity resolution with an oracle, beyond their worst case guarantees. Finally, we provide both theoretical and experimental analysis of the proposed algorithms.
ACM Digital Library