research-article

Deep learning for blocking in entity matching: a design space exploration

Authors:

Saravanan Thirumuruganathan,

Mourad Ouzzani,

AnHai DoanAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 14, Issue 11

Pages 2459 - 2472

https://doi.org/10.14778/3476249.3476294

Published: 01 July 2021 Publication History

Abstract

Entity matching (EM) finds data instances that refer to the same real-world entity. Most EM solutions perform blocking then matching. Many works have applied deep learning (DL) to matching, but far fewer works have applied DL to blocking. These blocking works are also limited in that they consider only a simple form of DL and some of them require labeled training data. In this paper, we develop the DeepBlocker framework that significantly advances the state of the art in applying DL to blocking for EM. We first define a large space of DL solutions for blocking, which contains solutions of varying complexity and subsumes most previous works. Next, we develop eight representative solutions in this space. These solutions do not require labeled training data and exploit recent advances in DL (e.g., sequence modeling, transformer, self supervision). We empirically determine which solutions perform best on what kind of datasets (structured, textual, or dirty). We show that the best solutions (among the above eight) outperform the best existing DL solution and the best existing non-DL solutions (including a state-of-the-art industrial non-DL solution), on dirty and textual data, and are comparable on structured data. Finally, we show that the combination of the best DL and non-DL solutions can perform even better, suggesting a new venue for research.

References

[1]

[n.d.]. Tech Report for DeepBlocker. https://www.dropbox.com/s/yirgfecdcyr6aep/DeepBlockerTechReport.pdf?dl=0.

[2]

Alexandr Andoni, Piotr Indyk, TMM Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems (NIPS 2015). 1225--1233.

Digital Library

[3]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.

[4]

Fabio Azzalini, Songle Jin, Marco Renzi, and Letizia Tanca. 2020. Blocking Techniques for Entity Linkage: A Semantics-Based Approach. Data Science and Engineering (2020), 1--19.

[5]

Nils Barlaug and Jon Atle Gulla. 2020. Neural networks for entity matching. arXiv preprint arXiv.2010.11075 (2020).

[6]

R Baxter, P Christen, and T Churches. [n.d.]. A Comparison of Fast Blocking Methods for Record Linkage; erschienen in: Proceedings of the Workshop on Data Cleaning, Record Linkage and Object Consolidation at the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Washington DC; 2003; o.

[7]

Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709 (2013).

[8]

Mikhail Bilenko, Beena Kamath, and Raymond J Mooney. 2006. Adaptive blocking: Learning to scale up record linkage. In Sixth International Conference on Data Mining (ICDM'06). IEEE, 87--96.

Digital Library

[9]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2017. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguistics 5 (2017), 135--146.

[10]

Rajesh Bordawekar and Oded Shmueli. 2017. Using word embedding to enable semantic queries in relational databases. In DEEM Workshop. ACM, 5.

Digital Library

[11]

Rajesh Bordawekar and Oded Shmueli. 2019. Exploiting Latent Information in Relational Databases via Word Embedding and Application to Degrees of Disclosure. In CIDR.

[12]

Andrew Borthwick, Stephen Ash, Bin Pang, Shehzad Qureshi, and Timothy Jones. 2020. Scalable Blocking for Very Large Databases. arXiv preprint arXiv:2008.08285 (2020).

[13]

Ursin Brunner and Kurt Stockinger. 2020. Entity matching with transformer architectures-a step forward in data integration. In EDBT.

[14]

Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating embeddings of heterogeneous relational datasets for data integration tasks. In SIGMOD. 1335--1349.

Digital Library

[15]

Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scale online learning of image similarity through ranking. (2010).

Digital Library

[16]

Peter Christen. 2011. A survey of indexing techniques for scalable record linkage and deduplication. IEEE transactions on knowledge and data engineering 24, 9 (2011), 1537--1555.

Digital Library

[17]

Peter Christen. 2012. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.

Digital Library

[18]

Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2020. An overview of end-to-end entity resolution for big data. ACM Computing Surveys (CSUR) 53, 6 (2020), 1--42.

Digital Library

[19]

Sanjib Das, Paul Suganthan G. C., AnHai Doan, Jeffrey F. Naughton, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra, and Youngchoon Park. 2017. Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services. In SIGMOD.

Digital Library

[20]

Anish Das Sarma, Ankur Jain, Ashwin Machanavajjhala, and Philip Bohannon. 2012. An automatic blocking mechanism for large-scale de-duplication tasks. In Proceedings of the 21st ACM international conference on Information and knowledge management. 1055--1064.

Digital Library

[21]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.

[22]

AnHai Doan, Alon Y. Halevy, and Zachary G. Ives. 2012. Principles of Data Integration. Morgan Kaufmann.

Digital Library

[23]

Carl Doersch, Abhinav Gupta, and Alexei A Efros. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE international conference on computer vision. 1422--1430.

Digital Library

[24]

Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment 2, 1 (2009), 562--573.

Digital Library

[25]

Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed representations of tuples for entity resolution. PVLDB 11, 11 (2018), 1454--1467.

Digital Library

[26]

Vasilis Efthymiou, George Papadakis, George Papastefanatos, Kostas Stefanidis, and Themis Palpanas. 2015. Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 411--420.

Digital Library

[27]

Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. 2007. Duplicate Record Detection: A Survey. TKDE 19, 1 (2007).

Digital Library

[28]

Sainyam Galhotra, Donatella Firmani, Barna Saha, and Divesh Srivastava. 2021. BEER: Blocking for Effective Entity Resolution. In Proceedings of the 2021 International Conference on Management of Data. 2711--2715.

Digital Library

[29]

Sainyam Galhotra, Donatella Firmani, Barna Saha, and Divesh Srivastava. 2021. Efficient and effective er with progressive blocking. The VLDB Journal (2021), 1--21.

[30]

Chaitanya Gokhale, Sanjib Das, AnHai Doan, Jeffrey F. Naughton, Narasimhan Rampalli, Jude W. Shavlik, and Xiaojin Zhu. 2014. Corleone: hands-off crowd-sourcing for entity matching. In SIGMOD.

Digital Library

[31]

Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville. 2016. Deep Learning. MIT Press.

Digital Library

[32]

Yash Govind, Erik Paulson, Palaniappan Nagarajan, Paul Suganthan G. C., AnHai Doan, Youngchoon Park, Glenn Fung, Devin Conathan, Marshall Carter, and Mingju Sun. 2018. CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching. Proc. VLDB Endow. 11, 12 (2018), 2042--2045.

Digital Library

[33]

Michael Günther. 2018. FREDDY: Fast Word Embeddings in Database Systems. In SIGMOD. ACM, 1817--1819.

Digital Library

[34]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[35]

Delaram Javdani, Hossein Rahmani, Milad Allahgholi, and Fatemeh Karimkhani. 2019. DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning. In ICWR. IEEE, 41--44.

[36]

Longlong Jing and Yingli Tian. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[37]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data (2019).

[38]

Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, and Lucian Popa. 2019. Low-resource Deep Entity Resolution with Transfer and Active Learning. In ACL. 5851--5861.

[39]

Yoon Kim, Yacine Jernite, David A. Sontag, and Alexander M. Rush. 2015. Character-Aware Neural Language Models. CoRR abs/1508.06615 (2015).

Digital Library

[40]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[41]

Diederik P. Kingma and Max Welling. 2019. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 12, 4 (2019), 307--392.

Digital Library

[42]

Lars Kolb, Andreas Thor, and Erhard Rahm. 2011. Parallel sorted neighborhood blocking with MapeReduce. Datenbanksysteme für Business, Technologie und Web (BTW) (2011).

[43]

Pradap Konda, Sanjib Das, Paul Suganthan GC, AnHai Doan, Adel Ardalan, Jeffrey R Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, et al. 2016. Magellan: Toward building entity matching management systems. PVLDB 9, 13 (2016), 1581--1584.

Digital Library

[44]

Christos Koutras, Marios Fragkoulis, Asterios Katsifodimos, and Christoph Lofi. 2020. REMA: Graph Embeddings-based Relational Schema Matching. SEA Data workshop (2020).

[45]

Brian Kulis et al. 2012. Metric learning: A survey. Foundations and trends in machine learning 5, 4 (2012), 287--364.

[46]

Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. 2014. Mining of Massive Datasets, 2nd Ed. Cambridge University Press.

Digital Library

[47]

Han Li, Pradap Konda, Paul Suganthan GC, AnHai Doan, Benjamin Snyder, Youngchoon Park, Ganesh Krishnan, Rohit Deep, and Vijay Raghavendra. 2018. MatchCatcher: A Debugger for Blocking in Entity Matching. In EDBT. 193--204.

[48]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. PVLDB 14, 1 (2020), 50--60.

Digital Library

[49]

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, Jin Wang, Wataru Hirota, and Wang-Chiew Tan. 2021. Deep Entity Matching: Challenges and Opportunities. Journal of Data and Information Quality (JDIQ) 13, 1 (2021), 1--17.

Digital Library

[50]

Michael Loster, Ioannis Koumarelas, and Felix Naumann. 2021. Knowledge Transfer for Entity Resolution with Siamese Neural Networks. Journal of Data and Information Quality (JDIQ) 13, 1 (2021), 1--25.

Digital Library

[51]

Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In SIGKDD. 169--178.

Digital Library

[52]

Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. 2000. Automating the construction of internet portals with machine learning. Information Retrieval 3, 2 (2000), 127--163.

Digital Library

[53]

Matthew Michelson and Craig A. Knoblock. 2006. Learning Blocking Schemes for Record Linkage. In AAAI. 440--445.

Digital Library

[54]

Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NeruIPS. 3111--3119.

Digital Library

[55]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In SIGMOD.

Digital Library

[56]

Felix Naumann and Melanie Herschel. 2010. An introduction to duplicate detection. Synthesis Lectures on Data Management 2, 1 (2010), 1--87.

Digital Library

[57]

Hao Nie, Xianpei Han, Ben He, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In CIKM. 629--638.

Digital Library

[58]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).

[59]

Kevin O'Hare, Anna Jurek, and Cassio de Campos. 2018. A new technique of selecting an optimal blocking method for better record linkage. Information Systems 77 (2018), 151--166.

[60]

Kevin O'Hare, Anna Jurek-Loughrey, and Cassio de Campos. 2019. A review of unsupervised and semi-supervised blocking methods for record linkage. Linking and Mining Heterogeneous and Multi-view Data (2019), 79--105.

[61]

George Papadakis, Ekaterini Ioannou, Claudia Niederée, and Peter Fankhauser. 2011. Efficient entity resolution for large heterogeneous information spaces. In WSDM. 535--544.

Digital Library

[62]

George Papadakis, Ekaterini Ioannou, Emanouil Thanos, and Themis Palpanas. 2021. The Four Generations of Entity Resolution. Synthesis Lectures on Data Management 16, 2 (2021), 1--170.

[63]

George Papadakis, Georgia Koutrika, Themis Palpanas, and Wolfgang Nejdl. 2013. Meta-blocking: Taking entity resolutionto the next level. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2013), 1946--1960.

[64]

George Papadakis, George Mandilaras, Luca Gagliardelli, Giovanni Simonini, Emmanouil Thanos, George Giannakopoulos, Sonia Bergamaschi, Themis Palpanas, and Manolis Koubarakis. 2020. Three-dimensional Entity Resolution with JedAI. Information Systems 93 (2020), 101565.

[65]

George Papadakis, George Papastefanatos, Themis Palpanas, and Manolis Koubarakis. 2016. Scaling entity resolution to large, heterogeneous data with enhanced meta-blocking. In EDBT. 221--232.

[66]

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2020. Blocking and filtering techniques for entity resolution: A survey. ACM Computing Surveys (CSUR) 53, 2 (2020), 1--42.

Digital Library

[67]

George Papadakis, Leonidas Tsekouras, Emmanouil Thanos, Nikiforos Pittaras, Giovanni Simonini, Dimitrios Skoutas, Paul Isaris, George Giannakopoulos, Themis Palpanas, and Manolis Koubarakis. 2020. JedAI3: beyond batch, blocking-based Entity Resolution. In EDBT. 603--606.

[68]

Ralph Peeters, Christian Bizer, and Goran Glavaš. 2020. Intermediate training of BERT for product matching. 745, 722 (2020), 2--112.

[69]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP. ACL, 1532--1543.

[70]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Association for Computational Linguistics (2019).

[71]

Giovanni Simonini, Sonia Bergamaschi, and HV Jagadish. 2016. BLAST: a loosely schema-aware meta-blocking approach for entity resolution. Proceedings of the VLDB Endowment 9, 12 (2016), 1173--1184.

Digital Library

[72]

Giovanni Simonini, George Papadakis, Themis Palpanas, and Sonia Bergamaschi. 2019. Schema-Agnostic Progressive Entity Resolution. IEEE Trans. Knowl. Data Eng. 31,6 (2019),1208--1221.

[73]

Kostas Stefanidis, Vasilis Efthymiou, Melanie Herschel, and Vassilis Christophides. 2014. Entity resolution in the web of data. In WWW. 203--204.

Digital Library

[74]

Rebecca C Steorts, Samuel L Ventura, Mauricio Sadinle, and Stephen E Fienberg. 2014. A comparison of blocking methods for record linkage. In International conference on privacy in statistical databases. Springer, 253--268.

[75]

Saravanan Thirumuruganathan, Shameem A Puthiya Parambath, Mourad Ouzzani, Nan Tang, and Shafiq Joty. 2018. Reuse and adaptation for entity resolution through transfer learning. arXiv preprint arXiv:1809.11084 (2018).

[76]

Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, and AnHai Doan. 2020. Data curation with Deep Learning. EDBT (2020).

[77]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeurIPS. 5998--6008.

Digital Library

[78]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res. 11 (2010), 3371--3408.

Digital Library

[79]

Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, and Saravanan Thirumuruganathan. 2020. Zeroer: Entity resolution using zero labeled examples. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1149--1164.

Digital Library

[80]

Minghe Yu, Guoliang Li, Dong Deng, and Jianhua Feng. 2016. String similarity search and join: a survey. Frontiers of Computer Science 10, 3 (01 Jun 2016), 399--417.

Digital Library

[81]

Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, and Davd Page. 2020. AutoBlock: A hands-off blocking framework for entity matching. In WSDM. 744--752.

Digital Library

[82]

Chen Zhao and Yeye He. 2019. Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning. In WWW. 2413--2424.

Digital Library

Cited By

Zhu JMao YChen LGe CWei ZGao Y(2024)FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous DataProceedings of the VLDB Endowment10.14778/3648160.364817417:6(1337-1349)Online publication date: 3-May-2024
https://doi.org/10.14778/3648160.3648174
Fan JTu JLi GWang PDu XJia XGao STang N(2024)Unicorn: A Unified Multi-Tasking Matching ModelACM SIGMOD Record10.1145/3665252.366526353:1(44-53)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3665252.3665263
Backes TDietze S(2024)Connected Components for Scaling Partial-order Blocking to Billion EntitiesJournal of Data and Information Quality10.1145/364655316:1(1-29)Online publication date: 19-Mar-2024
https://dl.acm.org/doi/10.1145/3646553
Show More Cited By

Index Terms

Deep learning for blocking in entity matching: a design space exploration
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning approaches
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Entity Matching: Challenges and Opportunities
On the Horizon, On the Horizon and Experience Papers

Entity matching refers to the task of determining whether two different representations refer to the same real-world entity. It continues to be a prevalent problem for many organizations where data resides in different sources and duplicates the need to ...
Deep Learning for Entity Matching: A Design Space Exploration
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Entity matching (EM) finds data instances that refer to the same real-world entity. In this paper we examine applying deep learning (DL) to EM, to understand DL's benefits and limitations. We review many DL solutions that have been developed for related ...
Entity resolution with iterative blocking
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 14, Issue 11

July 2021

732 pages

ISSN:2150-8097

Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2021

Published in PVLDB Volume 14, Issue 11

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
457
Total Downloads

Downloads (Last 12 months)133
Downloads (Last 6 weeks)22

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Zhu JMao YChen LGe CWei ZGao Y(2024)FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous DataProceedings of the VLDB Endowment10.14778/3648160.364817417:6(1337-1349)Online publication date: 3-May-2024
https://doi.org/10.14778/3648160.3648174
Fan JTu JLi GWang PDu XJia XGao STang N(2024)Unicorn: A Unified Multi-Tasking Matching ModelACM SIGMOD Record10.1145/3665252.366526353:1(44-53)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3665252.3665263
Backes TDietze S(2024)Connected Components for Scaling Partial-order Blocking to Billion EntitiesJournal of Data and Information Quality10.1145/364655316:1(1-29)Online publication date: 19-Mar-2024
https://dl.acm.org/doi/10.1145/3646553
Brinkmann AShraga RBizer C(2024)SC-Block: Supervised Contrastive Blocking Within Entity Resolution PipelinesThe Semantic Web10.1007/978-3-031-60626-7_7(121-142)Online publication date: 26-May-2024
https://dl.acm.org/doi/10.1007/978-3-031-60626-7_7
Wu SWu QDong HHua WZhou X(2023)Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity ResolutionProceedings of the VLDB Endowment10.14778/3632093.363209617:3(292-304)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.14778/3632093.3632096
Zeakis APapadakis GSkoutas DKoubarakis M(2023)Pre-Trained Embeddings for Entity Resolution: An Experimental AnalysisProceedings of the VLDB Endowment10.14778/3598581.359859416:9(2225-2238)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.14778/3598581.3598594
Paulsen DGovind YDoan A(2023)Sparkly: A Simple yet Surprisingly Strong TF/IDF Blocker for Entity MatchingProceedings of the VLDB Endowment10.14778/3583140.358316316:6(1507-1519)Online publication date: 20-Apr-2023
https://dl.acm.org/doi/10.14778/3583140.3583163
Fan WHan ZRen WWang DWang YXie MYan M(2023)Splitting Tuples of Mismatched EntitiesProceedings of the ACM on Management of Data10.1145/36267631:4(1-29)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626763
Tu JFan JTang NWang PLi GDu XJia XGao S(2023)Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data IntegrationProceedings of the ACM on Management of Data10.1145/35889381:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588938
Fan WHan ZWang YXie M(2023)Discovering Top-k Rules using Subjective and Objective CriteriaProceedings of the ACM on Management of Data10.1145/35889241:1(1-29)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588924
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents