Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

A Declarative Framework for Linking Entities

Published: 18 July 2016 Publication History

Abstract

We introduce and develop a declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on a systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons two entities should be linked. We also consider extensions of the core language that capture collective entity resolution by allowing interdependencies among the link relations.
We identify a class of “good” solutions for entity-linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the “good” solutions and the problem of finding the certain links, which are the links that appear in every “good” solution. We show that these problems are tractable for the core language but may become intractable once we allow interdependencies among the link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.

References

[1]
N. Alur, A. K. Jha, B. Rosen, and T. Skov. 2008. IBM WebSphere QualityStage Methodologies, Standardization, and Matching. Redbooks. http://www.redbooks.ibm.com/redbooks/pdfs/sg247546.pdf.
[2]
B. Alexe, D. Burdick, M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, I. R. Stanoi, and R. Wisnesky. 2013. High-level rules for integration and analysis of data: New challenges. In LNCS 8000: In Search of Elegance in the Theory and Practice of Computation. 36--55.
[3]
A. Arasu, C. Re, and D. Suciu. 2009. Large-scale deduplication with constraints using dedupalog. In ICDE. 952--963.
[4]
M. Arenas, P. Barceló, R. Fagin, and L. Libkin. 2013. Solutions and query rewriting in data exchange. Inf. Comp. 228 (2013), 28--51.
[5]
M. Arenas, L. E. Bertossi, and J. Chomicki. 1999. Consistent query answers in inconsistent databases. In PODS. 68--79.
[6]
L. E. Bertossi, S. Kolahi, and L. V. S. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory Comput. Syst. 52, 3 (2013), 441--482.
[7]
I. Bhattacharya and L. Getoor. 2007. Collective entity resolution in relational data. TKDD 1, 1 (2007).
[8]
D. Burdick, R. Fagin, Ph. G. Kolaitis, L. Popa, and W.-C. Tan. 2015. A declarative framework for linking entities. In 18th International Conference on Database Theory (ICDT’15). 25--43.
[9]
D. Burdick, M. A. Hernández, H. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. R. Stanoi, S. Vaithyanathan, and S. Das. 2011. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng. Bull. 34, 3 (2011), 60--67.
[10]
C. R. Chegireddy and H. W. Hamacher. 1987. Algorithms for finding k-best perfect matchings. Discrete Appl. Math. 18, 2 (1987), 155--165.
[11]
J. Chomicki and J. Marcinkowski. 2005. Minimal-change integrity maintenance using tuple deletions. Inf. Comp. 197 (2005), 90--121.
[12]
P. Christen. 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1537--1555.
[13]
X. Dong, A. Y. Halevy, and J. Madhavan. 2005. Reference reconciliation in complex information spaces. In SIGMOD. 85--96.
[14]
A. Droschinsky, B. Heinemann, N. Kriege, and P. Mutzel. 2014. Enumeration of maximum common subtree isomorphisms with polynomial-delay. In Proceedings of Algorithms and Computation - 25th International Symposium, (ISAAC’14). 81--93.
[15]
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE TKDE 19, 1 (2007), 1--16.
[16]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. (TCS) 336, 1 (2005), 89--124.
[17]
W. Fan. 2008. Dependencies revisited for improving data quality. In PODS. 159--170.
[18]
W. Fan and F. Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers.
[19]
I. P. Fellegi and A. B. Sunter. 1969. A theory for record linkage. J. Am. Statistical Assoc. 64, 328 (1969), 1183--1210.
[20]
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. 2001. Declarative data cleaning: Language, model, and algorithms. In VLDB. 371--380.
[21]
V. Ganti and A. Das Sarma. 2013. Data Cleaning: A Practical Perspective. Morgan & Claypool Publishers.
[22]
L. Getoor and A. Machanavajjhala. 2012. Entity resolution: Theory, practice & open challenges. PVLDB 5, 12 (2012), 2018--2019.
[23]
O. Hassanzadeh, A. Kementsietsidis, L. Lim, R. J. Miller, and M. Wang. 2009. A framework for semantic link discovery over relational data. In CIKM. 1027--1036.
[24]
M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, and R. Wisnesky. 2013. HIL: A high-level scripting language for entity integration. In EDBT. 549--560.
[25]
M. A. Hernández and S. J. Stolfo. 1995. The merge/purge problem for large databases. In SIGMOD. 127--138.
[26]
A. Itai, M. Rodeh, and S. L. Tanimoto. 1978. Some matching problems for bipartite graphs. J. ACM 25, 4 (1978), 517--525.
[27]
D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. 1988. On generating all maximal independent sets. Inf. Process. Lett. 27, 3 (1988), 119--123.
[28]
P. Jonsson and A. A. Krokhin. 2004. Recognizing frozen variables in constraint Satisfaction Problems. Theor. Comput. Sci. (TCS) 329, 1--3 (2004), 93--113.
[29]
N. Koudas, S. Sarawagi, and D. Srivastava. 2006. Record linkage: Similarity measures and algorithms. In SIGMOD. 802--803.
[30]
K. G. Murty. 1968. An algorithm for ranking all the assignments in order of increasing cost. Oper. Res. 16, 3 (1968), 682--687.
[31]
C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.
[32]
M. Richardson and P. Domingos. 2006. Markov logic networks. Machine Learn. 62, 1--2 (2006), 107--136.
[33]
P. Singla and P. Domingos. 2006. Entity resolution with Markov logic. In ICDM. 572--582.

Cited By

View all
  • (2023)Combining global and local merges in logic-based entity resolutionProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/74(742-746)Online publication date: 2-Sep-2023
  • (2023)A framework for combining entity resolution and query answering in knowledge basesProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/23(229-239)Online publication date: 2-Sep-2023
  • (2023)REPLACEProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/349(3132-3139)Online publication date: 19-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 41, Issue 3
August 2016
247 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2966276
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2016
Accepted: 01 February 2016
Revised: 01 December 2015
Received: 01 August 2015
Published in TODS Volume 41, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Entity linking
  2. Markov logic networks
  3. certain links
  4. constraints
  5. maximum-probability worlds
  6. maximum-value solutions

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)13
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Combining global and local merges in logic-based entity resolutionProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/74(742-746)Online publication date: 2-Sep-2023
  • (2023)A framework for combining entity resolution and query answering in knowledge basesProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/23(229-239)Online publication date: 2-Sep-2023
  • (2023)REPLACEProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/349(3132-3139)Online publication date: 19-Aug-2023
  • (2023)Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-VProceedings of the VLDB Endowment10.14778/3583140.358316916:6(1587-1600)Online publication date: 20-Apr-2023
  • (2022)LACE: A Logical Approach to Collective Entity ResolutionProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526233(379-391)Online publication date: 12-Jun-2022
  • (2022)Saga: A Platform for Continuous Construction and Serving of Knowledge at ScaleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526049(2259-2272)Online publication date: 10-Jun-2022
  • (2021)Real-Time Entity Resolution by Forest-Based Indexing in Database Systems with Vertical FragmentationsProceedings of the 5th International Conference on Computer Science and Application Engineering10.1145/3487075.3487142(1-5)Online publication date: 19-Oct-2021
  • (2021)Deep Entity MatchingJournal of Data and Information Quality10.1145/343181613:1(1-17)Online publication date: 6-Jan-2021
  • (2020)Learning Over Dirty Data Without CleaningProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389708(1301-1316)Online publication date: 11-Jun-2020
  • (2020)A collective entity linking algorithm with parallel computing on large-scale knowledge baseThe Journal of Supercomputing10.1007/s11227-019-03046-776:2(948-963)Online publication date: 1-Feb-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media