Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2063576.2063825acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Context-based entity description rule for entity resolution

Published: 24 October 2011 Publication History

Abstract

In this paper, we consider the entity resolution(ER) problem, which is to identify objects referring to the same real-world entity. Prior work of ER involves expensive similarity comparison and clustering approaches. Additionally, the quality of entity resolution may be low due to insufficient information. To address these problems, by adopting context information of data objects, we present a novel framework of entity resolution, context-based entity description (CED), to make context information help entity resolution. In our framework, each entity is described by a set of CEDs. During entity resolution, objects are only compared with CEDs to determine its corresponding entity. Additionally, we propose efficient algorithms for CED discovery and CED-based entity resolution. We experimentally evaluated our CED-based ER algorithm on the real DBLP datasets, and the experimental results show that our algorithm can achieve both high precision and recall as well as outperform existing methods.

References

[1]
N. Koudas, S. Sarawagi, and D. Srivastava: Record linkage: similarity measures and algorithms. SIGMOD 2006.
[2]
W. W. Cohen: Integration of heterogeneous databases without common domains using queries based on textual similarity. SIGMOD 1998.
[3]
S. Chaudhuri, B.-C. Chen, V. Ganti, and R. Kaushik: Example-driven design of efficient record matching queries. VLDB 2007.
[4]
A. Arasu, S. Chaudhuri, and R. Kaushik: Learning string transformations from examples. VLDB 2009.
[5]
Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, Bing Lv: On Graph-Based Name Disambiguation. J. Data and Information Quality 2(2): 10 (2011).
[6]
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: FOCS, p. 238(2002).
[7]
Chaudhuri,S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. SIGMOD 2003.
[8]
Chuan Xiao, Wei Wang, Xuemin Lin, Jeffrey Xu Yu: Efficient similarity joins for near duplicate detection. WWW 2008: 131--140.
[9]
Chuan Xiao, Wei Wang, Xuemin Lin, Haichuan Shang: Top-k Set Similarity Joins. ICDE 2009: 916--927.
[10]
Xiaoxin Yin, Jiawei Han, Philip S. Yu: Object Distinction: Distinguishing Objects with Identical Names by Link Analysis. ICDE 2007.
[11]
http://dblp.uni-trier.de/

Cited By

View all
  • (2021)GNEM: A Generic One-to-Set Neural Entity Matching FrameworkProceedings of the Web Conference 202110.1145/3442381.3450119(1686-1694)Online publication date: 19-Apr-2021
  • (2020)A node resistance-based probability model for resolving duplicate named entitiesScientometrics10.1007/s11192-020-03585-4Online publication date: 13-Jul-2020
  • (2018)Emergence of Big Data Research in Operations Management, Information Systems, and Healthcare: Past Contributions and Future RoadmapProduction and Operations Management10.1111/poms.1283327:9(1724-1735)Online publication date: 1-Sep-2018
  • Show More Cited By

Index Terms

  1. Context-based entity description rule for entity resolution

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
      October 2011
      2712 pages
      ISBN:9781450307178
      DOI:10.1145/2063576
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 October 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. contexted-based
      2. data cleaning
      3. entity resolution

      Qualifiers

      • Research-article

      Conference

      CIKM '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)GNEM: A Generic One-to-Set Neural Entity Matching FrameworkProceedings of the Web Conference 202110.1145/3442381.3450119(1686-1694)Online publication date: 19-Apr-2021
      • (2020)A node resistance-based probability model for resolving duplicate named entitiesScientometrics10.1007/s11192-020-03585-4Online publication date: 13-Jul-2020
      • (2018)Emergence of Big Data Research in Operations Management, Information Systems, and Healthcare: Past Contributions and Future RoadmapProduction and Operations Management10.1111/poms.1283327:9(1724-1735)Online publication date: 1-Sep-2018
      • (2018)An effective weighted rule-based method for entity resolutionDistributed and Parallel Databases10.1007/s10619-018-7240-636:3(593-612)Online publication date: 1-Sep-2018
      • (2017)Entity reconciliation in big data sourcesExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.03.01080:C(14-27)Online publication date: 1-Sep-2017
      • (2015)Rule-Based Method for Entity ResolutionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.232071327:1(250-263)Online publication date: 1-Jan-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media