Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1142473.1142599acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Record linkage: similarity measures and algorithms

Published: 27 June 2006 Publication History
  • Get Citation Alerts
  • Abstract

    This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.

    References

    [1]
    C. Batini, T. Catarci, and M. Scannapieco. A survey of data quality issues in cooperative information systems. Pre-conference ER tutorial, 2004.
    [2]
    T. Johnson and P. Dasu. Data quality and data cleaning: An overview. SIGMOD tutorial, 2003.
    [3]
    N. Koudas and D. Srivastava. Approximate joins: concepts and techniques. VLDB tutorial, 2005.

    Cited By

    View all
    • (2023)Authenticating q-Gram-Based Similarity Search Results for Outsourced String DatabasesMathematics10.3390/math1109212811:9(2128)Online publication date: 1-May-2023
    • (2023)Análisis de calidad de los datos en las estadísticas públicas y privadas, ante la implementación del Big DataCiencias Administrativas10.24215/23143738e119(119)Online publication date: 14-Mar-2023
    • (2023)Entity alignment via graph neural networks: a component-level studyWorld Wide Web10.1007/s11280-023-01221-826:6(4069-4092)Online publication date: 29-Nov-2023
    • Show More Cited By

    Index Terms

    1. Record linkage: similarity measures and algorithms

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
      June 2006
      830 pages
      ISBN:1595934340
      DOI:10.1145/1142473
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 June 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. approximate join
      2. data quality

      Qualifiers

      • Article

      Conference

      SIGMOD/PODS06
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)46
      • Downloads (Last 6 weeks)2

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Authenticating q-Gram-Based Similarity Search Results for Outsourced String DatabasesMathematics10.3390/math1109212811:9(2128)Online publication date: 1-May-2023
      • (2023)Análisis de calidad de los datos en las estadísticas públicas y privadas, ante la implementación del Big DataCiencias Administrativas10.24215/23143738e119(119)Online publication date: 14-Mar-2023
      • (2023)Entity alignment via graph neural networks: a component-level studyWorld Wide Web10.1007/s11280-023-01221-826:6(4069-4092)Online publication date: 29-Nov-2023
      • (2023)Introduction to Entity AlignmentEntity Alignment10.1007/978-981-99-4250-3_1(3-13)Online publication date: 26-Oct-2023
      • (2022)Entity Resolution Algorithm Based on Locality Sensitive Hash and Fuzzy JoinHans Journal of Data Mining10.12677/HJDM.2022.12302812:03(280-296)Online publication date: 2022
      • (2022)Saga: A Platform for Continuous Construction and Serving of Knowledge at ScaleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526049(2259-2272)Online publication date: 10-Jun-2022
      • (2022)Machine Learning and Data Cleaning: Which Serves the Other?Journal of Data and Information Quality10.1145/350671214:3(1-11)Online publication date: 21-Jul-2022
      • (2022)Diversified Subgraph Query Generation with Group FairnessProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498525(686-694)Online publication date: 11-Feb-2022
      • (2022)Subgraph Query Generation with Fairness and Diversity Constraints2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00278(3106-3118)Online publication date: May-2022
      • (2022)Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pairComputer Speech and Language10.1016/j.csl.2022.10138175:COnline publication date: 1-Sep-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media