Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1142473.1142599acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Record linkage: similarity measures and algorithms

Published: 27 June 2006 Publication History

Abstract

This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.

References

[1]
C. Batini, T. Catarci, and M. Scannapieco. A survey of data quality issues in cooperative information systems. Pre-conference ER tutorial, 2004.
[2]
T. Johnson and P. Dasu. Data quality and data cleaning: An overview. SIGMOD tutorial, 2003.
[3]
N. Koudas and D. Srivastava. Approximate joins: concepts and techniques. VLDB tutorial, 2005.

Cited By

View all
  • (2023)Authenticating q-Gram-Based Similarity Search Results for Outsourced String DatabasesMathematics10.3390/math1109212811:9(2128)Online publication date: 1-May-2023
  • (2023)Análisis de calidad de los datos en las estadísticas públicas y privadas, ante la implementación del Big DataCiencias Administrativas10.24215/23143738e119(119)Online publication date: 14-Mar-2023
  • (2023)Entity alignment via graph neural networks: a component-level studyWorld Wide Web10.1007/s11280-023-01221-826:6(4069-4092)Online publication date: 29-Nov-2023
  • Show More Cited By

Index Terms

  1. Record linkage: similarity measures and algorithms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
    June 2006
    830 pages
    ISBN:1595934340
    DOI:10.1145/1142473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximate join
    2. data quality

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Authenticating q-Gram-Based Similarity Search Results for Outsourced String DatabasesMathematics10.3390/math1109212811:9(2128)Online publication date: 1-May-2023
    • (2023)Análisis de calidad de los datos en las estadísticas públicas y privadas, ante la implementación del Big DataCiencias Administrativas10.24215/23143738e119(119)Online publication date: 14-Mar-2023
    • (2023)Entity alignment via graph neural networks: a component-level studyWorld Wide Web10.1007/s11280-023-01221-826:6(4069-4092)Online publication date: 29-Nov-2023
    • (2023)Introduction to Entity AlignmentEntity Alignment10.1007/978-981-99-4250-3_1(3-13)Online publication date: 26-Oct-2023
    • (2022)Entity Resolution Algorithm Based on Locality Sensitive Hash and Fuzzy JoinHans Journal of Data Mining10.12677/HJDM.2022.12302812:03(280-296)Online publication date: 2022
    • (2022)Saga: A Platform for Continuous Construction and Serving of Knowledge at ScaleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526049(2259-2272)Online publication date: 10-Jun-2022
    • (2022)Machine Learning and Data Cleaning: Which Serves the Other?Journal of Data and Information Quality10.1145/350671214:3(1-11)Online publication date: 21-Jul-2022
    • (2022)Diversified Subgraph Query Generation with Group FairnessProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498525(686-694)Online publication date: 11-Feb-2022
    • (2022)Subgraph Query Generation with Fairness and Diversity Constraints2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00278(3106-3118)Online publication date: May-2022
    • (2022)Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pairComputer Speech and Language10.1016/j.csl.2022.10138175:COnline publication date: 1-Sep-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media