Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-62362-2_46guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

The Five Generations of Entity Resolution on Web Data

Published: 17 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Entity Resolution constitutes a core data integration task that has attracted a bulk of works on improving its effectiveness and time efficiency. This tutorial provides a comprehensive overview of the field, distinguishing relevant methods into five main generations. The first one targets Veracity in the context of structured data with a clean schema. The second generation extends its focus to cover Volume, as well, leveraging multi-core or massive parallelization to process large-scale datasets. The third generation addresses the additional challenge of Variety, targeting voluminous, noisy, semi-structured, and highly heterogeneous data from the Semantic Web. The fourth generation also tackles Velocity so as to process data collections of a continuously increasing volume. The latest works, though, belong to the fifth generation, involving pre-trained (large) language models which heavily rely on external knowledge to address all four Vs with high effectiveness.

    References

    [1]
    Altwaijry H et al. Query: a framework for integrating entity resolution with query processing PVLDB 2015 9 120-131
    [2]
    Bernstein PA, Madhavan J, and Rahm E Generic schema matching, ten years later PVLDB 2011 4 11 695-701
    [3]
    Böhm, C., et al.: LINDA: distributed web-of-data-scale entity matching. In: CIKM, pp. 2104–2108 (2012)
    [4]
    Christen P Data Matching 2012 Heidelberg Springer
    [5]
    Dean J and Ghemawat S Mapreduce: simplified data processing on large clusters Commun. ACM 2008 51 1 107-113
    [6]
    Efthymiou, V., et al.: Self-configured entity resolution with pyJedAI. In: IEEE Big Data (2023)
    [7]
    Golshan, B., Halevy, A., Mihaila, G., Tan, W.: Data integration: after the teenage years. In: PODS, pp. 101–106 (2017)
    [8]
    Gruenheid A, Dong XL, and Srivastava D Incremental record linkage PVLDB 2014 7 9 697-708
    [9]
    Hassanzadeh O et al. Framework for evaluating clustering algorithms in duplicate detection PVLDB 2009 2 1 1282-1293
    [10]
    Ioannou E and Garofalakis M Query analytics over probabilistic databases with unmerged duplicates TKDE 2015 27 2245-2260
    [11]
    Kolb L, Thor A, and Rahm E Dedoop: efficient deduplication with hadoop PVLDB 2012 5 12 1878-1881
    [12]
    Kolb, L., Thor, A., Rahm, E.: Load balancing for mapreduce-based entity resolution. In: ICDE, pp. 618–629 (2012)
    [13]
    Lacoste-Julien, S., et al.: Sigma: simple greedy matching for aligning large knowledge bases. In: KDD, pp. 572–580 (2013)
    [14]
    Li J et al. Rimom: a dynamic multistrategy ontology alignment framework TKDE 2009 21 8 1218-1232
    [15]
    Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)
    [16]
    Nikoletos, K., Papadakis, G., Koubarakis, M.: pyJedAI: a lightsaber for Link Discovery. In: ISWC (2022)
    [17]
    Papadakis, G., Ioannou, E., Palpanas, T.: Entity resolution: past, present and yet-to-come. In: EDBT, pp. 647–650 (2020)
    [18]
    Papadakis G, Ioannou E, Thanos E, and Palpanas T The Four Generations of Entity Resolution 2021 San Rafael Morgan & Claypool Publishers
    [19]
    Stefanidis, K., Efthymiou, V., Herschel, M., Christophides, V.: Entity resolution in the web of data. In: WWW (2014)
    [20]
    Suchanek FM et al. PARIS: probabilistic alignment of relations, instances, and schema PVLDB 2011 5 3 157-168
    [21]
    Zeakis, A., Papadakis, G., Skoutas, D., Koubarakis, M.: Pre-trained embeddings for entity resolution: an experimental analysis. In: VLDB (2023)

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Web Engineering: 24th International Conference, ICWE 2024, Tampere, Finland, June 17–20, 2024, Proceedings
    Jun 2024
    485 pages
    ISBN:978-3-031-62361-5
    DOI:10.1007/978-3-031-62362-2

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 17 June 2024

    Author Tags

    1. Entity Resolution
    2. Data Integration
    3. LLMs

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media