Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleJune 2024
EdgER: Entity Resolution at the Edge for Next Generation Web Systems
AbstractThanks to the advances of emerging technologies like Edge and Cloud Computing and microservice development, web architectures are evolving to support Big Data platforms fully. The decentralization of the traditional cloud-centric approach pushes ...
- research-articleMay 2024
On tuning parameters guiding similarity computations in a data deduplication pipeline for customers records
AbstractData stored in information systems are often erroneous. Duplicate data are one of the typical error type. To discover and handle duplicates, the so-called deduplication methods are applied. They are complex and time costly algorithms. In data ...
- research-articleMarch 2024
Connected Components for Scaling Partial-order Blocking to Billion Entities
Journal of Data and Information Quality (JDIQ), Volume 16, Issue 1Article No.: 9, Pages 1–29https://doi.org/10.1145/3646553In entity resolution, blocking pre-partitions data for further processing by more expensive methods. Two entity mentions are in the same block if they share identical or related blocking-keys. Previous work has sometimes related blocking keys by grouping ...
- review-articleMarch 2024
GSM: A generalized approach to Supervised Meta-blocking for scalable entity resolution
AbstractEntity Resolution (ER) constitutes a core data integration task that relies on Blocking in order to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall, requires no domain knowledge and applies to data of any ...
Highlights- Formalization of meta-blocking as a probabilistic classification task.
- A supervised meta-blocking algorithm that requires only 50 examples for training.
- Four new weighting schemes that enhance the meta-blocking performance.
- ...
- research-articleFebruary 2024
Linking Entities across Relations and Graphs
ACM Transactions on Database Systems (TODS), Volume 49, Issue 1Article No.: 2, Pages 1–50https://doi.org/10.1145/3639363This article proposes a notion of parametric simulation to link entities across a relational database 𝒟 and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations, and important properties as parameters, parametric ...
-
- ArticleJuly 2023
Data Integration Landscapes: The Case for Non-optimal Solutions in Network Diffusion Models
Computational Science – ICCS 2023Jul 2023, Pages 494–508https://doi.org/10.1007/978-3-031-35995-8_35AbstractThe successful application of computational models presupposes access to accurate, relevant, and representative datasets. The growth of public data, and the increasing practice of data sharing and reuse, emphasises the importance of data ...
- research-articleMay 2023
Towards automatic Privacy-Preserving Record Linkage: A Transfer Learning based classification step
Data & Knowledge Engineering (DAKE), Volume 145, Issue CMay 2023https://doi.org/10.1016/j.datak.2023.102180AbstractPrivacy-Preserving Record Linkage (PPRL) intends to identify records that match the same real-world entities across disparate data sources while preserving the privacy of the individual entities. To identify matching records across different data ...
- research-articleApril 2023
Transformer-based Denoising Adversarial Variational Entity Resolution
Journal of Intelligent Information Systems (JIIS), Volume 61, Issue 2Oct 2023, Pages 631–650https://doi.org/10.1007/s10844-022-00773-xAbstractEntity resolution (ER), precisely identifying different representations of the same real-world entities, is critical for data integration. The ER question has been studied for many years, and many methods have been proposed to solve it. Although ...
- research-articleJanuary 2023
Adaptive deep learning for entity resolution by risk analysis
Knowledge-Based Systems (KNBS), Volume 260, Issue CJan 2023https://doi.org/10.1016/j.knosys.2022.110118AbstractThe state-of-the-art performance on entity resolution (ER) has been achieved by deep learning. However, deep models usually need to be trained on large quantities of accurately labeled training data, and cannot be easily tuned towards ...
- research-articleSeptember 2022
Towards hierarchical affiliation resolution: framework, baselines, dataset
International Journal on Digital Libraries (IJDL), Volume 23, Issue 3Sep 2022, Pages 267–288https://doi.org/10.1007/s00799-022-00326-1AbstractAuthor affiliations provide key information when attributing academic performance like publication counts. So far, such measures have been aggregated either manually or only to top-level institutions, such as universities. Supervised affiliation ...
- research-articleJune 2022
The role of transitive closure in evaluating blocking methods for dirty entity resolution
Journal of Intelligent Information Systems (JIIS), Volume 58, Issue 3Jun 2022, Pages 561–590https://doi.org/10.1007/s10844-021-00676-3AbstractEntity resolution (ER) is a process that identifies duplicate records referring to a real-world entity and links them together in one or more datasets. As a first step toward reducing the number of required record comparisons, blocking methods ...
- research-articleMay 2022
Accurate privacy-preserving record linkage for databases with missing values
AbstractPrivacy-preserving record linkage is the process of matching records that refer to the same entity across sensitive databases held by different organisations. This process is often challenging because no unique entity identifiers, such ...
Highlights- Privacy-preserving record linkage (PPRL) aims to link sensitive data across databases.
- research-articleMay 2022
Multi-layer data integration technique for combining heterogeneous crime data
Information Processing and Management: an International Journal (IPRM), Volume 59, Issue 3May 2022https://doi.org/10.1016/j.ipm.2022.102879AbstractAnalysis of publicly available human and drug trafficking crime data faces the challenge of finding a comprehensive dataset that includes a sufficiently large number of crime incidents. Our proposed methodology attempts to address this ...
Highlights- Novel multi-layer data integration technique for combining crime datasets.
- ...
- research-articleApril 2022
Geospatial Entity Resolution
WWW '22: Proceedings of the ACM Web Conference 2022April 2022, Pages 3061–3070https://doi.org/10.1145/3485447.3512026A geospatial database is today at the core of an ever increasing number of services. Building and maintaining it remains challenging due to the need to merge information from multiple providers. Entity Resolution (ER) consists of finding entity mentions ...
- ArticleApril 2022
Information Networks Based Multi-semantic Data Embedding for Entity Resolution
Database Systems for Advanced ApplicationsApr 2022, Pages 20–35https://doi.org/10.1007/978-3-031-00129-1_2AbstractEntity resolution (ER) is an ongoing topic in data integration and data governance, which attracts considerable attention from multiple research fields. Recently, deep learning techniques have been substantially applied to entity resolution. We ...
- research-articleFebruary 2022
Cost-effective crowdsourced join queries for entity resolution without prior knowledge
Future Generation Computer Systems (FGCS), Volume 127, Issue CFeb 2022, Pages 240–251https://doi.org/10.1016/j.future.2021.09.008AbstractThe join query, which finds matching pairs from two object sets, is a fundamental operation in computer systems and helps to solve many real problems, e.g., entity resolution. In this paper, we address the problem of join queries by ...
Highlights- To leverage crowdsourcing to obtain matching relationships in join queries.
- ...
- research-articleJanuary 2022
Towards deep entity resolution via soft schema matching
Neurocomputing (NEUROC), Volume 471, Issue CJan 2022, Pages 107–117https://doi.org/10.1016/j.neucom.2021.10.106AbstractEntity resolution (ER) leads a key role in data preprocessing. ER identifies records corresponding to the same real-world entity. Recent years have witnessed a growing trend of deep learning based ER (deep ER). However, previous deep ...
- research-articleJanuary 2022
Active deep learning on entity resolution by risk sampling
Knowledge-Based Systems (KNBS), Volume 236, Issue CJan 2022https://doi.org/10.1016/j.knosys.2021.107729AbstractWhile the state-of-the-art performance on entity resolution (ER) has been achieved by deep learning, its effectiveness depends on large quantities of accurately labeled training data. To alleviate the data labeling burden, Active ...
- research-articleDecember 2021
Real-Time Entity Resolution by Forest-Based Indexing in Database Systems with Vertical Fragmentations
CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application EngineeringOctober 2021, Article No.: 67, Pages 1–5https://doi.org/10.1145/3487075.3487142Entity resolution (ER) is the process of identifying and matching which tuples/records in a dataset/relation refer to the same real-world entity. Real-time ER is a challenge for large datasets. Schema decomposition is of importance in (distributed) ...