short-paper

FairER: Entity Resolution With Fairness Constraints

Authors:

Vasilis Efthymiou,

Kostas Stefanidis,

Evaggelia Pitoura,

Vassilis ChristophidesAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 3004 - 3008

https://doi.org/10.1145/3459637.3482105

Published: 30 October 2021 Publication History

Abstract

There is an urgent call to detect and prevent "biased data" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the same real-world entity. We formally define the ER problem with fairness constraints ensuring that all groups of entities have similar chances to be resolved. Then, we introduce FairER, a greedy algorithm for solving this problem for fairness criteria based on equal matching decisions. Our experiments show that FairER achieves similar or higher accuracy against two baseline methods over 7 datasets, while guaranteeing minimal bias.

Supplementary Material

MP4 File (CIKM21-rgsp2685.mp4)

In this video, we present our short paper "FairER: Entity Resolution with Fairness Constraints". In this paper, we introduce the notion of fairness-aware entity resolution, and we present FairER, a greedy algorithm that solves an instance of this problem for group fairness criteria based on equal matching decisions. Our experimental results over seven datasets show that FairER achieves accuracy that is not only comparable to, but sometimes even higher than two baseline methods, while guaranteeing minimal bias. All the resources used in this work are publicly available at https://github.com/vefthym/fairER.

Download
34.10 MB

References

[1]

Arvind Arasu, Michaela Gö tz, and Raghav Kaushik. 2010. On active learning of record matching packages. In SIGMOD. 783--794.

Digital Library

[2]

Andrea Baraldi, Francesco Del Buono, Matteo Paganelli, and Francesco Guerra. 2021. Using Landmarks for Explaining Entity Matching Models. In EDBT. 451--456.

[3]

Kedar Bellare, Suresh Iyengar, Aditya G. Parameswaran, and Vibhor Rastogi. 2012. Active sampling for entity matching. In KDD. 1131--1139.

Digital Library

[4]

Christoph Bö hm, Gerard de Melo, Felix Naumann, and Gerhard Weikum. 2012. LINDA: distributed web-of-data-scale entity matching. In CIKM. 2104--2108.

Digital Library

[5]

L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018. Ranking with Fairness Constraints. In ICALP, Vol. 107. 28:1--28:15.

[6]

Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, Vol. 5, 2 (2017), 153--163.

[7]

Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data. ACM Comput. Surv., Vol. 53, 6 (2021).

Digital Library

[8]

Amr Ebaid, Saravanan Thirumuruganathan, Walid G. Aref, Ahmed Elmagarmid, and Mourad Ouzzani. 2019. EXPLAINER: Entity Resolution Explanations. In ICDE. 2000--2003.

[9]

Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. Proc. VLDB Endow., Vol. 11, 11 (2018), 1454--1467.

Digital Library

[10]

Vasilis Efthymiou, George Papadakis, Kostas Stefanidis, and Vassilis Christophides. 2019. MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities. In EDBT. 373--384.

[11]

Vasilis Efthymiou, Kostas Stefanidis, and Vassilis Christophides. 2020. Benchmarking Blocking Algorithms for Web Entities. IEEE Trans. Big Data, Vol. 6, 2 (2020), 382--395.

[12]

Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-End Multi-Perspective Matching for Entity Resolution. In IJCAI. 4961--4967.

[13]

Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miroslav Dud'i k, and Hanna M. Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In CHI.

Digital Library

[14]

Alexandros Karakasidis and Evaggelia Pitoura. 2019. Identifying Bias in Name Matching Tasks. In EDBT. 626--629.

[15]

Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In ITCS (LIPIcs, Vol. 67). Dagstuhl, 43:1--43:23.

[16]

Caitlin Kuhlman and Elke A. Rundensteiner. 2020. Rank Aggregation Algorithms for Fair Consensus. Proc. VLDB Endow., Vol. 13, 11 (2020), 2706--2719.

Digital Library

[17]

Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, and Zoubin Ghahramani. 2013. SiGMa: simple greedy matching for aligning large knowledge bases. In KDD. 572--580.

Digital Library

[18]

Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2021. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks. In ICDE.

[19]

Karima Makhlouf, Sami Zhioua, and Catuscia Palamidessi. 2020. On the Applicability of ML Fairness Notions. CoRR, Vol. abs/2006.16745 (2020).

[20]

Edgar Meij, Tara Safavi, Chenyan Xiong, Gianluca Demartini, Miriam Redi, and Fatma Özcan. 2020. Proceedings of the KG-BIAS Workshop 2020 at AKBC 2020.

[21]

Shubhanshu Mishra, Sijun He, and Luca Belli. 2020. Assessing Demographic Bias in Named Entity Recognition. CoRR, Vol. abs/2008.03415 (2020).

[22]

Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum. 2021. Algorithmic Fairness: Choices, Assumptions, and Definitions. Annual Review of Statistics and Its Application, Vol. 8, 1 (Mar 2021), 141--163.

[23]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD. 19--34.

Digital Library

[24]

George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions - I. Math. Program., Vol. 14, 1 (1978), 265--294.

Digital Library

[25]

Evaggelia Pitoura, Kostas Stefanidis, and Georgia Koutrika. 2021. Fairness in Rankings and Recommendations: An Overview. CoRR, Vol. abs/2104.05994 (2021).

[26]

Kun Qian, Lucian Popa, and Prithviraj Sen. 2019. SystemER: A Human-in-the-loop System for Explainable Entity Resolution. Proc. VLDB Endow., Vol. 12, 12 (2019), 1794--1797.

Digital Library

[27]

Sebastian Schelter, Yuxuan He, Jatin Khilnani, and Julia Stoyanovich. 2020. FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions. In EDBT. 395--398.

[28]

Julia Stoyanovich, Bill Howe, and H. V. Jagadish. 2020. Responsible Data Management. Proc. VLDB Endow., Vol. 13, 12 (2020), 3474--3488.

Digital Library

[29]

Harini Suresh and John V. Guttag. 2019. A Framework for Understanding Unintended Consequences of Machine Learning. CoRR, Vol. abs/1901.10002 (2019).

[30]

Ki Hyun Tae, Yuji Roh, Young Hun Oh, Hyunsu Kim, and Steven Euijong Whang. 2019. Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach. In DEEM@SIGMOD. 5:1--5:4.

Digital Library

[31]

Yan Yan, Stephen Meyles, Aria Haghighi, and Dan Suciu. 2020. Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications. In SIGMOD. 2287--2301.

Digital Library

[32]

Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A Fair Top-k Ranking Algorithm. In CIKM. 1569--1578.

Digital Library

Cited By

Shahbazi NWang JMiao ZBhutani N(2024)Fairness-Aware Data Preparation for Entity Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00268(3476-3489)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00268
Shahbazi NDanevski NNargesian FAsudeh ASrivastava D(2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.14778/3611479.3611525
Efthymiou VIoannou EKarvounis MKoubarakis MMaciejewski JNikoletos KPapadakis GSkoutas DVelegrakis YZeakis A(2023)Self-configured Entity Resolution with pyJedAI2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386556(339-343)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386556
Show More Cited By

Index Terms

FairER: Entity Resolution With Fairness Constraints
1. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution

Recommendations

Collective entity resolution in relational data

Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data ...
A note on using the F-measure for evaluating record linkage algorithms

Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two ...
A Graduate-Level Course on Entity Resolution and Information Quality: A Step toward ER Education
Special Issue on Entity Resolution

This article discusses the topics, approaches, and lessons learned in teaching a graduate-level course covering entity resolution (ER) and its relationship to information quality (IQ). The course surveys a broad spectrum of ER topics and activities ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Hellenic Foundation for Research and Innovation

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
202
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shahbazi NWang JMiao ZBhutani N(2024)Fairness-Aware Data Preparation for Entity Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00268(3476-3489)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00268
Shahbazi NDanevski NNargesian FAsudeh ASrivastava D(2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.14778/3611479.3611525
Efthymiou VIoannou EKarvounis MKoubarakis MMaciejewski JNikoletos KPapadakis GSkoutas DVelegrakis YZeakis A(2023)Self-configured Entity Resolution with pyJedAI2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386556(339-343)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386556
Ghassabi SBehkamal BMilani M(2023)Leveraging Knowledge Graphs for Matching Heterogeneous Entities and Explanation2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386157(2910-2919)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386157
Makri CKarakasidis APitoura E(2022)Towards a more Accurate and Fair SVM-based Record Linkage2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020514(4691-4699)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10020514
Nilforoushan SWu QMilani M(2022)Entity Matching with AUC-Based Fairness2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020293(5068-5075)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10020293

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents