Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459637.3482105acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

FairER: Entity Resolution With Fairness Constraints

Published: 30 October 2021 Publication History

Abstract

There is an urgent call to detect and prevent "biased data" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the same real-world entity. We formally define the ER problem with fairness constraints ensuring that all groups of entities have similar chances to be resolved. Then, we introduce FairER, a greedy algorithm for solving this problem for fairness criteria based on equal matching decisions. Our experiments show that FairER achieves similar or higher accuracy against two baseline methods over 7 datasets, while guaranteeing minimal bias.

Supplementary Material

MP4 File (CIKM21-rgsp2685.mp4)
In this video, we present our short paper "FairER: Entity Resolution with Fairness Constraints". In this paper, we introduce the notion of fairness-aware entity resolution, and we present FairER, a greedy algorithm that solves an instance of this problem for group fairness criteria based on equal matching decisions. Our experimental results over seven datasets show that FairER achieves accuracy that is not only comparable to, but sometimes even higher than two baseline methods, while guaranteeing minimal bias. All the resources used in this work are publicly available at https://github.com/vefthym/fairER.

References

[1]
Arvind Arasu, Michaela Gö tz, and Raghav Kaushik. 2010. On active learning of record matching packages. In SIGMOD. 783--794.
[2]
Andrea Baraldi, Francesco Del Buono, Matteo Paganelli, and Francesco Guerra. 2021. Using Landmarks for Explaining Entity Matching Models. In EDBT. 451--456.
[3]
Kedar Bellare, Suresh Iyengar, Aditya G. Parameswaran, and Vibhor Rastogi. 2012. Active sampling for entity matching. In KDD. 1131--1139.
[4]
Christoph Bö hm, Gerard de Melo, Felix Naumann, and Gerhard Weikum. 2012. LINDA: distributed web-of-data-scale entity matching. In CIKM. 2104--2108.
[5]
L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018. Ranking with Fairness Constraints. In ICALP, Vol. 107. 28:1--28:15.
[6]
Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, Vol. 5, 2 (2017), 153--163.
[7]
Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data. ACM Comput. Surv., Vol. 53, 6 (2021).
[8]
Amr Ebaid, Saravanan Thirumuruganathan, Walid G. Aref, Ahmed Elmagarmid, and Mourad Ouzzani. 2019. EXPLAINER: Entity Resolution Explanations. In ICDE. 2000--2003.
[9]
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. Proc. VLDB Endow., Vol. 11, 11 (2018), 1454--1467.
[10]
Vasilis Efthymiou, George Papadakis, Kostas Stefanidis, and Vassilis Christophides. 2019. MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities. In EDBT. 373--384.
[11]
Vasilis Efthymiou, Kostas Stefanidis, and Vassilis Christophides. 2020. Benchmarking Blocking Algorithms for Web Entities. IEEE Trans. Big Data, Vol. 6, 2 (2020), 382--395.
[12]
Cheng Fu, Xianpei Han, Le Sun, Bo Chen, Wei Zhang, Suhui Wu, and Hao Kong. 2019. End-to-End Multi-Perspective Matching for Entity Resolution. In IJCAI. 4961--4967.
[13]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miroslav Dud'i k, and Hanna M. Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In CHI.
[14]
Alexandros Karakasidis and Evaggelia Pitoura. 2019. Identifying Bias in Name Matching Tasks. In EDBT. 626--629.
[15]
Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In ITCS (LIPIcs, Vol. 67). Dagstuhl, 43:1--43:23.
[16]
Caitlin Kuhlman and Elke A. Rundensteiner. 2020. Rank Aggregation Algorithms for Fair Consensus. Proc. VLDB Endow., Vol. 13, 11 (2020), 2706--2719.
[17]
Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, and Zoubin Ghahramani. 2013. SiGMa: simple greedy matching for aligning large knowledge bases. In KDD. 572--580.
[18]
Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2021. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks. In ICDE.
[19]
Karima Makhlouf, Sami Zhioua, and Catuscia Palamidessi. 2020. On the Applicability of ML Fairness Notions. CoRR, Vol. abs/2006.16745 (2020).
[20]
Edgar Meij, Tara Safavi, Chenyan Xiong, Gianluca Demartini, Miriam Redi, and Fatma Özcan. 2020. Proceedings of the KG-BIAS Workshop 2020 at AKBC 2020.
[21]
Shubhanshu Mishra, Sijun He, and Luca Belli. 2020. Assessing Demographic Bias in Named Entity Recognition. CoRR, Vol. abs/2008.03415 (2020).
[22]
Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum. 2021. Algorithmic Fairness: Choices, Assumptions, and Definitions. Annual Review of Statistics and Its Application, Vol. 8, 1 (Mar 2021), 141--163.
[23]
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD. 19--34.
[24]
George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions - I. Math. Program., Vol. 14, 1 (1978), 265--294.
[25]
Evaggelia Pitoura, Kostas Stefanidis, and Georgia Koutrika. 2021. Fairness in Rankings and Recommendations: An Overview. CoRR, Vol. abs/2104.05994 (2021).
[26]
Kun Qian, Lucian Popa, and Prithviraj Sen. 2019. SystemER: A Human-in-the-loop System for Explainable Entity Resolution. Proc. VLDB Endow., Vol. 12, 12 (2019), 1794--1797.
[27]
Sebastian Schelter, Yuxuan He, Jatin Khilnani, and Julia Stoyanovich. 2020. FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions. In EDBT. 395--398.
[28]
Julia Stoyanovich, Bill Howe, and H. V. Jagadish. 2020. Responsible Data Management. Proc. VLDB Endow., Vol. 13, 12 (2020), 3474--3488.
[29]
Harini Suresh and John V. Guttag. 2019. A Framework for Understanding Unintended Consequences of Machine Learning. CoRR, Vol. abs/1901.10002 (2019).
[30]
Ki Hyun Tae, Yuji Roh, Young Hun Oh, Hyunsu Kim, and Steven Euijong Whang. 2019. Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach. In DEEM@SIGMOD. 5:1--5:4.
[31]
Yan Yan, Stephen Meyles, Aria Haghighi, and Dan Suciu. 2020. Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications. In SIGMOD. 2287--2301.
[32]
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A Fair Top-k Ranking Algorithm. In CIKM. 1569--1578.

Cited By

View all
  • (2024)Fairness-Aware Data Preparation for Entity Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00268(3476-3489)Online publication date: 13-May-2024
  • (2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 1-Jul-2023
  • (2023)Self-configured Entity Resolution with pyJedAI2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386556(339-343)Online publication date: 15-Dec-2023
  • Show More Cited By

Index Terms

  1. FairER: Entity Resolution With Fairness Constraints

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
    October 2021
    4966 pages
    ISBN:9781450384469
    DOI:10.1145/3459637
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithmic fairness
    2. entity resolution

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    CIKM '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Fairness-Aware Data Preparation for Entity Matching2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00268(3476-3489)Online publication date: 13-May-2024
    • (2023)Through the Fairness Lens: Experimental Analysis and Evaluation of Entity MatchingProceedings of the VLDB Endowment10.14778/3611479.361152516:11(3279-3292)Online publication date: 1-Jul-2023
    • (2023)Self-configured Entity Resolution with pyJedAI2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386556(339-343)Online publication date: 15-Dec-2023
    • (2023)Leveraging Knowledge Graphs for Matching Heterogeneous Entities and Explanation2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386157(2910-2919)Online publication date: 15-Dec-2023
    • (2022)Towards a more Accurate and Fair SVM-based Record Linkage2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020514(4691-4699)Online publication date: 17-Dec-2022
    • (2022)Entity Matching with AUC-Based Fairness2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020293(5068-5075)Online publication date: 17-Dec-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media