research-article

Estimation of Fair Ranking Metrics with Incomplete Judgments

Authors:

Michael Ekstrand,

Ben Carterette,

Emine YilmazAuthors Info & Claims

WWW '21: Proceedings of the Web Conference 2021

Pages 1065 - 1075

https://doi.org/10.1145/3442381.3450080

Published: 03 June 2021 Publication History

Abstract

There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.

References

[1]

Javed A Aslam, Virgil Pavlu, and Emine Yilmaz. 2006. A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 541–548.

Digital Library

[2]

Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H Chi, 2019. Fairness in recommendation ranking through pairwise comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2212–2220.

Digital Library

[3]

Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, and Sebastian Kohlmeier. 2019. Overview of the TREC 2019 Fair Ranking Track. In The Twenty-Eighth Text REtrieval Conference (TREC 2019) Proceedings.

[4]

Asia J Biega, Fernando Diaz, Michael D Ekstrand, and Sebastian Kohlmeier. 2020. Overview of the trec 2019 fair ranking track. arXiv preprint arXiv:2003.11650(2020).

[5]

Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Amortizing individual fairness in rankings. In The 41st international acm sigir conference on research & development in information retrieval. 405–414.

[6]

Ken RW Brewer and Muhammad Hanif. 2013. Sampling with unequal probabilities. Vol. 15. Springer Science & Business Media.

[7]

Chris Buckley and Ellen M. Voorhees. 2004. Retrieval Evaluation with Incomplete Information(SIGIR ’04). Association for Computing Machinery, New York, NY, USA, 25–32.

[8]

Robin Burke. 2017. Multisided Fairness for Recommendation. (July 2017). arxiv:1707.00093 [cs.CY]

[9]

Ben Carterette, James Allan, and Ramesh Sitaraman. 2006. Minimal Test Collections for Retrieval Evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Seattle, Washington, USA) (SIGIR ’06). Association for Computing Machinery, New York, NY, USA, 268–275.

Digital Library

[10]

L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2017. Ranking with fairness constraints. arXiv preprint arXiv:1704.06840(2017).

[11]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (Hong Kong, China) (CIKM ’09). ACM, New York, NY, USA, 621–630. https://doi.org/10.1145/1645953.1646033

Digital Library

[12]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis lectures on information concepts, retrieval, and services 7, 3(2015), 1–115.

[13]

Charles L A Clarke, Maheedhar Kolla, Gordon V Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’08). ACM, New York, NY, USA, 659–666. https://doi.org/10.1145/1390334.1390446

Digital Library

[14]

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87–94.

Digital Library

[15]

Fernando Diaz, Bhaskar Mitra, Michael D Ekstrand, Asia J Biega, and Ben Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. arXiv preprint arXiv:2004.13157(2020).

[16]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.

Digital Library

[17]

Michael D. Ekstrand and Daniel Kluver. 2021. Exploring Author Gender in Book Rating and Recommendation. User Modeling and User-Adapted Interaction(2021). https://doi.org/10.1007/s11257-020-09284-2

[18]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.

[19]

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. 2018. Fairness Without Demographics in Repeated Loss Minimization(Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 1929–1938. http://proceedings.mlr.press/v80/hashimoto18a.html

[20]

Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–16.

Digital Library

[21]

Thorsten Joachims and Filip Radlinski. 2007. Search engines that learn from implicit feedback. Computer 40, 8 (2007), 34–40.

Digital Library

[22]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (Oct. 2002), 422–446. https://doi.org/10.1145/582415.582418

Digital Library

[23]

Nathan Kallus, Xiaojie Mao, and Angela Zhou. 2019. Assessing algorithmic fairness with unobserved protected class using data combination. arXiv preprint arXiv:1906.00285(2019).

[24]

Caitlin Kuhlman, MaryAnn VanValkenburg, and Elke Rundensteiner. 2019. Fare: Diagnostics for fair ranking using pairwise error metrics. In The World Wide Web Conference. 2936–2942.

Digital Library

[25]

Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. 2020. Fairness without Demographics through Adversarially Reweighted Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 728–740.

[26]

Shira Mitchell, Eric Potash, Solon Barocas, Alexander D’Amour, and Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. (Nov. 2018). arxiv:1811.07867 [stat.AP]

[27]

Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (2008), 1–27.

Digital Library

[28]

Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. 2020. Controlling Fairness and Bias in Dynamic Learning-to-Rank. arXiv preprint arXiv:2005.14713(2020).

[29]

V Pavlu and J Aslam. 2007. A practical sampling strategy for efficient retrieval evaluation. College of Computer and Information Science, Northeastern University (2007).

[30]

Dino Pedreschi, Salvatore Ruggieri, and Franco Turini. 2009. Measuring discrimination in socially-sensitive decision records. In Proceedings of the 2009 SIAM international conference on data mining. SIAM, 581–592.

[31]

Adam Roegiest, Aldo Lipani, Alex Beutel, Alexandra Olteanu, Ana Lucic, Ana-Andreea Stoica, Anubrata Das, Asia Biega, Bart Voorn, Claudia Hauff, Damiano Spina, David Lewis, Douglas W Oard, Emine Yilmaz, Faegheh Hasibi, Gabriella Kazai, Graham McDonald, Hinda Haned, Iadh Ounis, Ilse van der Linden, Jean Garcia-Gathright, Joris Baan, Kamuela N Lau, Krisztian Balog, Maarten de Rijke, Mahmoud Sayed, Maria Panteli, Mark Sanderson, Matthew Lease, Michael D Ekstrand, Preethi Lahoti, and Toshihiro Kamishima. 2019. FACTS-IR: Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval. SIGIR Forum 53, 2 (Dec. 2019), 20–43.

[32]

Kevin Roitero, Andrea Brunello, Giuseppe Serra, and Stefano Mizzaro. 2020. Effectiveness evaluation without human relevance judgments: A systematic analysis of existing methods and of their combinations. Information Processing & Management 57, 2 (2020), 102149.

Digital Library

[33]

Kevin Roitero, Marco Passon, Giuseppe Serra, and Stefano Mizzaro. 2018. Reproduce. generalize. extend. on information retrieval evaluation without relevance judgments. Journal of Data and Information Quality (JDIQ) 10, 3 (2018), 1–32.

Digital Library

[34]

Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219–2228.

Digital Library

[35]

WL Stevens. 1958. Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society: Series B (Methodological) 20, 2(1958), 393–397.

[36]

Gábor Takács, István Pilászy, and Domonkos Tikk. 2011. Applications of the conjugate gradient method for implicit feedback collaborative filtering. In Proceedings of the fifth ACM conference on Recommender systems(RecSys ’11). ACM, 297–300. https://doi.org/10.1145/2043932.2043987

Digital Library

[37]

S. K. Thompson. 2002. Sampling. Wiley-Interscience.

[38]

Steven K. Thompson. 2002. Sampling. Wile-Interscience, second edition, 2002.

[39]

Ellen M Voorhees. 2001. The Philosophy of Information Retrieval Evaluation. In Evaluation of Cross-Language Information Retrieval Systems, Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck (Eds.). Springer Berlin Heidelberg, 355–370. http://link.springer.com/chapter/10.1007/3-540-45691-0_34

Digital Library

[40]

Mengting Wan and Julian McAuley. 2018. Item Recommendation on Monotonic Behavior Chains. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 86–94. https://doi.org/10.1145/3240323.3240369

Digital Library

[41]

Ke Yang and Julia Stoyanovich. 2017. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 1–6.

Digital Library

[42]

Emine Yilmaz and Javed A Aslam. 2006. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management. 102–111.

Digital Library

[43]

Emine Yilmaz, Evangelos Kanoulas, and Javed A Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 603–610.

Digital Library

[44]

Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. Fa* ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1569–1578.

Digital Library

Cited By

Rahmani HNaghiaei MDeldjoo Y(2024)A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender SystemsACM Transactions on Recommender Systems10.1145/36511672:3(1-24)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3651167
Ferraro AFerreira GDiaz FBorn G(2024)Measuring Commonality in Recommendation of Cultural Content to Strengthen Cultural CitizenshipACM Transactions on Recommender Systems10.1145/36431382:1(1-32)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3643138
Wu YCao JXu G(2023)Fairness in Recommender Systems: Evaluation Approaches and Assurance StrategiesACM Transactions on Knowledge Discovery from Data10.1145/360455818:1(1-37)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1145/3604558
Show More Cited By

Estimation of Fair Ranking Metrics with Incomplete Judgments
1. Information systems

Recommendations

FARE: Diagnostics for Fair Ranking using Pairwise Error Metrics
WWW '19: The World Wide Web Conference

Ranking, used extensively online and as a critical tool for decision making across many domains, may embed unfair bias. Tools to measure and correct for discriminatory bias are required to ensure that ranking models do not perpetuate unfair practices. ...
Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially ...
Marginal-Certainty-Aware Fair Ranking Algorithm
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Ranking systems are ubiquitous in modern Internet services, including online marketplaces, social media, and search engines. Traditionally, ranking systems only focus on how to get better relevance estimation. When relevance estimation is available, they ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '21: Proceedings of the Web Conference 2021

April 2021

4054 pages

ISBN:9781450383127

DOI:10.1145/3442381

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
359
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)4

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rahmani HNaghiaei MDeldjoo Y(2024)A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender SystemsACM Transactions on Recommender Systems10.1145/36511672:3(1-24)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3651167
Ferraro AFerreira GDiaz FBorn G(2024)Measuring Commonality in Recommendation of Cultural Content to Strengthen Cultural CitizenshipACM Transactions on Recommender Systems10.1145/36431382:1(1-32)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3643138
Wu YCao JXu G(2023)Fairness in Recommender Systems: Evaluation Approaches and Assurance StrategiesACM Transactions on Knowledge Discovery from Data10.1145/360455818:1(1-37)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1145/3604558
Sakai TKim JKang I(2023)A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and RelevanceACM Transactions on Information Systems10.1145/358976342:1(1-36)Online publication date: 18-Aug-2023
https://dl.acm.org/doi/10.1145/3589763
Wang YMa WZhang MLiu YMa S(2023)A Survey on the Fairness of Recommender SystemsACM Transactions on Information Systems10.1145/354733341:3(1-43)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1145/3547333
Balagopalan AJacobs ABiega AChen HDuh WHuang HKato MMothe JPoblete B(2023)The Role of Relevance in Fair RankingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591933(2650-2660)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591933
Schedl MGómez ELex EChua TLauw HSi LTerzi ETsaparas P(2023)Trustworthy Algorithmic Ranking SystemsProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572723(1240-1243)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3572723
Kumar DKumar DSingh SChhikara G(2023)BigBasket Fairness Analysis for Searched Outputs2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307520(1-6)Online publication date: 6-Jul-2023
https://doi.org/10.1109/ICCCNT56998.2023.10307520
Amigó EDeldjoo YMizzaro SBellogín A(2023)A unifying and general account of fairness measurement in recommender systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10311560:1Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.ipm.2022.103115
Deldjoo YJannach DBellogin ADifonzo AZanzonelli D(2023)Fairness in recommender systems: research landscape and future directionsUser Modeling and User-Adapted Interaction10.1007/s11257-023-09364-z34:1(59-108)Online publication date: 24-Apr-2023
https://dl.acm.org/doi/10.1007/s11257-023-09364-z
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents