Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3442381.3450080acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Estimation of Fair Ranking Metrics with Incomplete Judgments

Published: 03 June 2021 Publication History

Abstract

There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.

References

[1]
Javed A Aslam, Virgil Pavlu, and Emine Yilmaz. 2006. A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 541–548.
[2]
Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H Chi, 2019. Fairness in recommendation ranking through pairwise comparisons. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2212–2220.
[3]
Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, and Sebastian Kohlmeier. 2019. Overview of the TREC 2019 Fair Ranking Track. In The Twenty-Eighth Text REtrieval Conference (TREC 2019) Proceedings.
[4]
Asia J Biega, Fernando Diaz, Michael D Ekstrand, and Sebastian Kohlmeier. 2020. Overview of the trec 2019 fair ranking track. arXiv preprint arXiv:2003.11650(2020).
[5]
Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Amortizing individual fairness in rankings. In The 41st international acm sigir conference on research & development in information retrieval. 405–414.
[6]
Ken RW Brewer and Muhammad Hanif. 2013. Sampling with unequal probabilities. Vol. 15. Springer Science & Business Media.
[7]
Chris Buckley and Ellen M. Voorhees. 2004. Retrieval Evaluation with Incomplete Information(SIGIR ’04). Association for Computing Machinery, New York, NY, USA, 25–32.
[8]
Robin Burke. 2017. Multisided Fairness for Recommendation. (July 2017). arxiv:1707.00093 [cs.CY]
[9]
Ben Carterette, James Allan, and Ramesh Sitaraman. 2006. Minimal Test Collections for Retrieval Evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(Seattle, Washington, USA) (SIGIR ’06). Association for Computing Machinery, New York, NY, USA, 268–275.
[10]
L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2017. Ranking with fairness constraints. arXiv preprint arXiv:1704.06840(2017).
[11]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (Hong Kong, China) (CIKM ’09). ACM, New York, NY, USA, 621–630. https://doi.org/10.1145/1645953.1646033
[12]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis lectures on information concepts, retrieval, and services 7, 3(2015), 1–115.
[13]
Charles L A Clarke, Maheedhar Kolla, Gordon V Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’08). ACM, New York, NY, USA, 659–666. https://doi.org/10.1145/1390334.1390446
[14]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87–94.
[15]
Fernando Diaz, Bhaskar Mitra, Michael D Ekstrand, Asia J Biega, and Ben Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. arXiv preprint arXiv:2004.13157(2020).
[16]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
[17]
Michael D. Ekstrand and Daniel Kluver. 2021. Exploring Author Gender in Book Rating and Recommendation. User Modeling and User-Adapted Interaction(2021). https://doi.org/10.1007/s11257-020-09284-2
[18]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.
[19]
Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. 2018. Fairness Without Demographics in Repeated Loss Minimization(Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 1929–1938. http://proceedings.mlr.press/v80/hashimoto18a.html
[20]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–16.
[21]
Thorsten Joachims and Filip Radlinski. 2007. Search engines that learn from implicit feedback. Computer 40, 8 (2007), 34–40.
[22]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (Oct. 2002), 422–446. https://doi.org/10.1145/582415.582418
[23]
Nathan Kallus, Xiaojie Mao, and Angela Zhou. 2019. Assessing algorithmic fairness with unobserved protected class using data combination. arXiv preprint arXiv:1906.00285(2019).
[24]
Caitlin Kuhlman, MaryAnn VanValkenburg, and Elke Rundensteiner. 2019. Fare: Diagnostics for fair ranking using pairwise error metrics. In The World Wide Web Conference. 2936–2942.
[25]
Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. 2020. Fairness without Demographics through Adversarially Reweighted Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 728–740.
[26]
Shira Mitchell, Eric Potash, Solon Barocas, Alexander D’Amour, and Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. (Nov. 2018). arxiv:1811.07867 [stat.AP]
[27]
Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (2008), 1–27.
[28]
Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. 2020. Controlling Fairness and Bias in Dynamic Learning-to-Rank. arXiv preprint arXiv:2005.14713(2020).
[29]
V Pavlu and J Aslam. 2007. A practical sampling strategy for efficient retrieval evaluation. College of Computer and Information Science, Northeastern University (2007).
[30]
Dino Pedreschi, Salvatore Ruggieri, and Franco Turini. 2009. Measuring discrimination in socially-sensitive decision records. In Proceedings of the 2009 SIAM international conference on data mining. SIAM, 581–592.
[31]
Adam Roegiest, Aldo Lipani, Alex Beutel, Alexandra Olteanu, Ana Lucic, Ana-Andreea Stoica, Anubrata Das, Asia Biega, Bart Voorn, Claudia Hauff, Damiano Spina, David Lewis, Douglas W Oard, Emine Yilmaz, Faegheh Hasibi, Gabriella Kazai, Graham McDonald, Hinda Haned, Iadh Ounis, Ilse van der Linden, Jean Garcia-Gathright, Joris Baan, Kamuela N Lau, Krisztian Balog, Maarten de Rijke, Mahmoud Sayed, Maria Panteli, Mark Sanderson, Matthew Lease, Michael D Ekstrand, Preethi Lahoti, and Toshihiro Kamishima. 2019. FACTS-IR: Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval. SIGIR Forum 53, 2 (Dec. 2019), 20–43.
[32]
Kevin Roitero, Andrea Brunello, Giuseppe Serra, and Stefano Mizzaro. 2020. Effectiveness evaluation without human relevance judgments: A systematic analysis of existing methods and of their combinations. Information Processing & Management 57, 2 (2020), 102149.
[33]
Kevin Roitero, Marco Passon, Giuseppe Serra, and Stefano Mizzaro. 2018. Reproduce. generalize. extend. on information retrieval evaluation without relevance judgments. Journal of Data and Information Quality (JDIQ) 10, 3 (2018), 1–32.
[34]
Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219–2228.
[35]
WL Stevens. 1958. Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society: Series B (Methodological) 20, 2(1958), 393–397.
[36]
Gábor Takács, István Pilászy, and Domonkos Tikk. 2011. Applications of the conjugate gradient method for implicit feedback collaborative filtering. In Proceedings of the fifth ACM conference on Recommender systems(RecSys ’11). ACM, 297–300. https://doi.org/10.1145/2043932.2043987
[37]
S. K. Thompson. 2002. Sampling. Wiley-Interscience.
[38]
Steven K. Thompson. 2002. Sampling. Wile-Interscience, second edition, 2002.
[39]
Ellen M Voorhees. 2001. The Philosophy of Information Retrieval Evaluation. In Evaluation of Cross-Language Information Retrieval Systems, Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck (Eds.). Springer Berlin Heidelberg, 355–370. http://link.springer.com/chapter/10.1007/3-540-45691-0_34
[40]
Mengting Wan and Julian McAuley. 2018. Item Recommendation on Monotonic Behavior Chains. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 86–94. https://doi.org/10.1145/3240323.3240369
[41]
Ke Yang and Julia Stoyanovich. 2017. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 1–6.
[42]
Emine Yilmaz and Javed A Aslam. 2006. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management. 102–111.
[43]
Emine Yilmaz, Evangelos Kanoulas, and Javed A Aslam. 2008. A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 603–610.
[44]
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. Fa* ir: A fair top-k ranking algorithm. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1569–1578.

Cited By

View all
  • (2024)A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender SystemsACM Transactions on Recommender Systems10.1145/36511672:3(1-24)Online publication date: 5-Jun-2024
  • (2024)Measuring Commonality in Recommendation of Cultural Content to Strengthen Cultural CitizenshipACM Transactions on Recommender Systems10.1145/36431382:1(1-32)Online publication date: 7-Mar-2024
  • (2023)Fairness in Recommender Systems: Evaluation Approaches and Assurance StrategiesACM Transactions on Knowledge Discovery from Data10.1145/360455818:1(1-37)Online publication date: 10-Aug-2023
  • Show More Cited By
  1. Estimation of Fair Ranking Metrics with Incomplete Judgments

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. fair ranking
    3. fairness
    4. information retrieval

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '21
    Sponsor:
    WWW '21: The Web Conference 2021
    April 19 - 23, 2021
    Ljubljana, Slovenia

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)71
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender SystemsACM Transactions on Recommender Systems10.1145/36511672:3(1-24)Online publication date: 5-Jun-2024
    • (2024)Measuring Commonality in Recommendation of Cultural Content to Strengthen Cultural CitizenshipACM Transactions on Recommender Systems10.1145/36431382:1(1-32)Online publication date: 7-Mar-2024
    • (2023)Fairness in Recommender Systems: Evaluation Approaches and Assurance StrategiesACM Transactions on Knowledge Discovery from Data10.1145/360455818:1(1-37)Online publication date: 10-Aug-2023
    • (2023)A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and RelevanceACM Transactions on Information Systems10.1145/358976342:1(1-36)Online publication date: 18-Aug-2023
    • (2023)A Survey on the Fairness of Recommender SystemsACM Transactions on Information Systems10.1145/354733341:3(1-43)Online publication date: 7-Feb-2023
    • (2023)The Role of Relevance in Fair RankingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591933(2650-2660)Online publication date: 19-Jul-2023
    • (2023)Trustworthy Algorithmic Ranking SystemsProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572723(1240-1243)Online publication date: 27-Feb-2023
    • (2023)BigBasket Fairness Analysis for Searched Outputs2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307520(1-6)Online publication date: 6-Jul-2023
    • (2023)A unifying and general account of fairness measurement in recommender systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10311560:1Online publication date: 1-Jan-2023
    • (2023)Fairness in recommender systems: research landscape and future directionsUser Modeling and User-Adapted Interaction10.1007/s11257-023-09364-z34:1(59-108)Online publication date: 24-Apr-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media