Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3487572.3487573acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

The 2021 RecSys Challenge Dataset: Fairness is not optional

Published: 22 November 2021 Publication History
  • Get Citation Alerts
  • Abstract

    After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year’s dataset is not only bigger (~1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.

    References

    [1]
    Abolfazl Asudeh, HV Jagadish, Julia Stoyanovich, and Gautam Das. 2019. Designing fair ranking schemes. In Proceedings of the 2019 International Conference on Management of Data. 1259–1276.
    [2]
    Luca Belli, Sofia Ira Ktena, Alykhan Tejani, Alexandre Lung-Yut-Fon, Frank Portman, Xiao Zhu, Yuanpu Xie, Akshay Gupta, Michael Bronstein, Amra Delić, Gabriele Sottocornola, Walter Anelli, Nazareno Andrade, Jessie Smith, and Wenzhe Shi. 2020. Privacy-Aware Recommender Systems Challenge on Twitter’s Home Timeline. arxiv:2004.13715 [cs.SI]
    [3]
    Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of attention: Amortizing individual fairness in rankings. In The 41st international acm sigir conference on research & development in information retrieval. 405–414.
    [4]
    S. Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. ArXiv abs/1808.00023(2018).
    [5]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
    [6]
    Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980
    [7]
    Arvind Narayanan and Vitaly Shmatikov. 2006. How To Break Anonymity of the Netflix Prize Dataset. arxiv:cs/0610105 [cs.CR]
    [8]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
    [9]
    Piotr Sapiezynski, Wesley Zeng, Ronald E Robertson, Alan Mislove, and Christo Wilson. 2019. Quantifying the Impact of User Attentionon Fair Group Representation in Ranked Lists. In Companion Proceedings of The 2019 World Wide Web Conference. 553–562.
    [10]
    Andrew D. Selbst, Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and Abstraction in Sociotechnical Systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 59–68. https://doi.org/10.1145/3287560.3287598
    [11]
    Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219–2228.
    [12]
    Latanya Sweeney. 1997. Guaranteeing anonymity when sharing medical data, the Datafly System. In Proceedings: a conference of the American Medical Informatics Association. AMIA Fall Symposium. Hanley & Belfus, Inc., Nashville, TN, USA, 51—55. https://europepmc.org/articles/PMC2233452
    [13]
    Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17(2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2

    Cited By

    View all
    • (2024)Skewed perspectives: examining the influence of engagement maximization on content diversity in social media feedsJournal of Computational Social Science10.1007/s42001-024-00255-w7:1(721-739)Online publication date: 20-Mar-2024
    • (2024)Algorithmic Amplification of Politics and Engagement Maximization on Social MediaComplex Networks & Their Applications XII10.1007/978-3-031-53503-1_11(131-142)Online publication date: 29-Feb-2024
    • (2023)RecSys Challenge 2023 Dataset: Ads Recommendations in Online AdvertisingProceedings of the Recommender Systems Challenge 202310.1145/3626221.3627283(1-3)Online publication date: 19-Sep-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    RecSysChallenge '21: Proceedings of the Recommender Systems Challenge 2021
    October 2021
    43 pages
    ISBN:9781450386937
    DOI:10.1145/3487572
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 November 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. engagement prediction
    2. fairness challenge
    3. large-scale dataset
    4. personalization
    5. recommender system
    6. twitter

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    RecSysChallenge 2021

    Acceptance Rates

    Overall Acceptance Rate 11 of 15 submissions, 73%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Skewed perspectives: examining the influence of engagement maximization on content diversity in social media feedsJournal of Computational Social Science10.1007/s42001-024-00255-w7:1(721-739)Online publication date: 20-Mar-2024
    • (2024)Algorithmic Amplification of Politics and Engagement Maximization on Social MediaComplex Networks & Their Applications XII10.1007/978-3-031-53503-1_11(131-142)Online publication date: 29-Feb-2024
    • (2023)RecSys Challenge 2023 Dataset: Ads Recommendations in Online AdvertisingProceedings of the Recommender Systems Challenge 202310.1145/3626221.3627283(1-3)Online publication date: 19-Sep-2023
    • (2023)RecSys Challenge 2023: Deep Funnel Optimization with a Focus on User PrivacyProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3610508(1217-1220)Online publication date: 14-Sep-2023
    • (2021)User Engagement Modeling with Deep Learning and Language ModelsProceedings of the Recommender Systems Challenge 202110.1145/3487572.3487604(22-27)Online publication date: 1-Oct-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media