Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3593013.3594019acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Representation in AI Evaluations

Published: 12 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Calls for representation in artificial intelligence (AI) and machine learning (ML) are widespread, with "representation" or "representativeness" generally understood to be both an instrumentally and intrinsically beneficial quality of an AI system, and central to fairness concerns. But what does it mean for an AI system to be "representative"? Each element of the AI lifecycle is geared towards its own goals and effect on the system, therefore requiring its own analyses with regard to what kind of representation is best. In this work we untangle the benefits of representation in AI evaluations to develop a framework to guide an AI practitioner or auditor towards the creation of representative ML evaluations. Representation, however, is not a panacea. We further lay out the limitations and tensions of instrumentally representative datasets, such as the necessity of data existence and access, surveillance vs expectations of privacy, implications for foundation models and power. This work sets the stage for a research agenda on representation in AI, which extends beyond instrumentally valuable representation in evaluations towards refocusing on, and empowering, impacted communities.

    Supplemental Material

    PDF File
    Appendix

    References

    [1]
    NeurIPS 2022. 2022. Thirty Sixth Conference on Neural Information Processing Systems (NeurIPS 2022). Retrieved January 25, 2023 from https://neurips.cc/
    [2]
    Mohsen Abbasi, Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2019. Fairness in representation: quantifying stereotyping as a representational harm. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics, 801–809.
    [3]
    [3] Parity AI. 2023. Retrieved January 16, 2023 from https://www.getparity.ai/
    [4]
    V. Albiero, K. W. Bowyer, K. Vangara, and M. C. King. 2020. Does Face Recognition Accuracy Get Better With Age? Deep Face Matchers Say No. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE Computer Society, Los Alamitos, CA, USA, 250–258. https://doi.org/10.1109/WACV45572.2020.9093357
    [5]
    Joshua Albrecht, Abraham J Fetterman, Bryden Fogelman, Ellie Kitanidis, Bartosz Wróblewski, Nicole Seo, Michael Rosenthal, Maksis Knutins, Zachary Polizzi, James B Simon, 2022. Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [6]
    Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, 2022. FETA: Towards Specializing Foundation Models for Expert Task Applications. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [7]
    Sherry R. Arnstein. 1969. A Ladder of Citizen Participation. 35 (July 1969), 216–224. Issue 4.
    [8]
    Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.
    [9]
    Solon Barocas and Andrew D Selbst. 2016. Big Data’s Disparate Impact. Technical Report ID 2477899. Social Science Research Network, Rochester, NY.
    [10]
    Alvaro Bedoya. 2014. Big data and the underground railroad. Slate (2014). http://www.slate.com/articles/technology/future_tense/2014/11/big_ data_underground_railroad_history_says_unfettered_collection_of_data.html.
    [11]
    Emily Bender. 2019. The #BenderRule: On Naming the Languages We Study and Why It Matters. The Gradient (2019). https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/.
    [12]
    Emily M Bender. 2011. On Achieving and Evaluating Language-Independence in NLP. LiLT 6 (Oct. 2011).
    [13]
    Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2012. Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives. CoRR abs/1206.5538 (2012). arXiv:1206.5538http://arxiv.org/abs/1206.5538
    [14]
    A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau, and Verena Rieser. 2022. Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, Edinburgh, UK, 39–52. https://aclanthology.org/2022.sigdial-1.4
    [15]
    A. Stevie Bergman and Mona Diab. 2022. Towards Responsible Natural Language Annotation for the Varieties of Arabic. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 364–371. https://doi.org/10.18653/v1/2022.findings-acl.31
    [16]
    Abeba Birhane, William Isaac, Vinodkumar Prabhakaran, Mark Díaz, Madeleine Clare Elish, Iason Gabriel, and Shakir Mohamed. 2022. Power to the People? Opportunities and Challenges for Participatory AI. In Equity and Access in Algorithms, Mechanisms, and Optimization (Arlington, VA, USA) (EAAMO ’22). Association for Computing Machinery, New York, NY, USA, Article 6, 8 pages. https://doi.org/10.1145/3551624.3555290
    [17]
    Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2022. The Values Encoded in Machine Learning Research. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 173–184. https://doi.org/10.1145/3531146.3533083
    [18]
    Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454–5476. https://doi.org/10.18653/v1/2020.acl-main.485
    [19]
    Su Lin Blodgett, Gilsinia Lopez, Alexandra Olteanu, Robert Sim, and Hanna Wallach. 2021. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets. Association for Computational Linguistics.
    [20]
    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, S. Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen A. Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren E. Gillespie, Karan Goel, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas F. Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, O. Khattab, Pang Wei Koh, Mark S. Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir P. Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Benjamin Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, J. F. Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Robert Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher R’e, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishna Parasuram Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei A. Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. 2021. On the Opportunities and Risks of Foundation Models. ArXiv (2021). https://crfm.stanford.edu/assets/report.pdf
    [21]
    Elizabeth Bondi, Lily Xu, Diana Acosta-Navas, and Jackson A Killian. 2021. Envisioning Communities: A Participatory Approach Towards AI for Social Good. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, New York, NY, USA, 425–436.
    [22]
    Finn Brunton and Helen Nissenbaum. 2015. Obfuscation: A user’s guide for privacy and protest. MIT Press (2015).
    [23]
    Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency(Proceedings of Machine Learning Research, Vol. 81), Sorelle A. Friedler and Christo Wilson (Eds.). PMLR, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html
    [24]
    Danilo Bzdok, Naomi Altman, and Martin Krzywinski. 2018. Statistics versus machine learning. Nat. Methods 15, 4 (April 2018), 233–234.
    [25]
    William Cai, Ro Encarnacion, Bobbie Chern, Sam Corbett-Davies, Miranda Bogen, Stevie Bergman, and Sharad Goel. 2022. Adaptive Sampling Strategies to Construct Equitable Training Datasets. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1467–1478. https://doi.org/10.1145/3531146.3533203
    [26]
    Kyla Chasalow and Karen Levy. 2021. Representativeness in Statistics, Politics, and Machine Learning. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 77–89.
    [27]
    Daoyuan Chen, Dawei Gao, Weirui Kuang, Yaliang Li, and Bolin Ding. 2022. pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [28]
    Irene Y. Chen, Fredrik D. Johansson, and David Sontag. 2018. Why is My Classifier Discriminatory?. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 3543–3554.
    [29]
    Xingyu Chen, Zhengxiong Li, Srirangaraj Setlur, and Wenyao Xu. 2022. Exploring racial and gender disparities in voice biometrics. Sci. Rep. 12, 1 (March 2022), 3723.
    [30]
    Marta R Costa-jussà, Christine Basta, Oriol Domingo, and André Niyongabo Rubungo. 2022. Occgen: Selection of real-world multilingual parallel data balanced in gender within occupations. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [31]
    Sasha Costanza-Chock. 2018. Design Justice: Towards an Intersectional Feminist Framework for Design Theory and Practice. (June 2018).
    [32]
    Michael A Covington. 2009. Idea density—A potentially informative characteristic of retrieved documents. In IEEE Southeastcon 2009. IEEE, 201–203.
    [33]
    Kate Crawford. 2017. The trouble with bias. (2017). https://www.youtube.com/watch?v=fMym_BKWQzk NeurIPS.
    [34]
    Kate Crawford and Trevor Paglen. 2021. Excavating AI: the politics of images in machine learning training sets. AI & SOCIETY (06 2021). https://doi.org/10.1007/s00146-021-01162-8
    [35]
    Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, and Maosong Sun. 2022. A unified evaluation of textual backdoor learning: Frameworks and benchmarks. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [36]
    Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (2018). https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G.
    [37]
    Terrance De Vries, Ishan Misra, Changhan Wang, and Laurens Van der Maaten. 2019. Does object recognition work for everyone?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 52–59.
    [38]
    Joseph DelPreto, Chao Liu, Yiyue Luo, Michael Foshey, Yunzhu Li, Antonio Torralba, Wojciech Matusik, and Daniela Rus. 2022. ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment. In Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks. https://action-sense.csail.mit.edu
    [39]
    Chengyuan Deng, Shihang Feng, Hanchen Wang, Xitong Zhang, Peng Jin, Yinan Feng, Qili Zeng, Yinpeng Chen, and Youzuo Lin. 2022. OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [40]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
    [41]
    Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, and Hilary Nicole. 2021. On the genealogy of machine learning datasets: A critical history of ImageNet. Big Data & Society 8, 2 (July 2021), 20539517211035955.
    [42]
    Mark Díaz, Razvan Amironesei, Laura Weidinger, and Iason Gabriel. 2022. Accounting for Offensive Speech as a Practice of Resistance. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics, Seattle, Washington (Hybrid), 192–202.
    [43]
    Mark Díaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. 2022. CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 2342–2351.
    [44]
    Madeleine Clare Elish. 2019. Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction. Engaging STS 5 (March 2019), 40–60.
    [45]
    Eubanks, Virginia. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Press, Inc., USA.
    [46]
    European Commission. 2021. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Cerntain Union Legislative Acts. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELLAR:e0649735-a372-11eb-9585-01aa75ed71a1.
    [47]
    Sina Fazelpour and Maria De-Arteaga. 2022. Diversity in sociotechnical machine learning systems. Big Data & Society 9, 1 (Jan. 2022), 20539517221082027.
    [48]
    Shangbin Feng, Zhaoxuan Tan, Herun Wan, Ningnan Wang, Zilong Chen, Binchi Zhang, Qinghua Zheng, Wenqian Zhang, Zhenyu Lei, Shujie Yang, 2022. TwiBot-22: Towards graph-based Twitter bot detection. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [49]
    Sue Fletcher-Watson, Jon Adams, Kabie Brook, Tony Charman, Laura Crane, James Cusack, Susan Leekam, Damian Milton, Jeremy R Parr, and Elizabeth Pellicano. 2019. Making the future together: Shaping autism research through meaningful participation. Autism 23, 4 (May 2019), 943–953.
    [50]
    [50] Sortition Foundation. 2023. Retrieved January 16, 2023 from https://www.sortitionfoundation.org/
    [51]
    Dan Friedman and Adji Bousso Dieng. 2022. The Vendi Score: A Diversity Evaluation Metric for Machine Learning. arXiv preprint arXiv:2210.02410 (2022).
    [52]
    Sidney Fussell. 2019. How an Attempt at Correcting Bias in Tech Goes Wrong. The Atlantic (2019). https://www.theatlantic.com/technology/archive/2019/10/google-allegedly-used-homeless-train-pixel-phone/599668/.
    [53]
    Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, and Jack Clark. 2022. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. (Aug. 2022). arxiv:2209.07858 [cs.CL]
    [54]
    GapMinder.org. 2020. Dollar Street. Retrieved January 24, 2023 from https://www.gapminder.org/dollar-street
    [55]
    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for Datasets. Commun. ACM 64, 12 (nov 2021), 86–92. https://doi.org/10.1145/3458723
    [56]
    Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu, Rachel Foley, Susannah Young, Iason Gabriel, William Isaac, John Mellor, Demis Hassabis, Koray Kavukcuoglu, Lisa Anne Hendricks, and Geoffrey Irving. 2022. Improving alignment of dialogue agents via targeted human judgements. (Sept. 2022). arxiv:2209.14375 [cs.LG]
    [57]
    NC Gokul, Manideep Ladi, Sumit Negi, Prem Selvaraj, Pratyush Kumar, and Mitesh M Khapra. 2022. Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [58]
    M.L. Gray and S. Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt. https://books.google.com/books?id=u10-uQEACAAJ
    [59]
    Loren Grush. 2015. Google engineer apologizes after Photos app tags two black people as gorillas. The Verge (2015). https://www.theverge.com/2015/7/1/8880363/google-apologizes-photos-app-tags-two-black-people-gorillas.
    [60]
    Suchin Gururangan, Dallas Card, Sarah Dreier, Emily Gade, Leroy Wang, Zeyu Wang, Luke Zettlemoyer, and Noah A. Smith. 2022. Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2562–2580. https://aclanthology.org/2022.emnlp-main.165
    [61]
    Lelia Marie Hampton. 2021. Black Feminist Musings on Algorithmic Oppression. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 1. https://doi.org/10.1145/3442188.3445929
    [62]
    Songqiao Han, Xiyang Hu, Hailiang Huang, Mingqi Jiang, and Yue Zhao. 2022. Adbench: Anomaly detection benchmark. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [63]
    Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 501–512.
    [64]
    Nabil Hassein. 2017. Against black inclusion in facial recognition. (2017). https://digitaltalkingdrum.com/2017/08/15/against-black-inclusion-in-facial-recognition/.
    [65]
    Caner Hazirbas, Yejin Bang, Tiezheng Yu, Parisa Assar, Bilal Porgali, Vítor Albiero, Stefan Hermanek, Jacqueline Pan, Emily McReynolds, Miranda Bogen, Pascale Fung, and Cristian Canton Ferrer. 2022. Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness. (Nov. 2022). arxiv:2211.05809 [cs.CV]
    [66]
    Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, and Cristian Canton Ferrer. 2021. Towards measuring fairness in AI: The casual conversations dataset. IEEE trans. biom. behav. identity sci. (2021), 1–1.
    [67]
    Iwao Hirose and Jonas Olson. 2015. The Oxford Handbook of Value Theory. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199959303.001.0001
    [68]
    Anna Lauren Hoffmann. 2021. Terms of inclusion: Data, discourse, violence. New Media & Society 23, 12 (Dec. 2021), 3539–3556.
    [69]
    Rodrigo Hormazabal, Changyoung Park, Soonyoung Lee, Sehui Han, Yeonsik Jo, Jaewan Lee, Ahra Jo, Seung Hwan Kim, Jaegul Choo, Moontae Lee, 2022. CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [70]
    Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 560–575.
    [71]
    Pratyusha Kalluri. 2020. Don’t ask if AI is good or fair, ask how it shifts power. Nature (2020).
    [72]
    Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 3819–3828. https://doi.org/10.1145/2702123.2702520
    [73]
    Os Keyes. 2019. Counting the Countless: Why data science is a profound threat for queer people. https://reallifemag.com/counting-the-countless/
    [74]
    Yo-whan Kim, Samarth Mishra, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Kate Saenko, Aude Oliva, and Rogerio Feris. 2022. How Transferable are Video Representations Based on Synthetic Data?Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [75]
    Danica Kirka. 2020. UK court says face recognition violates human rights. Retrieved January 24, 2023 from https://techxplore.com/news/2020-08-uk-court-recognition-violates-human.html
    [76]
    Youjin Kong. 2022. Are “Intersectionally Fair” AI Algorithms Really Fair to Women of Color? A Philosophical Analysis. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 485–494.
    [77]
    Alexander Korotin, Alexander Kolesov, and Evgeny Burnaev. 2022. Kantorovich Strikes Back! Wasserstein GANs are not Optimal Transport?Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [78]
    K. S. Krishnapriya, Vítor Albiero, Kushal Vangara, Michael C. King, and Kevin W. Bowyer. 2020. Issues Related to Face Recognition Accuracy Varying Based on Race and Skin Tone. IEEE Transactions on Technology and Society 1, 1 (2020), 8–20. https://doi.org/10.1109/TTS.2020.2974996
    [79]
    Yi-An Lai, Xuan Zhu, Yi Zhang, and Mona Diab. 2020. Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections. In Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 1739–1746. https://aclanthology.org/2020.lrec-1.215
    [80]
    Angela Lashbrook. 2018. AI-Driven Dermatology Could Leave Dark-Skinned Patients Behind. The Atlantic (2018). https://www.theatlantic.com/health/archive/2018/08/machine-learning-dermatology-skin-color/567619/.
    [81]
    Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, Shayne Longpre, Sebastian Nagel, Leon Weber, Manuel Romero Muñoz, Jian Zhu, Daniel Van Strien, Zaid Alyafeai, Khalid Almubarak, Vu Minh Chien, Itziar Gonzalez-Dios, Aitor Soroa, Kyle Lo, Manan Dey, Pedro Ortiz Suarez, Aaron Gokaslan, Shamik Bose, David Ifeoluwa Adelani, Long Phan, Hieu Tran, Ian Yu, Suhas Pai, Jenny Chim, Violette Lepercq, Suzana Ilic, Margaret Mitchell, Sasha Luccioni, and Yacine Jernite. 2022. The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=UoEw6KigkUn
    [82]
    Angeliki Lazaridou, Adhi Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d’Autume, Tomas Kocisky, Sebastian Ruder, 2021. Mind the gap: Assessing temporal generalization in neural language models. Advances in Neural Information Processing Systems 34 (2021), 29348–29363.
    [83]
    Fangyu Liu, Emanuele Bugliarello, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, and Desmond Elliott. 2021. Visually Grounded Reasoning across Languages and Cultures. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 10467–10485. https://doi.org/10.18653/v1/2021.emnlp-main.818
    [84]
    Christina Lu, Jackie Kay, and Kevin McKee. 2022. Subverting Machines, Fluctuating Identities: Re-Learning Human Categorization. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1005–1015. https://doi.org/10.1145/3531146.3533161
    [85]
    Kristian Lum and William Isaac. 2016. To predict and serve?Signif. (Oxf.) 13, 5 (Oct. 2016), 14–19.
    [86]
    Zelun Luo, Zane Durante, Linden Li, Wanze Xie, Ruochen Liu, Emily Jin, Zhuoyi Huang, Lun Yu Li, Jiajun Wu, Juan Carlos Niebles, 2022. MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [87]
    Kim Lyons. 2021. Facebook says its AI mislabeling a video of Black men as "primates" was "unacceptable". The Verge (2021). https://www.theverge.com/2021/9/4/22657026/facebook-mislabeling-video-black-men-primates-algorithm.
    [88]
    Donald Martin, Jr, Vinodkumar Prabhakaran, Jill Kuhlberg, Andrew Smart, and William S Isaac. 2020. Participatory Problem Formulation for Fairer Machine Learning Through Community Based System Dynamics. ICLR Machine Learning In Real Life (ML-IRL) Workshop, Addis Ababa, Ethiopia.
    [89]
    Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, and Dan Hendrycks. 2022. How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [90]
    MediaWell. 2019. “Please do not include us": Workshop on AI Ethics and Inclusion. Retrieved January 19, 2023 from https://mediawell.ssrc.org/event/please-do-not-include-us-workshop-on-ai-ethics-and-inclusion/
    [91]
    Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 54, 6 (July 2021), 1–35.
    [92]
    Milagros Miceli, Tianling Yang, Adriana Alvarado Garcia, Julian Posada, Sonja Mei Wang, Marc Pohl, and Alex Hanna. 2022. Documenting Data Production Processes: A Participatory Approach for Data Work. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 510 (nov 2022), 34 pages. https://doi.org/10.1145/3555623
    [93]
    Marvin Minsky. 2006. The Emotion Machine. Simon & Schuster, New York.
    [94]
    Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022. TweetNERD–End to End Entity Linking Benchmark for Tweets. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [95]
    Margaret Mitchell, Dylan Baker, Nyalleng Moorosi, Emily Denton, Ben Hutchinson, Alex Hanna, Timnit Gebru, and Jamie Morgenstern. 2020. Diversity and inclusion metrics in subset selection. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 117–123.
    [96]
    Margaret Mitchell, Alexandra Sasha Luccioni, Nathan Lambert, Marissa Gerchick, Angelina McMillan-Major, Ezinwanne Ozoani, Nazneen Rajani, Tristan Thrush, Yacine Jernite, and Douwe Kiela. 2022. Measuring Data. (Dec. 2022). arxiv:2212.05129 [cs.AI]
    [97]
    Mazda Moayeri, Sahil Singla, and Soheil Feizi. 2022. Hard imagenet: Segmentations for objects with strong spurious cues. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [98]
    National Institute of Standards and Technology (NIST). 2022. AI Risk Management Framework, 2nd Draft. https://www.nist.gov/itl/ai-risk-management-framework.
    [99]
    Aviv Ovadya and Jess Whittlestone. 2019. Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning. arXiv preprint arXiv:1907.11274 (2019).
    [100]
    Feng Pan, Wei Wang, A K H Tung, and Jiong Yang. 2005. Finding representative set from massive data. In Fifth IEEE International Conference on Data Mining (ICDM’05). 8 pp.–.
    [101]
    Partnership on AI. 2020. Publication Norms for Responsible AI: Ongoing Initiative.
    [102]
    Partnership on AI. 2021. Managing the Risks of AI Research: Six Recommendations for Responsible Publication.
    [103]
    Partnership on AI. 2021. Responsible Sourcing of Data Enrichment Services.
    [104]
    Frank Pasquale and Gianclaudio Malgieri. 2021. If You Don’t Trust A.I. Yet, You’re Not Wrong. The New York Times (2021). https://www.nytimes.com/2021/07/30/opinion/artificial-intelligence-european-union.html.
    [105]
    Amandalynne Paullada, Inioluwa Deborah Raji, Emily M Bender, Emily Denton, and Alex Hanna. 2020. Data and its (dis)contents: A survey of dataset development and use in machine learning research. (Dec. 2020). arxiv:2012.05345 [cs.LG]
    [106]
    Philip Pettit. 2010. Varieties of public representation. Cambridge University Press, 61–89. https://doi.org/10.1017/CBO9780511813146.005
    [107]
    Hanna Fenichel Pitkin. 1967. The Concept of Representation. University of California Press, Berkeley. https://doi.org/
    [108]
    Julian Posada. 2021. Family Units. https://logicmag.io/beacons/family-units/. Accessed: 2022-11-28.
    [109]
    Vinodkumar Prabhakaran and Donald Martin, Jr. 2020. Participatory Machine Learning Using Community-Based System Dynamics. Health Hum. Rights 22, 2 (Dec. 2020), 71–74.
    [110]
    Carina Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, and Allan Dafoe. 2021. Institutionalizing ethics in AI through broader impact requirements. Nature Machine Intelligence (2021).
    [111]
    Jasbir K Puar. 2020. “I would rather be a cyborg than a goddess”: Becoming-intersectional in assemblage theory. In Feminist Theory Reader. Routledge, 405–415.
    [112]
    Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, and Yang Yu. 2022. NeoRL: A near real-world benchmark for offline reinforcement learning. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [113]
    Yijian Qin, Ziwei Zhang, Xin Wang, Zeyang Zhang, and Wenwu Zhu. 2022. NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [114]
    Stephanie Carroll Rainie. 2019. Indigenous Data Sovereignty. In The State of Open Data: Histories and Horizons. Cape Town and Ottawa: African Minds and International Development Research Centre. https://www.stateofopendata.od4d.net/chapters/issues/indigenous-data.html
    [115]
    Deborah Raji, Emily Denton, Emily M. Bender, Alex Hanna, and Amandalynne Paullada. 2021. AI and the Everything in the Whole Wide World Benchmark. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.). Vol. 1. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/084b6fbb10729ed4da8c3d3f5a3ae7c9-Paper-round2.pdf
    [116]
    Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 429–435.
    [117]
    Vikram V Ramaswamy, Sing Yu Lin, Dora Zhao, Aaron B Adcock, Laurens van der Maaten, Deepti Ghadiyaram, and Olga Russakovsky. 2023. Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset. arXiv preprint arXiv:2301.02560 (2023).
    [118]
    Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Weidinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac, and Lisa Anne Hendricks. 2022. Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [119]
    The Oxford Reference. 2023. Overview: representation. Retrieved January 23, 2023 from https://www.oxfordreference.com/display/10.1093/oi/authority.20111014165925770;jsessionid=8033B4F2345255BA0941E2E13889BD54
    [120]
    Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4902–4912.
    [121]
    William A Gaviria Rojas, Sudnya Diamos, Keertan Ranjan Kini, David Kanter, Vijay Janapa Reddi, and Cody Coleman. 2022. The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 201–203.
    [122]
    Esther Rolf, Theodora T. Worledge, Benjamin Recht, and Michael I. Jordan. 2021. Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 9040–9051.
    [123]
    Nithya Sambasivan. 2021. Seeing like a dataset from the global south. Interactions 28, 4 (July 2021), 76–78.
    [124]
    Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. 2021. Re-Imagining Algorithmic Fairness in India and Beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 315–328. https://doi.org/10.1145/3442188.3445896
    [125]
    Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21, Article 39). Association for Computing Machinery, New York, NY, USA, 1–15.
    [126]
    Kate Sanders, Reno Kriz, Anqi Liu, and Benjamin Van Durme. 2022. Ambiguous Images With Human Judgments for Robust Visual Event Classification. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [127]
    Morgan Klaus Scheuerman, Alex Hanna, and Emily Denton. 2021. Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development. Proc. ACM Hum. Comput. Interact. 5, CSCW2 (2021), 317:1–317:37. https://doi.org/10.1145/3476058
    [128]
    Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Sabine Dippel, Rainer Kiko, Mariusz Oszust, Matti Pastell, Jenny Stracke, Anna Valros, Nina Volkmann, 2022. Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [129]
    Jessica Schrouff, Natalie Harris, Oluwasanmi O Koyejo, Ibrahim Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alexander Brown, Subhrajit Roy, Diana Mincu, Chrsitina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine A Heller, Silvia Chiappa, and Alexander D’Amour. 2022. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.).
    [130]
    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [131]
    Andrew D. Selbst, Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and Abstraction in Sociotechnical Systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 59–68. https://doi.org/10.1145/3287560.3287598
    [132]
    Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. NIPS 2017: Workshop on Machine Learning for the Developing World.
    [133]
    Renee Shelby, Shalaleh Rismani, Kathryn Henne, Ajung Moon, Negar Rostamzadeh, Paul Nicholas, N’mah Yilla, Jess Gallegos, Andrew Smart, Emilio Garcia, and Gurleen Virk. 2022. Sociotechnical Harms: Scoping a Taxonomy for Harm Reduction. (Oct. 2022). arxiv:2210.05791 [cs.HC]
    [134]
    Harini Suresh and John Guttag. 2021. A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. In Equity and Access in Algorithms, Mechanisms, and Optimization (–, NY, USA) (EAAMO ’21). Association for Computing Machinery, New York, NY, USA, Article 17, 9 pages. https://doi.org/10.1145/3465416.3483305
    [135]
    Alex Tamkin, Gaurab Banerjee, Mohamed Owda, Vincent Liu, Shashank Rammoorthy, and Noah Goodman. 2022. DABS 2.0: Improved Datasets and Algorithms for Universal Self-Supervision. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [136]
    Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, and Ann Yuan. 2020. The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 107–118. https://doi.org/10.18653/v1/2020.emnlp-demos.15
    [137]
    Nenad Tomasev, Kevin R McKee, Jackie Kay, and Shakir Mohamed. 2021. Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (Virtual Event, USA) (AIES ’21). Association for Computing Machinery, New York, NY, USA, 254–265.
    [138]
    Shari Trewin. 2018. AI Fairness for People With Disabilities: Point of View.
    [139]
    Renbo Tu, Nicholas Roberts, Mikhail Khodak, Junhong Shen, Frederic Sala, and Ameet Talwalkar. 2022. NAS-bench-360: Benchmarking neural architecture search on diverse tasks. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [140]
    Ihsan Ullah, Dustin Carrión-Ojeda, Sergio Escalera, Isabelle M Guyon, Mike Huisman, Felix Mohr, Jan N van Rijn, Haozhe Sun, Joaquin Vanschoren, and Phan Anh Vu. 2022. Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [141]
    Ana Valdivia, Júlia Corbera Serrajòrdia, and Aneta Swianiewicz. 2022. There is an elephant in the room: towards a critique on the use of fairness in biometrics. AI and Ethics (Dec. 2022).
    [142]
    Angelina Wang, Alexander Liu, Ryan Zhang, Anat Kleiman, Leslie Kim, Dora Zhao, Iroha Shirai, Arvind Narayanan, and Olga Russakovsky. 2022. REVISE: A tool for measuring and mitigating bias in visual datasets. International Journal of Computer Vision (2022), 1–21.
    [143]
    Angelina Wang, Vikram V Ramaswamy, and Olga Russakovsky. 2022. Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 336–349. https://doi.org/10.1145/3531146.3533101
    [144]
    Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, and Vicente Ordonez. 2019. Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5310–5319.
    [145]
    Emily Wenger, Roma Bhattacharjee, Arjun Nitin Bhagoji, Josephine Passananti, Emilio Andere, Haitao Zheng, and Ben Zhao. 2022. Finding Naturally Occurring Physical Backdoors in Image Datasets. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [146]
    Wikipedia. 2023. Comparison of MUTCD-influenced traffic signs. Retrieved February 6, 2023 from https://en.wikipedia.org/wiki/Comparison_of_MUTCD-influenced_traffic_signs
    [147]
    Wiktionary. 2023. Represent. Retrieved January 23, 2023 from https://en.wiktionary.org/wiki/represent
    [148]
    Adrienne Williams. 2022. The exploited labor behind artificial intelligence. (Oct. 2022).
    [149]
    Jonathan Wolff. 2010. Fairness, respect and the egalitarian ethos revisited. J. Ethics 14, 3-4 (Dec. 2010), 335–350.
    [150]
    Minghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Chang Ma, Runcheng Liu, and Jian Tang. 2022. Peer: a comprehensive and multi-task benchmark for protein sequence understanding. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [151]
    Huaxiu Yao, Caroline Choi, Bochuan Cao, Yoonho Lee, Pang Wei Koh, and Chelsea Finn. 2022. Wild-time: A benchmark of in-the-wild distribution shift over time. arXiv preprint arXiv:2211.14238 (2022).
    [152]
    Iris Marion Young. 2002. Inclusion and Democracy. Oxford University Press. https://doi.org/10.1093/0198297556.003.0001
    [153]
    Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. 2022. The surprising effectiveness of ppo in cooperative, multi-agent games. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).
    [154]
    Klim Zaporojets, Lucie-Aimée Kaffee, Thomas Demeester, Chris Develder, and I Augenstein. 2022. TempEL: Linking dynamically evolving and newly emerging entities. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
    [155]
    Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, and Xin Eric Wang. 2022. Vlmbench: A compositional benchmark for vision-and-language manipulation. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022).

    Cited By

    View all
    • (2024)The Illusion of Artificial InclusionProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642703(1-12)Online publication date: 11-May-2024
    • (2023)Ethical considerations for responsible data curationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668537(55320-55360)Online publication date: 10-Dec-2023
    • (2023)WormGPT: A Large Language Model Chatbot for Criminals2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453752(1-6)Online publication date: 6-Dec-2023
    • Show More Cited By

    Index Terms

    1. Representation in AI Evaluations

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency
        June 2023
        1929 pages
        ISBN:9798400701924
        DOI:10.1145/3593013
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 June 2023

        Check for updates

        Author Tags

        1. datasets
        2. machine learning evaluation
        3. responsible AI

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Data Availability

        Conference

        FAccT '23

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1,595
        • Downloads (Last 6 weeks)178
        Reflects downloads up to 11 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)The Illusion of Artificial InclusionProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642703(1-12)Online publication date: 11-May-2024
        • (2023)Ethical considerations for responsible data curationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668537(55320-55360)Online publication date: 10-Dec-2023
        • (2023)WormGPT: A Large Language Model Chatbot for Criminals2023 24th International Arab Conference on Information Technology (ACIT)10.1109/ACIT58888.2023.10453752(1-6)Online publication date: 6-Dec-2023
        • (2023)Artificial Intelligence in the Colonial Matrix of PowerPhilosophy & Technology10.1007/s13347-023-00687-836:4Online publication date: 15-Dec-2023

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media