Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3689904.3694710acmconferencesArticle/Chapter ViewAbstractPublication PageseaamoConference Proceedingsconference-collections
research-article
Open access

Auditing Gender Presentation Differences in Text-to-Image Models

Published: 29 October 2024 Publication History

Abstract

Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools. Despite significantly affecting a wide range of downstream applications, the distributions of these generated images are still not fully understood, especially when it comes to the potential stereotypical attributes of different genders. In this work, we propose a paradigm (Gender Presentation Differences) that utilizes fine-grained self-presentation attributes to study how gender is presented differently in text-to-image models. By probing gender indicators in the input text (e.g., “a woman” or “a man”), we quantify the frequency differences of presentation-centric attributes (e.g., “a shirt” and “a dress”) through human annotation and introduce a novel metric: GEP.1 Furthermore, we propose an automatic method to estimate such differences. The automatic GEP metric based on our approach yields a higher correlation with human annotations than that based on existing CLIP scores, consistently across three state-of-the-art text-to-image models. Finally, we demonstrate the generalization ability of our metrics in the context of gender stereotypes. We will publicly release our code/data.

Supplemental Material

PDF File
Appendix

References

[1]
American Psychological Association. 2015. Guidelines for psychological practice with transgender and gender nonconforming people. American psychologist 70, 9 (2015), 832–864.
[2]
Hritik Bansal, Da Yin, Masoud Monajatipoor, and Kai-Wei Chang. 2022. How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?arxiv:2210.15230 [cs.CL]
[3]
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2022. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. arxiv:2211.03759 [cs.CL]
[4]
Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2022. InstructPix2Pix: Learning to Follow Image Editing Instructions. arxiv:2211.09800 [cs.CV]
[5]
Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, and Dilip Krishnan. 2023. Muse: Text-To-Image Generation via Masked Generative Transformers. arxiv:2301.00704 [cs.CV]
[6]
Jaemin Cho, Abhay Zala, and Mohit Bansal. 2022. DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models. arxiv:2202.04053 [cs.CV]
[7]
Colin Conwell and Tomer Ullman. 2022. Testing Relational Understanding in Text-Guided Image Generation. arxiv:2208.00005 [cs.CV]
[8]
Ming Ding, Wendi Zheng, Wenyi Hong, and Jie Tang. 2022. CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers. arxiv:2204.14217 [cs.CV]
[9]
Kathleen C. Fraser, Isar Nejadgholi, and Svetlana Kiritchenko. 2023. A Friendly Face: Do Text-to-Image Systems Rely on Stereotypes when the Input is Under-Specified?. In The AAAI-23 Workshop on Creative AI Across Modalities. https://openreview.net/forum?id=UqvWNBQKf5
[10]
Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, and Yezhou Yang. 2022. Benchmarking Spatial Relationships in Text-to-Image Generation. In ArXiv.
[11]
Sophia Gu, Christopher Clark, and Aniruddha Kembhavi. 2022. I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data. ArXiv abs/2211.09778 (2022).
[12]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-Prompt Image Editing with Cross Attention Control. arxiv:2208.01626 [cs.CV]
[13]
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2021. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021).
[14]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. arxiv:2006.11239 [cs.LG]
[16]
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arxiv:2210.09276 [cs.CV]
[17]
Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
[18]
Saehoon Kim, Sanghun Cho, Chiheon Kim, Doyup Lee, and Woonhyuk Baek. 2021. minDALL-E on Conceptual Captions. https://github.com/kakaobrain/minDALL-E.
[19]
Evelina Leivada, Elliot Murphy, and Gary Marcus. 2022. DALL-E 2 Fails to Reliably Capture Common Syntactic Processes. ArXiv abs/2210.12889 (2022).
[20]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
[21]
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. arxiv:2201.09865 [cs.CV]
[22]
Douglas Martin, Jacqui Hutchison, Gillian Slessor, James Urquhart, Sheila J Cunningham, and Kenny Smith. 2014. The spontaneous formation of stereotypes via cumulative cultural evolution. Psychological Science 25, 9 (2014), 1777–1786.
[23]
Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2 (1975), 442–451.
[24]
Mary Norris. 2019. Female trouble: The debate over "woman" as an adjective. https://www.newyorker.com/culture/comma-queen/female-trouble-the-debate-over-woman-as-an-adjective
[25]
David Nukrai, Ron Mokady, and Amir Globerson. 2022. Text-Only Training for Image Captioning using Noise-Injected CLIP. ArXiv abs/2211.00575 (2022).
[26]
OpenAI. 2022. Reducing bias and improving safety in dall·e 2. https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/
[27]
Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, and Anna Rohrbach. 2021. Benchmark for compositional text-to-image synthesis. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
[28]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2012. Scikit-learn: Machine Learning in Python. arxiv:1201.0490 [cs.LG]
[29]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]
[30]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html
[31]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]
[32]
Royi Rassin, Shauli Ravfogel, and Yoav Goldberg. 2022. DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models. arXiv preprint arXiv:2210.10606 (2022).
[33]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]
[34]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.
[35]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arxiv:2205.11487 [cs.CV]
[36]
Morgan Klaus Scheuerman, Jacob M Paul, and Jed R Brubaker. 2019. How computers see gender: An evaluation of gender classification in commercial facial analysis services. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–33.
[37]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. arxiv:2210.08402 [cs.CV]
[38]
Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, and Caroline Pantofaru. 2021. A step toward more inclusive people annotations for fairness. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 916–925.
[39]
Robyn Speer. 2022. rspeer/wordfreq: v3.0. https://doi.org/10.5281/zenodo.7199437
[40]
Robyn Speer, Joshua Chin, and Catherine Havasi. 2016. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. arxiv:1612.03975 [cs.CL]
[41]
Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. 2022. The Biased Artist: Exploiting Cultural Biases via Homoglyphs in Text-Guided Image Generation Models. ArXiv abs/2209.08891 (2022).
[42]
Aaron Van Den Oord, Oriol Vinyals, 2017. Neural discrete representation learning. Advances in neural information processing systems 30 (2017).
[43]
Wikipedia. 2022. Gender expression. https://en.wikipedia.org/wiki/Gender_expression
[44]
Wenying Wu, Pavlos Protopapas, Zheng Yang, and Panagiotis Michalatos. 2020. Gender Classification and Bias Mitigation in Facial Images. In 12th ACM Conference on Web Science (Southampton, United Kingdom) (WebSci ’20). Association for Computing Machinery, New York, NY, USA, 106–114. https://doi.org/10.1145/3394231.3397900
[45]
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. 2022. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. arxiv:2206.10789 [cs.CV]
[46]
Lal Zimman. 2019. Trans self-identification and the language of neoliberal selfhood: Agency, power, and the limits of monologic discourse. International Journal of the Sociology of Language 2019, 256 (2019), 147–175.

Cited By

View all
  • (2024)Large-Scale Reinforcement Learning for Diffusion ModelsComputer Vision – ECCV 202410.1007/978-3-031-73036-8_1(1-17)Online publication date: 21-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EAAMO '24: Proceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization
October 2024
221 pages
ISBN:9798400712227
DOI:10.1145/3689904
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2024

Check for updates

Author Tags

  1. AI Ethics
  2. Gender Bias
  3. Text-to-Image models

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Google Research Collabs program

Conference

EAAMO '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)104
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Large-Scale Reinforcement Learning for Diffusion ModelsComputer Vision – ECCV 202410.1007/978-3-031-73036-8_1(1-17)Online publication date: 21-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media