Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3479986.3479992acmotherconferencesArticle/Chapter ViewAbstractPublication PagesopencollabConference Proceedingsconference-collections
research-article
Public Access

Quantifying the Gap: A Case Study of Wikidata Gender Disparities

Published: 15 October 2021 Publication History

Abstract

Much prior research has found gender bias in peer production systems like Wikipedia and OpenStreetMap. This bias affects both women’s participation in these platforms and content about women on these platforms. We investigated the gender content gap in Wikidata, where less than 22% of items that represent people are about women. We asked: what is the source of this bias? Specifically, does it originate from the actions of Wikidata editors or from external factors; that is, does it simply reflect existing real world gender bias? We conducted a quantitative case study that found: (i) the most popular categories of people included in Wikidata represent male-dominant professions, such as American football; (ii) within a selected set of professions where we could obtain gender distribution data, Wikidata is no more biased than the real world: men and women are included at similar percentages, and the quality of items representing men and women also is similar. We provide possible explanations for our findings and implications for addressing the Wikidata content gap.

References

[1]
Judd Antin, Raymond Yee, Coye Cheshire, and Oded Nov. 2011. Gender differences in Wikipedia editing. In Proceedings of the 7th international symposium on wikis and open collaboration. 11–14.
[2]
David Bamman and Noah A Smith. 2014. Unsupervised discovery of biographical structure from text. Transactions of the Association for Computational Linguistics 2 (2014), 363–376.
[3]
Shaowen Bardzell. 2010. Feminist HCI: taking stock and outlining an agenda for design. In Proceedings of the SIGCHI conference on human factors in computing systems. 1301–1310.
[4]
Julia B Bear and Benjamin Collier. 2016. Where are the women in Wikipedia? Understanding the different psychological experiences of men and women in Wikipedia. Sex Roles 74, 5-6 (2016), 254–265.
[5]
Benjamin Cabrera, Björn Ross, Marielle Dado, and Maritta Heisel. 2018. The Gender Gap in Wikipedia Talk Pages. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.
[6]
Shuo Chang, Vikas Kumar, Eric Gilbert, and Loren G Terveen. 2014. Specialization, homophily, and gender in a social curation site: Findings from Pinterest. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 674–686.
[7]
Benjamin Collier and Julia Bear. 2012. Conflict, criticism, or confidence: An empirical examination of the gender gap in Wikipedia contributions. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 383–392.
[8]
Danielle J Corple. 2016. Beyond the Gender Gap: Understanding Women’s Participation in Wikipedia. (2016).
[9]
Maitraye Das, Brent Hecht, and Darren Gergle. 2019. The gendered geography of contributions to OpenStreetMap: Complexities in self-focus bias. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
[10]
Casey Fiesler, Shannon Morrison, and Amy S Bruckman. 2016. An archive of their own: a case study of feminist HCI and values in design. In Proceedings of the 2016 CHI conference on human factors in computing systems. 2574–2585.
[11]
Z Gardner, Peter Mooney, S De Sabbata, and L Dowthwaite. 2020. Quantifying gendered participation in OpenStreetMap: responding to theories of female (under) representation in crowdsourced mapping. GeoJournal 85, 6 (2020), 1603–1620.
[12]
Rishab A Ghosh, Ruediger Glott, Bernhard Krieger, and Gregorio Robles. 2002. Free/libre and open source software: Survey and study.
[13]
Ruediger Glott, Philipp Schmidt, and Rishab Ghosh. 2010. Wikipedia survey–overview of results. United Nations University: Collaborative Creativity Group 8 (2010), 1158–1178.
[14]
Eduardo Graells-Garrido, Mounia Lalmas, and Filippo Menczer. 2015. First women, second sex: Gender bias in Wikipedia. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. 165–174.
[15]
Aaron Halfaker. 2017. Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect. In Proceedings of the 13th International Symposium on Open Collaboration. 1–9.
[16]
Aaron Halfaker and R Stuart Geiger. 2020. Ores: Lowering barriers with participatory machine learning in wikipedia. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2(2020), 1–37.
[17]
Aaron Halfaker, Jonathan Morgan, Amir Sarabadani, and Adam Wight. 2016. ORES: Facilitating re-mediation of Wikipedia’s socio-technical problems. Working Paper, Wikimedia Research(2016).
[18]
Eszter Hargittai and Aaron Shaw. 2015. Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia. Information, communication & society 18, 4 (2015), 424–442.
[19]
Brent Hecht and Darren Gergle. 2009. Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on communities and technologies. 11–20.
[20]
Benjamin Mako Hill and Aaron Shaw. 2013. The Wikipedia gender gap revisited: Characterizing survey response bias with propensity score estimation. PloS one 8, 6 (2013), e65782.
[21]
Laura Hollink, Astrid Van Aggelen, and Jacco Van Ossenbruggen. 2018. Using the web of data to study gender differences in online knowledge sources: the case of the European parliament. In Proceedings of the 10th ACM Conference on Web Science. 381–385.
[22]
Daniela Iosub, David Laniado, Carlos Castillo, Mayo Fuster Morell, and Andreas Kaltenbrunner. 2014. Emotions under discussion: Gender, status and communication in online collaboration. PloS one 9, 8 (2014), e104880.
[23]
Matthew Kay, Cynthia Matuszek, and Sean A Munson. 2015. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 3819–3828.
[24]
Maximilian Klein, Harsh Gupta, Vivek Rai, Piotr Konieczny, and Haiyi Zhu. 2016. Monitoring the gender gap with wikidata human gender indicators. In Proceedings of the 12th International Symposium on Open Collaboration. 1–9.
[25]
Rebecca Knowles, Josh Carroll, and Mark Dredze. 2016. Demographer: Extremely simple name demographics. In Proceedings of the First Workshop on NLP and Computational Social Science. 108–113.
[26]
Shyong (Tony) K Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R Musicant, Loren Terveen, and John Riedl. 2011. WP: clubhouse? An exploration of Wikipedia’s gender imbalance. In Proceedings of the 7th international symposium on Wikis and open collaboration. 1–10.
[27]
David Laniado, Andreas Kaltenbrunner, Carlos Castillo, and Mayo Fuster Morell. 2012. Emotions and dialogue in a peer-production community: the case of Wikipedia. In proceedings of the eighth annual international symposium on wikis and open collaboration. 1–10.
[28]
Andrew Lih. 2004. Wikipedia as participatory journalism: Reliable sources? metrics for evaluating collaborative media as a news resource. Nature 3, 1 (2004), 1–31.
[29]
Shlomit Aharoni Lir. 2019. Strangers in a seemingly open-to-all website: the gender bias in Wikipedia. Equality, Diversity and Inclusion: An International Journal (2019).
[30]
Amanda Menking and Ingrid Erickson. 2015. The heart work of Wikipedia: Gendered, emotional labor in the world’s largest online encyclopedia. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 207–210.
[31]
Amanda Menking, David W McDonald, and Mark Zachry. 2017. Who Wants to Read This? A Method for Measuring Topical Representativeness in User Generated Content Systems. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 2068–2081.
[32]
Amanda Menking and Jon Rosenberg. 2021. WP: NOT, WP: NPOV, and Other Stories Wikipedia Tells Us: A Feminist Critique of Wikipedia’s Epistemology. Science, Technology, & Human Values 46, 3 (2021), 455–479.
[33]
Joseph Reagle. 2013. “Free as in sexist?” Free culture and the gender gap. first monday (2013).
[34]
Monica Stephens. 2013. Gender and the GeoWeb: divisions in the production of user-generated cartographic information. GeoJournal 78, 6 (2013), 981–996.
[35]
Besiki Stvilia, Michael B Twidale, Linda C Smith, and Les Gasser. 2005. Assessing Information Quality of a Community-Based Encyclopedia.ICIQ 5, 2005 (2005), 442–454.
[36]
Bogdan Vasilescu, Andrea Capiluppi, and Alexander Serebrenik. 2012. Gender, representation and online participation: A quantitative study of stackoverflow. In 2012 International Conference on Social Informatics. IEEE, 332–338.
[37]
Claudia Wagner, David Garcia, Mohsen Jadidi, and Markus Strohmaier. 2015. It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopedia. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 9.
[38]
Claudia Wagner, Eduardo Graells-Garrido, David Garcia, and Filippo Menczer. 2016. Women through the glass ceiling: gender asymmetries in Wikipedia. EPJ Data Science 5(2016), 1–24.
[39]
Elijah Zolduoarrati and Sherlock A Licorish. 2021. On the Value of Encouraging Gender Tolerance and Inclusiveness in Software Engineering Communities. Information and Software Technology(2021), 106667.

Cited By

View all
  • (2024)Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AIProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658981(1433-1445)Online publication date: 3-Jun-2024
  • (2024)How Contentious Terms About People and Cultures are Used in Linked Open DataProceedings of the ACM Web Conference 202410.1145/3589334.3648140(4523-4533)Online publication date: 13-May-2024
  • (2024)Assessing knowledge organization systems from a gender perspective: Wikipedia taxonomy and Wikidata ontologiesJournal of Documentation10.1108/JD-11-2023-023080:7(124-147)Online publication date: 5-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
OpenSym '21: Proceedings of the 17th International Symposium on Open Collaboration
September 2021
136 pages
ISBN:9781450385008
DOI:10.1145/3479986
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikidata
  2. peer-production
  3. structured data

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

OpenSym 2021

Acceptance Rates

Overall Acceptance Rate 108 of 195 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)340
  • Downloads (Last 6 weeks)40
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AIProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658981(1433-1445)Online publication date: 3-Jun-2024
  • (2024)How Contentious Terms About People and Cultures are Used in Linked Open DataProceedings of the ACM Web Conference 202410.1145/3589334.3648140(4523-4533)Online publication date: 13-May-2024
  • (2024)Assessing knowledge organization systems from a gender perspective: Wikipedia taxonomy and Wikidata ontologiesJournal of Documentation10.1108/JD-11-2023-023080:7(124-147)Online publication date: 5-Apr-2024
  • (2024)How have you modelled my gender? Reconstructing the history of gender representation in WikidataInternet Histories10.1080/24701475.2024.2431798(1-17)Online publication date: 17-Dec-2024
  • (2023)Wikipedia gender gap: a scoping reviewEl Profesional de la información10.3145/epi.2023.nov.17Online publication date: 16-Dec-2023
  • (2023)Wikipedia gender gap: a scoping reviewEl Profesional de la información10.3145/10.3145/epi.2023.nov.17Online publication date: 16-Dec-2023
  • (2023)Quantifying the Gap: The Gender Gap in French Writers’ WikidataJournal of Cultural Analytics10.22148/001c.740688:2Online publication date: 11-May-2023
  • (2022)An Analysis of Content Gaps Versus User Needs in the Wikidata Knowledge GraphThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_21(354-374)Online publication date: 23-Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media