Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Impact of Gender Debiased Word Embeddings in Language Modeling

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

  • 520 Accesses

Abstract

Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing. A key path towards fairness is to understand, analyse and interpret our data and algorithms. Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases. In addition, current algorithms have also been proven to amplify biases from data.

To further address these concerns, in this paper, we study how an state-of-the-art recurrent neural language model behaves when trained on data, which under-represents females, using pre-trained standard and debiased word embeddings. Results show that language models inherit higher bias when trained on unbalanced data when using pre-trained embeddings, in comparison with using embeddings trained within the task. Moreover, results show that, on the same data, language models inherit lower bias when using debiased pre-trained emdeddings, compared to using standard pre-trained embeddings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/google-research-datasets/gap-coreference.

  2. 2.

    https://github.com/tolga-b/debiaswe.

  3. 3.

    https://github.com/salesforce/awd-lstm-lm.

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003). http://dl.acm.org/citation.cfm?id=944919.944966

  2. Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4349–4357. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf

  3. Chiappa, S., Gilliam, T.P.: Path-specific counterfactual fairness. arXiv:1802.08139 (2018)

  4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on EMNLP, pp. 1724–1734 (2014). http://aclweb.org/anthology/D/D14/D14-1179.pdf

  5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735, http://dx.doi.org/10.1162/neco.1997.9.8.1735

  6. Islam, A.C., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora necessarily contain human biases. Science 356, 183–186 (2017)

    Article  Google Scholar 

  7. Leavy, S.: Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning. In: Proceedings of the 1st International Workshop on Gender Equality in Software Engineering, pp. 14–16. ACM (2018)

    Google Scholar 

  8. Lu, K., Mardziel, P., Wu, F., Amancharla, P., Datta, A.: Gender bias in neural natural language processing. CoRR abs/1807.11714 (2018), http://arxiv.org/abs/1807.11714

  9. Ma, L., Zhang, Y.: Using word2vec to process big text data. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2895–2897. IEEE (2015)

    Google Scholar 

  10. Madaan, N., Singh, G., Mehta, S., Chetan, A., Joshi, B.: Generating clues for gender based occupation de-biasing in text. arXiv preprint arXiv:1804.03839 (2018)

  11. Makarenkov, V., Shapira, B., Rokach, L.: Language models with glove word embeddings. CoRR abs/1610.03759 (2016)

    Google Scholar 

  12. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)

  13. Merity, S., Keskar, N.S., Socher, R.: An analysis of neural language modeling at multiple scales. arXiv preprint arXiv:1803.08240 (2018)

  14. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)

  15. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Kobayashi, T., Hirose, K., Nakamura, S. (eds.) INTERSPEECH, pp. 1045–1048. ISCA (2010). http://dblp.uni-trier.de/db/conf/interspeech/interspeech2010.html#MikolovKBCK10

  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  17. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014). http://www.aclweb.org/anthology/D14-1162

  18. Rao, S., Tetreault, J.: Dear sir or madam, may I introduce the YAFC corpus: corpus, benchmarks and metrics for formality style transfer. arXiv preprint arXiv:1803.06535 (2018)

  19. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta (2010). http://is.muni.cz/publication/884893/en

  20. Rudinger, R., Naradowsky, J., Leonard, B., Van Durme, B.: Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 (2018)

  21. Stolcke, A.: SRILM - an extensible language modeling toolkit. In: INTERSPEECH. ISCA (2002)

    Google Scholar 

  22. Vera, M.F.: Exploring and mitigating gender bias in glove word embeddings (2018)

    Google Scholar 

  23. Webster, K., Recasens, M., Axelrod, V., Baldridge, J.: Mind the GAP: a balanced corpus of gendered ambiguous pronouns. CoRR abs/1810.05201 (2018)

    Google Scholar 

  24. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018)

Download references

Acknowledgments

This work is supported in part by the AGAUR through the FI PhD Scholarship; the Spanish Ministerio de Economía y Competitividad, the European Regional Development Fund and the Agencia Estatal de Investigación, through the postdoctoral senior grant Ramón y Cajal, the contract TEC2015-69266-P (MINECO/FEDER,EU) and the contract PCIN-2017-079 (AEI/MINECO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christine Basta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basta, C., Costa-jussà, M.R. (2023). Impact of Gender Debiased Word Embeddings in Language Modeling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics