Abstract
Companies operating internet platforms are developing artificial intelligence tools for content moderation purposes. This paper discusses technologies developed to measure the ‘toxicity’ of text-based content. The research builds upon queer linguistic studies that have indicated the use of ‘mock impoliteness’ as a form of interaction employed by LGBTQ people to cope with hostility. Automated analyses that disregard such a pro-social function may, contrary to their intended design, actually reinforce harmful biases. This paper uses ‘Perspective’, an AI technology developed by Jigsaw (formerly Google Ideas), to measure the levels of toxicity of tweets from prominent drag queens in the United States. The research indicated that Perspective considered a significant number of drag queen Twitter accounts to have higher levels of toxicity than white nationalists. The qualitative analysis revealed that Perspective was not able to properly consider social context when measuring toxicity levels and failed to recognize cases in which words, that might conventionally be seen as offensive, conveyed different meanings in LGBTQ speech.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Data and Material
Due to Twitter’s Developer Policy, which provides rules and guidelines for developers who interact with Twitter’s applications and content, the authors decided not to publish the CSV dataset. The document sets forth several restrictions for that matter, limiting what could be disclosed in downloadable datasets. Additionally, it provides that any third party with access to the dataset would have to adhere to Twitter’s ToS, Privacy Policy, Developer Agreement, and Developer Policy—the authors would not be in a position to guarantee this if the dataset was publicly available.
Code Availability
The Python source code of the algorithms developed to be employed in the research are available at GitHub and may be accessed on the following link: https://github.com/internetlab-br/ai_content_moderation.
Notes
Algorithms may be defined as ‘encoded procedures for transforming input data into a desired output, based on specified calculations’ (Gillespie, 2014: 167). They are designed to store and analyze data, apply mathematical formulas to it and come up with new information as a result.
Available at: https://www.perspectiveapi.com/#/.
Available at: https://www.tweepy.org/.
Available at: https://pypi.org/project/emoji/.
Available at: https://requests.kennethreitz.org/en/master/.
Available at: https://developer.twitter.com/en/developer-terms/agreement-and-policy.html (accessed 17 September 2019). Link on Perma.cc: [https://perma.cc/RZM2-4LYW].
For more information, access Perspective’s profile at GitHub: https://github.com/conversationai/perspectiveapi/blob/master/api_reference.md.
The source-code is available at: https://github.com/internetlab-br/ai_content_moderation.
References
Adams, C. J. (2019). Tune: Control the comments you see. In: Medium. https://medium.com/jigsaw/tune-control-the-comments-you-see-b10cc807a171. Accessed 31 Oct 2020.
Alexander, J. (2020). YouTube bans Stefan Molyneux, David Duke, Richard Spencer, and more for hate speech. The Verge. https://www.theverge.com/2020/6/29/21307303/youtube-bans-molyneux-duke-richard-spencer-conduct-hate-speech. Accessed 31 Oct 2020.
BBC (2017). YouTube ‘made wrong call’ on Syrian videos. https://www.bbc.com/news/technology-41023234. Accessed 31 Oct 2020.
Bogage, J., & Scott, E. (2020) Twitter permanently bans former KKK leader David Duke. Washington Post. https://www.washingtonpost.com/technology/2020/07/31/twitter-david-duke-ban/. Accessed 31 Oct 2020.
Curtis, S. (2015). Facebook, Google, and Twitter block ‘hash list’ of child porn images. Telegraph. https://www.telegraph.co.uk/technology/internet-security/11794180/Facebook-Google-and-Twitter-to-block-hash-list-of-child-porn-images.html. Accessed 31 Oct 2020.
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th international AAAI conference on web and social media, https://arxiv.org/abs/1703.04009
Duarte, N., Llanso, E., & Loup, A. (2017). Mixed messages? The limits of automated social media content analysis. Center for Democracy and Technology. https://cdt.org/insight/mixed-messages-the-limits-of-automated-social-media-content-analysis. Accessed 31 Oct 2020.
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 conference on empirical methods in natural language processing, https://arxiv.org/abs/1708.00524.
García, L., Moctezuma, D., & Muñiz, V. (2019). A contextualized word representation approach for irony detection. In Proceedings of the Iberian languages evaluation forum 2019, https://ceur-ws.org/Vol-2421/IroSvA_paper_5.pdf.
Gillespie, T. (2014). The relevance of algorithms. In T. Gillespie, P. J. Boczowski, & K. A. Foot (Eds.), Media technologies: Essays on communication, materiality and society (pp. 167–193). Cambridge: The MIT Press.
Gröndahl, T., Pajola, L., Juuti, M., Conti, T., & Asokan, N. (2018). All you need is “love”: Evading hate speech detection. In Proceedings of the 11th ACM workshop on artificial intelligence and security, https://doi.org/10.1145/3270101.3270103.
Hazarika, D., Poria, S., Gorantla, S., Cambria, E., Zimmermann, R., & Mihalcea, R. (2018). CASCADE: Contextual sarcasm detection in online discussion forums. arXiv ahead of print 16 May 2018. https://arxiv.org/abs/1805.06413.
Heisterkamp, B. L., & Alberts, J. K. (2000). Control and desire: Identity formation through teasing among gay men and lesbians. Communication Studies, 51(4), 388–403.
Hosseini, H., Kannan, S., Zhang, B., & Poovendran, R. (2017). Deceiving Google’s perspective API built for detecting toxic comments. arXiv ahead of print 27 February 2017. https://arxiv.org/abs/1702.08138.
Jones, R. G. (2007). Drag queens, drama queens and friends: Drama and performance as a solidarity-building function in a gay male friendship circle. Kaleidoscope, 6(1), 61–84.
Johnson, E. P. (1995). SNAP! culture: A different kind of “reading.” Text and Performance Quarterly, 15(2), 122–142.
Keller, D. (2018). Internet platforms: Observations on speech, danger and money. Hoover Institution. https://www.hoover.org/sites/default/files/research/docs/keller_webreadypdf_final.pdf. Accessed 31 Oct 2020.
Levin, S. (2017). Civil rights groups urge Facebook to fix ‘racially biased’ moderation system. The Guardian. https://www.theguardian.com/technology/2017/jan/18/facebook-moderation-racial-bias-black-lives-matter. Accessed 31 Oct 2020.
Lessig, L. (2006). Code and other laws of cyberspace—version 2.0. New York: Basic Books.
Lux, D., & Mess, L. M. H. (2019). Facebook’s hate speech policies censor marginalized users. Wired. https://www.wired.com/story/facebooks-hate-speech-policies-censor-marginalized-users/. Accessed 31 Oct 2020.
McKinnon, S. (2017). “Building a thick skin for each other”: The use of ‘reading’ as an interactional practice of mock impoliteness in drag queen backstage talk. Journal of Language and Sexuality, 6(1), 90–127.
Mishra, P., Tredici, M., Yannakoudakis, H., & Shutova, E. (2019). Author profiling for hate speech detection. arXiv ahead of print 14 February 2019. https://arxiv.org/abs/1902.06734.
Murray, S. O. (1979). The art of gay insulting. Anthropological Linguistics, 21(5), 211–223.
Nadali, S., Murad, M., & Sharef, N. (2016). Sarcastic tweets detection based on sentiment hashtags analysis. Advanced Science Letters, 22(4), 400–407.
O’Brien, L. (2019). Twitter still has a White Nationalist Problem. HuffPost. https://www.huffpostbrasil.com/entry/twitter-white-nationalist-problem_n_5cec4d28e4b00e036573311d?ri18n=true. Accessed 31 Oct 2020.
Perel, M., & Elkin-Koren, N. (2016). Accountability in algorithmic copyright enforcement. Stanford Technology Law Review, 19(3), 473–533.
Perez, J. (2011). Word play, ritual insult, and volleyball in Peru. Journal of Homosexuality, 58(6), 834–847.
Rangwani, H., Kulshreshtha, D., & Singh, A. (2018). NLPRL-IITBHU at SemEval-2018 task 3: Combining linguistic features and emoji pre-trained CNN for irony detection in Tweets. In Proceedings of the 12th international workshop on semantic evaluation, https://www.aclweb.org/anthology/S18-1104.pdf.
Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. In Proceedings of the 57th annual meeting of the association for computational linguistics, https://www.aclweb.org/anthology/P19-1163.pdf.
Solon, O. (2017). Facebook asks users for nude photos in project to combat ‘revenge porn’. The Guardian. https://www.theguardian.com/technology/2017/nov/07/facebook-revenge-porn-nude-photos. Accessed 31 Oct 2020.
Suzor, N. (2018). Digital constitutionalism: Using the rule of law to evaluate the legitimacy of governance by platforms. Social Media + Society. https://doi.org/10.1177/2056305118787812
Taylor, J., Peignon, M., & Chen, Y. (2017). Surfacing contextual hate speech words within social media. arXiv ahead of print 28 November 2017. https://arxiv.org/abs/1711.10093.
Yang, F., Peng, X., Ghosh, G., Shilon, R., Ma, H., Moore, E., & Predovic, G. (2019). Exploring deep multimodal fusion of text and photo for hate speech classification. In: Proceedings of the third workshop on abusive language online. https://www.aclweb.org/anthology/W19-3502.pdf.
Acknowledgements
The authors are grateful to Timothy Rosenberger for his editing and review support; to Ester Borges, Clarice Tavares and Victor Pavarin Tavares for their research support.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dias Oliva, T., Antonialli, D.M. & Gomes, A. Fighting Hate Speech, Silencing Drag Queens? Artificial Intelligence in Content Moderation and Risks to LGBTQ Voices Online. Sexuality & Culture 25, 700–732 (2021). https://doi.org/10.1007/s12119-020-09790-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12119-020-09790-w