2024
pdf
bib
abs
Innovative Approaches to Enhancing Safety and Ethical AI Interactions in Digital Environments
Zachary Yang
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Ensuring safe online environments is a formidable challenge, but nonetheless an important one as people are now chronically online. The increasing online presence of people paired with the prevalence of harmful content such as toxicity, hate speech, misinformation and disinformation across various social media platforms and within different video calls for stronger detection and prevention methods. My research interests primarily lie in applied natural language processing for social good. Previously, I focused on measuring partisan polarization on social media during the COVID-19 pandemic and its societal impacts. Currently, at Ubisoft La Forge, I am dedicated to enhancing player safety within in-game chat systems by developing methods to detect toxicity, evaluating the biases in these detection systems, and assessing the current ecological state of online interactions. Additionally, I am engaged in simulating social media environments using LLMs to ethically test detection methods, evaluate the effectiveness of current mitigation strategies, and potentially introduce new, successful strategies. My suggested topics for discussion: 1. Understanding and mitigating social harms through high fidelity simulated social media environments 2. Enhancing safety in online environments such as within in-game chats (text and speech) 3. Personification of LLM agents 4. Ethically simulating social media sandbox environments at scale with LLM agents 5. Re-balancing the playing field between good and bad actors: Strategies for countering societal-scale manipulation.
pdf
bib
abs
An Evaluation of Language Models for Hyperpartisan Ideology Detection in Persian Twitter
Sahar Omidi Shayegan
|
Isar Nejadgholi
|
Kellin Pelrine
|
Hao Yu
|
Sacha Levy
|
Zachary Yang
|
Jean-François Godbout
|
Reihaneh Rabbany
Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024
Large Language Models (LLMs) have shown significant promise in various tasks, including identifying the political beliefs of English-speaking social media users from their posts. However, assessing LLMs for this task in non-English languages remains unexplored. In this work, we ask to what extent LLMs can predict the political ideologies of users in Persian social media. To answer this question, we first acknowledge that political parties are not well-defined among Persian users, and therefore, we simplify the task to a much simpler task of hyperpartisan ideology detection. We create a new benchmark and show the potential and limitations of both open-source and commercial LLMs in classifying the hyper-partisan ideologies of users. We compare these models with smaller fine-tuned models, both on the Persian language (ParsBERT) and translated data (RoBERTa), showing that they considerably outperform generative LLMs in this task. We further demonstrate that the performance of the generative LLMs degrades when classifying users based on their tweets instead of their bios and even when tweets are added as additional information, whereas the smaller fine-tuned models are robust and achieve similar performance for all classes. This study is a first step toward political ideology detection in Persian Twitter, with implications for future research to understand the dynamics of ideologies in Persian social media.
2023
pdf
bib
abs
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
Zachary Yang
|
Nicolas Grenon-Godbout
|
Reihaneh Rabbany
Findings of the Association for Computational Linguistics: EMNLP 2023
Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gaming platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multiplayer games, including Rainbow Six Siege, For Honor, and DOTA 2. We conduct an ablation study to assess the importance of each model component and explore ToxBuster’s transferability across the datasets. Furthermore, we showcase ToxBuster’s efficacy in post-game moderation, successfully flagging 82.1% of chat-reported players at a precision level of 90.0%. Additionally, we show how an additional 6% of unreported toxic players can be proactively moderated.
pdf
bib
abs
Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach
Josiane Van Dorpe
|
Zachary Yang
|
Nicolas Grenon-Godbout
|
Grégoire Winterstein
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Identity biases arise commonly from annotated datasets, can be propagated in language models and can cause further harm to marginal groups. Existing bias benchmarking datasets are mainly focused on gender or racial biases and are made to pinpoint which class the model is biased towards. They also are not designed for the gaming industry, a concern for models built for toxicity detection in videogames’ chat. We propose a dataset and a method to highlight oversensitive terms using reactivity analysis and the model’s performance. We test our dataset against ToxBuster, a language model developed by Ubisoft fine-tuned for toxicity detection on multiplayer videogame’s written chat, and Perspective API. We find that these toxicity models often automatically tag terms related to a community’s identity as toxic, which prevents members of already marginalized groups to make their presence known or have a mature / normal conversation. Through this process, we have generated an interesting list of terms that trigger the models to varying degrees, along with insights on establishing a baseline through human annotations.