Search | arXiv e-print repository

Perceptions of Moderators as a Large-Scale Measure of Online Community Governance

Authors: Galen Weld, Leon Leibmann, Amy X. Zhang, Tim Althoff

Abstract: Millions of online communities are governed by volunteer moderators, who shape their communities by setting and enforcing rules, recruiting additional moderators, and participating in the community themselves. These moderators must regularly make decisions about how to govern, yet it is challenging to determine what governance strategies are most successful, as measuring the `success' of governanc… ▽ More Millions of online communities are governed by volunteer moderators, who shape their communities by setting and enforcing rules, recruiting additional moderators, and participating in the community themselves. These moderators must regularly make decisions about how to govern, yet it is challenging to determine what governance strategies are most successful, as measuring the `success' of governance is complex and nuanced. Furthermore, the incredible diversity in community topic, size, and membership all but guarantee that there is no `one-size-fits-all' solution for community governance. In this work, we measure governance by assessing how community members publicly discuss their own moderators. We quantify perceptions of moderators through 1.89 million labeled posts and comments made on reddit over an 18 month period, and relate these perceptions to characteristics of community governance and to different actions that community moderators can take. We identify key differences between different types of communities, and highlight promising strategies for moderator teams. Amongst other findings, we show that positive perceptions of moderators are associated with other measures of community health, and that strict rule enforcement is perceived more favorably for certain topics, such as news communities, than others. We investigate what kinds of moderators have the most positive impact on the community when they join the mod team, and find that moderators who are active community members before and during their mod tenures result in the largest improvement of community members' perceptions of moderators. We make all our models, datasets, and code public. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 16 pages, 12 figures

arXiv:2111.05835 [pdf, other]

What Makes Online Communities 'Better'? Measuring Values, Consensus, and Conflict across Thousands of Subreddits

Authors: Galen Weld, Amy X. Zhang, Tim Althoff

Abstract: Making online social communities 'better' is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another. However, community values are challenging to measure as they are rarely explicitly stated. In this work, we measure community values through the first large-scale… ▽ More Making online social communities 'better' is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another. However, community values are challenging to measure as they are rarely explicitly stated. In this work, we measure community values through the first large-scale survey of community values, including 2,769 reddit users in 2,151 unique subreddits. Through a combination of survey responses and a quantitative analysis of public reddit data, we characterize how these values vary within and across communities. Amongst other findings, we show that community members disagree about how safe their communities are, that longstanding communities place 30.1% more importance on trustworthiness than newer communities, and that community moderators want their communities to be 56.7% less democratic than non-moderator community members. These findings have important implications, including suggesting that care must be taken to protect vulnerable community members, and that participatory governance strategies may be difficult to implement. Accurate and scalable modeling of community values enables research and governance which is tuned to each community's different values. To this end, we demonstrate that a small number of automatically quantifiable features capture a significant yet limited amount of the variation in values between communities with a ROC AUC of 0.667 on a binary classification task. However, substantial variation remains, and modeling community values remains an important topic for future work. We make our models and data public to inform community design and governance. △ Less

Submitted 9 May, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: 12 pages, 8 figures, 4 tables; to appear at ICWSM 2022

arXiv:2109.05152 [pdf, other]

Making Online Communities 'Better': A Taxonomy of Community Values on Reddit

Authors: Galen Weld, Amy X. Zhang, Tim Althoff

Abstract: Many researchers studying online communities seek to make them better. However, beyond a small set of widely-held values, such as combating misinformation and abuse, determining what 'better' means can be challenging, as community members may disagree, values may be in conflict, and different communities may have differing preferences as a whole. In this work, we present the first study that elici… ▽ More Many researchers studying online communities seek to make them better. However, beyond a small set of widely-held values, such as combating misinformation and abuse, determining what 'better' means can be challenging, as community members may disagree, values may be in conflict, and different communities may have differing preferences as a whole. In this work, we present the first study that elicits values directly from members across a diverse set of communities. We survey 212 members of 627 unique subreddits and ask them to describe their values for their communities in their own words. Through iterative categorization of 1,481 responses, we develop and validate a comprehensive taxonomy of community values, consisting of 29 subcategories within nine top-level categories, enabling principled, quantitative study of community values by researchers. Using our taxonomy, we reframe existing research problems, such as managing influxes of new members, as tensions between different values, and we identify understudied values, such as those regarding content quality and community size. We call for greater attention to vulnerable community members' values, and we make our codebook public for use in future research. △ Less

Submitted 20 September, 2023; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: to appear at ICWSM 2024

arXiv:2104.13490 [pdf, other]

Leveraging Community and Author Context to Explain the Performance and Bias of Text-Based Deception Detection Models

Authors: Galen Weld, Ellyn Ayton, Tim Althoff, Maria Glenski

Abstract: Deceptive news posts shared in online communities can be detected with NLP models, and much recent research has focused on the development of such models. In this work, we use characteristics of online communities and authors -- the context of how and where content is posted -- to explain the performance of a neural network deception detection model and identify sub-populations who are disproporti… ▽ More Deceptive news posts shared in online communities can be detected with NLP models, and much recent research has focused on the development of such models. In this work, we use characteristics of online communities and authors -- the context of how and where content is posted -- to explain the performance of a neural network deception detection model and identify sub-populations who are disproportionately affected by model accuracy or failure. We examine who is posting the content, and where the content is posted to. We find that while author characteristics are better predictors of deceptive content than community characteristics, both characteristics are strongly correlated with model performance. Traditional performance metrics such as F1 score may fail to capture poor model performance on isolated sub-populations such as specific authors, and as such, more nuanced evaluation of deception detection models is critical. △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2102.08537 [pdf, other]

Political Bias and Factualness in News Sharing across more than 100,000 Online Communities

Authors: Galen Weld, Maria Glenski, Tim Althoff

Abstract: As civil discourse increasingly takes place online, misinformation and the polarization of news shared in online communities have become ever more relevant concerns with real world harms across our society. Studying online news sharing at scale is challenging due to the massive volume of content which is shared by millions of users across thousands of communities. Therefore, existing research has… ▽ More As civil discourse increasingly takes place online, misinformation and the polarization of news shared in online communities have become ever more relevant concerns with real world harms across our society. Studying online news sharing at scale is challenging due to the massive volume of content which is shared by millions of users across thousands of communities. Therefore, existing research has largely focused on specific communities or specific interventions, such as bans. However, understanding the prevalence and spread of misinformation and polarization more broadly, across thousands of online communities, is critical for the development of governance strategies, interventions, and community design. Here, we conduct the largest study of news sharing on reddit to date, analyzing more than 550 million links spanning 4 years. We use non-partisan news source ratings from Media Bias/Fact Check to annotate links to news sources with their political bias and factualness. We find that, compared to left-leaning communities, right-leaning communities have 105% more variance in the political bias of their news sources, and more links to relatively-more biased sources, on average. We observe that reddit users' voting and re-sharing behaviors generally decrease the visibility of extremely biased and low factual content, which receives 20% fewer upvotes and 30% fewer exposures from crossposts than more neutral or more factual content. This suggests that reddit is more resilient to low factual content than Twitter. We show that extremely biased and low factual content is very concentrated, with 99% of such content being shared in only 0.5% of communities, giving credence to the recent strategy of community-wide bans and quarantines. △ Less

Submitted 9 May, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 12 pages, 7 figures. Published at ICWSM 2021

arXiv:2009.09961 [pdf, other]

Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference

Authors: Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff

Abstract: Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by… ▽ More Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by previous social media studies. Evaluating causal methods is challenging, as ground truth counterfactuals are almost never available. Presently, no empirical evaluation framework for causal methods using text exists, and as such, practitioners must select their methods without guidance. We contribute the first such framework, which consists of five tasks drawn from real world studies. Our framework enables the evaluation of any casual inference method using text. Across 648 experiments and two datasets, we evaluate every commonly used causal inference method and identify their strengths and weaknesses to inform social media researchers seeking to use such methods, and guide future improvements. We make all tasks, data, and models public to inform applications and encourage additional research. △ Less

Submitted 6 May, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

Comments: to appear at ICWSM 2022

Showing 1–6 of 6 results for author: Weld, G