-
"There Has To Be a Lot That We're Missing": Moderating AI-Generated Content on Reddit
Authors:
Travis Lloyd,
Joseph Reagle,
Mor Naaman
Abstract:
Generative AI has begun to alter how we work, learn, communicate, and participate in online communities. How might our online communities be changed by generative AI? To start addressing this question, we focused on online community moderators' experiences with AI-generated content (AIGC). We performed fifteen in-depth, semi-structured interviews with community moderators on the social sharing sit…
▽ More
Generative AI has begun to alter how we work, learn, communicate, and participate in online communities. How might our online communities be changed by generative AI? To start addressing this question, we focused on online community moderators' experiences with AI-generated content (AIGC). We performed fifteen in-depth, semi-structured interviews with community moderators on the social sharing site Reddit to understand their attitudes towards AIGC and how their communities are responding. Our study finds communities are choosing to enact rules restricting use of AIGC for both ideological and practical reasons. We find that, despite the absence of foolproof tools for detecting AIGC, moderators were able to somewhat limit the disruption caused by this new phenomenon by working with their communities to clarify norms about AIGC use. However, moderators found enforcing AIGC restrictions challenging, and had to rely on time-intensive and inaccurate detection heuristics in their efforts. Our results highlight the importance of supporting community autonomy and self-determination in the face of this sudden technological change, and suggest potential design solutions that may help.
△ Less
Submitted 2 July, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
The Role of Inclusion, Control, and Ownership in Workplace AI-Mediated Communication
Authors:
Kowe Kadoma,
Marianne Aubin Le Quere,
Jenny Fu,
Christin Munsch,
Danae Metaxa,
Mor Naaman
Abstract:
Given large language models' (LLMs) increasing integration into workplace software, it is important to examine how biases in the models may impact workers. For example, stylistic biases in the language suggested by LLMs may cause feelings of alienation and result in increased labor for individuals or groups whose style does not match. We examine how such writer-style bias impacts inclusion, contro…
▽ More
Given large language models' (LLMs) increasing integration into workplace software, it is important to examine how biases in the models may impact workers. For example, stylistic biases in the language suggested by LLMs may cause feelings of alienation and result in increased labor for individuals or groups whose style does not match. We examine how such writer-style bias impacts inclusion, control, and ownership over the work when co-writing with LLMs. In an online experiment, participants wrote hypothetical job promotion requests using either hesitant or self-assured autocomplete suggestions from an LLM and reported their subsequent perceptions. We found that the style of the AI model did not impact perceived inclusion. However, individuals with higher perceived inclusion did perceive greater agency and ownership, an effect more strongly impacting participants of minoritized genders. Feelings of inclusion mitigated a loss of control and agency when accepting more AI suggestions.
△ Less
Submitted 26 February, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Trustworthiness Evaluations of Search Results: The Impact of Rank and Misinformation
Authors:
Sterling Williams-Ceci,
Michael Macy,
Mor Naaman
Abstract:
Users rely on search engines for information in critical contexts, such as public health emergencies. Understanding how users evaluate the trustworthiness of search results is therefore essential. Research has identified rank and the presence of misinformation as factors impacting perceptions and click behavior in search. Here, we elaborate on these findings by measuring the effects of rank and mi…
▽ More
Users rely on search engines for information in critical contexts, such as public health emergencies. Understanding how users evaluate the trustworthiness of search results is therefore essential. Research has identified rank and the presence of misinformation as factors impacting perceptions and click behavior in search. Here, we elaborate on these findings by measuring the effects of rank and misinformation, as well as warning banners, on the perceived trustworthiness of individual results in search. We conducted three online experiments (N=3196) using Covid-19-related queries to address this question. We show that although higher-ranked results are clicked more often, they are not more trusted. We also show that misinformation did not change trust in accurate results below it. However, a warning about unreliable sources backfired, decreasing trust in accurate information but not misinformation. This work addresses concerns about how people evaluate information in search, and illustrates the dangers of generic prevention approaches.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Co-Writing with Opinionated Language Models Affects Users' Views
Authors:
Maurice Jakesch,
Advait Bhat,
Daniel Buschek,
Lior Zalmanson,
Mor Naaman
Abstract:
If large language models like GPT-3 preferably produce a particular point of view, they may influence people's opinions on an unknown scale. This study investigates whether a language-model-powered writing assistant that generates some opinions more often than others impacts what users write - and what they think. In an online experiment, we asked participants (N=1,506) to write a post discussing…
▽ More
If large language models like GPT-3 preferably produce a particular point of view, they may influence people's opinions on an unknown scale. This study investigates whether a language-model-powered writing assistant that generates some opinions more often than others impacts what users write - and what they think. In an online experiment, we asked participants (N=1,506) to write a post discussing whether social media is good for society. Treatment group participants used a language-model-powered writing assistant configured to argue that social media is good or bad for society. Participants then completed a social media attitude survey, and independent judges (N=500) evaluated the opinions expressed in their writing. Using the opinionated language model affected the opinions expressed in participants' writing and shifted their opinions in the subsequent attitude survey. We discuss the wider implications of our results and argue that the opinions built into AI language technologies need to be monitored and engineered more carefully.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Stop the [Image] Steal: The Role and Dynamics of Visual Content in the 2020 U.S. Election Misinformation Campaign
Authors:
Hana Matatov,
Mor Naaman,
Ofra Amir
Abstract:
Images are powerful. Visual information can attract attention, improve persuasion, trigger stronger emotions, and is easy to share and spread. We examine the characteristics of the popular images shared on Twitter as part of "Stop the Steal", the widespread misinformation campaign during the 2020 U.S. election. We analyze the spread of the forty most popular images shared on Twitter as part of thi…
▽ More
Images are powerful. Visual information can attract attention, improve persuasion, trigger stronger emotions, and is easy to share and spread. We examine the characteristics of the popular images shared on Twitter as part of "Stop the Steal", the widespread misinformation campaign during the 2020 U.S. election. We analyze the spread of the forty most popular images shared on Twitter as part of this campaign. Using a coding process, we categorize and label the images according to their type, content, origin, and role, and perform a mixed-method analysis of these images' spread on Twitter. Our results show that popular images include both photographs and text rendered as image. Only very few of these popular images included alleged photographic evidence of fraud; and none of the popular photographs had been manipulated. Most images reached a significant portion of their total spread within several hours from their first appearance, and both popular- and less-popular accounts were involved in various stages of their spread.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Estimating Exposure to Information on Social Networks
Authors:
Buddhika Nettasinghe,
Kowe Kadoma,
Mor Naaman,
Vikram Krishnamurthy
Abstract:
This paper considers the problem of estimating exposure to information in a social network. Given a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter), our aim is to find the fraction of people on the network who have been exposed to it. The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social ne…
▽ More
This paper considers the problem of estimating exposure to information in a social network. Given a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter), our aim is to find the fraction of people on the network who have been exposed to it. The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social network and the set of people who shared the piece of information. Often, both features are not publicly available (i.e., access to the two features is limited only to the internal administrators of the platform) and difficult to be estimated from data. As a solution, we propose two methods to estimate the exposure to a piece of information in an unbiased manner: a vanilla method which is based on sampling the network uniformly and a method which non-uniformly samples the network motivated by the Friendship Paradox. We provide theoretical results which characterize the conditions (in terms of properties of the network and the piece of information) under which one method outperforms the other. Further, we outline extensions of the proposed methods to dynamic information cascades (where the exposure needs to be tracked in real-time). We demonstrate the practical feasibility of the proposed methods via experiments on multiple synthetic and real-world datasets.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Human heuristics for AI-generated language are flawed
Authors:
Maurice Jakesch,
Jeffrey Hancock,
Mor Naaman
Abstract:
Human communication is increasingly intermixed with language generated by AI. Across chat, email, and social media, AI systems suggest words, complete sentences, or produce entire conversations. AI-generated language is often not identified as such but presented as language written by humans, raising concerns about novel forms of deception and manipulation. Here, we study how humans discern whethe…
▽ More
Human communication is increasingly intermixed with language generated by AI. Across chat, email, and social media, AI systems suggest words, complete sentences, or produce entire conversations. AI-generated language is often not identified as such but presented as language written by humans, raising concerns about novel forms of deception and manipulation. Here, we study how humans discern whether verbal self-presentations, one of the most personal and consequential forms of language, were generated by AI. In six experiments, participants (N = 4,600) were unable to detect self-presentations generated by state-of-the-art AI language models in professional, hospitality, and dating contexts. A computational analysis of language features shows that human judgments of AI-generated language are hindered by intuitive but flawed heuristics such as associating first-person pronouns, use of contractions, or family topics with human-written language. We experimentally demonstrate that these heuristics make human judgment of AI-generated language predictable and manipulable, allowing AI systems to produce text perceived as "more human than human." We discuss solutions, such as AI accents, to reduce the deceptive potential of language generated by AI, limiting the subversion of human intuition.
△ Less
Submitted 14 March, 2023; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Characterizing Alternative Monetization Strategies on YouTube
Authors:
Yiqing Hua,
Manoel Horta Ribeiro,
Robert West,
Thomas Ristenpart,
Mor Naaman
Abstract:
One of the key emerging roles of the YouTube platform is providing creators the ability to generate revenue from their content and interactions. Alongside tools provided directly by the platform, such as revenue-sharing from advertising, creators co-opt the platform to use a variety of off-platform monetization opportunities. In this work, we focus on studying and characterizing these alternative…
▽ More
One of the key emerging roles of the YouTube platform is providing creators the ability to generate revenue from their content and interactions. Alongside tools provided directly by the platform, such as revenue-sharing from advertising, creators co-opt the platform to use a variety of off-platform monetization opportunities. In this work, we focus on studying and characterizing these alternative monetization strategies. Leveraging a large longitudinal YouTube dataset of popular creators, we develop a taxonomy of alternative monetization strategies and a simple methodology to detect their usage automatically. We then proceed to characterize the adoption of these strategies. First, we find that the use of external monetization is expansive and increasingly prevalent, used in 18% of all videos, with 61% of channels using one such strategy at least once. Second, we show that the adoption of these strategies varies substantially among channels of different kinds and popularity, and that channels that establish these alternative revenue streams often become more productive on the platform. Lastly, we investigate how potentially problematic channels -- those that produce Alt-lite, Alt-right, and Manosphere content -- leverage alternative monetization strategies, finding that they employ a more diverse set of such strategies significantly more often than a carefully chosen comparison set of channels. This finding complicates YouTube's role as a gatekeeper, since the practice of excluding policy-violating content from its native on-platform monetization may not be effective. Overall, this work provides an important step toward broadening the understanding of the monetary incentives behind content creation on YouTube.
△ Less
Submitted 6 October, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Characterizing Reddit Participation of Users Who Engage in the QAnon Conspiracy Theories
Authors:
Kristen Engel,
Yiqing Hua,
Taixiang Zeng,
Mor Naaman
Abstract:
Widespread conspiracy theories may significantly impact our society. This paper focuses on the QAnon conspiracy theory, a consequential conspiracy theory that started on and disseminated successfully through social media. Our work characterizes how Reddit users who have participated in QAnon-focused subreddits engage in activities on the platform, especially outside their own communities. Using a…
▽ More
Widespread conspiracy theories may significantly impact our society. This paper focuses on the QAnon conspiracy theory, a consequential conspiracy theory that started on and disseminated successfully through social media. Our work characterizes how Reddit users who have participated in QAnon-focused subreddits engage in activities on the platform, especially outside their own communities. Using a large-scale Reddit moderation action against QAnon-related activities in 2018 as the source, we identified 13,000 users active in the early QAnon communities. We collected the 2.1 million submissions and 10.8 million comments posted by these users across all of Reddit from October 2016 to January 2021. The majority of these users were only active after the emergence of the QAnon Conspiracy theory and decreased in activity after Reddit's 2018 QAnon ban. A qualitative analysis of a sample of 915 subreddits where the "QAnon-enthusiastic" users were especially active shows that they participated in a diverse range of subreddits, often of unrelated topics to QAnon. However, most of the users' submissions were concentrated in subreddits that have sympathetic attitudes towards the conspiracy theory, characterized by discussions that were pro-Trump, or emphasized unconstricted behavior (often anti-establishment and anti-interventionist). Further study of a sample of 1,571 of these submissions indicates that most consist of links from low-quality sources, bringing potential harm to the broader Reddit community. These results point to the likelihood that the activities of early QAnon users on Reddit were dedicated and committed to the conspiracy, providing implications on both platform moderation design and future research.
△ Less
Submitted 16 March, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Dataset and Case Studies for Visual Near-Duplicates Detection in the Context of Social Media
Authors:
Hana Matatov,
Mor Naaman,
Ofra Amir
Abstract:
The massive spread of visual content through the web and social media poses both challenges and opportunities. Tracking visually-similar content is an important task for studying and analyzing social phenomena related to the spread of such content. In this paper, we address this need by building a dataset of social media images and evaluating visual near-duplicates retrieval methods based on image…
▽ More
The massive spread of visual content through the web and social media poses both challenges and opportunities. Tracking visually-similar content is an important task for studying and analyzing social phenomena related to the spread of such content. In this paper, we address this need by building a dataset of social media images and evaluating visual near-duplicates retrieval methods based on image retrieval and several advanced visual feature extraction methods. We evaluate the methods using a large-scale dataset of images we crawl from social media and their manipulated versions we generated, presenting promising results in terms of recall. We demonstrate the potential of this method in two case studies: one that shows the value of creating systems supporting manual content review, and another that demonstrates the usefulness of automatic large-scale data analysis.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Increasing Adversarial Uncertainty to Scale Private Similarity Testing
Authors:
Yiqing Hua,
Armin Namavari,
Kaishuo Cheng,
Mor Naaman,
Thomas Ristenpart
Abstract:
Social media and other platforms rely on automated detection of abusive content to help combat disinformation, harassment, and abuse. One common approach is to check user content for similarity against a server-side database of problematic items. However, this method fundamentally endangers user privacy. Instead, we target client-side detection, notifying only the users when such matches occur to…
▽ More
Social media and other platforms rely on automated detection of abusive content to help combat disinformation, harassment, and abuse. One common approach is to check user content for similarity against a server-side database of problematic items. However, this method fundamentally endangers user privacy. Instead, we target client-side detection, notifying only the users when such matches occur to warn them against abusive content. Our solution is based on privacy-preserving similarity testing. Existing approaches rely on expensive cryptographic protocols that do not scale well to large databases and may sacrifice the correctness of the matching. To contend with this challenge, we propose and formalize the concept of similarity-based bucketization~(SBB). With SBB, a client reveals a small amount of information to a database-holding server so that it can generate a bucket of potentially similar items. The bucket is small enough for efficient application of privacy-preserving protocols for similarity. To analyze the privacy risk of the revealed information, we introduce a framework for measuring an adversary's confidence in inferring a predicate about the client input correctly. We develop a practical SBB protocol for image content, and evaluate its client privacy guarantee with real-world social media data. We then combine SBB with various similarity protocols, showing that the combination with SBB provides a speedup of at least 29x on large-scale databases compared to that without, while retaining correctness of over 95%.
△ Less
Submitted 4 October, 2021; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Trend Alert: How a Cross-Platform Organization Manipulated Twitter Trends in the Indian General Election
Authors:
Maurice Jakesch,
Kiran Garimella,
Dean Eckles,
Mor Naaman
Abstract:
Political organizations worldwide keep innovating their use of social media technologies. In the 2019 Indian general election, organizers used a network of WhatsApp groups to manipulate Twitter trends through coordinated mass postings. We joined 600 WhatsApp groups that support the Bharatiya Janata Party, the right-wing party that won the general election, to investigate these campaigns. We found…
▽ More
Political organizations worldwide keep innovating their use of social media technologies. In the 2019 Indian general election, organizers used a network of WhatsApp groups to manipulate Twitter trends through coordinated mass postings. We joined 600 WhatsApp groups that support the Bharatiya Janata Party, the right-wing party that won the general election, to investigate these campaigns. We found evidence of 75 hashtag manipulation campaigns in the form of mobilization messages with lists of pre-written tweets. Building on this evidence, we estimate the campaigns' size, describe their organization and determine whether they succeeded in creating controlled social media narratives. Our findings show that the campaigns produced hundreds of nationwide Twitter trends throughout the election. Centrally controlled but voluntary in participation, this hybrid configuration of technologies and organizational strategies shows how profoundly online tools transform campaign politics. Trend alerts complicate the debates over the legitimate use of digital tools for political participation and may have provided a blueprint for participatory media manipulation by a party with popular support.
△ Less
Submitted 2 September, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Artificial intelligence in communication impacts language and social relationships
Authors:
Jess Hohenstein,
Dominic DiFranzo,
Rene F. Kizilcec,
Zhila Aghajari,
Hannah Mieczkowski,
Karen Levy,
Mor Naaman,
Jeff Hancock,
Malte Jung
Abstract:
Artificial intelligence (AI) is now widely used to facilitate social interaction, but its impact on social relationships and communication is not well understood. We study the social consequences of one of the most pervasive AI applications: algorithmic response suggestions ("smart replies"). Two randomized experiments (n = 1036) provide evidence that a commercially-deployed AI changes how people…
▽ More
Artificial intelligence (AI) is now widely used to facilitate social interaction, but its impact on social relationships and communication is not well understood. We study the social consequences of one of the most pervasive AI applications: algorithmic response suggestions ("smart replies"). Two randomized experiments (n = 1036) provide evidence that a commercially-deployed AI changes how people interact with and perceive one another in pro-social and anti-social ways. We find that using algorithmic responses increases communication efficiency, use of positive emotional language, and positive evaluations by communication partners. However, consistent with common assumptions about the negative implications of AI, people are evaluated more negatively if they are suspected to be using algorithmic responses. Thus, even though AI can increase communication efficiency and improve interpersonal perceptions, it risks changing users' language production and continues to be viewed negatively.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter
Authors:
Anton Abilov,
Yiqing Hua,
Hana Matatov,
Ofra Amir,
Mor Naaman
Abstract:
The wide spread of unfounded election fraud claims surrounding the U.S. 2020 election had resulted in undermining of trust in the election, culminating in violence inside the U.S. capitol. Under these circumstances, it is critical to understand the discussions surrounding these claims on Twitter, a major platform where the claims were disseminated. To this end, we collected and released the VoterF…
▽ More
The wide spread of unfounded election fraud claims surrounding the U.S. 2020 election had resulted in undermining of trust in the election, culminating in violence inside the U.S. capitol. Under these circumstances, it is critical to understand the discussions surrounding these claims on Twitter, a major platform where the claims were disseminated. To this end, we collected and released the VoterFraud2020 dataset, a multi-modal dataset with 7.6M tweets and 25.6M retweets from 2.6M users related to voter fraud claims. To make this data immediately useful for a diverse set of research projects, we further enhance the data with cluster labels computed from the retweet graph, each user's suspension status, and the perceptual hashes of tweeted images. The dataset also includes aggregate data for all external links and YouTube videos that appear in the tweets. Preliminary analyses of the data show that Twitter's user suspension actions mostly affected a specific community of voter fraud claim promoters, and exposes the most common URLs, images and YouTube videos shared in the data.
△ Less
Submitted 26 April, 2021; v1 submitted 20 January, 2021;
originally announced January 2021.
-
Characterizing Twitter Users Who Engage in Adversarial Interactions against Political Candidates
Authors:
Yiqing Hua,
Mor Naaman,
Thomas Ristenpart
Abstract:
Social media provides a critical communication platform for political figures, but also makes them easy targets for harassment. In this paper, we characterize users who adversarially interact with political figures on Twitter using mixed-method techniques. The analysis is based on a dataset of 400~thousand users' 1.2~million replies to 756 candidates for the U.S. House of Representatives in the tw…
▽ More
Social media provides a critical communication platform for political figures, but also makes them easy targets for harassment. In this paper, we characterize users who adversarially interact with political figures on Twitter using mixed-method techniques. The analysis is based on a dataset of 400~thousand users' 1.2~million replies to 756 candidates for the U.S. House of Representatives in the two months leading up to the 2018 midterm elections. We show that among moderately active users, adversarial activity is associated with decreased centrality in the social graph and increased attention to candidates from the opposing party. When compared to users who are similarly active, highly adversarial users tend to engage in fewer supportive interactions with their own party's candidates and express negativity in their user profiles. Our results can inform the design of platform moderation mechanisms to support political figures countering online harassment.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
Towards Measuring Adversarial Twitter Interactions against Candidates in the US Midterm Elections
Authors:
Yiqing Hua,
Thomas Ristenpart,
Mor Naaman
Abstract:
Adversarial interactions against politicians on social media such as Twitter have significant impact on society. In particular they disrupt substantive political discussions online, and may discourage people from seeking public office. In this study, we measure the adversarial interactions against candidates for the US House of Representatives during the run-up to the 2018 US general election. We…
▽ More
Adversarial interactions against politicians on social media such as Twitter have significant impact on society. In particular they disrupt substantive political discussions online, and may discourage people from seeking public office. In this study, we measure the adversarial interactions against candidates for the US House of Representatives during the run-up to the 2018 US general election. We gather a new dataset consisting of 1.7 million tweets involving candidates, one of the largest corpora focusing on political discourse. We then develop a new technique for detecting tweets with toxic content that are directed at any specific candidate.Such technique allows us to more accurately quantify adversarial interactions towards political candidates. Further, we introduce an algorithm to induce candidate-specific adversarial terms to capture more nuanced adversarial interactions that previous techniques may not consider toxic. Finally, we use these techniques to outline the breadth of adversarial interactions seen in the election, including offensive name-calling, threats of violence, posting discrediting information, attacks on identity, and adversarial message repetition.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
When Do People Trust Their Social Groups?
Authors:
Xiao Ma,
Justin Cheng,
Shankar Iyer,
Mor Naaman
Abstract:
Trust facilitates cooperation and supports positive outcomes in social groups, including member satisfaction, information sharing, and task performance. Extensive prior research has examined individuals' general propensity to trust, as well as the factors that contribute to their trust in specific groups. Here, we build on past work to present a comprehensive framework for predicting trust in grou…
▽ More
Trust facilitates cooperation and supports positive outcomes in social groups, including member satisfaction, information sharing, and task performance. Extensive prior research has examined individuals' general propensity to trust, as well as the factors that contribute to their trust in specific groups. Here, we build on past work to present a comprehensive framework for predicting trust in groups. By surveying 6,383 Facebook Groups users about their trust attitudes and examining aggregated behavioral and demographic data for these individuals, we show that (1) an individual's propensity to trust is associated with how they trust their groups, (2) smaller, closed, older, more exclusive, or more homogeneous groups are trusted more, and (3) a group's overall friendship-network structure and an individual's position within that structure can also predict trust. Last, we demonstrate how group trust predicts outcomes at both individual and group level such as the formation of new friendship ties.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
Understanding Image Quality and Trust in Peer-to-Peer Marketplaces
Authors:
Xiao Ma,
Lina Mezghani,
Kimberly Wilber,
Hui Hong,
Robinson Piramuthu,
Mor Naaman,
Serge Belongie
Abstract:
As any savvy online shopper knows, second-hand peer-to-peer marketplaces are filled with images of mixed quality. How does image quality impact marketplace outcomes, and can quality be automatically predicted? In this work, we conducted a large-scale study on the quality of user-generated images in peer-to-peer marketplaces. By gathering a dataset of common second-hand products (~75,000 images) an…
▽ More
As any savvy online shopper knows, second-hand peer-to-peer marketplaces are filled with images of mixed quality. How does image quality impact marketplace outcomes, and can quality be automatically predicted? In this work, we conducted a large-scale study on the quality of user-generated images in peer-to-peer marketplaces. By gathering a dataset of common second-hand products (~75,000 images) and annotating a subset with human-labeled quality judgments, we were able to model and predict image quality with decent accuracy (~87%). We then conducted two studies focused on understanding the relationship between these image quality scores and two marketplace outcomes: sales and perceived trustworthiness. We show that image quality is associated with higher likelihood that an item will be sold, though other factors such as view count were better predictors of sales. Nonetheless, we show that high quality user-generated images selected by our models outperform stock imagery in eliciting perceptions of trust from users. Our findings can inform the design of future marketplaces and guide potential sellers to take better product images.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
Authors:
Max Grusky,
Mor Naaman,
Yoav Artzi
Abstract:
We present NEWSROOM, a summarization dataset of 1.3 million articles and summaries written by authors and editors in newsrooms of 38 major news publications. Extracted from search and social media metadata between 1998 and 2017, these high-quality summaries demonstrate high diversity of summarization styles. In particular, the summaries combine abstractive and extractive strategies, borrowing word…
▽ More
We present NEWSROOM, a summarization dataset of 1.3 million articles and summaries written by authors and editors in newsrooms of 38 major news publications. Extracted from search and social media metadata between 1998 and 2017, these high-quality summaries demonstrate high diversity of summarization styles. In particular, the summaries combine abstractive and extractive strategies, borrowing words and phrases from articles at varying rates. We analyze the extraction strategies used in NEWSROOM summaries against other datasets to quantify the diversity and difficulty of our new data, and train existing methods on the data to evaluate its utility and challenges.
△ Less
Submitted 17 May, 2020; v1 submitted 30 April, 2018;
originally announced April 2018.
-
Web-Based VR Experiments Powered by the Crowd
Authors:
Xiao Ma,
Megan Cackett,
Leslie Park,
Eric Chien,
Mor Naaman
Abstract:
We build on the increasing availability of Virtual Reality (VR) devices and Web technologies to conduct behavioral experiments in VR using crowdsourcing techniques. A new recruiting and validation method allows us to create a panel of eligible experiment participants recruited from Amazon Mechanical Turk. Using this panel, we ran three different crowdsourced VR experiments, each reproducing one of…
▽ More
We build on the increasing availability of Virtual Reality (VR) devices and Web technologies to conduct behavioral experiments in VR using crowdsourcing techniques. A new recruiting and validation method allows us to create a panel of eligible experiment participants recruited from Amazon Mechanical Turk. Using this panel, we ran three different crowdsourced VR experiments, each reproducing one of three VR illusions: place illusion, embodiment illusion, and plausibility illusion. Our experience and worker feedback on these experiments show that conducting Web-based VR experiments using crowdsourcing is already feasible, though some challenges---including scale---remain. Such crowdsourced VR experiments on the Web have the potential to finally support replicable VR experiments with diverse populations at a low cost.
△ Less
Submitted 26 February, 2018; v1 submitted 22 February, 2018;
originally announced February 2018.
-
Understanding Musical Diversity via Online Social Media
Authors:
Minsu Park,
Ingmar Weber,
Mor Naaman,
Sarah Vieweg
Abstract:
Musicologists and sociologists have long been interested in patterns of music consumption and their relation to socioeconomic status. In particular, the Omnivore Thesis examines the relationship between these variables and the diversity of music a person consumes. Using data from social media users of Last.fm and Twitter, we design and evaluate a measure that reasonably captures diversity of music…
▽ More
Musicologists and sociologists have long been interested in patterns of music consumption and their relation to socioeconomic status. In particular, the Omnivore Thesis examines the relationship between these variables and the diversity of music a person consumes. Using data from social media users of Last.fm and Twitter, we design and evaluate a measure that reasonably captures diversity of musical tastes. We use that measure to explore associations between musical diversity and variables that capture socioeconomic status, demographics, and personal traits such as openness and degree of interest in music (into-ness). Our musical diversity measure can provide a useful means for studies of musical preferences and consumption. Also, our study of the Omnivore Thesis provides insights that extend previous survey and interview-based studies.
△ Less
Submitted 17 May, 2017; v1 submitted 9 April, 2016;
originally announced April 2016.
-
A Data-driven Study of View Duration on YouTube
Authors:
Minsu Park,
Mor Naaman,
Jonah Berger
Abstract:
Video watching had emerged as one of the most frequent media activities on the Internet. Yet, little is known about how users watch online video. Using two distinct YouTube datasets, a set of random YouTube videos crawled from the Web and a set of videos watched by participants tracked by a Chrome extension, we examine whether and how indicators of collective preferences and reactions are associat…
▽ More
Video watching had emerged as one of the most frequent media activities on the Internet. Yet, little is known about how users watch online video. Using two distinct YouTube datasets, a set of random YouTube videos crawled from the Web and a set of videos watched by participants tracked by a Chrome extension, we examine whether and how indicators of collective preferences and reactions are associated with view duration of videos. We show that video view duration is positively associated with the video's view count, the number of likes per view, and the negative sentiment in the comments. These metrics and reactions have a significant predictive power over the duration the video is watched by individuals. Our findings provide a more precise understandings of user engagement with video content in social media beyond view count.
△ Less
Submitted 17 May, 2017; v1 submitted 28 March, 2016;
originally announced March 2016.
-
On the Accuracy of Hyper-local Geotagging of Social Media Content
Authors:
David Flatow,
Mor Naaman,
Ke Eddie Xie,
Yana Volkovich,
Yaron Kanza
Abstract:
Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this meth…
▽ More
Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.
△ Less
Submitted 1 February, 2015; v1 submitted 4 September, 2014;
originally announced September 2014.