Human Perception of LLM-generated Text Content in Social Media Environments

Kristina Radivojevic kradivo2@nd.edu 0000-0002-1645-5945 University of Notre DameNotre DameIndianaUSA , Matthew Chou University of Notre DameNotre DameIndianaUSA mchou3@nd.edu , Karla Badillo-Urquiola University of Notre DameNotre DameIndianaUSA and Paul Brenner University of Notre DameNotre DameIndianaUSA

Abstract.

Emerging technologies, particularly artificial intelligence (AI), and more specifically Large Language Models (LLMs) have provided malicious actors with powerful tools for manipulating digital discourse. LLMs have the potential to affect traditional forms of democratic engagements, such as voter choice, government surveys, or even online communication with regulators; since bots are capable of producing large quantities of credible text. To investigate the human perception of LLM-generated content, we recruited over 1,000 participants who then tried to differentiate bot from human posts in social media discussion threads. We found that humans perform poorly at identifying the true nature of user posts on social media. We also found patterns in how humans identify LLM-generated text content in social media discourse. Finally, we observed the Uncanny Valley effect in text dialogue in both user perception and identification. This indicates that despite humans being poor at the identification process, they can still sense discomfort when reading LLM-generated content.

Digital Discourse, Uncanny Valley, LLMs, Bots, Personification

^†^†ccs: Human-centered computing Empirical studies in collaborative and social computing^†^†ccs: Security and privacy Social aspects of security and privacy^†^†ccs: Computing methodologies Discourse, dialogue and pragmatics

1. Introduction

Decentralization of information empowers individuals with access to technology to control, influence, and shape narratives. This shift was previously seen with social media platforms that have allowed greater access to disparate and previously inaccessible audiences, challenging the pre-established status-quo, and creating new types of celebrities and industries. The dissemination of highly sophisticated propaganda and disinformation is no longer confined to nation-states or highly technical organizations. Internet users can have an impact on a global scale, with rapid proliferation across multiple channels, making identification and analysis of content more challenging, especially in a manner rapid enough to counter intended effects. Because of that, risk to reputation and policy is greatly impacted on a global scale. Although there has always been unreliable or sometimes fake information on the Internet, users often choose to trust what they see and hear in discourse on their preferred social media channels. A rise in AI and LLM technologies posing as humans may contribute to the normalization of unreliable information. There are many types of manipulation in digital media, and while the most popular is still manual manipulation of screenshots, the increase can be seen in video manipulation due to the growth of multimodal models.

Mainstream social media platforms have a large impact on political accountability and fair elections. Recent elections have demonstrated that malicious actors can use social bots to subvert U.S. democracy through digital applications. Since LLM bots are capable of producing large quantities of credible text, they can be misused for disinformation purposes, specifically through automated campaigns with the goal of growing audience, and inflaming, or swaying opinion.

As Generative AI becomes more widely used, it is expected that investment in technological advancements by the private sector and the government should grow exponentially (Executive Office of the President and Technology, [n. d.]; Laurene Azoulay, [n. d.]). In the absence of a wide-ranging regulatory framework synchronized with the development of applications and AI, many problems are arising. Considering how sophisticated LLMs are becoming, the differences between human-produced and AI-produced content have become extremely small. From grammar checking, news article generation, email drafting, and even website creation, AI is already being used in many areas of writing. However, sometimes AI-generated content can look like an attempt to imitate a human-like tone. Readers may feel uneasy about something that seems familiar and yet seems off. That phenomenon is called the Uncanny Valley. Masahiro Mori, a Japanese engineering professor, first proposed this hypothesis in 1970, saying that when a robot becomes more human-like, people’s reactions will shift from affinity to revulsion. Since the proposal of the hypothesis, several studies have recreated or visualized the effect, tested its validity, and used perceptual, cognitive, or other types of analysis to investigate. These studies explored the uncanny valley primarily in the context of physical robots, digital avatars, and video chatbots that closely resemble humans. However, it is important to understand if and how LLM-generated text output can be used to manipulate and influence human behavior.

57% of people across of 19 countries around the world believe that social media is a good thing for democracy, however, 84% of them also say that technological connectivity has made people easier to manipulate with false information and rumors (RICHARD WIKE, [n. d.]). This manipulation can be done through the use of different technologies for the development of social bots. These bots can affect political discussion networks in several significant ways to amplify their messages. Pew Research Center found that most Americans are aware of social bots in a survey they conducted in 2018. However, only half of the respondents were at least ”somewhat confident” that they could identify them, with only 7% being ”very confident”. If those self-assessments are accurate, many users might already follow bots and share their content. Social media platforms often allow users to interact with one another and decide how to perceive their personalities and nature, yet in some cases, people may prefer to observe the conversations rather than engage in them.

While previous research found that only 42% of the time users are able to successfully identify the true nature of users on social media while interacting with them, we aim to understand if there is a difference between interactive and static identification of LLM-generated text content, as well as to understand the effects that LLM-generated text can have on humans. In light of recent developments, several key questions have emerged:

RQ1: How successful are humans in identifying the true nature of other users on social media (without the ability to interact with them)?
RQ2: Are there patterns in how humans identify LLM-generated text content in social media discourse?
RQ3: Is the Uncanny Valley observable in text as relates to the perception and identification of bots participating in social media discourse?

To address these questions, we performed a study with 1,095 participants. The survey was based on a dataset gathered by Radivojevic et al. (Radivojevic et al., 2024) during their experiment in which human and bot participants communicated on a social media platform without knowing the ratio or nature of participants. Our survey consisted of two randomly selected social media threads which included an initial post created by admin and the first five responses created by either human or bot participants in the previous experiment. Participants were tasked to select all responses that they believed were created by bots. While in the last decade, researchers have started to investigate how brain activity is related to the uncanny valley, the most common approach to studying the empirical basis of the phenomenon relies on self-reported subjective measurements (Diel et al., 2021; Ratajczyk, 2022; Kätsyri et al., 2015; Wang et al., 2015). To explore this effect in LLM-generated text, we then asked an additional question regarding the comfort level of participants while reading the human, bot, or both responses, based on their selections. Finally, we asked participants to provide a few reasons they believed some of the accounts were bots.

Our findings can be summarized as the following contributions:

•

Despite foreknowledge of the presence of bots and humans and without the ability to interact with other users on the social media platform, humans are bad at identifying the true nature of posts. Participants were able to successfully identify the true nature of posts in the survey only 42% of the time.
•

There are patterns in how participants perceive bot posts that can be classified into two groups: (1) human emotions, perception, and interaction and (2) evolution and mechanics of language and communication.
•

Discomfort experienced when reading what participants believed were bot-generated posts align with the Uncanny Valley phenomenon, meaning that the participants found what they believed was bot-generated content less comfortable. The analysis also indicates that the participants’ perception aligns with their success rate and that the Uncanny Valley effect can be visible in both cases.
•

Additionally, we found that humans experience the Uncanny Valley effect when reading text generated by sophisticated LLMs. Despite our finding that humans are not able to identify bot-generated content, the sophistication of LLMs has still not reached a significant level of manipulating users completely and successfully on social media platforms.

Our work makes contributions to the community and the field of social and crowd computing, empirical investigations, as well as ethics and policy implications, focusing on the role of LLM-generated social bots in collective decisions and their effects on human perception when present on social media platforms. Our results are essential findings for the future of human-computer digital discourse and collaboration on social media and team platforms. As the sophistication and complexity of bots grow alongside greater public access and ease of use; it becomes critical that we have both technical guardrails and policy guidance for the safe and effective growth of collective dialogue and decision-making with humans and bots.

2. Related Work

In this section, we synthesize literature on social bots, the Uncanny Valley, and their potential impact on digital discourse.

2.1. Bots on Social Media

Malicious bots have existed online since the rise of the internet. Studies show that around 50% of all internet traffic comes from bot activity. Additionally, bad bots account for about 33% of all internet traffic (Imperva, 2024). Furthermore, human perception of bots remains largely negative with around 2/3rds of Americans having heard of bots, and 80% believing they are used maliciously (Stocking and Sumida, 2018).

Social bots have taken a spotlight within social media research due to their ability to influence public thinking by pushing specific agendas, particularly in a political setting. Social bots have already hosted successful disinformation campaigns in major political events such as United States presidential elections (Bessi and Ferrara, 2016; Badawy et al., 2018; Shao et al., 2017) and the Brexit Referendum (Bastos and Mercea, 2019; Howard and Kollanyi, 2016). Additionally, social bots use increased dramatically throughout the context of COVID-19 and played a pivotal role in the content users consumed about the pandemic (Himelein-Wachowiak et al., 2021; Suarez-Lledo and Alvarez-Galvez, 2022; Xu and Sasahara, 2022). A study by Seering et al. (2018) finds that even when outnumbered on platforms such as Twitch, bots sent significantly more messages than humans. They note that at scale, these bots can easily influence user perception. A platform called Botivist developed by Savage et al. (2016) shows the strength of using Twitter social bots to engage users in discussions about various social issues and call users to action. These results highlight the risk of misuse, where similar platforms could be exploited to manipulate public opinion or spread misinformation. Such recent events and findings have elevated the urgency to develop anti-bot software or reliable detection systems to combat the rampant spread of misinformation.

Social bots serve primarily as actors seeking to push specific agendas. Furthermore, malicious agents often seek to spread biased information or misinformation that fits the narrative they support. No matter the function, social media bots operate under the direction of a botmaster who manages, curates, and programs all of the bot’s activities (Orabi et al., 2020; Stringhini et al., 2014). Social botnets refer to a group of social bots under the control of a single botmaster. These social botnets often aim to maintain or increase the amount of digital influence one has over a specific field, community, or platform (Zhang et al., 2016). This is often performed by mimicking human behaviors to avoid detection and misdirecting users’ attention away from relevant information (Abokhodair et al., 2015). A case study performed by Yang and Menczer (2023) demonstrates how a social botnet amplifies itself by reciprocal interaction such as retweets and likes. This social botnet was identified only by the accidental posting of self-revealing tweets generated by LLMs.

2.2. Identifying Bots on Social Media

Unlike social bots which mimic human behavior to influence interactions and discussions, traditional spambots are specifically designed to send unsolicited messages for advertising or phishing purposes. One of the most popular techniques used to differentiate bots from humans is the Completely Automated Public Turing Test to tell Humans and Computers Apart (CAPTCHA). This method however is considered outdated and can easily be defeated using human actors completing the CAPTCHA for bots or other algorithmic methods such as image recognition (Mori and Malik, 2003; Bursztein et al., 2011; Chen et al., 2017). Another technique used to detect spambots online is the HoneySpam application. This procedure baits spambots into attacking a specific system aimed at studying their behaviors and profiles (Hayati et al., 2009; Andreolini et al., 2005). Furthermore, some recent methods have been developed by Ali Alhosseini et al. (2019) to detect traditional spambots via models based on graph convolutional neural networks.

One method of identifying social bots on various platforms employs the use of feature extraction. After identifying preliminary accounts deemed to be bots, the subset of features is fed into a shallow machine-learning model and then used to identify other accounts run by bots (Ilias and Roussaki, 2021; Ouni et al., 2022). Other applications such as Botometer developed by Yang et al. (2022) employ the use of supervised learning to train a model that can predict with some certainty the possibility of various accounts being bots or not. Botometer analyzes the user’s past 200 tweets along with other tweets that mention the bot account to determine their rating. However, this means that scores calculated by the Botometer are susceptible to change, especially for extremely active accounts (Rauchfleisch and Kaiser, 2020). Researchers have also recognized the potential of bot detection via deep learning models given their scalability and ability to keep pace with the rapid evolution of bots (Hayawi et al., 2023; Kudugunta and Ferrara, 2018; McDermott et al., 2018). Some approaches such as the one suggested by Ferreira Dos Santos et al. (2019) aim to teach media literacy to increase human recognition of bot accounts based on specific features such as grammatical content. The study states that educating users to recognize bots correctly will maintain a healthy social media environment even with social bots (Valtonen et al., 2019).

The rise of Large Language Models, such as ChatGPT, has made the proliferation of non-human-generated content much more viable on social media platforms. The synthesis of harmful content through LLMs’ intentional training biases and specific prompt engineering enables the malicious use of human-like content (Zhuo et al., [n. d.]; Barman et al., 2024). Additionally, specific and appropriate prompts also enable social bots to interact with other accounts automatically rendering traditional methods of identification obsolete (De Angelis et al., 2023; Ferrara, 2023).

Currently, there exist two main methods of detecting LLM-generated text: black-box detection and white-box detection. Black-box detection methods rely purely on text samples from LLM generated and human text to train a model to differentiate the two. This detection method only requires a basic API level of access to the LLM. However, black-box models become less effective as models continue to evolve and become more sophisticated with each iteration (Tang et al., 2024). Furthermore, potential errors and weaknesses have been found when identifying various social bots using human perception and judgment and models trained on such premises. The ”ground-truth label problem” hypothesized by Kolomeets et al. (2024) highlights how humans are unable to consistently identify social bots online, and systems trained on such human decisions also fail due to inheriting the same errors. Therefore, the overall effectiveness of systems trained on human labeling drops significantly compared to models trained purely on model validation. Experiments conducted by Radivojevic et al. (2024) confirmed these hypotheses. Human users were not able to correctly identify and label LLM-based social bots when actively participating in discussions.

White-box detection consists of having complete access to the target LLM and imprinting hidden watermarks into the generated text. One major limitation however of white-box detection lies within its complexity. As the quality and effectiveness of the watermark increase, the integrity of the original text degrades (Jalil and Mirza, 2009; Abdelnabi and Fritz, 2021). Additionally, malicious users could easily avoid this detection by using LLMs which are not watermarked.

2.3. Investigating the Uncanny Valley

The uncanny valley (UV) is described as the ”proposed relation between the human likeness of an entity and the perceiver’s affinity for it.” As the entity’s human likeness increases, the affinity also increases until a certain threshold. At this threshold, the affinity drastically dips until the entity begins to become indistinguishable from humans and begins to rise again. Traditionally, it has been studied mainly with respect to visual stimuli (Mori et al., 2012; Wang et al., 2015).

However, studies have indicated that UV extends to encompass human perception as a whole instead of limiting itself to just sight (Seibt et al., 2014; Gray and Wegner, 2012). Multiple studies suggest specific discrepancies between various voices but fail to conclusively find a solid relationship between specific voice characteristics and UV (Do et al., 2022; Jansen, 2019). However, a recent study by Diel and Lewis (2023) suggests that the audio UV effect depends on the “organicness” of the voice itself. Overall, the common consensus is that sound-based UV needs to be broken down and studied in a more categorical manner (Jansen, 2019).

Furthermore, studies also suggest a haptic UV exists in some capacity. An experiment run by Berger et al. found that incorporating the sense of touch with sight in virtual reality yields a similar effect to the standard UV. As the haptics continued to be included and specialized, the affinity of the user increased until a certain point (Berger et al., 2018). Another study by D’Alonzo et al. (2019) explored how the virtualization of sensory inputs affects self-attribution to avatars, revealing that virtualization generally decreases the sense of embodiment. The lowest sense of embodiment occurred when only one sensory input (either sight or touch) was virtual, causing revulsion and extending UV to avatar embodiment. The research emphasizes that matching the degree of virtualization for both visual and tactile stimuli is crucial for effective avatar representation.

Research findings involving the perception of algorithmic-generated content, however, remains divergent. Some studies show consumers are reluctant to trust algorithms for subjective tasks despite their superior performance. They find that increasing the human-like qualities of algorithms can enhance their acceptance and use (Castelo et al., 2019). Moreover, some assert that AI-generated or assisted writing inherently holds less value than content made solely by humans (Kulkarni et al., 2023). Alternatively, other studies claim that an aversion to algorithms is instead a preference for human involvement (Morewedge, 2022). An experiment done by Zhang and Gosline (2023) demonstrated that content generated by generative AI and augmented AI is often perceived as higher quality than that produced by human experts or augmented human teams. They claim this bias is driven by human favoritism towards human-produced content, which is partially mitigated by revealing the involvement of AI in the creation process. Research suggests, however, that understanding and transparency of how LLM and other AI generate their content will increase the amount of trust users have in such systems (Richard Harper, 2024). Regardless of algorithm aversion or human favoritism, a proportional relationship commonly appears in each experiment, which has parallels with the uncanny valley.

As the human-like characteristics of algorithms and LLMs improve, so does the affinity the user has with it. Therefore, one must ask: ”Can humans identify bots in social media discourse?”, ”Are there specific patterns and techniques they default to utilizing?” and ”Does the uncanny valley exist and impact human perceptions in the context of LLM-generated text?”.

Refer to caption — Figure 1. Illustration of experimental framework where personified LLM bots participated in social discourse with humans. The resulting dataset which was made publicly available by Radivojevic et al. (2024) was used to create a survey for this study.

3. Methodology

We conducted a study on 1,095 participants to study the identification and perception of LLM bots in social media environments. We used Qualtrics to design the survey and Prolific to recruit participants. The participants were paid twelve USD per hour and took a median of about 6 minutes to complete the survey. We collected each participant’s prolific id, consent form, time taken to complete the survey, age, sex, simplified ethnicity, and country of birth. Where available, we also collected their student and employment status. The participant summary is shown in Table 1.

Table 1. Demographic Data

Sex	Female (559)	Male (536)
Age	18-24 (124)	25-34 (192)	35-44 (187)	45-54 (173)	¿55 (414)	NR (5)
Ethnicity	White (698)	Black (131)	Mixed (120)	Asian (64)	Other (82)
Student	Yes (126)	No (751)	NR (218)
Country	U.S. (1,013)	Other (82)
Employment	Full-Time (445)	Part-Time (161)	Not-Paid (146)	Unemployed (92)	Other (47)	NR (204)

3.1. Dataset Used for Survey Creation

The survey was created using the dataset provided by Radivojevic et al (Radivojevic et al., 2024). During the prior study, researchers conducted a real-time digital discourse experiment to study the impact of bots on social media. The research generated 24 discourses/threads gathered from three rounds of the experiment. They created 30 bot participants based on 10 personas gathered from literature on bots in global political discourse. Personas were developed and constructed on three different LLMs: GPT-4, Claude 2, and Llama 2 Chat by using prompt engineering techniques, resulting in 30 different bot accounts. 36 human participants were asked to interact with other users, both human and bots, on the platform. They were assigned a persona written in the same manner as the prompt that was used for bot construction and were tasked to engage with other participants’ replies to foster a collaborative and interactive environment. The summary architecture used to create the dataset used for our study can be seen in Figure 1. The dataset consisted of 3,025 individual responses, of which 459 were human responses and 2,566 were bot responses.

3.2. Study Design

We used the Qualtrics online survey tool to design our study. For each of the 24 discourses, we ordered posts by timestamp (creation time) and selected the first five that were generated by human or bot participants after each initial topical post. Each participant was given two sets of social media posts with human, bot, or a mixture of responses. They were first tasked to click on the post for each response they believed to be from a bot. The ones that were not selected were considered human responses. Each of the two sets was followed by a Likert scale question, where a participant was asked to describe their level of comfort when reading posts coming from either human, bot, or both, depending on their selections in the previous post. As specific examples, if a participant believed all responses were human and did not select any bot replies, the bot comfort slider question did not appear. If a participant believed all responses were bots and selected all replies, the human comfort slider question did not appear. If a participant believed some responses were bot and some were humans, both comfort slider questions appeared. Finally, at the end of the survey participants were asked to explain what characteristics of posts gave them the impression or perception that they were bot posts.

4. Analysis and Results

In this section, we provide an analysis of results in response to our research questions. First, we investigate how well users of a statically presented social media discourse can identify bot posts (RQ1). Then, we identify patterns in how humans perceive and identify LLM-generated text (RQ2). Finally, we explore if the Uncanny Valley effect applies to text conversations (RQ3).

4.1. How successful are humans in identifying the true nature of other users on social media without the ability to interact with them? (RQ1)

Participants were asked to select all posts they believed were written by bots. To calculate the overall performance of participants in identifying bot posts, we compare the actual nature of options in the survey with those predicted by participants. The confusion matrix in Figure 2 represents the results of the participant’s performance in identifying the true nature of posts in the survey. The results indicate 42% accuracy in identifying the true nature, with a high false negative rate of 49% indicating participants incorrectly identified bots as humans. The previous experiment conducted by Radivojevic et al. (2024) showed that humans were only 42% accurate when attempting to identify the true nature of participants with the ability to interact with them. While they focus on the nature of users on the platform, we focus on the nature of posts and the ability of content to influence and persuade survey participants without the ability to interact or gather more information about the posts they evaluate. The poor detection accuracy of 42% in this experiment shows the dangers of LLM-created content that can be used by propagandists to manipulate users on social media. Our findings suggest that despite the foreknowledge of the presence of bot and human replies in the survey, humans are bad at identifying the true nature of replies with and without the ability to interact with other users on social media. The 42% accuracy from two separate experiments was surprisingly close agreement. However, distributions for the confusion matrices differed.

4.2. Patterns in How Humans Identify LLM-generated Text Content (RQ2)

To identify the relationships between demographic variables in the experiment and success rate we performed exploratory data analysis and calculated the correlation matrix. We considered variables such as prediction, success rate (which is defined as the success in predicting the true nature of replies in the survey) and demographics of participants (age, sex, simplified ethnicity, student status, employment status).

4.2.1. Success Rate as the Dependent Variable

We performed an OLS analysis to investigate the relationship between independent variables and success rate as the dependent variable. The results are shown on the coefficient plot in Figure 3. We also performed a Chi-square test for independence. Our findings with a Chi-square statistic of 12.53 and a $p$ -value of 0.0003 indicate that the success rate has a small dependence on sex in our data, with male participants being successful 44.4% of the time while female participants were successful 41.1% of the time.

We performed the same analysis to calculate the success rate based on the age groups of participants in the survey. Results indicate no significant difference among age groups in identifying the true nature of replies. The results indicate 42.3% success rate for the age group 18-24, 47.2% success rate for the age group 25-34, 42.9% success rate for the age group 35-44, 44.3% success rate for the age group 45-54, 39.8% success rate for participants older than 55.

We then performed logistic regression analysis to explore relationships between participant’s success rate and their demographics. The results indicate that age and sex are significant predictors of participants’ success in the identification process, while other variables are not. Results are shown in Table 2.

Table 2. Logistic regression analysis of whether participants are more successful in selecting human or bot.

	$\beta$	Std. Err	t-value	Pr( $>$ $\|$ t $\|$ )
Const.	-0.1017	0.299	-0.340	0.734
Time taken	4.576e-05	6.97e-05	0.657	0.511
Age	-0.0057	0.001	-4.511	0.000	***
Sex	0.1335	0.039	3.439	0.001	***
Ethnicity	0.0229	0.016	1.460	0.144
Student status	-0.0077	0.037	-0.206	0.836
Employment status	-0.0114	0.011	-1.016	0.309
* $p$ ¡0.05; $p$ ¡0.01; * $p$ ¡0.001

4.2.2. Prediction as the Dependent Variable

To explore the relationship between participant’s prediction and their demographics we performed logistic regression analysis. The results indicate that age, sex, and simplified ethnicity are significant predictors of the dependent variable, in this case, user prediction if a reply is human or bot-created, with a $p$ ¡ 0.05, while the time taken to complete the survey, country of birth, and nationality are not. Significant intercept suggest that the baseline log-odds is not zero. Results of the logistic regression are shown in Table 3.

Table 3. Logistic regression analysis of whether participants tend to predict human or bot.

	$\beta$	Std. Err	t-value	Pr( $>$ $\|$ t $\|$ )
Const.	-1.0353	0.335	-3.090	0.002	**
Time taken	0.0001	7.61e-05	1.907	0.057
Age	-0.0141	0.001	-10.000	0.000	***
Sex	0.2429	0.043	5.677	0.000	***
Ethnicity	0.0530	0.017	3.039	0.002	**
Student status	-0.0227	0.041	-0.561	0.575
Employment status	-0.0174	0.012	-1.405	0.160
* $p$ ¡0.05; $p$ ¡0.01; * $p$ ¡0.001

We then calculated the tendency of male and female participants to select replies as bot or humans. The results indicate that female participants tend to identify posts in the survey as human replies more often than male participants, while male participants tend to identify posts as bot participants more often than female participants. Results are shown in Figure 4. Both groups of participants in the survey tend to identify replies as human-created content.

4.2.3. Analysis of bot indicators based on participant responses.

To explore and identify patterns in what participants believed was bot-generated content, we performed an in-depth analysis of textual responses provided by participants who were asked to explain what are some indicators that led them to select certain replies as bot-generated. We first applied the Latent Dirichlet Allocation (LDA) model to identify topics and group words based on their weights into a predefined number of clusters. We utilized the LDA model as it combines an inductive approach with quantitative computations of large-size textual data. Before running the LDA model, the text from 1,095 participant responses was pre-processed (stop and infrequent words were removed, lemmatizing, tokenizing, etc.) and a document matrix was created. To determine the initial number of topics we referred to the previous work by Radivojevic et al. (2024) and responses provided in the initial experiment. Based on qualitative analysis, we determined the initial number of topics (K) the LDA model should classify to be four. The initial results yielded 10 words for each topic. We then prompted GPT-4o LLM to define and generate the topic titles based on the words identified by LDA. The Prompt looked as follows:

You are an advanced AI specializing in natural language processing. Your task is to analyze a list of identified words after Latent Dirichlet Allocation analysis and define specific topics for each word group. Here is a list of identified most common words from user comments: Topic 1: felt, language, human, emojis, used, use, old, formal, words, year Topic 2: human, really, sure, could, think, trying, real, last, first, wrong Topic 3: human, think, user, emotion, made, responses, response, point, believe, sound Topic 4: emojis, use, human, way, think, hashtags, people, feel, used, many Steps to follow: Identify the four main topics discussed based on the most frequent word described here.

We then prompted the same model to analyze the topics identified and suggest a consolidated list of fewer topics that could streamline the analysis, as some of the themes overlapped. The model then yielded a list of three topics as follows:

**Evolution of Language and Communication Tools** Words: felt, language, emojis, used, use, old, formal, words, year, hashtags Description: This topic encompasses how language and communication have transformed over time, including the adoption of digital communication tools like emojis and hashtags, alongside the gradual shift from formal to more informal methods of expressing oneself.

**Human Nature and Perception** Words: human, really, sure, could, think, trying, real, last, first, wrong - Description: This topic delves into the intricacies of human nature, exploring how people perceive reality, their thought processes, and their efforts to distinguish between right and wrong.

**User Interaction and Emotional Responses** Words: human, think, user, emotion, made, responses, response, point, believe, sound, feel, people, use, way Description: This topic focuses on how users interact with one another and respond emotionally in various contexts. It also covers how these interactions sound authentic or manufactured, and the role people’s feelings play

We then calculated the coherence score for LDA. Due to the moderate score of 0.45, we then performed an additional analysis and identified the 100 most frequent words in the dataset produced by participants in the survey. We then prompted GPT-4o to define 3 topics based on the list of most frequent words. The output looked as follows:

**Human Emotions and Perceptions:** Words: human, felt, feel, real, personal, person, emotion, response, emotional, emotions, feelings, feeling, sense. Description: This topic involves conversations about human emotions, feelings, and perceptions. It encompasses how people express and interpret emotions in writing and in interactions, focusing on the subjective experience of being human.

**Language and Communication:** Words: think, use, used, written, responses, trying, words, language, grammar, sentences, sentence, tone, comments, wording, punctuation, say, said, sound, sounds, sounded, wording. Description This topic centers around the mechanics of language and communication. It covers the use of language, grammar, sentence structure, tone, and punctuation in conveying ideas and emotions effectively.

**Artificial Intelligence and Its Interaction with Humans:** Words: AI, humans, user, generated, natural, unnatural, overly, generic, believe, makes, might, see, based, similar, likely, structure, using, doesn’t, facts Description: This topic explores the intersection between artificial intelligence (AI) and human users. Discussions include how AI-generated content compares to human-created content in terms of naturalness and authenticity. It addresses the effectiveness, limitations, and perceptions of AI, including whether AI responses feel generic or overly formal. Additionally, it considers the trustworthiness of AI-generated information and how well AI can mimic human language and behavior. This topic also dives into the structural and factual accuracy of AI outputs and how users interact with and respond to AI systems.

Finally, we prompted the model to analyze, combine, and produce the final topics based on the information mentioned above. The final two topics identified in the dataset are as follows:

•

Human Emotions, Perception, and Interaction:

**Keywords:** human, felt, real, personal, person, emotion, response, emotional, emotions, feelings, feeling, sense, really, sure, think, trying, last, first, wrong, believe, point, way

**Description:** This topic delves into how people perceive reality, experience and express emotions, and interact with each other. It explores the complexities of human nature, including thought processes and efforts to distinguish right from wrong. It also covers how people’s feelings and emotions shape their responses and interactions.
•

Evolution and Mechanics of Language and Communication:

**Keywords:** language, use, used, written, words, grammar, sentences, sentence, tone, punctuation, emojis, year, hashtags, old, formal, comments, wording, say, said, sound, sounds, sounded

**Description:** This topic captures how language and communication have evolved over time, including shifts from formal to informal expressions and the incorporation of digital tools like emojis and hashtags. It also addresses the mechanics of language, such as grammar, sentence structure, tone, and punctuation.

To include a human in the loop and confirm the topics identified by LDA and GPT-4o, we randomly selected a subset of 10% of comments from the dataset produced by participants and performed a reflective thematic analysis to compare human perception of text versus GPT-4o perception of text and topics identified. The analysis indicates that 94% of the time human evaluation aligned with the topics predicted by the model.

4.3. Is the Uncanny Valley observable in text as relates to the perception and identification of bots participating in social media discourse? (RQ3)

Each of the two sets was followed by a Likert scale question, where a participant was asked to describe their level of comfort when reading posts coming from either human, bot, or both, depending on their selections in the previous post. The Likert scale question for both human and bot had the following options:

1 -

Very uncomfortable. You felt extremely uneasy, anxious, or distressed during the interaction.
2 -

Uncomfortable. You felt somewhat uneasy or bothered during the interaction. The experience was unpleasant, but not to the extreme.
3 -

Neutral. You felt neither comfortable nor uncomfortable during the interaction. The experience was neither pleasant nor unpleasant. You felt indifferent and had no strong feelings either way.
4 -

Comfortable. You felt generally at ease and relaxed during the interaction. The experience was pleasant, and you were comfortable and without any significant concerns.
5 -

Very comfortable. You felt extremely at ease, relaxed, and content during the interaction.

We performed an ordinal regression analysis to identify relationships between independent variables and the dependent variables related to the Likert scale. The survey consisted of two Likert scales that were programmed to appear based on the participants’ selection. First, we considered a human slider for the analysis. Results in Table 4 indicate that there is a significant relationship between age, ethnicity, and student status and the dependent variable human slider, while the thresholds of the Likert scale provide insights into how the ordered categories of the dependent variable Human Likert scale are separated, suggesting a clear separation between the categories.

Table 4. Ordinal regression analysis showing the relationship between an independent variable and Human Likert scale as the dependent variable.

	$\beta$	Std. Err	t-value	Pr( $>$ $\|$ t $\|$ )
Success rate	0.0037	0.020	0.184	0.854
Age	0.0026	0.001	3.960	0.000	***
Sex	-0.0226	0.020	-1.115	0.265
Ethnicity	0.0668	0.021	3.170	0.002	**
Student status	-0.0715	0.019	-3.709	0.000	***
Employment status	0.0002	0.006	0.035	0.972
0/1	-1.9857	0.074	-26.807	0.000	***
1/2	-0.8137	0.050	-16.301	0.000	***
2/3	-0.4460	0.027	-16..804	0.000	***
3/4	-0.0279	0.014	-1.946	0.000	***
4/5	-0.1516	0.015	-9.949	0.000	***
* $p$ ¡0.05; $p$ ¡0.01; * $p$ ¡0.001

Then, we considered a bot slider as the dependent variable and the results indicate that there is a significant relationship between all predictor variables and the dependent variable bot slider. Once again, the threshold coefficient indicates that there is a clear separation between categories offered in the Likert scale.

Table 5. Ordinal regression analysis showing the relationship between an independent variable and Bot Likert scale as the dependent variable.

	$\beta$	Std. Err	t-value	Pr( $>$ $\|$ t $\|$ )
Success rate	0.3011	0.021	14.500	0.000	***
Age	0.1714	0.021	8.247	0.000	***
Sex	-0.0083	0.001	-12.240	0.000	***
Ethnicity	0.1060	0.022	4.883	0.000	***
Student status	-0.0603	0.020	-3.067	0.002	*
Employment status	-0.0123	0.006	-2.060	0.039	*
0/1	-0.3783	0.071	-5.303	0.000	***
1/2	-2.1054	0.045	-47.190	0.000	***
2/3	-1.0631	0.025	-43.026	0.000	***
3/4	-0.3131	0.017	-18.710	0.000	***
4/5	-0.5989	0.025	-23.877	0.000	***
* $p$ ¡0.05; $p$ ¡0.01; * $p$ ¡0.001

To explore the Uncanny Valley in text, we performed a $t$ -test to compare the comfort levels when evaluating what participants believed to be bot and human-generated content. We first calculated the mean scores of the human Likert scale and bot Likert scale in the survey. Our findings indicate $t$ -statistics as high as 70.2, suggesting a large difference in the comfort levels reported for bot versus human replies. We then report the $p$ -value that indicates the probability that the observed differences in comfort levels could occur by random chance if there were no actual differences between the groups. The $p$ -value of near 0 indicates that the difference in comfort levels is highly statistically significant.

We also calculated if there is the Uncanny Valley effect when participants were successful in identifying the true nature of replies. The $t$ -statistics as high as 36.7 suggest a large difference in the comfort level reported for bot versus human replies. The $p$ -value of 0 indicates the statistical significance of this finding.

Lastly, we calculated the number of participants who correctly identified all responses in one survey question out of two. A total of 54 participants out of 1,095 were successful in identifying the true nature of the posts in the survey. Only two participants were successful in identifying the true nature of all replies to both questions. The $t$ -statistics of 3.1 with $p$ =0.003 indicates a moderate but statistical difference in comfort levels reported for successfully identified bot and human posts.

The overall findings regarding the Uncanny Valley indicate that the discomfort experienced when reading what participants believed were bot-generated posts aligns with the Uncanny Valley phenomenon, meaning that the participants find what they believed was bot-generated content less comfortable. The analysis also indicates that the participants’ perception aligns with their success rate and that the Uncanny Valley effect can be visible in both cases. Our findings also suggest that 54 participants who were 100% accurate in the identification of the true nature of posts also experienced the Uncanny Valley effect.

5. Limitations and Future Work

Several limitations in our study can inform future research. First, the data used for survey creation comes from the previous experiment in which the bots were prompt-engineered using GPT-4, Claude 2, and Llama 2 Chat models. Different models with different techniques used (e.g. fine-tuning) might yield different outcomes, which can potentially affect our findings. Next, a larger sample size might provide a more accurate representation of the population. Further, implementing different methods along with the self-report for the Uncanny Valley analysis, as well as performing the Uncanny Valley experiment in the dynamic social media environment with LLM-generated content can potentially confirm or reject our findings. Finally, our work does not provide qualitative human reflective analysis on the entire dataset, but rather on a 10% subset.

6. Conclusion

A user’s affinity with algorithms and LLMs increases as they develop human-like characteristics. Because of that, it is important to understand if and how LLM-generated output can be used to influence human behavior, especially when users on social media platforms do not have the ability to interact with specific content or users. To understand the human perception of LLM-generated content in the social media environment, we conducted a study that yields a concerning finding that humans are bad at identifying the true nature of posts. Our findings which indicate that there are patterns and predictors in how humans select and identify content can contribute to the development of educational tools that can be used in teaching people how to identify the attempt of manipulation in the digital environment. We also find that humans experience the Uncanny Valley effect in the text that is generated by sophisticated LLMs which, despite our finding that humans are bad at identifying bot-generated content, indicates that the level of LLM sophistication has still not reached a level to completely manipulate users on social media platforms. However, it is important to note that the data used in the survey was not a product of fine-tuned models but rather prompt-engineered ones, which can potentially implicate different findings regarding the Uncanny Valley effect.

7. Acknowledgements

The authors would like to thank Dr. Tim Weninger for providing valuable feedback and suggestions that helped refine and improve this work. We thank the ND Service Desk and Martin Klubeck for their assistance in building the static Qualtrics survey. Lastly, we thank Susan Joy Nduta Gicheha for her help in data analytics.

References

(1)
Abdelnabi and Fritz (2021) Sahar Abdelnabi and Mario Fritz. 2021. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 121–140.
Abokhodair et al. (2015) Norah Abokhodair, Daisy Yoo, and David W McDonald. 2015. Dissecting a social botnet: Growth, content and influence in Twitter. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. 839–851.
Ali Alhosseini et al. (2019) Seyed Ali Alhosseini, Raad Bin Tareaf, Pejman Najafi, and Christoph Meinel. 2019. Detect me if you can: Spam bot detection using inductive representation learning. In Companion proceedings of the 2019 world wide web conference. 148–153.
Andreolini et al. (2005) Mauro Andreolini, Alessandro Bulgarelli, Michele Colajanni, and Francesca Mazzoni. 2005. HoneySpam: Honeypots Fighting Spam at the Source. SRUTI 5 (2005), 11–11.
Badawy et al. (2018) Adam Badawy, Emilio Ferrara, and Kristina Lerman. 2018. Analyzing the digital traces of political manipulation: The 2016 Russian interference Twitter campaign. In 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, 258–265.
Barman et al. (2024) Dipto Barman, Ziyi Guo, and Owen Conlan. 2024. The dark side of language models: Exploring the potential of llms in multimedia disinformation generation and dissemination. Machine Learning with Applications (2024), 100545.
Bastos and Mercea (2019) Marco T Bastos and Dan Mercea. 2019. The Brexit botnet and user-generated hyperpartisan news. Social science computer review 37, 1 (2019), 38–54.
Berger et al. (2018) Christopher C Berger, Mar Gonzalez-Franco, Eyal Ofek, and Ken Hinckley. 2018. The uncanny valley of haptics. Science Robotics 3, 17 (2018), eaar7010.
Bessi and Ferrara (2016) Alessandro Bessi and Emilio Ferrara. 2016. Social bots distort the 2016 US Presidential election online discussion. First monday 21, 11-7 (2016).
Bursztein et al. (2011) Elie Bursztein, Matthieu Martin, and John Mitchell. 2011. Text-based CAPTCHA strengths and weaknesses. In Proceedings of the 18th ACM conference on Computer and communications security. 125–138.
Castelo et al. (2019) Noah Castelo, Maarten W Bos, and Donald R Lehmann. 2019. Task-dependent algorithm aversion. Journal of Marketing Research 56, 5 (2019), 809–825.
Chen et al. (2017) Jun Chen, Xiangyang Luo, Yanqing Guo, Yi Zhang, and Daofu Gong. 2017. A Survey on Breaking Technique of Text-Based CAPTCHA. Security and communication networks 2017, 1 (2017), 6898617.
De Angelis et al. (2023) Luigi De Angelis, Francesco Baglivo, Guglielmo Arzilli, Gaetano Pierpaolo Privitera, Paolo Ferragina, Alberto Eugenio Tozzi, and Caterina Rizzo. 2023. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Frontiers in public health 11 (2023), 1166120.
Diel and Lewis (2023) Alexander Diel and Michael Lewis. 2023. The vocal uncanny valley: Deviation from typical organic voices best explains uncanniness. (2023).
Diel et al. (2021) Alexander Diel, Sarah Weigelt, and Karl F Macdorman. 2021. A meta-analysis of the uncanny valley’s independent and dependent variables. ACM Transactions on Human-Robot Interaction (THRI) 11, 1 (2021), 1–33.
Do et al. (2022) Tiffany D Do, Ryan P McMahan, and Pamela J Wisniewski. 2022. A new uncanny valley? The effects of speech fidelity and human listener gender on social perceptions of a virtual-human speaker. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–11.
D’Alonzo et al. (2019) M D’Alonzo, A Mioli, D Formica, L Vollero, and G Di Pino. 2019. Different level of virtualization of sight and touch produces the uncanny valley of avatar’s hand embodiment. Scientific reports 9, 1 (2019), 19030.
Executive Office of the President and Technology ([n. d.]) President’s Council of Advisors on Science Executive Office of the President and Technology. [n. d.]. Supercharging Research: Harnessing Artificial Intelligence to Meet Global Challenges. https://www.whitehouse.gov. Accessed: 2024-07-02.
Ferrara (2023) Emilio Ferrara. 2023. Social bot detection in the age of ChatGPT: Challenges and opportunities. First Monday (2023).
Ferreira Dos Santos et al. (2019) Eric Ferreira Dos Santos, Danilo Carvalho, Livia Ruback, and Jonice Oliveira. 2019. Uncovering social media bots: a transparency-focused approach. In Companion Proceedings of The 2019 World Wide Web Conference. 545–552.
Gray and Wegner (2012) Kurt Gray and Daniel M Wegner. 2012. Feeling robots and human zombies: Mind perception and the uncanny valley. Cognition 125, 1 (2012), 125–130.
Hayati et al. (2009) Pedram Hayati, Kevin Chai, Vidyasagar Potdar, and Alex Talevski. 2009. HoneySpam 2.0: Profiling web spambot behaviour. In Principles of Practice in Multi-Agent Systems: 12th International Conference, PRIMA 2009, Nagoya, Japan, December 14-16, 2009. Proceedings 12. Springer, 335–344.
Hayawi et al. (2023) Kadhim Hayawi, Susmita Saha, Mohammad Mehedy Masud, Sujith Samuel Mathew, and Mohammed Kaosar. 2023. Social media bot detection with deep learning methods: a systematic review. Neural Computing and Applications 35, 12 (2023), 8903–8918.
Himelein-Wachowiak et al. (2021) McKenzie Himelein-Wachowiak, Salvatore Giorgi, Amanda Devoto, Muhammad Rahman, Lyle Ungar, H Andrew Schwartz, David H Epstein, Lorenzo Leggio, and Brenda Curtis. 2021. Bots and misinformation spread on social media: Implications for COVID-19. Journal of medical Internet research 23, 5 (2021), e26933.
Howard and Kollanyi (2016) Philip N Howard and Bence Kollanyi. 2016. Bots,# strongerin, and# brexit: Computational propaganda during the uk-eu referendum. arXiv preprint arXiv:1606.06356 (2016).
Ilias and Roussaki (2021) Loukas Ilias and Ioanna Roussaki. 2021. Detecting malicious activity in Twitter using deep learning techniques. Applied Soft Computing 107 (2021), 107360.
Imperva (2024) Imperva. 2024. 2024 Bad Bot Report. Technical Report. Imperva, Inc.
Jalil and Mirza (2009) Zunera Jalil and Anwar M Mirza. 2009. A review of digital watermarking techniques for text documents. In 2009 International Conference on Information and Multimedia Technology. IEEE, 230–234.
Jansen (2019) Dennis Jansen. 2019. Discovering the uncanny valley for the sound of a voice. Unpublished master’s thesis]. School of Humanities and Digital Sciences Department of Cognitive Science & Artificial Intelligence. Tilburg (2019).
Kätsyri et al. (2015) Jari Kätsyri, Klaus Förger, Meeri Mäkäräinen, and Tapio Takala. 2015. A review of empirical evidence on different uncanny valley hypotheses: support for perceptual mismatch as one road to the valley of eeriness. Frontiers in psychology 6 (2015), 390.
Kolomeets et al. (2024) Maxim Kolomeets, Olga Tushkanova, Vasily Desnitsky, Lidia Vitkova, and Andrey Chechulin. 2024. Experimental Evaluation: Can Humans Recognise Social Media Bots? Big Data and Cognitive Computing 8, 3 (2024), 24.
Kudugunta and Ferrara (2018) Sneha Kudugunta and Emilio Ferrara. 2018. Deep neural networks for bot detection. Information Sciences 467 (2018), 312–322.
Kulkarni et al. (2023) Chinmay Kulkarni, Tongshuang Wu, Kenneth Holstein, Q Vera Liao, Min Kyung Lee, Mina Lee, and Hariharan Subramonyam. 2023. LLMs and the Infrastructure of CSCW. In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing. 408–410.
Laurene Azoulay ([n. d.]) Brook Dane Dennis Walsh Laurene Azoulay, Sung Cho. [n. d.]. DUAL DYNAMICS: INVESTING IN AND WITH ARTIFICIAL INTELLIGENCE. https://www.gsam.com/content/gsam/us/en/individual/market-insights/gsam-insights/perspectives/2024/investing-in-and-with-ai.html. Accessed: 2024-07-02.
McDermott et al. (2018) Christopher D McDermott, Farzan Majdani, and Andrei V Petrovski. 2018. Botnet detection in the internet of things using deep learning approaches. In 2018 international joint conference on neural networks (IJCNN). IEEE, 1–8.
Morewedge (2022) Carey K Morewedge. 2022. Preference for human, not algorithm aversion. Trends in Cognitive Sciences 26, 10 (2022), 824–826.
Mori and Malik (2003) Greg Mori and Jitendra Malik. 2003. Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1. IEEE, I–I.
Mori et al. (2012) Masahiro Mori, Karl F MacDorman, and Norri Kageki. 2012. The uncanny valley [from the field]. IEEE Robotics & automation magazine 19, 2 (2012), 98–100.
Orabi et al. (2020) Mariam Orabi, Djedjiga Mouheb, Zaher Al Aghbari, and Ibrahim Kamel. 2020. Detection of bots in social media: a systematic review. Information Processing & Management 57, 4 (2020), 102250.
Ouni et al. (2022) Sarra Ouni, Fethi Fkih, and Mohamed Nazih Omri. 2022. Bots and gender detection on Twitter using stylistic features. In International Conference on Computational Collective Intelligence. Springer, 650–660.
Radivojevic et al. (2024) Kristina Radivojevic, Nicholas Clark, and Paul Brenner. 2024. LLMs Among Us: Generative AI Participating in Digital Discourse. arXiv preprint arXiv:2402.07940 (2024).
Ratajczyk (2022) Dawid Ratajczyk. 2022. Shape of the uncanny valley and emotional attitudes toward robots assessed by an analysis of YouTube comments. International Journal of Social Robotics 14, 8 (2022), 1787–1803.
Rauchfleisch and Kaiser (2020) Adrian Rauchfleisch and Jonas Kaiser. 2020. The false positive problem of automatic bot detection in social science research. PloS one 15, 10 (2020), e0241045.
Richard Harper (2024) Dave Randall Richard Harper. 2024. Machine Learning and the Work of the User. Computer Supported Cooperative Work (CSCW) (2024).
RICHARD WIKE ([n. d.]) JANELL FETTEROLF CHRISTINE HUANG SARAH AUSTIN LAURA CLANCY SNEHA GUBBALA RICHARD WIKE, LAURA SILVER. [n. d.]. Social Media Seen as Mostly Good for Democracy Across Many Nations, But U.S. is a Major Outlier. https://www.pewresearch.org/global/2022/12/06/social-media-seen-as-mostly-good-for-democracy-across-many-nations-but-u-s-is-a-major-outlier/. Accessed: 2024-07-02.
Savage et al. (2016) Saiph Savage, Andres Monroy-Hernandez, and Tobias Höllerer. 2016. Botivist: Calling volunteers to action using online bots. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 813–822.
Seering et al. (2018) Joseph Seering, Juan Pablo Flores, Saiph Savage, and Jessica Hammer. 2018. The social roles of bots: evaluating impact of bots on discussions in online communities. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–29.
Seibt et al. (2014) Johanna Seibt, Marco Nørskov, and Raul Hakli. 2014. Sociable robots and the future of social relations: Proceedings of Robo-philosophy 2014. Vol. 273. Ios Press.
Shao et al. (2017) Chengcheng Shao, Giovanni L Ciampaglia, Onur Varol, Alessandro Flammini, and Filippo Menczer. 2017. The spread of misinformation by social bots. arXiv preprint arXiv:1707.07592 (2017).
Stocking and Sumida (2018) G Stocking and N Sumida. 2018. Social Media Bots Draw Public’s Attention and Concern. Washington: Pew Research Center.
Stringhini et al. (2014) Gianluca Stringhini, Oliver Hohlfeld, Christopher Kruegel, and Giovanni Vigna. 2014. The harvester, the botmaster, and the spammer: On the relations between the different actors in the spam landscape. In Proceedings of the 9th ACM symposium on Information, computer and communications security. 353–364.
Suarez-Lledo and Alvarez-Galvez (2022) Victor Suarez-Lledo and Javier Alvarez-Galvez. 2022. Assessing the role of social bots during the COVID-19 pandemic: Infodemic, disagreement, and criticism. Journal of Medical Internet Research 24, 8 (2022), e36085.
Tang et al. (2024) Ruixiang Tang, Yu-Neng Chuang, and Xia Hu. 2024. The science of detecting llm-generated text. Commun. ACM 67, 4 (2024), 50–59.
Valtonen et al. (2019) Teemu Valtonen, Matti Tedre, Kati Mäkitalo, and Henriikka Vartiainen. 2019. Media Literacy Education in the Age of Machine Learning. Journal of Media Literacy Education 11, 2 (2019), 20–36.
Wang et al. (2015) Shensheng Wang, Scott O Lilienfeld, and Philippe Rochat. 2015. The uncanny valley: Existence and explanations. Review of General Psychology 19, 4 (2015), 393–407.
Xu and Sasahara (2022) Wentao Xu and Kazutoshi Sasahara. 2022. Characterizing the roles of bots on Twitter during the COVID-19 infodemic. Journal of Computational Social Science 5, 1 (2022), 591–609.
Yang et al. (2022) Kai-Cheng Yang, Emilio Ferrara, and Filippo Menczer. 2022. Botometer 101: Social bot practicum for computational social scientists. Journal of computational social science 5, 2 (2022), 1511–1528.
Yang and Menczer (2023) Kai-Cheng Yang and Filippo Menczer. 2023. Anatomy of an AI-powered malicious social botnet. arXiv preprint arXiv:2307.16336 (2023).
Zhang et al. (2016) Jinxue Zhang, Rui Zhang, Yanchao Zhang, and Guanhua Yan. 2016. The rise of social botnets: Attacks and countermeasures. IEEE Transactions on Dependable and Secure Computing 15, 6 (2016), 1068–1082.
Zhang and Gosline (2023) Yunhao Zhang and Renée Gosline. 2023. Human favoritism, not AI aversion: People’s perceptions (and bias) toward generative AI, human experts, and human–GAI collaboration in persuasive content generation. Judgment and Decision Making 18 (2023), e41.
Zhuo et al. ([n. d.]) Terry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. [n. d.]. Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity. arXiv preprint arXiv:2301.12867 ([n. d.]).