Search | arXiv e-print repository

Building Knowledge-Guided Lexica to Model Cultural Variation

Authors: Shreya Havaldar, Salvatore Giorgi, Sunny Rai, Thomas Talhelm, Sharath Chandra Guntuku, Lyle Ungar

Abstract: Cultural variation exists between nations (e.g., the United States vs. China), but also within regions (e.g., California vs. Texas, Los Angeles vs. San Francisco). Measuring this regional cultural variation can illuminate how and why people think and behave differently. Historically, it has been difficult to computationally model cultural variation due to a lack of training data and scalability co… ▽ More Cultural variation exists between nations (e.g., the United States vs. China), but also within regions (e.g., California vs. Texas, Los Angeles vs. San Francisco). Measuring this regional cultural variation can illuminate how and why people think and behave differently. Historically, it has been difficult to computationally model cultural variation due to a lack of training data and scalability constraints. In this work, we introduce a new research problem for the NLP community: How do we measure variation in cultural constructs across regions using language? We then provide a scalable solution: building knowledge-guided lexica to model cultural variation, encouraging future work at the intersection of NLP and cultural understanding. We also highlight modern LLMs' failure to measure cultural variation or generate culturally varied language. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted at NAACL 2024

arXiv:2402.11477 [pdf, other]

Studying Differential Mental Health Expressions in India

Authors: Khushi Shelat, Sunny Rai, Devansh R Jain, Kishen Sivabalan, Young Min Cho, Maitreyi Redkar, Samindara Sawant, Sharath Chandra Guntuku

Abstract: Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online… ▽ More Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online depression language specific to the Indian context compared to users from the Rest of the World (ROW). Unlike in Western samples, we observe that mental health discussions in India additionally express sadness, use negation, are present-focused, and are related to work and achievement. Illness is uniquely correlated to India, indicating the association between depression and physical health in Indian patients. Two clinical psychologists validated the findings from social media posts and found 95% of the top 20 topics associated with mental health discussions as prevalent in Indians. Significant linguistic variations in online mental health-related language in India compared to ROW, emphasize the importance of developing precision-targeted interventions that are culturally appropriate. △ Less

Submitted 16 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.11333 [pdf, other]

Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice

Authors: Sunny Rai, Khushang Jilesh Zaveri, Shreya Havaldar, Soumna Nema, Lyle Ungar, Sharath Chandra Guntuku

Abstract: Social emotions such as shame and pride reflect social sanctions or approvals in society. In this paper, we examine how expressions of shame and pride vary across cultures and harness them to extract unspoken normative expectations across cultures. We introduce the first cross-cultural shame/pride emotions movie dialogue dataset, obtained from ~5.4K Bollywood and Hollywood movies, along with over… ▽ More Social emotions such as shame and pride reflect social sanctions or approvals in society. In this paper, we examine how expressions of shame and pride vary across cultures and harness them to extract unspoken normative expectations across cultures. We introduce the first cross-cultural shame/pride emotions movie dialogue dataset, obtained from ~5.4K Bollywood and Hollywood movies, along with over 10K implicit social norms. Our study reveals variations in expressions of social emotions and social norms that align with known cultural tendencies observed in the United States and India -- e.g., Hollywood movies express shame predominantly toward self whereas Bollywood movies express shame predominantly toward others. Similarly, Bollywood shames non-conformity in gender roles, and takes pride in collective identity, while Hollywood shames lack of accountability, and takes pride in ethical behavior. More importantly, women face more prejudice across cultures and are sanctioned for similar social norms. △ Less

Submitted 16 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2401.05254 [pdf, other]

Language-based Valence and Arousal Expressions between the United States and China: a Cross-Cultural Examination

Authors: Young-Min Cho, Dandan Pang, Stuti Thapa, Garrick Sherman, Lyle Ungar, Louis Tay, Sharath Chandra Guntuku

Abstract: Although affective expressions of individuals have been extensively studied using social media, research has primarily focused on the Western context. There are substantial differences among cultures that contribute to their affective expressions. This paper examines the differences between Twitter (X) in the United States and Sina Weibo posts in China on two primary dimensions of affect - valence… ▽ More Although affective expressions of individuals have been extensively studied using social media, research has primarily focused on the Western context. There are substantial differences among cultures that contribute to their affective expressions. This paper examines the differences between Twitter (X) in the United States and Sina Weibo posts in China on two primary dimensions of affect - valence and arousal. We study the difference in the functional relationship between arousal and valence (so-called V-shaped) among individuals in the US and China and explore the associated content differences. Furthermore, we correlate word usage and topics in both platforms to interpret their differences. We observe that for Twitter users, the variation in emotional intensity is less distinct between negative and positive emotions compared to Weibo users, and there is a sharper escalation in arousal corresponding with heightened emotions. From language features, we discover that affective expressions are associated with personal life and feelings on Twitter, while on Weibo such discussions are about socio-political topics in the society. These results suggest a West-East difference in the V-shaped relationship between valence and arousal of affective expressions on social media influenced by content differences. Our findings have implications for applications and theories related to cultural differences in affective expressions. △ Less

Submitted 30 July, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: preview

arXiv:2310.17017 [pdf, other]

An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives

Authors: Young Min Cho, Sunny Rai, Lyle Ungar, João Sedoc, Sharath Chandra Guntuku

Abstract: Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this g… ▽ More Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted in EMNLP 2023 Main Conference, camera ready

arXiv:2308.15352 [pdf]

Historical patterns of rice farming explain modern-day language use in China and Japan more than modernization and urbanization

Authors: Sharath Chandra Guntuku, Thomas Talhelm, Garrick Sherman, Angel Fan, Salvatore Giorgi, Liuqing Wei, Lyle H. Ungar

Abstract: We used natural language processing to analyze a billion words to study cultural differences on Weibo, one of China's largest social media platforms. We compared predictions from two common explanations about cultural differences in China (economic development and urban-rural differences) against the less-obvious legacy of rice versus wheat farming. Rice farmers had to coordinate shared irrigation… ▽ More We used natural language processing to analyze a billion words to study cultural differences on Weibo, one of China's largest social media platforms. We compared predictions from two common explanations about cultural differences in China (economic development and urban-rural differences) against the less-obvious legacy of rice versus wheat farming. Rice farmers had to coordinate shared irrigation networks and exchange labor to cope with higher labor requirements. In contrast, wheat relied on rainfall and required half as much labor. We test whether this legacy made southern China more interdependent. Across all word categories, rice explained twice as much variance as economic development and urbanization. Rice areas used more words reflecting tight social ties, holistic thought, and a cautious, prevention orientation. We then used Twitter data comparing prefectures in Japan, which largely replicated the results from China. This provides crucial evidence of the rice theory in a different nation, language, and platform. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Includes Supplemental Materials

arXiv:2307.01370 [pdf, other]

Multilingual Language Models are not Multicultural: A Case Study in Emotion

Authors: Shreya Havaldar, Sunny Rai, Bhumika Singhal, Langchen Liu, Sharath Chandra Guntuku, Lyle Ungar

Abstract: Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddi… ▽ More Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this. △ Less

Submitted 9 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted to WASSA at ACL 2023

arXiv:2305.18164 [pdf, other]

Generative Adversarial Networks based Skin Lesion Segmentation

Authors: Shubham Innani, Prasad Dutande, Ujjwal Baid, Venu Pokuri, Spyridon Bakas, Sanjay Talbar, Bhakti Baheti, Sharath Chandra Guntuku

Abstract: Skin cancer is a serious condition that requires accurate diagnosis and treatment. One way to assist clinicians in this task is using computer-aided diagnosis (CAD) tools that automatically segment skin lesions from dermoscopic images. We propose a novel adversarial learning-based framework called Efficient-GAN (EGAN) that uses an unsupervised generative network to generate accurate lesion masks.… ▽ More Skin cancer is a serious condition that requires accurate diagnosis and treatment. One way to assist clinicians in this task is using computer-aided diagnosis (CAD) tools that automatically segment skin lesions from dermoscopic images. We propose a novel adversarial learning-based framework called Efficient-GAN (EGAN) that uses an unsupervised generative network to generate accurate lesion masks. It consists of a generator module with a top-down squeeze excitation-based compound scaled path, an asymmetric lateral connection-based bottom-up path, and a discriminator module that distinguishes between original and synthetic masks. A morphology-based smoothing loss is also implemented to encourage the network to create smooth semantic boundaries of lesions. The framework is evaluated on the International Skin Imaging Collaboration (ISIC) Lesion Dataset 2018. It outperforms the current state-of-the-art skin lesion segmentation approaches with a Dice coefficient, Jaccard similarity, and Accuracy of 90.1%, 83.6%, and 94.5%, respectively. We also design a lightweight segmentation framework (MGAN) that achieves comparable performance as EGAN but with an order of magnitude lower number of training parameters, thus resulting in faster inference times for low compute resource settings. △ Less

Submitted 31 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted in Nature Scientific Reports

arXiv:2302.00669 [pdf, ps, other]

Detecting Histologic & Clinical Glioblastoma Patterns of Prognostic Relevance

Authors: Bhakti Baheti, Sunny Rai, Shubham Innani, Garv Mehdiratta, Sharath Chandra Guntuku, MacLean P. Nasrallah, Spyridon Bakas

Abstract: Glioblastoma is the most common and aggressive malignant adult tumor of the central nervous system, with a grim prognosis and heterogeneous morphologic and molecular profiles. Since adopting the current standard-of-care treatment 18 years ago, no substantial prognostic improvement has been noticed. Accurate prediction of patient overall survival (OS) from histopathology whole slide images (WSI) in… ▽ More Glioblastoma is the most common and aggressive malignant adult tumor of the central nervous system, with a grim prognosis and heterogeneous morphologic and molecular profiles. Since adopting the current standard-of-care treatment 18 years ago, no substantial prognostic improvement has been noticed. Accurate prediction of patient overall survival (OS) from histopathology whole slide images (WSI) integrated with clinical data using advanced computational methods could optimize clinical decision-making and patient management. Here, we focus on identifying prognostically relevant glioblastoma characteristics from H&E stained WSI & clinical data relating to OS. The exact approach for WSI capitalizes on the comprehensive curation of apparent artifactual content and an interpretability mechanism via a weakly supervised attention-based multiple-instance learning algorithm that further utilizes clustering to constrain the search space. The automatically placed patterns of high diagnostic value classify each WSI as representative of short or long-survivors. Further assessment of the prognostic relevance of the associated clinical patient data is performed both in isolation and in an integrated manner, using XGBoost and SHapley Additive exPlanations (SHAP). Identifying tumor morphological & clinical patterns associated with short and long OS will enable the clinical neuropathologist to provide additional relevant prognostic information to the treating team and suggest avenues of biological investigation for understanding and potentially treating glioblastoma. △ Less

Submitted 15 May, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

arXiv:2205.09416 [pdf, other]

A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation Topics

Authors: Harry Wang, Sharath Chandra Guntuku

Abstract: The COVID-19 pandemic has been accompanied by an `infodemic' -- of accurate and inaccurate health information across social media. Detecting misinformation amidst dynamically changing information landscape is challenging; identifying relevant keywords and posts is arduous due to the large amount of human effort required to inspect the content and sources of posts. We aim to reduce the resource cos… ▽ More The COVID-19 pandemic has been accompanied by an `infodemic' -- of accurate and inaccurate health information across social media. Detecting misinformation amidst dynamically changing information landscape is challenging; identifying relevant keywords and posts is arduous due to the large amount of human effort required to inspect the content and sources of posts. We aim to reduce the resource cost of this process by introducing a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to misinformation, with a focus on COVID-19. Our approach can successfully detect specific topics from general misinformation-related seed words in a few seed texts. Our approach utilizes the BERT-based Word Graph Search (BWGS) algorithm that builds on context-based neural network embeddings for retrieving misinformation-related posts. We utilize Latent Dirichlet Allocation (LDA) topic modeling for obtaining misinformation-related themes from the texts returned by BWGS. Furthermore, we propose the BERT-based Multi-directional Word Graph Search (BMDWGS) algorithm that utilizes greater starting context information for misinformation extraction. In addition to a qualitative analysis of our approach, our quantitative analyses show that BWGS and BMDWGS are effective in extracting misinformation-related content compared to common baselines in low data resource settings. Extracting such content is useful for uncovering prevalent misconceptions and concerns and for facilitating precision public health messaging campaigns to improve health behaviors. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: accepted at CySoc2022

arXiv:2202.01802 [pdf, other]

Different Affordances on Facebook and SMS Text Messaging Do Not Impede Generalization of Language-Based Predictive Models

Authors: Tingting Liu, Salvatore Giorgi, Xiangyu Tao, Sharath Chandra Guntuku, Douglas Bellew, Brenda Curtis, Lyle Ungar

Abstract: Adaptive mobile device-based health interventions often use machine learning models trained on non-mobile device data, such as social media text, due to the difficulty and high expense of collecting large text message (SMS) data. Therefore, understanding the differences and generalization of models between these platforms is crucial for proper deployment. We examined the psycho-linguistic differen… ▽ More Adaptive mobile device-based health interventions often use machine learning models trained on non-mobile device data, such as social media text, due to the difficulty and high expense of collecting large text message (SMS) data. Therefore, understanding the differences and generalization of models between these platforms is crucial for proper deployment. We examined the psycho-linguistic differences between Facebook and text messages, and their impact on out-of-domain model performance, using a sample of 120 users who shared both. We found that users use Facebook for sharing experiences (e.g., leisure) and SMS for task-oriented and conversational purposes (e.g., plan confirmations), reflecting the differences in the affordances. To examine the downstream effects of these differences, we used pre-trained Facebook-based language models to estimate age, gender, depression, life satisfaction, and stress on both Facebook and SMS. We found no significant differences in correlations between the estimates and self-reports across 6 of 8 models. These results suggest using pre-trained Facebook language models to achieve better accuracy with just-in-time interventions. △ Less

Submitted 23 May, 2023; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

arXiv:2110.15726 [pdf, other]

Social Media Reveals Urban-Rural Differences in Stress across China

Authors: Jesse Cui, Tingdan Zhang, Kokil Jaidka, Dandan Pang, Garrick Sherman, Vinit Jakhetiya, Lyle Ungar, Sharath Chandra Guntuku

Abstract: Modeling differential stress expressions in urban and rural regions in China can provide a better understanding of the effects of urbanization on psychological well-being in a country that has rapidly grown economically in the last two decades. This paper studies linguistic differences in the experiences and expressions of stress in urban-rural China from Weibo posts from over 65,000 users across… ▽ More Modeling differential stress expressions in urban and rural regions in China can provide a better understanding of the effects of urbanization on psychological well-being in a country that has rapidly grown economically in the last two decades. This paper studies linguistic differences in the experiences and expressions of stress in urban-rural China from Weibo posts from over 65,000 users across 329 counties using hierarchical mixed-effects models. We analyzed phrases, topical themes, and psycho-linguistic word choices in Weibo posts mentioning stress to better understand appraisal differences surrounding psychological stress in urban and rural communities in China; we then compared them with large-scale polls from Gallup. After controlling for socioeconomic and gender differences, we found that rural communities tend to express stress in emotional and personal themes such as relationships, health, and opportunity while users in urban areas express stress using relative, temporal, and external themes such as work, politics, and economics. These differences exist beyond controlling for GDP and urbanization, indicating a fundamentally different lifestyle between rural and urban residents in very specific environments, arguably having different sources of stress. We found corroborative trends in physical, financial, and social wellness with urbanization in Gallup polls. △ Less

Submitted 3 November, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: Accepted at AAAI Conference on Web and Social Media (ICWSM) 2022

arXiv:2104.05121 [pdf, other]

Detecting COVID-19 and Community Acquired Pneumonia using Chest CT scan images with Deep Learning

Authors: Shubham Chaudhary, Sadbhawna, Vinit Jakhetiya, Badri N Subudhi, Ujjwal Baid, Sharath Chandra Guntuku

Abstract: We propose a two-stage Convolutional Neural Network (CNN) based classification framework for detecting COVID-19 and Community-Acquired Pneumonia (CAP) using the chest Computed Tomography (CT) scan images. In the first stage, an infection - COVID-19 or CAP, is detected using a pre-trained DenseNet architecture. Then, in the second stage, a fine-grained three-way classification is done using Efficie… ▽ More We propose a two-stage Convolutional Neural Network (CNN) based classification framework for detecting COVID-19 and Community-Acquired Pneumonia (CAP) using the chest Computed Tomography (CT) scan images. In the first stage, an infection - COVID-19 or CAP, is detected using a pre-trained DenseNet architecture. Then, in the second stage, a fine-grained three-way classification is done using EfficientNet architecture. The proposed COVID+CAP-CNN framework achieved a slice-level classification accuracy of over 94% at identifying COVID-19 and CAP. Further, the proposed framework has the potential to be an initial screening tool for differential diagnosis of COVID-19 and CAP, achieving a validation accuracy of over 89.3% at the finer three-way COVID-19, CAP, and healthy classification. Within the IEEE ICASSP 2021 Signal Processing Grand Challenge (SPGC) on COVID-19 Diagnosis, our proposed two-stage classification framework achieved an overall accuracy of 90% and sensitivity of .857, .9, and .942 at distinguishing COVID-19, CAP, and normal individuals respectively, to rank first in the evaluation. Code and model weights are available at https://github.com/shubhamchaudhary2015/ct_covid19_cap_cnn △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: Top Ranked Model Paper at the ICASSP 2021 COVID-19 Grand Challenge

arXiv:2011.03983 [pdf, other]

Detecting Emerging Symptoms of COVID-19 using Context-based Twitter Embeddings

Authors: Roshan Santosh, H. Andrew Schwartz, Johannes C. Eichstaedt, Lyle H. Ungar, Sharath C. Guntuku

Abstract: In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach gener… ▽ More In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before being reported by the Centers for Disease Control (CDC). △ Less

Submitted 8 November, 2020; originally announced November 2020.

Comments: In proceedings of EMNLP 2020 (Empirical Methods in NLP) workshop on COVID-19

arXiv:2009.00596 [pdf, other]

Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2021

Authors: Salvatore Giorgi, Sharath Chandra Guntuku, McKenzie Himelein-Wachowiak, Amy Kwarteng, Sy Hwang, Muhammad Rahman, Brenda Curtis

Abstract: Black Lives Matter (BLM) is a decentralized social movement protesting violence against Black individuals and communities, with a focus on police brutality. The movement gained significant attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd in 2020. The #BlackLivesMatter social media hashtag has come to represent the grassroots movement, with similar hashtags counte… ▽ More Black Lives Matter (BLM) is a decentralized social movement protesting violence against Black individuals and communities, with a focus on police brutality. The movement gained significant attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd in 2020. The #BlackLivesMatter social media hashtag has come to represent the grassroots movement, with similar hashtags counter protesting the BLM movement, such as #AllLivesMatter, and #BlueLivesMatter. We introduce a data set of 63.9 million tweets from 13.0 million users from over 100 countries which contain one of the following keywords: BlackLivesMatter, AllLivesMatter, and BlueLivesMatter. This data set contains all currently available tweets from the beginning of the BLM movement in 2013 to 2021. We summarize the data set and show temporal trends in use of both the BlackLivesMatter keyword and keywords associated with counter movements. Additionally, for each keyword, we create and release a set of Latent Dirichlet Allocation (LDA) topics (i.e., automatically clustered groups of semantically co-occuring words) to aid researchers in identifying linguistic patterns across the three keywords. △ Less

Submitted 7 June, 2022; v1 submitted 1 September, 2020; originally announced September 2020.

Comments: Published at the 16th International AAAI Conference on Web and Social Media (ICWSM) 2022

arXiv:2008.02449 [pdf, other]

Studying Politeness across Cultures Using English Twitter and Mandarin Weibo

Authors: Mingyang Li, Louis Hickman, Louis Tay, Lyle Ungar, Sharath Chandra Guntuku

Abstract: Modeling politeness across cultures helps to improve intercultural communication by uncovering what is considered appropriate and polite. We study the linguistic features associated with politeness across US English and Mandarin Chinese. First, we annotate 5,300 Twitter posts from the US and 5,300 Sina Weibo posts from China for politeness scores. Next, we develop an English and Chinese politeness… ▽ More Modeling politeness across cultures helps to improve intercultural communication by uncovering what is considered appropriate and polite. We study the linguistic features associated with politeness across US English and Mandarin Chinese. First, we annotate 5,300 Twitter posts from the US and 5,300 Sina Weibo posts from China for politeness scores. Next, we develop an English and Chinese politeness feature set, `PoliteLex'. Combining it with validated psycholinguistic dictionaries, we then study the correlations between linguistic features and perceived politeness across cultures. We find that on Mandarin Weibo, future-focusing conversations, identifying with a group affiliation, and gratitude are considered to be more polite than on English Twitter. Death-related taboo topics, lack of or poor choice of pronouns, and informal language are associated with higher impoliteness on Mandarin Weibo compared to English Twitter. Finally, we build language-based machine learning models to predict politeness with an F1 score of 0.886 on Mandarin Weibo and a 0.774 on English Twitter. △ Less

Submitted 24 August, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

Comments: Accepted for CSCW 2020. To be published in PACM HCI

arXiv:1904.02671 [pdf, other]

Studying Cultural Differences in Emoji Usage across the East and the West

Authors: Sharath Chandra Guntuku, Mingyang Li, Louis Tay, Lyle H. Ungar

Abstract: Global acceptance of Emojis suggests a cross-cultural, normative use of Emojis. Meanwhile, nuances in Emoji use across cultures may also exist due to linguistic differences in expressing emotions and diversity in conceptualizing topics. Indeed, literature in cross-cultural psychology has found both normative and culture-specific ways in which emotions are expressed. In this paper, using social med… ▽ More Global acceptance of Emojis suggests a cross-cultural, normative use of Emojis. Meanwhile, nuances in Emoji use across cultures may also exist due to linguistic differences in expressing emotions and diversity in conceptualizing topics. Indeed, literature in cross-cultural psychology has found both normative and culture-specific ways in which emotions are expressed. In this paper, using social media, we compare the Emoji usage based on frequency, context, and topic associations across countries in the East (China and Japan) and the West (United States, United Kingdom, and Canada). Across the East and the West, our study examines a) similarities and differences on the usage of different categories of Emojis such as People, Food \& Drink, Travel \& Places etc., b) potential mapping of Emoji use differences with previously identified cultural differences in users' expression about diverse concepts such as death, money emotions and family, and c) relative correspondence of validated psycho-linguistic categories with Ekman's emotions. The analysis of Emoji use in the East and the West reveals recognizable normative and culture specific patterns. This research reveals the ways in which Emojis can be used for cross-cultural communication. △ Less

Submitted 4 April, 2019; originally announced April 2019.

Comments: ICWSM 2019

arXiv:1904.02670 [pdf, other]

What Twitter Profile and Posted Images Reveal About Depression and Anxiety

Authors: Sharath Chandra Guntuku, Daniel Preotiuc-Pietro, Johannes C. Eichstaedt, Lyle H. Ungar

Abstract: Previous work has found strong links between the choice of social media images and users' emotions, demographics and personality traits. In this study, we examine which attributes of profile and posted images are associated with depression and anxiety of Twitter users. We used a sample of 28,749 Facebook users to build a language prediction model of survey-reported depression and anxiety, and vali… ▽ More Previous work has found strong links between the choice of social media images and users' emotions, demographics and personality traits. In this study, we examine which attributes of profile and posted images are associated with depression and anxiety of Twitter users. We used a sample of 28,749 Facebook users to build a language prediction model of survey-reported depression and anxiety, and validated it on Twitter on a sample of 887 users who had taken anxiety and depression surveys. We then applied it to a different set of 4,132 Twitter users to impute language-based depression and anxiety labels, and extracted interpretable features of posted and profile pictures to uncover the associations with users' depression and anxiety, controlling for demographics. For depression, we find that profile pictures suppress positive emotions rather than display more negative emotions, likely because of social media self-presentation biases. They also tend to show the single face of the user (rather than show her in groups of friends), marking increased focus on the self, emblematic for depression. Posted images are dominated by grayscale and low aesthetic cohesion across a variety of image features. Profile images of anxious users are similarly marked by grayscale and low aesthetic cohesion, but less so than those of depressed users. Finally, we show that image features can be used to predict depression and anxiety, and that multitask learning that includes a joint modeling of demographics improves prediction performance. Overall, we find that the image attributes that mark depression and anxiety offer a rich lens into these conditions largely congruent with the psychological literature, and that images on Twitter allow inferences about the mental health status of users. △ Less

Submitted 4 April, 2019; originally announced April 2019.

Comments: ICWSM 2019

arXiv:1811.07430 [pdf, other]

Understanding and Measuring Psychological Stress using Social Media

Authors: Sharath Chandra Guntuku, Anneke Buffone, Kokil Jaidka, Johannes Eichstaedt, Lyle Ungar

Abstract: A body of literature has demonstrated that users' mental health conditions, such as depression and anxiety, can be predicted from their social media language. There is still a gap in the scientific understanding of how psychological stress is expressed on social media. Stress is one of the primary underlying causes and correlates of chronic physical illnesses and mental health conditions. In this… ▽ More A body of literature has demonstrated that users' mental health conditions, such as depression and anxiety, can be predicted from their social media language. There is still a gap in the scientific understanding of how psychological stress is expressed on social media. Stress is one of the primary underlying causes and correlates of chronic physical illnesses and mental health conditions. In this paper, we explore the language of psychological stress with a dataset of 601 social media users, who answered the Perceived Stress Scale questionnaire and also consented to share their Facebook and Twitter data. Firstly, we find that stressed users post about exhaustion, losing control, increased self-focus and physical pain as compared to posts about breakfast, family-time, and travel by users who are not stressed. Secondly, we find that Facebook language is more predictive of stress than Twitter language. Thirdly, we demonstrate how the language based models thus developed can be adapted and be scaled to measure county-level trends. Since county-level language is easily available on Twitter using the Streaming API, we explore multiple domain adaptation algorithms to adapt user-level Facebook models to Twitter language. We find that domain-adapted and scaled social media-based measurements of stress outperform sociodemographic variables (age, gender, race, education, and income), against ground-truth survey-based stress measurements, both at the user- and the county-level in the U.S. Twitter language that scores higher in stress is also predictive of poorer health, less access to facilities and lower socioeconomic status in counties. We conclude with a discussion of the implications of using social media as a new tool for monitoring stress levels of both individuals and counties. △ Less

Submitted 4 April, 2019; v1 submitted 18 November, 2018; originally announced November 2018.

Comments: Accepted for publication in the proceedings of ICWSM 2019

arXiv:1606.06873 [pdf, other]

Personality, Culture, and System Factors - Impact on Affective Response to Multimedia

Authors: Sharath Chandra Guntuku, Michael James Scott, Gheorghita Ghinea, Weisi Lin

Abstract: Whilst affective responses to various forms and genres of multimedia content have been well researched, precious few studies have investigated the combined impact that multimedia system parameters and human factors have on affect. Consequently, in this paper we explore the role that two primordial dimensions of human factors - personality and culture - in conjunction with system factors - frame ra… ▽ More Whilst affective responses to various forms and genres of multimedia content have been well researched, precious few studies have investigated the combined impact that multimedia system parameters and human factors have on affect. Consequently, in this paper we explore the role that two primordial dimensions of human factors - personality and culture - in conjunction with system factors - frame rate, resolution, and bit rate - have on user affect and enjoyment of multimedia presentations. To this end, a two-site, cross-cultural study was undertaken, the results of which produced three predictve models. Personality and Culture traits were shown statistically to represent 5.6% of the variance in positive affect, 13.6% in negative affect and 9.3% in enjoyment. The correlation between affect and enjoyment, was significant. Predictive modeling incorporating human factors showed about 8%, 7% and 9% improvement in predicting positive affect, negative affect and enjoyment respectively when compared to models trained only on system factors. Results and analysis indicate the significant role played by human factors in influencing affect that users experience while watching multimedia. △ Less

Submitted 18 July, 2016; v1 submitted 22 June, 2016; originally announced June 2016.

arXiv:1307.7464 [pdf]

Real-time Peer-to-Peer Botnet Detection Framework based on Bayesian Regularized Neural Network

Authors: Sharath Chandra Guntuku, Pratik Narang, Chittaranjan Hota

Abstract: Over the past decade, the Cyberspace has seen an increasing number of attacks coming from botnets using the Peer-to-Peer (P2P) architecture. Peer-to-Peer botnets use a decentralized Command & Control architecture. Moreover, a large number of such botnets already exist, and newer versions- which significantly differ from their parent bot- are also discovered practically every year. In this work, th… ▽ More Over the past decade, the Cyberspace has seen an increasing number of attacks coming from botnets using the Peer-to-Peer (P2P) architecture. Peer-to-Peer botnets use a decentralized Command & Control architecture. Moreover, a large number of such botnets already exist, and newer versions- which significantly differ from their parent bot- are also discovered practically every year. In this work, the authors propose and implement a novel hybrid framework for detecting P2P botnets in live network traffic by integrating Neural Networks with Bayesian Regularization. Bayesian Regularization helps in achieving better generalization of the dataset, thereby enabling the detection of botnet activity even of those bots which were never used in training the Neural Network. Hence such a framework is suitable for detection of newer and unseen botnets in live traffic of a network. This was verified by testing the Framework on test data unseen to the Detection module (using untrained botnet dataset), and the authors were successful in detecting this activity with an accuracy of 99.2 %. △ Less

Submitted 29 July, 2013; originally announced July 2013.

Showing 1–21 of 21 results for author: Guntuku, S C