Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–49 of 49 results for author: Jurgens, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02472  [pdf, other

    cs.CL

    ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions

    Authors: Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, Yulia Tsvetkov

    Abstract: This study introduces ValueScope, a framework leveraging language models to quantify social norms and values within online communities, grounded in social science perspectives on normative structures. We employ ValueScope to dissect and analyze linguistic and stylistic expressions across 13 Reddit communities categorized under gender, politics, science, and finance. Our analysis provides a quantit… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: First three authors contributed equally. 33 pages. In submission

  2. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  3. arXiv:2405.13272  [pdf, other

    cs.CL cs.CY

    A Multilingual Similarity Dataset for News Article Frame

    Authors: Xi Chen, Mattia Samory, Scott Hale, David Jurgens, Przemyslaw A. Grabowicz

    Abstract: Understanding the writing frame of news articles is vital for addressing social issues, and thus has attracted notable attention in the fields of communication studies. Yet, assessing such news article frames remains a challenge due to the absence of a concrete and unified standard dataset that considers the comprehensive nuances within news content. To address this gap, we introduce an extended… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  4. arXiv:2405.02411  [pdf, other

    cs.CL

    The Call for Socially Aware Language Technologies

    Authors: Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank

    Abstract: Language technologies have made enormous progress, especially with the introduction of large language models (LLMs). On traditional tasks such as machine translation and sentiment analysis, these models perform at near-human level. These advances can, however, exacerbate a variety of issues that models have traditionally struggled with, such as bias, evaluation, and risks. In this position paper,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  5. arXiv:2405.00948  [pdf, other

    cs.CL

    Modeling Empathetic Alignment in Conversation

    Authors: Jiamin Yang, David Jurgens

    Abstract: Empathy requires perspective-taking: empathetic responses require a person to reason about what another has experienced and communicate that understanding in language. However, most NLP approaches to empathy do not explicitly model this alignment process. Here, we introduce a new approach to recognizing alignment in empathetic speech, grounded in Appraisal Theory. We introduce a new dataset of ove… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Camera-ready version for NAACL 2024

  6. arXiv:2405.00280  [pdf, other

    cs.SI cs.CY cs.IR

    Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic

    Authors: Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw A. Grabowicz

    Abstract: News coverage profoundly affects how countries and individuals behave in international relations. Yet, we have little empirical evidence of how news coverage varies across countries. To enable studies of global news coverage, we develop an efficient computational methodology that comprises three components: (i) a transformer model to estimate multilingual news similarity; (ii) a global event ident… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  7. arXiv:2312.02118  [pdf, other

    cs.CL cs.CY cs.SI

    When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

    Authors: Benjamin Litterer, David Jurgens, Dallas Card

    Abstract: Most events in the world receive at most brief coverage by the news media. Occasionally, however, an event will trigger a media storm, with voluminous and widespread coverage lasting for weeks instead of days. In this work, we develop and apply a pairwise article similarity model, allowing us to identify story clusters in corpora covering local and national online news, and thereby create a compre… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Findings of EMNLP 2023; 16 pages; 12 figures; 4 tables

  8. arXiv:2311.10054  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

    Authors: Mingqian Zheng, Jiaxin Pei, David Jurgens

    Abstract: Prompting serves as the major way humans interact with Large Language Models (LLM). Commercial AI systems commonly define the role of the LLM in system prompts. For example, ChatGPT uses "You are a helpful assistant" as part of the default system prompt. But is "a helpful assistant" the best role for LLMs? In this study, we present a systematic evaluation of how social roles in system prompts affe… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  9. arXiv:2311.09730  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

    Authors: Huaman Sun, Jiaxin Pei, Minje Choi, David Jurgens

    Abstract: Human perception of language depends on personal backgrounds like gender and ethnicity. While existing studies have shown that large language models (LLMs) hold values that are closer to certain societal groups, it is unclear whether their prediction behaviors on subjective NLP tasks also exhibit a similar bias. In this study, leveraging the POPQUORN dataset which contains annotations of diverse d… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  10. arXiv:2311.09718  [pdf, other

    cs.CL cs.AI

    You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

    Authors: Bangzhao Shu, Lechen Zhang, Minje Choi, Lavinia Dunagan, Lajanugen Logeswaran, Moontae Lee, Dallas Card, David Jurgens

    Abstract: The versatility of Large Language Models (LLMs) on natural language understanding tasks has made them popular for research in social sciences. To properly understand the properties and innate personas of LLMs, researchers have performed studies that involve using prompts in the form of questions that ask LLMs about particular opinions. In this study, we take a cautionary step back and examine whet… ▽ More

    Submitted 1 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Camera-ready version for NAACL 2024. First two authors contributed equally

  11. arXiv:2311.09130  [pdf, other

    cs.CL

    Social Meme-ing: Measuring Linguistic Variation in Memes

    Authors: Naitian Zhou, David Jurgens, David Bamman

    Abstract: Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  12. arXiv:2308.09270  [pdf, other

    cs.SI

    Profile Update: The Effects of Identity Disclosure on Network Connections and Language

    Authors: Minje Choi, Daniel M. Romero, David Jurgens

    Abstract: Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change to what is important to them and how they should be viewed. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity si… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  13. arXiv:2307.15176  [pdf, other

    cs.AI cs.CL cs.LG stat.ME

    RCT Rejection Sampling for Causal Estimation Evaluation

    Authors: Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya

    Abstract: Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment… ▽ More

    Submitted 31 January, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Code and data at https://github.com/kakeith/rct_rejection_sampling

    Journal ref: Transactions on Machine Learning Research (TMLR) 2023

  14. arXiv:2307.02763  [pdf, other

    cs.CL cs.CY

    Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

    Authors: David Jurgens, Agrima Seth, Jackson Sargent, Athena Aghighi, Michael Geraci

    Abstract: Understanding interpersonal communication requires, in part, understanding the social context and norms in which a message is said. However, current methods for identifying offensive content in such communication largely operate independent of context, with only a few approaches considering community norms or prior conversation as context. Here, we introduce a new approach to identifying inappropr… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: ACL 2023, 18 pages, 8 figures, 11 tables

  15. arXiv:2307.02758  [pdf, other

    cs.CL

    Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

    Authors: Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer, David Jurgens

    Abstract: Linguistic style matching (LSM) in conversations can be reflective of several aspects of social influence such as power or persuasion. However, how LSM relates to the outcomes of online communication on platforms such as Reddit is an unknown question. In this study, we analyze a large corpus of two-party conversation threads in Reddit where we identify all occurrences of LSM using two types of sty… ▽ More

    Submitted 26 August, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Equal contributions from authors 1-9 (AA, HC, JY, KA, JP, AS, LD, MC, BL)

  16. arXiv:2306.06826  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

    Authors: Jiaxin Pei, David Jurgens

    Abstract: Annotators are not fungible. Their demographics, life experiences, and backgrounds all contribute to how they label data. However, NLP has only recently considered how annotator identity might influence their decisions. Here, we present POPQUORN (the POtato-Prolific dataset for QUestion-Answering, Offensiveness, text Rewriting, and politeness rating with demographic Nuance). POPQUORN contains 45,0… ▽ More

    Submitted 28 August, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

  17. Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

    Authors: Minje Choi, Jiaxin Pei, Sagar Kumar, Chang Shu, David Jurgens

    Abstract: Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand \textit{social} language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP… ▽ More

    Submitted 7 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Camera-ready version for EMNLP'23. First two authors contributed equally

  18. arXiv:2304.03797  [pdf, other

    cs.SI cs.CL

    Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media

    Authors: Julia Mendelsohn, Sayan Ghosh, David Jurgens, Ceren Budak

    Abstract: Social media enables the rapid spread of many kinds of information, from memes to social movements. However, little is known about how information crosses linguistic boundaries. We apply causal inference techniques on the European Twitter network to quantify multilingual users' structural role and communication influence in cross-lingual information exchange. Overall, multilinguals play an essenti… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: ICWSM 2023 (please cite accordingly); see https://github.com/juliamendelsohn/bridging-nations for data, models, and code

  19. Analyzing the Engagement of Social Relationships During Life Event Shocks in Social Media

    Authors: Minje Choi, David Jurgens, Daniel M. Romero

    Abstract: Individuals experiencing unexpected distressing events, shocks, often rely on their social network for support. While prior work has shown how social networks respond to shocks, these studies usually treat all ties equally, despite differences in the support provided by different social relationships. Here, we conduct a computational analysis on Twitter that examines how responses to online shocks… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted to ICWSM 2023. 12 pages, 5 figures, 5 tables

  20. arXiv:2212.08620  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    POTATO: The Portable Text Annotation Tool

    Authors: Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Jackson Sargent, Apostolos Dedeloudis, David Jurgens

    Abstract: We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a hig… ▽ More

    Submitted 23 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: EMNLP 2022 DEMO

  21. arXiv:2210.16604  [pdf, other

    cs.CL

    A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing

    Authors: Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek

    Abstract: We review the state of research on empathy in natural language processing and identify the following issues: (1) empathy definitions are absent or abstract, which (2) leads to low construct validity and reproducibility. Moreover, (3) emotional empathy is overemphasized, skewing our focus to a narrow subset of simplified tasks. We believe these issues hinder research progress and argue that current… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: To appear at Findings of EMNLP 2022

  22. arXiv:2210.13001  [pdf, other

    cs.CL cs.CY cs.LG

    Modeling Information Change in Science Communication with Semantically Matched Paraphrases

    Authors: Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein

    Abstract: Whether the media faithfully communicate scientific information has long been a core issue to the science community. Automatically identifying paraphrased scientific findings could enable large-scale tracking and analysis of information changes in the science communication process, but this requires systems to understand the similarity between scientific information across multiple domains. To thi… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: In EMNLP 2022; 25 pages; 11 figures; 6 tables

  23. arXiv:2210.01108  [pdf, other

    cs.CL cs.CY cs.LG

    SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

    Authors: Jiaxin Pei, VĂ­tor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, Francesco Barbieri

    Abstract: We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/u… ▽ More

    Submitted 3 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

  24. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĂ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  25. arXiv:2204.09885  [pdf, other

    cs.CL

    An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications

    Authors: Sungjin Nam, David Jurgens, Gwen Frishkoff, Kevyn Collins-Thompson

    Abstract: Both humans and machines learn the meaning of unknown words through contextual information in a sentence, but not all contexts are equally helpful for learning. We introduce an effective method for capturing the level of contextual informativeness with respect to a given target word. Our study makes three main contributions. First, we develop models for estimating contextual informativeness, focus… ▽ More

    Submitted 9 November, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

  26. arXiv:2204.03067  [pdf, other

    cs.CL

    ByT5 model for massively multilingual grapheme-to-phoneme conversion

    Authors: Jian Zhu, Cong Zhang, David Jurgens

    Abstract: In this study, we tackle massively multilingual grapheme-to-phoneme conversion through implementing G2P models based on ByT5. We have curated a G2P dataset from various sources that covers around 100 languages and trained large-scale multilingual G2P models based on ByT5. We found that ByT5 operating on byte-level inputs significantly outperformed the token-based mT5 model in terms of multilingual… ▽ More

    Submitted 4 July, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Interspeech 2022

  27. arXiv:2202.04842  [pdf, other

    cs.SI cs.CL cs.CY physics.soc-ph

    Networks and Identity Drive Geographic Properties of the Diffusion of Linguistic Innovation

    Authors: Aparna Ananthasubramaniam, David Jurgens, Daniel M. Romero

    Abstract: Adoption of cultural innovation (e.g., music, beliefs, language) is often geographically correlated, with adopters largely residing within the boundaries of relatively few well-studied, socially significant areas. These cultural regions are often hypothesized to be the result of either (i) identity performance driving the adoption of cultural innovation, or (ii) homophily in the networks underlyin… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    ACM Class: J.4; I.6.3; K.4

  28. arXiv:2110.04419  [pdf, other

    cs.CL

    Detecting Community Sensitive Norm Violations in Online Conversations

    Authors: Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens, Yulia Tsvetkov

    Abstract: Online platforms and communities establish their own norms that govern what behavior is acceptable within the community. Substantial effort in NLP has focused on identifying unacceptable behaviors and, recently, on forecasting them before they occur. However, these efforts have largely focused on toxicity as the sole form of community norm violation. Such focus has overlooked the much larger set o… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Findings of EMNLP 2021

  29. arXiv:2110.03876  [pdf, other

    cs.CL cs.SD eess.AS

    Phone-to-audio alignment without text: A Semi-supervised Approach

    Authors: Jian Zhu, Cong Zhang, David Jurgens

    Abstract: The task of phone-to-audio alignment has many applications in speech research. Here we introduce two Wav2Vec2-based models for both text-dependent and text-independent phone-to-audio alignment. The proposed Wav2Vec2-FS, a semi-supervised model, directly learns phone-to-audio alignment through contrastive learning and a forward sum loss, and can be coupled with a pretrained phone recognizer to achi… ▽ More

    Submitted 3 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: ICASSP 2022

  30. arXiv:2109.14776  [pdf, other

    cs.CL cs.CY cs.SI

    Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications

    Authors: Jiaxin Pei, David Jurgens

    Abstract: Certainty and uncertainty are fundamental to science communication. Hedges have widely been used as proxies for uncertainty. However, certainty is a complex construct, with authors expressing not only the degree but the type and aspects of uncertainty in order to give the reader a certain impression of what is known. Here, we introduce a new study of certainty that models both the level and the as… ▽ More

    Submitted 11 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 Main Conference

  31. arXiv:2109.12212  [pdf, other

    cs.CL cs.CV cs.CY

    An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog

    Authors: Xingyao Wang, David Jurgens

    Abstract: Online conversations include more than just text. Increasingly, image-based responses such as memes and animated gifs serve as culturally recognized and often humorous responses in conversation. However, while NLP has broadened to multimodal models, conversational dialog systems have largely focused only on generating text replies. Here, we introduce a new dataset of 1.56M text-gif conversation tu… ▽ More

    Submitted 29 September, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021; 30 pages

  32. arXiv:2109.11061  [pdf, other

    cs.CL cs.CY

    Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender

    Authors: Sky CH-Wang, David Jurgens

    Abstract: Individuals signal aspects of their identity and beliefs through linguistic choices. Studying these choices in aggregate allows us to examine large-scale attitude shifts within a population. Here, we develop computational methods to study word choice within a sociolinguistic lexical variable -- alternate words used to express the same concept -- in order to test for change in the United States tow… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Proceedings of EMNLP 2021

  33. arXiv:2109.03158  [pdf, other

    cs.CL

    Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

    Authors: Jian Zhu, David Jurgens

    Abstract: An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify an… ▽ More

    Submitted 10 September, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 main conference

  34. arXiv:2107.00414  [pdf, other

    cs.CL

    MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

    Authors: Anne Lauscher, Brandon Ko, Bailey Kuehl, Sophie Johnson, David Jurgens, Arman Cohan, Kyle Lo

    Abstract: Citation context analysis (CCA) is an important task in natural language processing that studies how and why scholars discuss each others' work. Despite decades of study, traditional frameworks for CCA have largely relied on overly-simplistic assumptions of how authors cite, which ignore several important phenomena. For instance, scholarly papers often contain rich discussions of cited work that s… ▽ More

    Submitted 31 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

  35. More than Meets the Tie: Examining the Role of Interpersonal Relationships in Social Networks

    Authors: Minje Choi, Ceren Budak, Daniel M. Romero, David Jurgens

    Abstract: Topics in conversations depend in part on the type of interpersonal relationship between speakers, such as friendship, kinship, or romance. Identifying these relationships can provide a rich description of how individuals communicate and reveal how relationships influence the way people share information. Using a dataset of more than 9.6M dyads of Twitter users, we show how relationship types infl… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: Accepted to ICWSM 2021

  36. arXiv:2104.06999  [pdf, other

    cs.CL

    Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

    Authors: Sayan Ghosh, Dylan Baker, David Jurgens, Vinodkumar Prabhakaran

    Abstract: Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users. However, these techniques suffer from various sampling and association biases present in training data, often resulting in sub-par performance on content relevant to marginalized groups, potentially furthering di… ▽ More

    Submitted 29 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT)

  37. Modeling Framing in Immigration Discourse on Social Media

    Authors: Julia Mendelsohn, Ceren Budak, David Jurgens

    Abstract: The framing of political issues can influence policy and public opinion. Even though the public plays a key role in creating and spreading frames, little is known about how ordinary people on social media frame political issues. By creating a new dataset of immigration-related tweets labeled for multiple framing typologies from political communication theory, we develop supervised models to detect… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted at NAACL 2021 (camera-ready), Annotation codebook, data, models, and code available at https://github.com/juliamendelsohn/framing

    ACM Class: I.2.7; J.4; K.4.2

  38. arXiv:2104.05010  [pdf, other

    cs.CL cs.SI

    The structure of online social networks modulates the rate of lexical change

    Authors: Jian Zhu, David Jurgens

    Abstract: New words are regularly introduced to communities, yet not all of these words persist in a community's lexicon. Among the many factors contributing to lexical change, we focus on the understudied effect of social networks. We conduct a large-scale analysis of over 80k neologisms in 4420 online communities across a decade. Using Poisson regression and survival analysis, our study demonstrates that… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  39. arXiv:2102.08368  [pdf, other

    cs.CY cs.CL cs.SI

    Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations

    Authors: Jiajun Bao, Junjie Wu, Yiming Zhang, Eshwar Chandrasekharan, David Jurgens

    Abstract: Online conversations can go in many directions: some turn out poorly due to antisocial behavior, while others turn out positively to the benefit of all. Research on improving online spaces has focused primarily on detecting and reducing antisocial behavior. Yet we know little about positive outcomes in online conversations and how to increase them-is a prosocial outcome simply the lack of antisoci… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted for Publication at the Web Conference 2021; 12 pages

  40. arXiv:2011.05910  [pdf, other

    cs.CL cs.AI

    Audrey: A Personalized Open-Domain Conversational Bot

    Authors: Chung Hoon Hong, Yuan Liang, Sagnik Sinha Roy, Arushi Jain, Vihang Agarwal, Ryan Draves, Zhizhuo Zhou, William Chen, Yujian Liu, Martha Miracky, Lily Ge, Nikola Banovic, David Jurgens

    Abstract: Conversational Intelligence requires that a person engage on informational, personal and relational levels. Advances in Natural Language Understanding have helped recent chatbots succeed at dialog on the informational level. However, current techniques still lag for conversing with humans on a personal level and fully relating to them. The University of Michigan's submission to the Alexa Prize Gra… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

  41. arXiv:2011.03020  [pdf, other

    cs.CL cs.CY cs.SI

    Quantifying Intimacy in Language

    Authors: Jiaxin Pei, David Jurgens

    Abstract: Intimacy is a fundamental aspect of how we relate to others in social settings. Language encodes the social information of intimacy through both topics and other more subtle cues (such as linguistic hedging and swearing). Here, we introduce a new computational framework for studying expressions of the intimacy in language with an accompanying dataset and deep learning model for accurately predicti… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020

  42. arXiv:2009.01896  [pdf, other

    cs.CY

    Author Mentions in Science News Reveal Widespread Disparities Across Name-inferred Ethnicities

    Authors: Hao Peng, Misha Teplitskiy, David Jurgens

    Abstract: Media outlets play a key role in spreading scientific knowledge to the general public and raising the profile of researchers among their peers. Yet, how journalists choose to present researchers in their stories is poorly understood. Using a comprehensive dataset of 223,587 news stories from 288 U.S. outlets reporting on 100,486 research papers across all areas of science, we investigate if the au… ▽ More

    Submitted 22 January, 2024; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: 68 pages, 8 figures, 11 tables

  43. arXiv:1906.01738  [pdf, other

    cs.SI cs.CL cs.CY

    A Just and Comprehensive Strategy for Using NLP to Address Online Abuse

    Authors: David Jurgens, Eshwar Chandrasekharan, Libby Hemphill

    Abstract: Online abusive behavior affects millions and the NLP community has attempted to mitigate this problem by developing technologies to detect abuse. However, current methods have largely focused on a narrow definition of abuse to detriment of victims who seek both validation and solutions. In this position paper, we argue that the community needs to make three substantive changes: (1) expanding our s… ▽ More

    Submitted 6 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: 9 pages; Accepted to be published at ACL 2019

  44. arXiv:1905.05961  [pdf, other

    cs.CY cs.CL cs.CV cs.LG

    Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

    Authors: Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, David Jurgens

    Abstract: Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almos… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19)

    Journal ref: Proceedings of the 2019 World Wide Web Conference (WWW '19), May 13--17, 2019, San Francisco, CA, USA

  45. arXiv:1904.04176  [pdf, other

    cs.SI cs.CY

    Smart, Responsible, and Upper Caste Only: Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles

    Authors: Ashwin Rajadesingan, Ramaswami Mahalingam, David Jurgens

    Abstract: Discriminatory caste attitudes currently stigmatize millions of Indians, subjecting individuals to prejudice in all aspects of life. Governmental incentives and societal movements have attempted to counter these attitudes, yet accurate measurements of public opinions on caste are not yet available for understanding whether progress is being made. Here, we introduce a novel approach to measure publ… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: 12 pages; Accepted to be published at ICWSM'19

  46. Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities

    Authors: Tiago Cunha, David Jurgens, Chenhao Tan, Daniel Romero

    Abstract: The proliferation of online communities has created exciting opportunities to study the mechanisms that explain group success. While a growing body of research investigates community success through a single measure -- typically, the number of members -- we argue that there are multiple ways of measuring success. Here, we present a systematic study to understand the relations between these success… ▽ More

    Submitted 18 March, 2019; originally announced March 2019.

    Comments: To appear at The Web Conference 2019

  47. arXiv:1901.11162  [pdf, other

    cs.SI cs.CY

    Still out there: Modeling and Identifying Russian Troll Accounts on Twitter

    Authors: Jane Im, Eshwar Chandrasekharan, Jackson Sargent, Paige Lighthammer, Taylor Denby, Ankit Bhargava, Libby Hemphill, David Jurgens, Eric Gilbert

    Abstract: There is evidence that Russia's Internet Research Agency attempted to interfere with the 2016 U.S. election by running fake accounts on Twitter - often referred to as "Russian trolls". In this work, we: 1) develop machine learning models that predict whether a Twitter account is a Russian troll within a set of 170K control accounts; and, 2) demonstrate that it is possible to use this model to find… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

  48. arXiv:1609.00435  [pdf, other

    cs.CL cs.DL

    Citation Classification for Behavioral Analysis of a Scientific Field

    Authors: David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky

    Abstract: Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars. However, our understanding of citation behavior has been limited to small-scale manual citation analysis. We perform the largest behavioral study of citations to date, analyzing how citations are both framed and taken up by scholars in one e… ▽ More

    Submitted 1 September, 2016; originally announced September 2016.

  49. arXiv:1404.7152  [pdf, other

    cs.SI physics.soc-ph

    Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

    Authors: Ryan Compton, David Jurgens, David Allen

    Abstract: Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using… ▽ More

    Submitted 3 March, 2015; v1 submitted 28 April, 2014; originally announced April 2014.

    Comments: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan, David Jurgens, and David Allen. "Geotagging one hundred million twitter accounts with total variation minimization." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014

    MSC Class: 68T99 ACM Class: G.1.6; H.2.8; H.3.4