Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 71 results for author: Danforth, C M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.08580  [pdf, other

    physics.soc-ph cs.CL

    The Resume Paradox: Greater Language Differences, Smaller Pay Gaps

    Authors: Joshua R. Minot, Marc Maier, Bradford Demarest, Nicholas Cheney, Christopher M. Danforth, Peter Sheridan Dodds, Morgan R. Frank

    Abstract: Over the past decade, the gender pay gap has remained steady with women earning 84 cents for every dollar earned by men on average. Many studies explain this gap through demand-side bias in the labor market represented through employers' job postings. However, few studies analyze potential bias from the worker supply-side. Here, we analyze the language in millions of US workers' resumes to investi… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 24 pages, 15 figures

  2. arXiv:2306.06794  [pdf, other

    cs.CL cs.AI

    A blind spot for large language models: Supradiegetic linguistic information

    Authors: Julia Witte Zimmerman, Denis Hudon, Kathryn Cramer, Jonathan St. Onge, Mikaela Fudolig, Milo Z. Trujillo, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Large Language Models (LLMs) like ChatGPT reflect profound changes in the field of Artificial Intelligence, achieving a linguistic fluency that is impressively, even shockingly, human-like. The extent of their current and potential capabilities is an active area of investigation by no means limited to scientific researchers. It is common for people to frame the training data for LLMs as "text" or… ▽ More

    Submitted 16 May, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

    Comments: 21 pages, 6 figures, 3 tables. Accepted at IC2S2 2024. arXiv admin note: text overlap with arXiv:2206.02608, arXiv:2303.12712, arXiv:2305.10601, arXiv:2305.06424, arXiv:1908.08530 by other authors

    Journal ref: Plutonics, Volume 17, 2024, pages 107 - 156

  3. arXiv:2305.12160  [pdf, other

    cs.CY

    Park visitation and walkshed demographics in the United States

    Authors: Kelsey Linnell, Mikaela Fudolig, Laura Bloomfield, Thomas McAndrew, Taylor H. Ricketts, Jarlath P. M. O'Neil-Dunne, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: A large and growing body of research demonstrates the value of local parks to mental and physical well-being. Recently, researchers have begun using passive digital data sources to investigate equity in usage; exactly who is benefiting from parks? Early studies suggest that park visitation differs according to demographic features, and that the demographic composition of a park's surrounding neigh… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  4. arXiv:2305.08978  [pdf, other

    cs.SI cs.CL cs.CY

    An assessment of measuring local levels of homelessness through proxy social media signals

    Authors: Yoshi Meke Bird, Sarah E. Grobe, Michael V. Arnold, Sean P. Rogers, Mikaela I. Fudolig, Julia Witte Zimmerman, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Recent studies suggest social media activity can function as a proxy for measures of state-level public health, detectable through natural language processing. We present results of our efforts to apply this approach to estimate homelessness at the state level throughout the US during the period 2010-2019 and 2022 using a dataset of roughly 1 million geotagged tweets containing the substring ``hom… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 29 pages, 21 figures

  5. arXiv:2305.03092  [pdf, other

    cs.CL cs.CY

    Curating corpora with classifiers: A case study of clean energy sentiment online

    Authors: Michael V. Arnold, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: Well curated, large-scale corpora of social media posts containing broad public opinion offer an alternative data source to complement traditional surveys. While surveys are effective at collecting representative samples and are capable of achieving high accuracy, they can be both expensive to run and lag public opinion by days or weeks. Both of these drawbacks could be overcome with a real-time,… ▽ More

    Submitted 9 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: 12 pages, 6 figures

  6. arXiv:2208.09496  [pdf, other

    cs.CL cs.CY physics.soc-ph

    A decomposition of book structure through ousiometric fluctuations in cumulative word-time

    Authors: Mikaela Irene Fudolig, Thayer Alshaabi, Kathryn Cramer, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: While quantitative methods have been used to examine changes in word usage in books, studies have focused on overall trends, such as the shapes of narratives, which are independent of book length. We instead look at how words change over the course of a book as a function of the number of words, rather than the fraction of the book, completed at any given point; we define this measure as "cumulati… ▽ More

    Submitted 11 May, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: published in Humanities and Social Sciences Communications

    Journal ref: Humanit Soc Sci Commun 10, 187 (2023)

  7. arXiv:2110.06847  [pdf, other

    cs.CL cs.CY cs.SI physics.soc-ph

    Ousiometrics and Telegnomics: The essence of meaning conforms to a two-dimensional powerful-weak and dangerous-safe framework with diverse corpora presenting a safety bias

    Authors: P. S. Dodds, T. Alshaabi, M. I. Fudolig, J. W. Zimmerman, J. Lovato, S. Beaulieu, J. R. Minot, M. V. Arnold, A. J. Reagan, C. M. Danforth

    Abstract: We define `ousiometrics' to be the study of essential meaning in whatever context that meaningful signals are communicated, and `telegnomics' as the study of remotely sensed knowledge. From work emerging through the middle of the 20th century, the essence of meaning has become generally accepted as being well captured by the three orthogonal dimensions of evaluation, potency, and activation (EPA).… ▽ More

    Submitted 29 March, 2023; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 40 pages (34 page main manuscript, 6 page appendix), 15 figures (9 main, 6 appendix), 4 tables

  8. arXiv:2110.00587  [pdf, other

    cs.CL cs.CY cs.SI physics.soc-ph

    Sentiment and structure in word co-occurrence networks on Twitter

    Authors: Mikaela Irene Fudolig, Thayer Alshaabi, Michael V. Arnold, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: We explore the relationship between context and happiness scores in political tweets using word co-occurrence networks, where nodes in the network are the words, and the weight of an edge is the number of tweets in the corpus for which the two connected words co-occur. In particular, we consider tweets with hashtags #imwithher and #crookedhillary, both relating to Hillary Clinton's presidential bi… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Journal ref: Applied Network Science 7, 9 (2022)

  9. arXiv:2109.09010  [pdf, other

    cs.CL cs.LG cs.SI physics.soc-ph

    Augmenting semantic lexicons using word embeddings and transfer learning

    Authors: Thayer Alshaabi, Colin M. Van Oort, Mikaela Irene Fudolig, Michael V. Arnold, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Sentiment-aware intelligent systems are essential to a wide array of applications. These systems are driven by language models which broadly fall into two paradigms: Lexicon-based and contextual. Although recent contextual models are increasingly dominant, we still see demand for lexicon-based models because of their interpretability and ease of use. For example, lexicon-based models allow researc… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: 17 pages, 8 figures

    Journal ref: Front. Artif. Intell. 4:783778 (2022)

  10. arXiv:2107.06096  [pdf, other

    cs.SI physics.soc-ph stat.AP stat.ME

    Blending search queries with social media data to improve forecasts of economic indicators

    Authors: Yi Li, Asieh Ahani, Haimao Zhan, Kevin Foley, Thayer Alshaabi, Kelsey Linnell, Peter Sheridan Dodds, Christopher M. Danforth, Adam Fox

    Abstract: The forecasting of political, economic, and public health indicators using internet activity has demonstrated mixed results. For example, while some measures of explicitly surveyed public opinion correlate well with social media proxies, the opportunity for profitable investment strategies to be driven solely by sentiment extracted from social media appears to have expired. Nevertheless, the inter… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 12 pages, 7 figures

  11. arXiv:2107.04929  [pdf, other

    cs.CL

    Computational Paremiology: Charting the temporal, ecological dynamics of proverb use in books, news articles, and tweets

    Authors: E. Davis, C. M. Danforth, W. Mieder, P. S. Dodds

    Abstract: Proverbs are an essential component of language and culture, and though much attention has been paid to their history and currency, there has been comparatively little quantitative work on changes in the frequency with which they are used over time. With wider availability of large corpora reflecting many diverse genres of documents, it is now possible to take a broad and dynamic view of the impor… ▽ More

    Submitted 10 July, 2021; originally announced July 2021.

    Comments: Main paper: 16 pages, 9 figures, 1 table; Supplementary: 5 pages, 4 tables, 4 figures

  12. arXiv:2106.10281  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Say Their Names: Resurgence in the collective attention toward Black victims of fatal police violence following the death of George Floyd

    Authors: Henry H. Wu, Ryan J. Gallagher, Thayer Alshaabi, Jane L. Adams, Joshua R. Minot, Michael V. Arnold, Brooke Foucault Welles, Randall Harp, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: The murder of George Floyd by police in May 2020 sparked international protests and renewed attention in the Black Lives Matter movement. Here, we characterize ways in which the online activity following George Floyd's death was unparalleled in its volume and intensity, including setting records for activity on Twitter, prompting the saddest day in the platform's history, and causing George Floyd'… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  13. arXiv:2106.05260  [pdf, other

    stat.AP cs.IR

    Sirius: Visualization of Mixed Features as a Mutual Information Network Graph

    Authors: Jane L. Adams, Todd F. Deluca, Christopher M. Danforth, Peter S. Dodds, Yuhang Zheng, Konstantinos Anastasakis, Boyoon Choi, Allison Min, Michael M. Bessey

    Abstract: Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features of mixed data type (quantitative continuous and discrete categorical). We introduce Sirius, a novel visualization package for researchers to explore feature relationships among mixed data types using mutual information. The visualization of feature relationships ai… ▽ More

    Submitted 13 August, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    ACM Class: H.5.2; J.0

  14. arXiv:2106.01481  [pdf, other

    physics.soc-ph cs.CL cs.SI

    Quantifying language changes surrounding mental health on Twitter

    Authors: Anne Marie Stupinski, Thayer Alshaabi, Michael V. Arnold, Jane Lydia Adams, Joshua R. Minot, Matthew Price, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: Mental health challenges are thought to afflict around 10% of the global population each year, with many going untreated due to stigma and limited access to services. Here, we explore trends in words and phrases related to mental health through a collection of 1- , 2-, and 3-grams parsed from a data stream of roughly 10% of all English tweets since 2012. We examine temporal dynamics of mental heal… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 12 pages, 5 figures, 1 table

  15. arXiv:2105.12006  [pdf, other

    cs.SI cs.CL

    The incel lexicon: Deciphering the emergent cryptolect of a global misogynistic community

    Authors: Kelly Gothard, David Rushing Dewhurst, Joshua R. Minot, Jane Lydia Adams, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Evolving out of a gender-neutral framing of an involuntary celibate identity, the concept of `incels' has come to refer to an online community of men who bear antipathy towards themselves, women, and society-at-large for their perceived inability to find and maintain sexual relationships. By exploring incel language use on Reddit, a global online message board, we contextualize the incel community… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Comments: 18 pages, 11 figures

  16. arXiv:2103.05841  [pdf, other

    cs.CL stat.ML

    Interpretable bias mitigation for textual data: Reducing gender bias in patient notes while maintaining classification performance

    Authors: Joshua R. Minot, Nicholas Cheney, Marc Maier, Danne C. Elbers, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Medical systems in general, and patient treatment decisions and outcomes in particular, are affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language mo… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: 31 pages, 22 figures

  17. arXiv:2008.13078  [pdf, other

    physics.soc-ph cs.IR physics.data-an

    Probability-turbulence divergence: A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions

    Authors: P. S. Dodds, J. R. Minot, M. V. Arnold, T. Alshaabi, J. L. Adams, D. R. Dewhurst, A. J. Reagan, C. M. Danforth

    Abstract: Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we further observe heavy-tailed categorical distributions for type frequencies. For the comparison of type frequency distributions of two systems or a system with itself at different time points… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

    Comments: 14 pages, 7 figures

  18. arXiv:2008.11305  [pdf, other

    physics.soc-ph cs.SI

    Long-term word frequency dynamics derived from Twitter are corrupted: A bespoke approach to detecting and removing pathologies in ensembles of time series

    Authors: P. S. Dodds, J. R. Minot, M. V. Arnold, T. Alshaabi, J. L. Adams, D. R. Dewhurst, A. J. Reagan, C. M. Danforth

    Abstract: Maintaining the integrity of long-term data collection is an essential scientific practice. As a field evolves, so too will that field's measurement instruments and data storage systems, as they are invented, improved upon, and made obsolete. For data streams generated by opaque sociotechnical systems which may have episodic and unknown internal rule changes, detecting and accounting for shifts in… ▽ More

    Submitted 27 August, 2020; v1 submitted 25 August, 2020; originally announced August 2020.

    Comments: 8 pages, 5 figures

  19. arXiv:2008.07301  [pdf, other

    physics.soc-ph cs.SI

    Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy

    Authors: P. S. Dodds, J. R. Minot, M. V. Arnold, T. Alshaabi, J. L. Adams, A. J. Reagan, C. M. Danforth

    Abstract: Measuring the specific kind, temporal ordering, diversity, and turnover rate of stories surrounding any given subject is essential to developing a complete reckoning of that subject's historical impact. Here, we use Twitter as a distributed news and opinion aggregation source to identify and track the dynamics of the dominant day-scale stories around Donald Trump, the 45th President of the United… ▽ More

    Submitted 30 September, 2022; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: 13 pages, 5 figures (4 main, 1 appendix), 1 table. Analysis complete for 6 calendar years, from 2015/01/01 through to 2021/12/31

    Journal ref: PLOS ONE, 2021, e0260592

  20. arXiv:2008.02250  [pdf, other

    cs.CL cs.CY cs.SI physics.soc-ph

    Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts

    Authors: Ryan J. Gallagher, Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: 20 pages, 7 figures, 2 tables

    Journal ref: EPJ Data Science, 10(4), 2021

  21. arXiv:2007.12988  [pdf, other

    cs.SI cs.CL physics.soc-ph

    Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

    Authors: Thayer Alshaabi, Jane L. Adams, Michael V. Arnold, Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: In real-time, social media data strongly imprints world events, popular culture, and day-to-day conversations by millions of ordinary people at a scale that is scarcely conventionalized and recorded. Vitally, and absent from many standard corpora such as books and news archives, sharing and commenting mechanisms are native to social media platforms, enabling us to quantify social amplification (i.… ▽ More

    Submitted 16 July, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

    Comments: Main text: 15 pages, 6 figures; Supplementary text: 23 pages, 11 figures, 15 tables. Website: https://storywrangling.org/

    Journal ref: Sci.Adv. 7 eabe6534 (2021)

  22. arXiv:2007.09124  [pdf, other

    cs.SI physics.soc-ph

    Local information sources received the most attention from Puerto Ricans during the aftermath of Hurricane María

    Authors: Benjamin Freixas Emery, Meredith T. Niles, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: In September 2017, Hurricane María made landfall across the Caribbean region as a category 4 storm. In the aftermath, many residents of Puerto Rico were without power or clean running water for nearly a year. Using both English and Spanish tweets from September 16 to October 15 2017, we investigate discussion of María both on and off the island, constructing a proxy for the temporal network of com… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  23. arXiv:2006.10658  [pdf, other

    physics.soc-ph cs.SI

    Gauging the happiness benefit of US urban parks through Twitter

    Authors: A. J. Schwartz, P. S. Dodds, J. P. M. O'Neil-Dunne, T. H. Ricketts, C. M. Danforth

    Abstract: The relationship between nature contact and mental well-being has received increasing attention in recent years. While a body of evidence has accumulated demonstrating a positive relationship between time in nature and mental well-being, there have been few studies comparing this relationship in different locations over long periods of time. In this study, we estimate a happiness benefit, the diff… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 13 pages including appendix, 9 figures, 2 tables

  24. arXiv:2006.08527  [pdf, other

    physics.soc-ph cs.SI stat.AP

    The sociospatial factors of death: Analyzing effects of geospatially-distributed variables in a Bayesian mortality model for Hong Kong

    Authors: Thayer Alshaabi, David Rushing Dewhurst, James P. Bagrow, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information… ▽ More

    Submitted 25 January, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 26 pages (15 main, 11 appendix), 22 figures (6 main, 11 appendix), 2 tables

  25. arXiv:2006.03526  [pdf, other

    physics.soc-ph cs.SI

    Ratioing the President: An exploration of public engagement with Obama and Trump on Twitter

    Authors: Joshua R. Minot, Michael V. Arnold, Thayer Alshaabi, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: The past decade has witnessed a marked increase in the use of social media by politicians, most notably exemplified by the 45th President of the United States (POTUS), Donald Trump. On Twitter, POTUS messages consistently attract high levels of engagement as measured by likes, retweets, and replies. Here, we quantify the balance of these activities, also known as "ratios", and study their dynamics… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

    Comments: 17 pages, 10 figures

  26. arXiv:2004.03516  [pdf, other

    physics.soc-ph cs.SI

    Divergent modes of online collective attention to the COVID-19 pandemic are associated with future caseload variance

    Authors: David Rushing Dewhurst, Thayer Alshaabi, Michael V. Arnold, Joshua R. Minot, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Using a random 10% sample of tweets authored from 2019-09-01 through 2020-04-30, we analyze the dynamic behavior of words (1-grams) used on Twitter to describe the ongoing COVID-19 pandemic. Across 24 languages, we find two distinct dynamic regimes: One characterizing the rise and subsequent collapse in collective attention to the initial Coronavirus outbreak in late January, and a second that rep… ▽ More

    Submitted 19 May, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 12 + 4 pages, 11 + 4 figures, code + data + figures will soon be available at http://compstorylab.org/covid19ngrams/

  27. arXiv:2003.14291  [pdf, other

    cs.SI physics.soc-ph

    Hurricanes and hashtags: Characterizing online collective attention for natural disasters

    Authors: Michael V. Arnold, David Rushing Dewhurst, Thayer Alshaabi, Joshua R. Minot, Jane L. Adams, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: We study collective attention paid towards hurricanes through the lens of $n$-grams on Twitter, a social media platform with global reach. Using hurricane name mentions as a proxy for awareness, we find that the exogenous temporal dynamics are remarkably similar across storms, but that overall collective attention varies widely even among storms causing comparable deaths and damage. We construct `… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: 31 pages (14 main, 17 Supplemental), 19 figures (5 main, 14 appendix)

  28. arXiv:2003.12614  [pdf, other

    physics.soc-ph cs.SI

    How the world's collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter

    Authors: T. Alshaabi, J. R. Minot, M. V. Arnold, J. L. Adams, D. R. Dewhurst, A. J. Reagan, R. Muhamad, C. M. Danforth, P. S. Dodds

    Abstract: In confronting the global spread of the coronavirus disease COVID-19 pandemic we must have coordinated medical, operational, and political responses. In all efforts, data is crucial. Fundamentally, and in the possible absence of a vaccine for 12 to 18 months, we need universal, well-documented testing for both the presence of the disease as well as confirmed recovery through serological tests for… ▽ More

    Submitted 6 January, 2021; v1 submitted 27 March, 2020; originally announced March 2020.

    Comments: 13 pages, 6 figures, 3 tables, website: http://compstorylab.org/covid19ngrams/

  29. The growing amplification of social media: Measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009-2020

    Authors: Thayer Alshaabi, David R. Dewhurst, Joshua R. Minot, Michael V. Arnold, Jane L. Adams, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio':… ▽ More

    Submitted 8 March, 2021; v1 submitted 7 March, 2020; originally announced March 2020.

    Comments: 26 pages (15 main, 11 appendix), 13 figures (6 main, 7 appendix), and 4 online appendices available at http://compstorylab.org/storywrangler/papers/tlid/

  30. arXiv:1910.00149  [pdf, other

    physics.soc-ph cs.SI

    Fame and Ultrafame: Measuring and comparing daily levels of `being talked about' for United States' presidents, their rivals, God, countries, and K-pop

    Authors: Peter Sheridan Dodds, Joshua R. Minot, Michael V. Arnold, Thayer Alshaabi, Jane Lydia Adams, David Rushing Dewhurst, Andrew J. Reagan, Christopher M. Danforth

    Abstract: When building a global brand of any kind -- a political actor, clothing style, or belief system -- developing widespread awareness is a primary goal. Short of knowing any of the stories or products of a brand, being talked about in whatever fashion -- raw fame -- is, as Oscar Wilde would have it, better than not being talked about at all. Here, we measure, examine, and contrast the day-to-day raw… ▽ More

    Submitted 29 October, 2021; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: 31 pages (21 pages main text, 10 pages appendix), 8 figures (7 in main text, 1 in appendix), 10 tables (1 in main text, 9 in appendix)

  31. arXiv:1907.12567  [pdf

    cs.CY physics.soc-ph

    Exploring Perceptions of Veganism

    Authors: Laura Jennings, Christopher M. Danforth, Peter Sheridan Dodds, Elizabeth Pinel, Lizzy Pope

    Abstract: This project examined perceptions of the vegan lifestyle using surveys and social media to explore barriers to choosing veganism. A survey of 510 individuals indicated that non-vegans did not believe veganism was as healthy or difficult as vegans. In a second analysis, Instagram posts using #vegan suggest content is aimed primarily at the female vegan community. Finally, sentiment analysis of roug… ▽ More

    Submitted 29 July, 2019; originally announced July 2019.

  32. arXiv:1907.03920  [pdf, other

    cs.CL physics.soc-ph

    Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

    Authors: Tyler J. Gray, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Stretched words like `heellllp' or `heyyyyy' are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of `stretchable words' found in roughly 100 billion twee… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Comments: 18 pages, 18 figures, and 9 tables. Online appendices at http://compstorylab.org/stretchablewords/

  33. arXiv:1906.11710  [pdf, other

    physics.soc-ph cs.DS eess.SP physics.data-an

    The shocklet transform: A decomposition method for the identification of local, mechanism-driven dynamics in sociotechnical time series

    Authors: David Rushing Dewhurst, Thayer Alshaabi, Dilan Kiley, Michael V. Arnold, Joshua R. Minot, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: We introduce a qualitative, shape-based, timescale-independent time-domain transform used to extract local dynamics from sociotechnical time series---termed the Discrete Shocklet Transform (DST)---and an associated similarity search routine, the Shocklet Transform And Ranking (STAR) algorithm, that indicates time windows during which panels of time series display qualitatively-similar anomalous be… ▽ More

    Submitted 18 December, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

    Comments: 29 pages (20 body, 9 appendix), 20 figures (13 body, 7 appendix), three online appendices available at http://compstorylab.org/shocklets/ (two displaying interactive visualizations and one containing over 10,000 figures), open-source implementation of STAR algorithm and discrete shocklet transform available at https://gitlab.com/compstorylab/discrete-shocklet-transform

  34. arXiv:1807.07982  [pdf

    cs.SI cs.CL cs.CY

    Visitors to urban greenspace have higher sentiment and lower negativity on Twitter

    Authors: Aaron J. Schwartz, Peter Sheridan Dodds, Jarlath P. M. O'Neil-Dunne, Christopher M. Danforth, Taylor H. Ricketts

    Abstract: With more people living in cities, we are witnessing a decline in exposure to nature. A growing body of research has demonstrated an association between nature contact and improved mood. Here, we used Twitter and the Hedonometer, a world analysis tool, to investigate how sentiment, or the estimated happiness of the words people write, varied before, during, and after visits to San Francisco's urba… ▽ More

    Submitted 27 August, 2019; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: 18 pages, 5 figures

    Journal ref: People Nat. 2019; 00: 1- 10

  35. arXiv:1806.07451  [pdf, other

    cs.SI physics.soc-ph

    Social media usage patterns during natural hazards

    Authors: Meredith T. Niles, Benjamin F. Emery, Andrew J. Reagan, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: Natural hazards are becoming increasingly expensive as climate change and development are exposing communities to greater risks. Preparation and recovery are critical for climate change resilience, and social media are being used more and more to communicate before, during, and after disasters. While there is a growing body of research aimed at understanding how people use social media surrounding… ▽ More

    Submitted 24 October, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

  36. arXiv:1805.09959  [pdf, other

    cs.CL cs.SI

    A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter

    Authors: Eric M. Clark, Ted James, Chris A. Jones, Amulya Alapati, Promise Ukandu, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing the… ▽ More

    Submitted 12 October, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

  37. arXiv:1803.09745  [pdf, other

    cs.CL physics.soc-ph

    English verb regularization in books and tweets

    Authors: Tyler J. Gray, Andrew J. Reagan, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: The English language has evolved dramatically throughout its lifespan, to the extent that a modern speaker of Old English would be incomprehensible without translation. One concrete indicator of this process is the movement from irregular to regular (-ed) forms for the past tense of verbs. In this study we quantify the extent of verb regularization using two vastly disparate datasets: (1) Six year… ▽ More

    Submitted 3 January, 2019; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: 16 pages, 10 figures, and 4 tables. Online appendices at https://www.uvm.edu/storylab/share/papers/gray2018a/ ; Updated to journal version with minor differences from first version

    Journal ref: PLOS ONE 13(12): e0209651, 2018

  38. arXiv:1703.09774  [pdf, other

    cs.SI physics.soc-ph

    Measuring the happiness of large-scale written expression: Songs, Blogs, and Presidents

    Authors: Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: The importance of quantifying the nature and intensity of emotional states at the level of populations is evident: we would like to know how, when, and why individuals feel as they do if we wish, for example, to better construct public policy, build more successful organizations, and, from a scientific perspective, more fully understand economic and social phenomena. Here, by incorporating direct… ▽ More

    Submitted 6 March, 2017; originally announced March 2017.

    Comments: 13 pages, 11 figures, 3 tables

    Journal ref: Journal of Happiness Studies, 11(4), 441-456, 2010 (published online July 20, 2009)

  39. arXiv:1703.06361  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Which friends are more popular than you? Contact strength and the friendship paradox in social networks

    Authors: James P. Bagrow, Christopher M. Danforth, Lewis Mitchell

    Abstract: The friendship paradox states that in a social network, egos tend to have lower degree than their alters, or, "your friends have more friends than you do". Most research has focused on the friendship paradox and its implications for information transmission, but treating the network as static and unweighted. Yet, people can dedicate only a finite fraction of their attention budget to each social i… ▽ More

    Submitted 18 March, 2017; originally announced March 2017.

  40. arXiv:1608.07740  [pdf

    physics.soc-ph cs.SI

    Forecasting the onset and course of mental illness with Twitter data

    Authors: Andrew G. Reece, Andrew J. Reagan, Katharina L. M. Lix, Peter Sheridan Dodds, Christopher M. Danforth, Ellen J. Langer

    Abstract: We developed computational models to predict the emergence of depression and Post-Traumatic Stress Disorder in Twitter users. Twitter data and details of depression history were collected from 204 individuals (105 depressed, 99 healthy). We extracted predictive features measuring affect, linguistic style, and context from participant tweets (N=279,951) and built models using these features with su… ▽ More

    Submitted 27 August, 2016; originally announced August 2016.

    Comments: 23 pages, 6 figures

  41. arXiv:1608.03282  [pdf

    cs.SI physics.soc-ph

    Instagram photos reveal predictive markers of depression

    Authors: Andrew G. Reece, Christopher M. Danforth

    Abstract: Using Instagram data from 166 individuals, we applied machine learning tools to successfully identify markers of depression. Statistical features were computationally extracted from 43,950 participant Instagram photos, using color analysis, metadata components, and algorithmic face detection. Resulting models outperformed general practitioners' average diagnostic success rate for depression. These… ▽ More

    Submitted 13 August, 2016; v1 submitted 10 August, 2016; originally announced August 2016.

    Comments: 34 pages, 12 figures

  42. arXiv:1608.02024  [pdf, other

    physics.soc-ph cs.SI

    Public Opinion Polling with Twitter

    Authors: Emily M. Cody, Andrew J. Reagan, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: Solicited public opinion surveys reach a limited subpopulation of willing participants and are expensive to conduct, leading to poor time resolution and a restricted pool of expert-chosen survey topics. In this study, we demonstrate that unsolicited public opinion polling through sentiment analysis applied to Twitter correlates well with a range of traditional measures, and has predictive power fo… ▽ More

    Submitted 5 August, 2016; originally announced August 2016.

  43. The emotional arcs of stories are dominated by six basic shapes

    Authors: Andrew J. Reagan, Lewis Mitchell, Dilan Kiley, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Advances in computing power, natural language processing, and digitization of text now make it possible to study a culture's evolution through its texts using a "big data" lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories and forming patterns that are meaningful to us. Here, by classifying the emotional… ▽ More

    Submitted 25 September, 2016; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: Manuscript: 10 pages, 7 figures. Supplementary: 81 pages, 29 figures

  44. Divergent discourse between protests and counter-protests: #BlackLivesMatter and #AllLivesMatter

    Authors: Ryan J. Gallagher, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Since the shooting of Black teenager Michael Brown by White police officer Darren Wilson in Ferguson, Missouri, the protest hashtag #BlackLivesMatter has amplified critiques of extrajudicial killings of Black Americans. In response to #BlackLivesMatter, other Twitter users have adopted #AllLivesMatter, a counter-protest hashtag whose content argues that equal attention should be given to all lives… ▽ More

    Submitted 19 May, 2017; v1 submitted 22 June, 2016; originally announced June 2016.

    Comments: 26 pages, 27 figures

    Journal ref: PLoS ONE, 2018

  45. arXiv:1605.00309  [pdf, other

    cs.SI cs.DL physics.soc-ph

    Connecting every bit of knowledge: The structure of Wikipedia's First Link Network

    Authors: Mark Ibrahim, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Apples, porcupines, and the most obscure Bob Dylan song---is every topic a few clicks from Philosophy? Within Wikipedia, the surprising answer is yes: nearly all paths lead to Philosophy. Wikipedia is the largest, most meticulously indexed collection of human knowledge ever amassed. More than information about a topic, Wikipedia is a web of naturally emerging relationships. By following the first… ▽ More

    Submitted 6 December, 2016; v1 submitted 1 May, 2016; originally announced May 2016.

  46. What we write about when we write about causality: Features of causal statements across large-scale social discourse

    Authors: Thomas C. McAndrew, Joshua C. Bongard, Christopher M. Danforth, Peter S. Dodds, Paul D. H. Hines, James P. Bagrow

    Abstract: Identifying and communicating relationships between causes and effects is important for understanding our world, but is affected by language structure, cognitive and emotional biases, and the properties of the communication medium. Despite the increasing importance of social media, much remains unknown about causal statements made online. To study real-world causal attribution, we extract a large-… ▽ More

    Submitted 21 April, 2016; v1 submitted 19 April, 2016; originally announced April 2016.

    Journal ref: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, 2016, pp. 519-524

  47. arXiv:1601.07969  [pdf, other

    cs.CL

    Zipf's law is a consequence of coherent language production

    Authors: Jake Ryland Williams, James P. Bagrow, Andrew J. Reagan, Sharon E. Alajajian, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: The task of text segmentation may be undertaken at many levels in text analysis---paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale of segmentation, hypothesizing it to be in accord with a stochastic model of language generation, as the smallest scale where independent units of meaning are produced. Our goals in this letter include the development of methods… ▽ More

    Submitted 5 August, 2016; v1 submitted 28 January, 2016; originally announced January 2016.

    Comments: 5 pages, 4 figures

  48. arXiv:1512.00531  [pdf, other

    cs.CL

    Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs

    Authors: Andrew J. Reagan, Brian Tivnan, Jake Ryland Williams, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests of 6 dictionary-based methods applied to 4 different co… ▽ More

    Submitted 7 September, 2016; v1 submitted 1 December, 2015; originally announced December 2015.

    Comments: 45 pages, 34 figures. More dictionaries added

  49. arXiv:1510.03765  [pdf, other

    q-bio.NC cs.NE

    Nonlinear functional mapping of the human brain

    Authors: Nicholas Allgaier, Tobias Banaschewski, Gareth Barker, Arun L. W. Bokde, Josh C. Bongard, Uli Bromberg, Christian Büchel, Anna Cattrell, Patricia J. Conrod, Christopher M. Danforth, Sylvane Desrivières, Peter S. Dodds, Herta Flor, Vincent Frouin, Jürgen Gallinat, Penny Gowland, Andreas Heinz, Bernd Ittermann, Scott Mackey, Jean-Luc Martinot, Kevin Murphy, Frauke Nees, Dimitri Papadopoulos-Orfanos, Luise Poustka, Michael N. Smolka , et al. (5 additional authors not shown)

    Abstract: The field of neuroimaging has truly become data rich, and novel analytical methods capable of gleaning meaningful information from large stores of imaging data are in high demand. Those methods that might also be applicable on the level of individual subjects, and thus potentially useful clinically, are of special interest. In the present study, we introduce just such a method, called nonlinear fu… ▽ More

    Submitted 8 September, 2015; originally announced October 2015.

    Comments: 21 pages, 12 figures, and 1 table

  50. arXiv:1508.01843  [pdf, other

    cs.SI

    Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter

    Authors: Eric M. Clark, Chris A. Jones, Jake Ryland Williams, Allison N. Kurti, Michell Craig Nortotsky, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Background: Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, and free samples. Methods:All electronic cigarette keyword related tweets from a 10% sample of Twitter spanning January 2012… ▽ More

    Submitted 5 March, 2016; v1 submitted 7 August, 2015; originally announced August 2015.