-
Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic
Authors:
Xi Chen,
Scott A. Hale,
David Jurgens,
Mattia Samory,
Ethan Zuckerman,
Przemyslaw A. Grabowicz
Abstract:
News coverage profoundly affects how countries and individuals behave in international relations. Yet, we have little empirical evidence of how news coverage varies across countries. To enable studies of global news coverage, we develop an efficient computational methodology that comprises three components: (i) a transformer model to estimate multilingual news similarity; (ii) a global event ident…
▽ More
News coverage profoundly affects how countries and individuals behave in international relations. Yet, we have little empirical evidence of how news coverage varies across countries. To enable studies of global news coverage, we develop an efficient computational methodology that comprises three components: (i) a transformer model to estimate multilingual news similarity; (ii) a global event identification system that clusters news based on a similarity network of news articles; and (iii) measures of news synchrony across countries and news diversity within a country, based on country-specific distributions of news coverage of the global events. Each component achieves state-of-the art performance, scaling seamlessly to massive datasets of millions of news articles. We apply the methodology to 60 million news articles published globally between January 1 and June 30, 2020, across 124 countries and 10 languages, detecting 4357 news events. We identify the factors explaining diversity and synchrony of news coverage across countries. Our study reveals that news media tend to cover a more diverse set of events in countries with larger Internet penetration, more official languages, larger religious diversity, higher economic inequality, and larger populations. Coverage of news events is more synchronized between countries that not only actively participate in commercial and political relations -- such as, pairs of countries with high bilateral trade volume, and countries that belong to the NATO military alliance or BRICS group of major emerging economies -- but also countries that share certain traits: an official language, high GDP, and high democracy indices.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit
Authors:
Virginia Partridge,
Jasmine Mangat,
Rebecca Curran,
Ryan McGrady,
Ethan Zuckerman
Abstract:
We present a method for mapping Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings…
▽ More
We present a method for mapping Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings that maintain quality while supporting insight into changes in Reddit communities over time. Comparing different clusterings of community embeddings with quantitative measures of quality and temporal stability, we describe properties of the models and what they tell us about the underlying Reddit data. Moreover, qualitative analysis of the resulting clusters illuminate which properties of clusterings are useful for analysis of Reddit communities. Although clusterings of subreddits have been used in many earlier works, we believe this is the first study to qualitatively analyze how these clusterings are perceived by social media researchers at a Reddit-wide scale.
Finally, we demonstrate how the temporal snapshots might be used in exploratory study. We are able to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading. This work informed the development of a webtool for exploring Reddit now available to the public at RedditMap.social.
△ Less
Submitted 22 December, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Virtual histological staining of unlabeled autopsy tissue
Authors:
Yuzhu Li,
Nir Pillar,
Jingxi Li,
Tairan Liu,
Di Wu,
Songyu Sun,
Guangdong Ma,
Kevin de Haan,
Luzhe Huang,
Sepehr Hamidi,
Anatoly Urisman,
Tal Keidar Haran,
William Dean Wallace,
Jonathan E. Zuckerman,
Aydogan Ozcan
Abstract:
Histological examination is a crucial step in an autopsy; however, the traditional histochemical staining of post-mortem samples faces multiple challenges, including the inferior staining quality due to autolysis caused by delayed fixation of cadaver tissue, as well as the resource-intensive nature of chemical staining procedures covering large tissue areas, which demand substantial labor, cost, a…
▽ More
Histological examination is a crucial step in an autopsy; however, the traditional histochemical staining of post-mortem samples faces multiple challenges, including the inferior staining quality due to autolysis caused by delayed fixation of cadaver tissue, as well as the resource-intensive nature of chemical staining procedures covering large tissue areas, which demand substantial labor, cost, and time. These challenges can become more pronounced during global health crises when the availability of histopathology services is limited, resulting in further delays in tissue fixation and more severe staining artifacts. Here, we report the first demonstration of virtual staining of autopsy tissue and show that a trained neural network can rapidly transform autofluorescence images of label-free autopsy tissue sections into brightfield equivalent images that match hematoxylin and eosin (H&E) stained versions of the same samples, eliminating autolysis-induced severe staining artifacts inherent in traditional histochemical staining of autopsied tissue. Our virtual H&E model was trained using >0.7 TB of image data and a data-efficient collaboration scheme that integrates the virtual staining network with an image registration network. The trained model effectively accentuated nuclear, cytoplasmic and extracellular features in new autopsy tissue samples that experienced severe autolysis, such as COVID-19 samples never seen before, where the traditional histochemical staining failed to provide consistent staining quality. This virtual autopsy staining technique can also be extended to necrotic tissue, and can rapidly and cost-effectively generate artifact-free H&E stains despite severe autolysis and cell death, also reducing labor, cost and infrastructure requirements associated with the standard histochemical staining.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Media Cloud: Massive Open Source Collection of Global News on the Open Web
Authors:
Hal Roberts,
Rahul Bhargava,
Linas Valiukas,
Dennis Jen,
Momin M. Malik,
Cindy Bishop,
Emily Ndulue,
Aashka Dave,
Justin Clark,
Bruce Etling,
Rob Faris,
Anushka Shah,
Jasmin Rubinovitz,
Alexis Hope,
Catherine D'Ignazio,
Fernando Bermejo,
Yochai Benkler,
Ethan Zuckerman
Abstract:
We present the first full description of Media Cloud, an open source platform based on crawling hyperlink structure in operation for over 10 years, that for many uses will be the best way to collect data for studying the media ecosystem on the open web. We document the key choices behind what data Media Cloud collects and stores, how it processes and organizes these data, and its open API access a…
▽ More
We present the first full description of Media Cloud, an open source platform based on crawling hyperlink structure in operation for over 10 years, that for many uses will be the best way to collect data for studying the media ecosystem on the open web. We document the key choices behind what data Media Cloud collects and stores, how it processes and organizes these data, and its open API access as well as user-facing tools. We also highlight the strengths and limitations of the Media Cloud collection strategy compared to relevant alternatives. We give an overview two sample datasets generated using Media Cloud and discuss how researchers can use the platform to create their own datasets.
△ Less
Submitted 1 May, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Deep learning-based transformation of the H&E stain into special stains
Authors:
Kevin de Haan,
Yijie Zhang,
Jonathan E. Zuckerman,
Tairan Liu,
Anthony E. Sisk,
Miguel F. P. Diaz,
Kuang-Yu Jen,
Alexander Nobori,
Sofia Liou,
Sarah Zhang,
Rana Riahi,
Yair Rivenson,
W. Dean Wallace,
Aydogan Ozcan
Abstract:
Pathology is practiced by visual inspection of histochemically stained slides. Most commonly, the hematoxylin and eosin (H&E) stain is used in the diagnostic workflow and it is the gold standard for cancer diagnosis. However, in many cases, especially for non-neoplastic diseases, additional "special stains" are used to provide different levels of contrast and color to tissue components and allow p…
▽ More
Pathology is practiced by visual inspection of histochemically stained slides. Most commonly, the hematoxylin and eosin (H&E) stain is used in the diagnostic workflow and it is the gold standard for cancer diagnosis. However, in many cases, especially for non-neoplastic diseases, additional "special stains" are used to provide different levels of contrast and color to tissue components and allow pathologists to get a clearer diagnostic picture. In this study, we demonstrate the utility of supervised learning-based computational stain transformation from H&E to different special stains (Masson's Trichrome, periodic acid-Schiff and Jones silver stain) using tissue sections from kidney needle core biopsies. Based on evaluation by three renal pathologists, followed by adjudication by a fourth renal pathologist, we show that the generation of virtual special stains from existing H&E images improves the diagnosis in several non-neoplastic kidney diseases sampled from 58 unique subjects. A second study performed by three pathologists found that the quality of the special stains generated by the stain transformation network was statistically equivalent to those generated through standard histochemical staining. As the transformation of H&E images into special stains can be achieved within 1 min or less per patient core specimen slide, this stain-to-stain transformation framework can improve the quality of the preliminary diagnosis when additional special stains are needed, along with significant savings in time and cost, reducing the burden on healthcare system and patients.
△ Less
Submitted 12 August, 2021; v1 submitted 20 August, 2020;
originally announced August 2020.