Search | arXiv e-print repository

Immunocto: a massive immune cell database auto-generated for histopathology

Authors: Mikaël Simard, Zhuoyan Shen, Maria A. Hawkins, Charles-Antoine Collins-Fekete

Abstract: With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sect… ▽ More With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2207.08883 [pdf, ps, other]

Population dynamics under demographic and environmental stochasticity

Authors: Alexandru Hening, Weiwei Qi, Zhongwei Shen, Yingfei Yi

Abstract: The present paper is devoted to the study of the long term dynamics of diffusion processes modelling a single species that experiences both demographic and environmental stochasticity. In our setting, the long term dynamics of the diffusion process in the absence of demographic stochasticity is determined by the sign of $Λ_0$, the external Lyapunov exponent, as follows: $Λ_0<0$ implies (asymptotic… ▽ More The present paper is devoted to the study of the long term dynamics of diffusion processes modelling a single species that experiences both demographic and environmental stochasticity. In our setting, the long term dynamics of the diffusion process in the absence of demographic stochasticity is determined by the sign of $Λ_0$, the external Lyapunov exponent, as follows: $Λ_0<0$ implies (asymptotic) extinction and $Λ_0>0$ implies convergence to a unique positive stationary distribution $μ_0$. If the system is of size $\frac{1}{ε^{2}}$ for small $ε>0$ (the intensity of demographic stochasticity), demographic effects will make the extinction time finite almost surely. This suggests that to understand the dynamics one should analyze the quasi-stationary distribution (QSD) $μ_ε$ of the system. The existence and uniqueness of the QSD is well-known under mild assumptions. We look at what happens when the population size is sent to infinity, i.e., when $ε\to 0$. We show that the external Lyapunov exponent still plays a key role: 1) If $Λ_0<0$, then $μ_ε\to δ_0$, the mean extinction time is of order $|\ln ε|$ and the extinction rate associated with the QSD $μ_ε$ has a lower bound of order $\frac{1}{|\lnε|}$; 2) If $Λ_0>0$, then $μ_ε\to μ_0$, the mean extinction time is polynomial in $\frac{1}{ε^{2}}$ and the extinction rate is polynomial in $ε^{2}$. Furthermore, when $Λ_0>0$ we are able to show that the system exhibits multiscale dynamics: at first the process quickly approaches the QSD $μ_ε$ and then, after spending a polynomially long time there, it relaxes to the extinction state. We give sharp asymptotics in $ε$ for the time spent close to $μ_ε$. △ Less

Submitted 8 July, 2024; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: 49 pages

MSC Class: 35Q84; 35J25; 37B25; 60J60

arXiv:2110.05231 [pdf, other]

Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types

Authors: Shentong Mo, Xi Fu, Chenyang Hong, Yizhen Chen, Yuxuan Zheng, Xiangru Tang, Zhiqiang Shen, Eric P Xing, Yanyan Lan

Abstract: In the genome biology research, regulatory genome modeling is an important topic for many regulatory downstream tasks, such as promoter classification, transaction factor binding sites prediction. The core problem is to model how regulatory elements interact with each other and its variability across different cell types. However, current deep learning methods often focus on modeling genome sequen… ▽ More In the genome biology research, regulatory genome modeling is an important topic for many regulatory downstream tasks, such as promoter classification, transaction factor binding sites prediction. The core problem is to model how regulatory elements interact with each other and its variability across different cell types. However, current deep learning methods often focus on modeling genome sequences of a fixed set of cell types and do not account for the interaction between multiple regulatory elements, making them only perform well on the cell types in the training set and lack the generalizability required in biological applications. In this work, we propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT. Specifically, we simultaneously take the 1d sequence of genome data and a 2d matrix of (transcription factors x regions) as the input, where three pre-training tasks are proposed to improve the robustness and generalizability of our model. We pre-train our model on the ATAC-seq dataset with 17 million genome sequences. We evaluate our GeneBERT on regulatory downstream tasks across different cell types, including promoter classification, transaction factor binding sites prediction, disease risk estimation, and splicing sites prediction. Extensive experiments demonstrate the effectiveness of multi-modal and self-supervised pre-training for large-scale regulatory genomics data. △ Less

Submitted 3 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

arXiv:2107.11732 [pdf, other]

Federated Causal Inference in Heterogeneous Observational Data

Authors: Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

Abstract: We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the… ▽ More We are interested in estimating the effect of a treatment applied to individuals at multiple sites, where data is stored locally for each site. Due to privacy constraints, individual-level data cannot be shared across sites; the sites may also have heterogeneous populations and treatment assignment mechanisms. Motivated by these considerations, we develop federated methods to draw inference on the average treatment effects of combined data across sites. Our methods first compute summary statistics locally using propensity scores and then aggregate these statistics across sites to obtain point and variance estimators of average treatment effects. We show that these estimators are consistent and asymptotically normal. To achieve these asymptotic properties, we find that the aggregation schemes need to account for the heterogeneity in treatment assignments and in outcomes across sites. We demonstrate the validity of our federated methods through a comparative study of two large medical claims databases. △ Less

Submitted 2 April, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

arXiv:2102.05785 [pdf, ps, other]

Quasi-stationary distributions of multi-dimensional diffusion processes

Authors: Alexandru Hening, Weiwei Qi, Zhongwei Shen, Yingfei Yi

Abstract: The present paper is devoted to the investigation of the long term behavior of a class of singular multi-dimensional diffusion processes that get absorbed in finite time with probability one. Our focus is on the analysis of quasi-stationary distributions (QSDs), which describe the long term behavior of the system conditioned on not being absorbed. Under natural Lyapunov conditions, we construct a… ▽ More The present paper is devoted to the investigation of the long term behavior of a class of singular multi-dimensional diffusion processes that get absorbed in finite time with probability one. Our focus is on the analysis of quasi-stationary distributions (QSDs), which describe the long term behavior of the system conditioned on not being absorbed. Under natural Lyapunov conditions, we construct a QSD and prove the sharp exponential convergence to this QSD for compactly supported initial distributions. Under stronger Lyapunov conditions ensuring that the diffusion process comes down from infinity, we show the uniqueness of a QSD and the exponential convergence to the QSD for all initial distributions. Our results can be seen as the multi-dimensional generalization of Cattiaux et al (Ann. Prob. 2009) as well as the complement to Hening and Nguyen (Ann. Appl. Prob. 2018) which looks at the long term behavior of multi-dimensional diffusions that can only become extinct asymptotically. The centerpiece of our approach concerns a uniformly elliptic operator that we relate to the generator, or the Fokker-Planck operator, associated to the diffusion process. This operator only has singular coefficients in its zeroth-order terms and can be handled more easily than the generator. For this operator, we establish the discreteness of its spectrum, its principal spectral theory, the stochastic representation of the semigroup generated by it, and the global regularity for the associated parabolic equation. We show how our results can be applied to most ecological models, among which cooperative, competitive, and predator-prey Lotka-Volterra systems. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 71 pages, 1 figure

MSC Class: 60J60; 60J70; 34F05; 92D25; 60H10

arXiv:2004.10117 [pdf]

doi 10.7554/eLife.61700

Alpha-1 adrenergic receptor antagonists to prevent hyperinflammation and death from lower respiratory tract infection

Authors: Allison Koenecke, Michael Powell, Ruoxuan Xiong, Zhu Shen, Nicole Fischer, Sakibul Huq, Adham M. Khalafallah, Marco Trevisan, Pär Sparen, Juan J Carrero, Akihiko Nishimura, Brian Caffo, Elizabeth A. Stuart, Renyuan Bai, Verena Staedtke, David L. Thomas, Nickolas Papadopoulos, Kenneth W. Kinzler, Bert Vogelstein, Shibin Zhou, Chetan Bettegowda, Maximilian F. Konig, Brett Mensh, Joshua T. Vogelstein, Susan Athey

Abstract: In severe viral pneumonia, including Coronavirus disease 2019 (COVID-19), the viral replication phase is often followed by hyperinflammation, which can lead to acute respiratory distress syndrome, multi-organ failure, and death. We previously demonstrated that alpha-1 adrenergic receptor ($α_1$-AR) antagonists can prevent hyperinflammation and death in mice. Here, we conducted retrospective analys… ▽ More In severe viral pneumonia, including Coronavirus disease 2019 (COVID-19), the viral replication phase is often followed by hyperinflammation, which can lead to acute respiratory distress syndrome, multi-organ failure, and death. We previously demonstrated that alpha-1 adrenergic receptor ($α_1$-AR) antagonists can prevent hyperinflammation and death in mice. Here, we conducted retrospective analyses in two cohorts of patients with acute respiratory distress (ARD, n=18,547) and three cohorts with pneumonia (n=400,907). Federated across two ARD cohorts, we find that patients exposed to $α_1$-AR antagonists, as compared to unexposed patients, had a 34% relative risk reduction for mechanical ventilation and death (OR=0.70, p=0.021). We replicated these methods on three pneumonia cohorts, all with similar effects on both outcomes. All results were robust to sensitivity analyses. These results highlight the urgent need for prospective trials testing whether prophylactic use of $α_1$-AR antagonists ameliorates lower respiratory tract infection-associated hyperinflammation and death, as observed in COVID-19. △ Less

Submitted 2 August, 2021; v1 submitted 21 April, 2020; originally announced April 2020.

Comments: 31 pages, 10 figures

Journal ref: Elife 10 (2021): e61700

arXiv:1810.08549 [pdf, other]

doi 10.1103/PhysRevFluids.3.103603

Optimal cell transport in straight channels and networks

Authors: Alexander Farutin, Zaiyi Shen, Gael Prado, Vassanti Audemar, Hamid Ez-Zahraouy, Abdelilah Benyoussef, Benoit Polack, Jens Harting, Petia M. Vlahovska, Thomas Podgorski, Gwennou Coupier, Chaouqi Misbah

Abstract: Flux of rigid or soft particles (such as drops, vesicles, red blood cells, etc.) in a channel is a complex function of particle concentration, which depends on the details of induced dissipation and suspension structure due to hydrodynamic interactions with walls or between neighboring particles. Through two-dimensional and three-dimensional simulations and a simple model that reveals the contribu… ▽ More Flux of rigid or soft particles (such as drops, vesicles, red blood cells, etc.) in a channel is a complex function of particle concentration, which depends on the details of induced dissipation and suspension structure due to hydrodynamic interactions with walls or between neighboring particles. Through two-dimensional and three-dimensional simulations and a simple model that reveals the contribution of the main characteristics of the flowing suspension, we discuss the existence of an optimal volume fraction for cell transport and its dependence on the cell mechanical properties. The example of blood is explored in detail, by adopting the commonly used modeling of red blood cells dynamics. We highlight the complexity of optimization at the level of a network, due to the antagonist evolution of local volume fraction and optimal volume fraction with the channels diameter. In the case of the blood network, the most recent results on the size evolution of vessels along the circulatory network of healthy organs suggest that the red blood cell volume fraction (hematocrit) of healthy subjects is close to optimality, as far as transport only is concerned. However, the hematocrit value of patients suffering from diverse red blood cel pathologies may strongly deviate from optimality. △ Less

Submitted 18 October, 2018; originally announced October 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1802.05353

Journal ref: Phys. Rev. Fluids 3, 103603 (2018)

arXiv:1807.02010 [pdf, other]

DNA Computing for Combinational Logic

Authors: Chuan Zhang, Lulu Ge, Yuchen Zhuang, Ziyuan Shen, Zhiwei Zhong, Zaichen Zhang, Xiaohu You

Abstract: With the progressive scale-down of semiconductor's feature size, people are looking forward to More Moore and More than Moore. In order to offer a possible alternative implementation process, people are trying to figure out a feasible transfer from silicon to molecular computing. Such transfer lies on bio-based modules programming with computer-like logic, aiming at realizing the Turing machine. T… ▽ More With the progressive scale-down of semiconductor's feature size, people are looking forward to More Moore and More than Moore. In order to offer a possible alternative implementation process, people are trying to figure out a feasible transfer from silicon to molecular computing. Such transfer lies on bio-based modules programming with computer-like logic, aiming at realizing the Turing machine. To accomplish this, the DNA-based combinational logic is inevitably the first step we have taken care of. This timely overview paper introduces combinational logic synthesized in DNA computing from both analog and digital perspectives separately. State-of-the-art research progress is summarized for interested readers to quick understand DNA computing, initiate discussion on existing techniques and inspire innovation solutions. We hope this paper can pave the way for the future DNA computing synthesis. △ Less

Submitted 5 July, 2018; originally announced July 2018.

arXiv:1802.05170 [pdf, other]

Molecular Computing for Markov Chains

Authors: Chuan Zhang, Ziyuan Shen, Wei Wei, Jing Zhao, Zaichen Zhang, Xiaohu You

Abstract: In this paper, it is presented a methodology for implementing arbitrarily constructed time-homogenous Markov chains with biochemical systems. Not only discrete but also continuous-time Markov chains are allowed to be computed. By employing chemical reaction networks (CRNs) as a programmable language, molecular concentrations serve to denote both input and output values. One reaction network is ela… ▽ More In this paper, it is presented a methodology for implementing arbitrarily constructed time-homogenous Markov chains with biochemical systems. Not only discrete but also continuous-time Markov chains are allowed to be computed. By employing chemical reaction networks (CRNs) as a programmable language, molecular concentrations serve to denote both input and output values. One reaction network is elaborately designed for each chain. The evolution of species' concentrations over time well matches the transient solutions of the target continuous-time Markov chain, while equilibrium concentrations can indicate the steady state probabilities. Additionally, second-order Markov chains are considered for implementation, with bimolecular reactions rather that unary ones. An original scheme is put forward to compile unimolecular systems to DNA strand displacement reactions for the sake of future physical implementations. Deterministic, stochastic and DNA simulations are provided to enhance correctness, validity and feasibility. △ Less

Submitted 14 February, 2018; originally announced February 2018.

Showing 1–9 of 9 results for author: Shen, Z