In many analyses of high-throughput data in systems biology, calculating the activity of a set of... more In many analyses of high-throughput data in systems biology, calculating the activity of a set of genes rather than focusing on the differential expression of individual genes has proven to be efficient and informative. Here, we present the rROMA software package for fast and accurate computation of the activity of gene sets with coordinated expression. We applied rROMA to cystic fibrosis, highlighting biological mechanisms potentially involved in the establishment and progression of the disease and the associated genes. Source code and documentation are available athttps://github.com/sysbiocurie/rROMA.
<p>(<b>A</b>) <i>In silico</i> knockout of IL-10 in mononuclear pha... more <p>(<b>A</b>) <i>In silico</i> knockout of IL-10 in mononuclear phagocytes (Mono IL-10<sup>−</sup>), T cells (T IL10<sup>−</sup>), and NK cells (NK IL10<sup>−</sup>) compared with baseline <i>in silico</i> model and in vivo (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003334#pcbi.1003334-Murray5" target="_blank">[58]</a>; WT). (<b>B</b>) <i>In silico</i> knockout of IL-10 from Kupffer cells (KC IL10<sup>−</sup>) and non-resident macrophages/monocytes/DC (Mac IL10<sup>−</sup>), compared with baseline <i>in silico</i> model and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003334#pcbi.1003334-Murray5" target="_blank">[58]</a>. In all the panels, means and standard deviation are reported. Standard deviation is indicated by error bars or shaded areas.</p
Multidimensional datapoint clouds representing large datasets are frequently characterized by non... more Multidimensional datapoint clouds representing large datasets are frequently characterized by non‐trivial low‐dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent‐like optimization of the<br> graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large dat...
Session 1: Disease maps resources Integrating disease maps using a graph database approach Irina ... more Session 1: Disease maps resources Integrating disease maps using a graph database approach Irina Balaur1, Alexander Mazein1, Charles Auffray1 European Institute for Systems Biology and Medicine (EISBM), Lyon , France1 Background: Disease maps are being developed as comprehensive, highly curated and humanreadable resources for describing disease mechanisms. The Disease Maps community is continuously extending and currently includes 14 projects (http://diseasemaps.org/projects). There is a need for integration of disease maps in a common platform in order to facilitate extension, interrogation and visualization of the integrated data. Graph databases are a natural way to represent and manage biological networks (Lysenko et al., 2016, PMID 27462371; Balaur et al., 2016, PMID 27627442; Toure et al., 2016, PMID 27919219; Balaur et al., 2017, PMID: 27993779; Fabregat et al., 2018, PMID 29377902). Objectives: We aim to highlight advantages and comment limitations of using the popular graph...
Large datasets represented by multidimensional data point clouds often possess non-trivial distri... more Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently ...
We present ElPiGraph, a method for approximating data distributions having non-trivial topologica... more We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional and noisy data. Instead, ElPiGraph constructs elastic principal graphs in a more robust way by minimizing elastic energy, applying graph grammars and explicitly controlling topological complexity. Using trimmed approximation error function makes ElPiGraph extremely robust to the presence of background noise without decreasing computational performance and allows it to deal with complex cases of manifold learning (for example, ElPiGraph can learn disconnected intersecting manifolds). Thanks to the quasi-quadratic nature of the elastic function, ElPiGraph performs almost as fast as a simple k-means clustering and, therefore, is much more scala...
TV dramas constitute an important part of the entertainment industry, with popular shows attracti... more TV dramas constitute an important part of the entertainment industry, with popular shows attracting millions of viewers and resulting in significant revenues. Finding a way to explore formally the social dynamics underpinning these show has therefore important implications, as it would allow us not only to understand which features are most likely to be associated with the popularity of a show, but also to explore the extent to which such fictional world have social interactions comparable with the real world. To begin tackling this question, we employed network analysis to systematically and quantitatively explore how the interactions between noble houses of the fantasy drama TV series Game of Thrones change as the show progresses. Our analysis discloses the invisible threads that connected different houses and shows how tension across the houses, as measure via structural balance, changes over time. To boost the impact of our analysis, we further extended our analysis to explore h...
We discuss an overtly "simple approach" to complex biological systems borrowing selecti... more We discuss an overtly "simple approach" to complex biological systems borrowing selectively from theoretical physics. The approach is framed by three maxims, and we show examples of its success in two different applications: investigating cellular robustness at the level of gene regulatory networks and quantifying rare events of DNA replication errors.
Background: The novel coronavirus disease 2019 (COVID-19) outbreak presents a significant threat ... more Background: The novel coronavirus disease 2019 (COVID-19) outbreak presents a significant threat to global health. A better understanding of patient clinical profiles is essential to drive efficient and timely health service strategies. In this study, we aimed to identify risk factors for a higher susceptibility to symptomatic presentation with COVID-19 and a transition to severe disease. Methods: We analysed data on 2756 patients admitted to Chelsea & Westminster Hospital NHS Foundation Trust between 1st January and 23rd April 2020. We compared differences in characteristics between patients designated positive for COVID-19 and patients designated negative on hospitalisation and derived a multivariable logistic regression model to identify risk factors for predicting risk of symptomatic COVID-19. For patients with COVID-19, we used univariable and multivariable logistic regression to identify risk factors associated with progression to severe disease defined by: 1) admission to the...
In order to maintain functional robustness and species integrity, organisms must ensure high fide... more In order to maintain functional robustness and species integrity, organisms must ensure high fidelity of the genome duplication process. This is particularly true during early development, where cell division is often occurring both rapidly and coherently. By studying the extreme limits of suppressing DNA replication failure due to double fork stall errors, we uncover a fundamental constant that describes a trade-off between genome size and architectural complexity of the developing organism. This constant has the approximate value N U ≈ 3 × 1012 basepairs, and depends only on two highly conserved molecular properties of DNA biology. We show that our theory is successful in interpreting a diverse range of data across the Eukaryota.
A subset of cancer-associated fibroblasts (FAP+/CAF-S1) mediates immunosuppression in breast canc... more A subset of cancer-associated fibroblasts (FAP+/CAF-S1) mediates immunosuppression in breast cancers, but its heterogeneity and its impact on immunotherapy response remain unknown. Here, we identify 8 CAF-S1 clusters by analyzing more than 19,000 single CAF-S1 fibroblasts from breast cancer. We validate the five most abundant clusters by flow cytometry and in silico analyses in other cancer types, highlighting their relevance. Myofibroblasts from clusters 0 and 3, characterized by extracellular matrix proteins and TGFβ signaling, respectively, are indicative of primary resistance to immunotherapies. Cluster 0/ecm-myCAF upregulates PD-1 and CTLA4 protein levels in regulatory T lymphocytes (Tregs), which, in turn, increases CAF-S1 cluster 3/TGFβ-myCAF cellular content. Thus, our study highlights a positive feedback loop between specific CAF-S1 clusters and Tregs and uncovers their role in immunotherapy resistance. Significance: Our work provides a significant advance in characterizing...
Multidimensional datapoint clouds representing large datasets are frequently characterized by non... more Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in v...
Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiat... more Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data.
In many analyses of high-throughput data in systems biology, calculating the activity of a set of... more In many analyses of high-throughput data in systems biology, calculating the activity of a set of genes rather than focusing on the differential expression of individual genes has proven to be efficient and informative. Here, we present the rROMA software package for fast and accurate computation of the activity of gene sets with coordinated expression. We applied rROMA to cystic fibrosis, highlighting biological mechanisms potentially involved in the establishment and progression of the disease and the associated genes. Source code and documentation are available athttps://github.com/sysbiocurie/rROMA.
<p>(<b>A</b>) <i>In silico</i> knockout of IL-10 in mononuclear pha... more <p>(<b>A</b>) <i>In silico</i> knockout of IL-10 in mononuclear phagocytes (Mono IL-10<sup>−</sup>), T cells (T IL10<sup>−</sup>), and NK cells (NK IL10<sup>−</sup>) compared with baseline <i>in silico</i> model and in vivo (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003334#pcbi.1003334-Murray5" target="_blank">[58]</a>; WT). (<b>B</b>) <i>In silico</i> knockout of IL-10 from Kupffer cells (KC IL10<sup>−</sup>) and non-resident macrophages/monocytes/DC (Mac IL10<sup>−</sup>), compared with baseline <i>in silico</i> model and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003334#pcbi.1003334-Murray5" target="_blank">[58]</a>. In all the panels, means and standard deviation are reported. Standard deviation is indicated by error bars or shaded areas.</p
Multidimensional datapoint clouds representing large datasets are frequently characterized by non... more Multidimensional datapoint clouds representing large datasets are frequently characterized by non‐trivial low‐dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent‐like optimization of the<br> graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large dat...
Session 1: Disease maps resources Integrating disease maps using a graph database approach Irina ... more Session 1: Disease maps resources Integrating disease maps using a graph database approach Irina Balaur1, Alexander Mazein1, Charles Auffray1 European Institute for Systems Biology and Medicine (EISBM), Lyon , France1 Background: Disease maps are being developed as comprehensive, highly curated and humanreadable resources for describing disease mechanisms. The Disease Maps community is continuously extending and currently includes 14 projects (http://diseasemaps.org/projects). There is a need for integration of disease maps in a common platform in order to facilitate extension, interrogation and visualization of the integrated data. Graph databases are a natural way to represent and manage biological networks (Lysenko et al., 2016, PMID 27462371; Balaur et al., 2016, PMID 27627442; Toure et al., 2016, PMID 27919219; Balaur et al., 2017, PMID: 27993779; Fabregat et al., 2018, PMID 29377902). Objectives: We aim to highlight advantages and comment limitations of using the popular graph...
Large datasets represented by multidimensional data point clouds often possess non-trivial distri... more Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently ...
We present ElPiGraph, a method for approximating data distributions having non-trivial topologica... more We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional and noisy data. Instead, ElPiGraph constructs elastic principal graphs in a more robust way by minimizing elastic energy, applying graph grammars and explicitly controlling topological complexity. Using trimmed approximation error function makes ElPiGraph extremely robust to the presence of background noise without decreasing computational performance and allows it to deal with complex cases of manifold learning (for example, ElPiGraph can learn disconnected intersecting manifolds). Thanks to the quasi-quadratic nature of the elastic function, ElPiGraph performs almost as fast as a simple k-means clustering and, therefore, is much more scala...
TV dramas constitute an important part of the entertainment industry, with popular shows attracti... more TV dramas constitute an important part of the entertainment industry, with popular shows attracting millions of viewers and resulting in significant revenues. Finding a way to explore formally the social dynamics underpinning these show has therefore important implications, as it would allow us not only to understand which features are most likely to be associated with the popularity of a show, but also to explore the extent to which such fictional world have social interactions comparable with the real world. To begin tackling this question, we employed network analysis to systematically and quantitatively explore how the interactions between noble houses of the fantasy drama TV series Game of Thrones change as the show progresses. Our analysis discloses the invisible threads that connected different houses and shows how tension across the houses, as measure via structural balance, changes over time. To boost the impact of our analysis, we further extended our analysis to explore h...
We discuss an overtly "simple approach" to complex biological systems borrowing selecti... more We discuss an overtly "simple approach" to complex biological systems borrowing selectively from theoretical physics. The approach is framed by three maxims, and we show examples of its success in two different applications: investigating cellular robustness at the level of gene regulatory networks and quantifying rare events of DNA replication errors.
Background: The novel coronavirus disease 2019 (COVID-19) outbreak presents a significant threat ... more Background: The novel coronavirus disease 2019 (COVID-19) outbreak presents a significant threat to global health. A better understanding of patient clinical profiles is essential to drive efficient and timely health service strategies. In this study, we aimed to identify risk factors for a higher susceptibility to symptomatic presentation with COVID-19 and a transition to severe disease. Methods: We analysed data on 2756 patients admitted to Chelsea & Westminster Hospital NHS Foundation Trust between 1st January and 23rd April 2020. We compared differences in characteristics between patients designated positive for COVID-19 and patients designated negative on hospitalisation and derived a multivariable logistic regression model to identify risk factors for predicting risk of symptomatic COVID-19. For patients with COVID-19, we used univariable and multivariable logistic regression to identify risk factors associated with progression to severe disease defined by: 1) admission to the...
In order to maintain functional robustness and species integrity, organisms must ensure high fide... more In order to maintain functional robustness and species integrity, organisms must ensure high fidelity of the genome duplication process. This is particularly true during early development, where cell division is often occurring both rapidly and coherently. By studying the extreme limits of suppressing DNA replication failure due to double fork stall errors, we uncover a fundamental constant that describes a trade-off between genome size and architectural complexity of the developing organism. This constant has the approximate value N U ≈ 3 × 1012 basepairs, and depends only on two highly conserved molecular properties of DNA biology. We show that our theory is successful in interpreting a diverse range of data across the Eukaryota.
A subset of cancer-associated fibroblasts (FAP+/CAF-S1) mediates immunosuppression in breast canc... more A subset of cancer-associated fibroblasts (FAP+/CAF-S1) mediates immunosuppression in breast cancers, but its heterogeneity and its impact on immunotherapy response remain unknown. Here, we identify 8 CAF-S1 clusters by analyzing more than 19,000 single CAF-S1 fibroblasts from breast cancer. We validate the five most abundant clusters by flow cytometry and in silico analyses in other cancer types, highlighting their relevance. Myofibroblasts from clusters 0 and 3, characterized by extracellular matrix proteins and TGFβ signaling, respectively, are indicative of primary resistance to immunotherapies. Cluster 0/ecm-myCAF upregulates PD-1 and CTLA4 protein levels in regulatory T lymphocytes (Tregs), which, in turn, increases CAF-S1 cluster 3/TGFβ-myCAF cellular content. Thus, our study highlights a positive feedback loop between specific CAF-S1 clusters and Tregs and uncovers their role in immunotherapy resistance. Significance: Our work provides a significant advance in characterizing...
Multidimensional datapoint clouds representing large datasets are frequently characterized by non... more Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in v...
Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiat... more Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data.
Uploads
Papers by Luca Albergante