Cancer evolution lays the groundwork for predictive oncology. Testing evolutionary metrics requires quantitative measurements in controlled clinical trials. We mapped genomic intratumor heterogeneity in locally advanced prostate cancer... more
Cancer evolution lays the groundwork for predictive oncology. Testing evolutionary metrics requires quantitative measurements in controlled clinical trials. We mapped genomic intratumor heterogeneity in locally advanced prostate cancer using 642 samples from 114 individuals enrolled in clinical trials with a 12-year median follow-up. We concomitantly assessed morphological heterogeneity using deep learning in 1,923 histological sections from 250 individuals. Genetic and morphological (Gleason) diversity were independent predictors of recurrence (hazard ratio (HR) = 3.12 and 95% confidence interval (95% CI) = 1.34–7.3; HR = 2.24 and 95% CI = 1.28–3.92). Combined, they identified a group with half the median time to recurrence. Spatial segregation of clones was also an independent marker of recurrence (HR = 2.3 and 95% CI = 1.11–4.8). We identified copy number changes associated with Gleason grade and found that chromosome 6p loss correlated with reduced immune infiltration. Matched profiling of relapse, decades after diagnosis, confirmed that genomic instability is a driving force in prostate cancer progression. This study shows that combining genomics with artificial intelligence-aided histopathology leads to the identification of clinical biomarkers of evolution.
Research Interests:
The patterns by which primary tumors spread to metastatic sites remain poorly understood. Here, we define patterns of metastatic seeding in prostate cancer (PCa) using a novel injection-based mouse model — EvoCaP (Evolution in Cancer of... more
The patterns by which primary tumors spread to metastatic sites remain poorly understood. Here, we define patterns of metastatic seeding in prostate cancer (PCa) using a novel injection-based mouse model — EvoCaP (Evolution in Cancer of the Prostate), featuring aggressive metastatic cancer to bone, liver, lungs, and lymph nodes. To define migration histories between primary and metastatic sites, we used our EvoTraceR pipeline to track distinct tumor clones containing recordable barcodes. We detected widespread intratumoral heterogeneity from the primary tumor in metastatic seeding, with few clonal populations (CPs) instigating most migration. Metastasis-to-metastasis seeding was uncommon, as most cells remained confined within the tissue. Migration patterns in our model were congruent with human PCa seeding topologies. Our findings support the view of metastatic PCa as a systemic disease driven by waves of aggressive clones expanding their niche, infrequently overcoming constraints that otherwise keep them confined in the primary or metastatic site.
Research Interests:
Background Copy number alterations (CNAs) are genetic variations that cause an abnormal increase or decrease in the number of copies of a genomic region, and they are commonly detected in cancer. CNAs can affect various regions of the... more
Background Copy number alterations (CNAs) are genetic variations that cause an abnormal increase or decrease in the number of copies of a genomic region, and they are commonly detected in cancer. CNAs can affect various regions of the genome, including broad regions that encompass multiple genes, individual genes, or even non-coding RNA molecules of small size. CNAs contribute to tumorigenesis and can have a significant impact
Research Interests:
Polycythemia Vera (PV) is typically caused by V617F or exon 12 JAK2 mutations. Little is known about Polycythemia cases where no JAK2 variants can be detected, and no other causes identified. This condition is defined as idiopathic... more
Polycythemia Vera (PV) is typically caused by V617F or exon 12 JAK2 mutations. Little is known about Polycythemia cases where no JAK2 variants can be detected, and no other causes identified. This condition is defined as idiopathic erythrocytosis (IE). We evaluated clinical-laboratory parameters of a cohort of 56 IE patients and we determined their molecular profile at diagnosis with paired blood/buccal-DNA exome-sequencing coupled with a high-depth targeted OncoPanel to identify a possible underling germline or somatic cause. We demonstrated that most of our cohort (40/56: 71.4%) showed no evidence of clonal hematopoiesis, suggesting that IE is, in large part, a germline disorder. We identified 20 low mutation burden somatic variants (Variant allelic fraction, VAF, < 10%) in only 14 (25%) patients, principally involving DNMT3A and TET2. Only 2 patients presented high mutation burden somatic variants, involving DNMT3A, TET2, ASXL1 and WT1. We identified recurrent germline variants in 42 (75%) patients occurring mainly in JAK/STAT, Hypoxia and Iron metabolism pathways, among them: JAK3-V722I and HIF1A-P582S; a high fraction of patients (48.2%) resulted also mutated in homeostatic iron regulatory gene HFE-H63D or C282Y. By generating cellular models, we showed that JAK3-V722I causes activation of the JAK-STAT5 axis and upregulation of EPAS1/HIF2A, while HIF1A-P582S causes suppression of hepcidin mRNA synthesis, suggesting a major role for these variants in the onset of IE.
Research Interests:
SETBP1 mutations are found in various clonal myeloid disorders. However, it is unclear whether they can initiate leukemia, as SETBP1 mutations typically appear as later events during oncogenesis. To answer this question, we generated a... more
SETBP1 mutations are found in various clonal myeloid disorders. However, it is unclear whether they can initiate leukemia, as SETBP1 mutations typically appear as later events during oncogenesis. To answer this question, we generated a mouse model expressing mutated SETBP1 in hematopoietic tissue: this model showed profound alterations in the differentiation program of hematopoietic progenitors and developed a myeloid neoplasm with megakaryocytic dysplasia, splenomegaly, and bone marrow fibrosis, prompting us to investigate SETBP1 mutations in a cohort of 36 triple-negative primary myelofibrosis (TN-PMF) cases. We identified two distinct subgroups, one carrying SETBP1 mutations and the other completely devoid of somatic variants. Clinically, a striking difference in disease aggressiveness was noted, with SETBP1-mutated patients showing a much worse clinical course. As opposite to myelodysplastic/myeloproliferative neoplasms, where SETBP1 mutations are mostly found as a late clonal event, single-cell clonal hierarchy reconstruction in three TN-PMF patients from our cohort revealed SETBP1 to be a very early event, suggesting that the phenotype of the different SETBP1+ disorders may be shaped by the opposite hierarchy of the same clonal SETBP1 variants.
Research Interests:
Cancer patients show heterogeneous phenotypes and very different outcomes and responses even to common treatments, such as standard chemotherapy. This state-of-affairs has motivated the need for the comprehensive characterization of... more
Cancer patients show heterogeneous phenotypes and very different outcomes and responses even to common treatments, such as standard chemotherapy. This state-of-affairs has motivated the need for the comprehensive characterization of cancer phenotypes and fueled the generation of large omics datasets, comprising multiple omics data reported for the same patients, which might now allow us to start deciphering cancer heterogeneity and implement personalized therapeutic strategies. In this work, we performed the analysis of four cancer types obtained from the latest efforts by The Cancer Genome Atlas, for which seven distinct omics data were available for each patient, in addition to curated clinical outcomes. We performed a uniform pipeline for raw data preprocessing and adopted the Cancer Integration via MultIkernel LeaRning (CIMLR) integrative clustering method to extract cancer subtypes. We then systematically review the discovered clusters for the considered cancer types, highlighting novel associations between the different omics and prognosis.
Research Interests:
In recent years, many algorithmic strategies have been developed to exploit single-cell mutational profiles generated via sequencing experiments of cancer samples and return reliable models of cancer evolution. Here, we introduce the... more
In recent years, many algorithmic strategies have been developed to exploit single-cell mutational profiles generated via sequencing experiments of cancer samples and return reliable models of cancer evolution. Here, we introduce the COB-tree algorithm, which summarizes the solutions explored by state-of-the-art methods for clonal tree inference, to return a unique consensus optimum branching tree. The method proves to be highly effective in detecting pairwise temporal relations between genomic events, as demonstrated by extensive tests on simulated datasets. We also provide a new method to visualize and quantitatively inspect the solution space of the inference methods, via Principal Coordinate Analysis. Finally, the application of our method to a single-cell dataset of patient-derived melanoma xenografts shows significant differences between the COB-tree solution and the maximum likelihood ones.
Background Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational... more
Background
Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational profiles of single cells generated via variant calling, in order to reconstruct the evolutionary history of a tumor and characterize the impact of therapeutic strategies, such as the administration of drugs. To this end, we have recently developed the LACE framework for the Longitudinal Analysis of Cancer Evolution.
Results
The LACE 2.0 release aimed at inferring longitudinal clonal trees enhances the original framework with new key functionalities: an improved data management for preprocessing of standard variant calling data, a reworked inference engine, and direct connection to public databases.
Conclusions
All of this is accessible through a new and interactive Shiny R graphical interface offering the possibility to apply filters helpful in discriminating relevant or potential driver mutations, set up inferential parameters, and visualize the results. The software is available at: github.com/BIMIB-DISCo/LACE.
Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational profiles of single cells generated via variant calling, in order to reconstruct the evolutionary history of a tumor and characterize the impact of therapeutic strategies, such as the administration of drugs. To this end, we have recently developed the LACE framework for the Longitudinal Analysis of Cancer Evolution.
Results
The LACE 2.0 release aimed at inferring longitudinal clonal trees enhances the original framework with new key functionalities: an improved data management for preprocessing of standard variant calling data, a reworked inference engine, and direct connection to public databases.
Conclusions
All of this is accessible through a new and interactive Shiny R graphical interface offering the possibility to apply filters helpful in discriminating relevant or potential driver mutations, set up inferential parameters, and visualize the results. The software is available at: github.com/BIMIB-DISCo/LACE.
Research Interests:
Research Interests:
Recent investigations have improved our understanding of the molecular aberrations supporting Waldenström Macroglobulinemia (WM) biology; however, whether the immune microenvironment contributes to WM pathogenesis remains unanswered. We... more
Recent investigations have improved our understanding of the molecular aberrations supporting Waldenström Macroglobulinemia (WM) biology; however, whether the immune microenvironment contributes to WM pathogenesis remains unanswered. We first showed how a transgenic murine model of human-like lymphoplasmacytic lymphoma/WM exhibits an increased number of regulatory T (Treg) cells with respect to control mice. These findings were translated into the WM clinical setting, where the transcriptomic profiling of WM patients'-derived regulatory T cells (Tregs) unveiled a peculiar WM-devoted mRNA signature, with significant enrichment for NF-kB-mediated TNF-a signaling-, MAPK-, PI3K/AKT-related genes; paralleled by different Treg functional phenotype. We demonstrated a significantly higher Treg-induction,-expansion and-proliferation triggered by WM cells as compared to their normal cellular counterpart; with a more profound effect within the context of CXCR4 C1013G-mutated WM cells. By investigating the B-toT cell cross-talk at single-cell level, we identified the CD40/CD40-ligand as a potentially relevant axis supporting WM cell-Treg cell interaction. Our findings demonstrate the existence of a Treg-mediated immunosuppressive phenotype in WM, which can be therapeutically reversed by blocking the CD40L/CD40 axis to inhibit WM cell growth.
Research Interests:
We present a large-scale analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) substitutions, considering 1,585,456 high-quality raw sequencing samples, aimed at investigating the existence and quantifying the effect of... more
We present a large-scale analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) substitutions, considering 1,585,456 high-quality raw sequencing samples, aimed at investigating the existence and quantifying the effect of mutational processes causing mutations in SARS-CoV-2 genomes when interacting with the human host. As a result, we confirmed the presence of three well-differentiated mutational processes likely ruled by reactive oxygen species (ROS), apolipoprotein B editing complex (APOBEC), and adenosine deaminase acting on RNA (ADAR). We then evaluated the activity of these mutational processes in different continental groups, showing that some samples from Africa present a significantly higher number of substitutions, most likely due to higher APOBEC activity. We finally analyzed the activity of mutational processes across different SARS-CoV-2 variants, and we found a significantly lower number of mutations attributable to APOBEC activity in samples assigned to the Omicron variant.
Research Interests:
We outline the features of the R package SparseSignatures and its application to determine the signatures contributing to mutation profiles of tumor samples. We describe installation details and illustrate a step-by-step approach to (1)... more
We outline the features of the R package SparseSignatures and its application to determine the signatures contributing to mutation profiles of tumor samples. We describe installation details and illustrate a step-by-step approach to (1) pre- pare the data for signature analysis, (2) determine the optimal parameters, and (3) employ them to determine the signatures and related exposure levels in the point mutation dataset.
For complete details on the use and execution of this protocol, please refer to Lal et al. (2021).
For complete details on the use and execution of this protocol, please refer to Lal et al. (2021).
Research Interests:
We describe the procedures to perform the following: (1) the de novo discovery of mutational signatures from raw sequencing data of viral samples and (2) the association of existing viral mutational signatures to the samples of a given... more
We describe the procedures to perform the following: (1) the de novo discovery of mutational signatures from raw sequencing data of viral samples and (2) the association of existing viral mutational signatures to the samples of a given dataset. The goal is to identify and characterize the nucleotide substitution patterns related to the mutational processes that underlie the origination of variants in viral genomes. The VirMutSig protocol is available at this link: https://github.com/BIMIB-DISCo/VirMutSig.
For complete information on the theoretical aspects of this protocol, please refer to Graudenzi et al. (2021).
For complete information on the theoretical aspects of this protocol, please refer to Graudenzi et al. (2021).
Research Interests:
Genetic and epigenetic variation, together with transcriptional plasticity, contribute to intratumour heterogeneity. The interplay of these biological processes and their respective contributions to tumour evolution remain unknown. Here... more
Genetic and epigenetic variation, together with transcriptional plasticity, contribute to intratumour heterogeneity. The interplay of these biological processes and their respective contributions to tumour evolution remain unknown. Here we show that intratumour genetic ancestry only infrequently affects gene expression traits and subclonal evolution in colorectal cancer (CRC). Using spatially resolved paired whole-genome and transcriptome sequencing, we find that the majority of intratumour variation in gene expression is not strongly heritable but rather ‘plastic’. Somatic expression quantitative trait loci analysis identified a number of putative genetic controls of expression by cis-acting coding and non-coding mutations, the majority of which were clonal within a tumour, alongside frequent structural alterations. Consistently, computational inference on the spatial patterning of tumour phylogenies finds that a considerable proportion of CRCs did not show evidence of subclonal selection, with only a subset of putative genetic drivers associated with subclone expansions. Spatial intermixing of clones is common, with some tumours growing exponentially and others only at the periphery. Together, our data suggest that most genetic intratumour variation in CRC has no major phenotypic consequence and that transcriptional plasticity is, instead, widespread within a tumour.
Research Interests:
Colorectal malignancies are a leading cause of cancer-related death and have undergone extensive genomic study. However, DNA mutations alone do not fully explain malignant transformation. Here we investigate the co-evolution of the genome... more
Colorectal malignancies are a leading cause of cancer-related death and have undergone extensive genomic study. However, DNA mutations alone do not fully explain malignant transformation. Here we investigate the co-evolution of the genome and epigenome of colorectal tumours at single-clone resolution using spatial multi-omic profiling of individual glands. We collected 1,370 samples from 30 primary cancers and 8 concomitant adenomas and generated 1,207 chromatin accessibility profiles, 527 whole genomes and 297 whole transcriptomes. We found positive selection for DNA mutations in chromatin modifier genes and recurrent somatic chromatin accessibility alterations, including in regulatory regions of cancer driver genes that were otherwise devoid of genetic mutations. Genome-wide alterations in accessibility for transcription factor binding involved CTCF, downregulation of interferon and increased accessibility for SOX and HOX transcription factor families, suggesting the involvement of developmental genes during tumourigenesis. Somatic chromatin accessibility alterations were heritable and distinguished adenomas from cancers. Mutational signature analysis showed that the epigenome in turn influences the accumulation of DNA mutations. This study provides a map of genetic and epigenetic tumour heterogeneity, with fundamental implications for understanding colorectal cancer biology.
Research Interests:
Activation-induced cytidine deaminase, AICDA or AID, is a driver of somatic hypermutation and class-switch recombination in immunoglobulins. In addition, this deaminase belonging to the APOBEC family may have off-target effects... more
Activation-induced cytidine deaminase, AICDA or AID, is a driver of somatic hypermutation and class-switch recombination in immunoglobulins. In addition, this deaminase belonging to the APOBEC family may have off-target effects genome-wide, but its effects at pan-cancer level are not well elucidated. Here, we used different pan-cancer datasets, totaling more than 50,000 samples analyzed by whole-genome, whole-exome, or targeted sequencing. AID mutations are present at pan-cancer level with higher frequency in hematological cancers and higher presence at transcriptionally active TAD domains. AID synergizes initial hotspot mutations by a second composite mutation. AID mutational load was found to be independently associated with a favorable outcome in immune-checkpoint inhibitors (ICI) treated patients across cancers after analyzing 2000 samples. Finally, we found that AID-related neoepitopes, resulting from mutations at more frequent hotspots if compared to other mutational signatures, enhance CXCL13/CCR5 expression, immunogenicity, and T-cell exhaustion, which may increase ICI sensitivity.
Research Interests:
Motivation: Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation... more
Motivation: Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation accumulation patterns can be regularly found and can be exploited to reconstruct predictive models of cancer evolution. Yet, available methods cannot infer logical formulas connecting events to represent alternative evolutionary routes or convergent evolution.
Results: We introduce PMCE, an expressive framework that leverages mutational profiles from cross-sectional sequencing data to infer probabilistic graphical models of cancer evolution including arbitrary logical formulas, and which outperforms the state-of-the-art in terms of accuracy and robustness to noise, on simulations.
The application of PMCE to 7866 samples from the TCGA database allows us to identify a highly significant correlation between the predicted evolutionary paths and the overall survival in 7 tumor types, proving that our approach can effectively stratify cancer patients in reliable risk groups.
Availability: PMCE is freely available at https://github.com/BIMIB-DISCo/PMCE, in addition to the code to replicate all the analyses presented in the manuscript.
Contacts: daniele.ramazzotti@unimib.it, alex.graudenzi@ibfm.cnr.it.
Results: We introduce PMCE, an expressive framework that leverages mutational profiles from cross-sectional sequencing data to infer probabilistic graphical models of cancer evolution including arbitrary logical formulas, and which outperforms the state-of-the-art in terms of accuracy and robustness to noise, on simulations.
The application of PMCE to 7866 samples from the TCGA database allows us to identify a highly significant correlation between the predicted evolutionary paths and the overall survival in 7 tumor types, proving that our approach can effectively stratify cancer patients in reliable risk groups.
Availability: PMCE is freely available at https://github.com/BIMIB-DISCo/PMCE, in addition to the code to replicate all the analyses presented in the manuscript.
Contacts: daniele.ramazzotti@unimib.it, alex.graudenzi@ibfm.cnr.it.
Research Interests:
To dissect the mechanisms underlying the inflation of variants in the SARS-CoV-2 genome, we present one of the largest up-to-date analyses of intra-host genomic diversity, which reveals that most samples present heterogeneous genomic... more
To dissect the mechanisms underlying the inflation of variants in the SARS-CoV-2 genome, we present one of the largest up-to-date analyses of intra-host genomic diversity, which reveals that most samples present heterogeneous genomic architectures, due to the interplay between host-related mutational processes and transmission dynamics. The deconvolution of the set of intra-host minor variants unveils the existence of non overlapping mutational signatures related to specific nucleotide substitutions, which prove that distinct hosts respond differently to SARS-CoV-2 infections, and which are likely ruled by APOBEC, Reactive Oxygen Species (ROS) and ADAR. Thanks to a corrected-for-signatures dN/dS analysis we demonstrate that the mutational processes underlying such signatures are affected by purifying selection, with important exceptions. In fact, several mutations linked to low-rate mutational processes appear to transit to clonality in the population, eventually leading to the definition of new clonal genotypes and to a statistically significant increase of overall genomic diversity. Importantly, the analysis of the phylogenetic model shows the presence of multiple homoplasies, due to mutational hotspots, phantom mutations or positive selection, and supports the hypothesis of transmission of minor variants during infections. Overall, the results of this study pave the way for the integrated characterization of intra-host genomic diversity and clinical outcome of SARS-CoV-2 hosts.
Research Interests:
Background. Germline mutations in the BRCA1 and BRCA2 genes predispose carriers to breast and ovarian cancer, and there remains a need to identify the specific genomic mechanisms by which cancer evolves in these patients. Here we present... more
Background. Germline mutations in the BRCA1 and BRCA2 genes predispose carriers to breast and ovarian cancer, and there remains a need to identify the specific genomic mechanisms by which cancer evolves in these patients. Here we present a systematic genomic analysis of breast tumors with BRCA1 and BRCA2 mutations.
Methods. We analyzed genomic data from breast tumors, with a focus on comparing tumors with BRCA1/BRCA2 gene mutations with common classes of sporadic breast tumors.
Results. We identify differences between BRCA-mutated and sporadic breast tumors in patterns of point mutation, DNA methylation and structural variation. We show that structural variation disproportionately affects tumor suppressor genes and identify specific driver gene candidates that are enriched for structural variation.
Conclusions. Compared to sporadic tumors, BRCA-mutated breast tumors show signals of reduced DNA methylation, more ancestral cell divisions, and elevated rates of structural variation that tend to disrupt highly expressed protein-coding genes and known tumor suppressors. Our analysis suggests that BRCA-mutated tumors are more aggressive than sporadic breast cancers because loss of the BRCA pathway causes multiple processes of mutagenesis and gene dysregulation.
Methods. We analyzed genomic data from breast tumors, with a focus on comparing tumors with BRCA1/BRCA2 gene mutations with common classes of sporadic breast tumors.
Results. We identify differences between BRCA-mutated and sporadic breast tumors in patterns of point mutation, DNA methylation and structural variation. We show that structural variation disproportionately affects tumor suppressor genes and identify specific driver gene candidates that are enriched for structural variation.
Conclusions. Compared to sporadic tumors, BRCA-mutated breast tumors show signals of reduced DNA methylation, more ancestral cell divisions, and elevated rates of structural variation that tend to disrupt highly expressed protein-coding genes and known tumor suppressors. Our analysis suggests that BRCA-mutated tumors are more aggressive than sporadic breast cancers because loss of the BRCA pathway causes multiple processes of mutagenesis and gene dysregulation.
Research Interests:
Over the past decades, both critical care and cancer care have improved substantially. Due to increased cancer-specific survival, we hypothesized that both the number of cancer patients admitted to the ICU and overall survival have... more
Over the past decades, both critical care and cancer care have improved substantially. Due to increased cancer-specific survival, we hypothesized that both the number of cancer patients admitted to the ICU and overall survival have increased since the millennium change. MIMIC-III, a freely accessible critical care database of Beth Israel Deaconess Medical Center, Boston, USA was used to retrospectively study trends and outcomes of cancer patients admitted to the ICU between 2002 and 2011. Multiple logistic regression analysis was performed to adjust for confounders of 28-day and 1-year mortality.
Out of 41,468 unique ICU admissions, 1,100 hemato-oncologic, 3,953 oncologic and 49 patients with both a hematological and solid malignancy were analyzed. Hematological patients had higher critical illness scores than non-cancer patients, while oncologic patients had similar APACHE-III and SOFA-scores compared to non-cancer patients. In the univariate analysis, cancer was strongly associated with mortality (OR= 2.74, 95%CI: 2.56, 2.94). Over the 10-year study period, 28-day mortality of cancer patients decreased by 30%. This trend persisted after adjustment for covariates, with cancer patients having significantly higher mortality (OR=2.63, 95%CI: 2.38, 2.88). Between 2002 and 2011, both the adjusted odds of 28-day mortality and the adjusted odds of 1-year mortality for cancer patients decreased by 6% (95%CI: 4%, 9%). Having cancer was the strongest single predictor of 1-year mortality in the multivariate model (OR=4.47, 95%CI: 4.11, 4.84).
Out of 41,468 unique ICU admissions, 1,100 hemato-oncologic, 3,953 oncologic and 49 patients with both a hematological and solid malignancy were analyzed. Hematological patients had higher critical illness scores than non-cancer patients, while oncologic patients had similar APACHE-III and SOFA-scores compared to non-cancer patients. In the univariate analysis, cancer was strongly associated with mortality (OR= 2.74, 95%CI: 2.56, 2.94). Over the 10-year study period, 28-day mortality of cancer patients decreased by 30%. This trend persisted after adjustment for covariates, with cancer patients having significantly higher mortality (OR=2.63, 95%CI: 2.38, 2.88). Between 2002 and 2011, both the adjusted odds of 28-day mortality and the adjusted odds of 1-year mortality for cancer patients decreased by 6% (95%CI: 4%, 9%). Having cancer was the strongest single predictor of 1-year mortality in the multivariate model (OR=4.47, 95%CI: 4.11, 4.84).
Research Interests:
Background. Critically ill patients may die despite invasive intervention. In this study, we examine trends in the application of two such treatments over a decade, namely, endotracheal ventilation and vasopressors and inotropes... more
Background. Critically ill patients may die despite invasive intervention. In this study, we examine trends in the application of two such treatments over a decade, namely, endotracheal ventilation and vasopressors and inotropes administration, as well as the impact of these trends on survival durations in patients who die within a month of ICU admission.
Methods. We considered observational data available from the MIMIC-III open-access ICU database and collected within a study period between year 2002 up to 2011. If a patient had multiple admissions to the ICU during the 30 days before death, only the first stay was analyzed, leading to a final set of 6,436 unique ICU admissions during the study period. We tested two hypotheses: (i) administration of invasive intervention during the ICU stay immediately preceding end-of-life would decrease over the study time period and (ii) time-to-death from ICU admission would also decrease, due to the decrease in invasive intervention administration. To investigate the latter hypothesis, we performed a subgroups analysis by considering patients with lowest and highest severity. To do so, we stratified the patients based on their SAPS I scores, and we considered patients within the first and the third tertiles of the score. We then assessed differences in trends within these groups between years 2002–05 vs. 2008–11.
Results. Comparing the period 2002–2005 vs. 2008–2011, we found a reduction in endotracheal ventilation among patients who died within 30 days of ICU admission (120.8 vs. 68.5 hours for the lowest severity patients, p<0.001; 47.7 vs. 46.0 hours for the highest severity patients, p = 0.004). This is explained in part by an increase in the use of non-invasive ventilation. Comparing the period 2002–2005 vs. 2008–2011, we found a reduction in the use of vasopressors and inotropes among patients with the lowest severity who died within 30 days of ICU admission (41.8 vs. 36.2 hours, p<0.001) but not among those with the highest severity. Despite a reduction in the use of invasive interventions, we did not find a reduction in the time to death between 2002–2005 vs. 2008–2011 (7.8 days vs. 8.2 days for the lowest severity patients, p = 0.32; 2.1 days vs. 2.0 days for the highest severity patients, p = 0.74).
Conclusion. We found that the reduction in the use of invasive treatments over time in patients with very poor prognosis did not shorten the time-to-death. These findings may be useful for goals of care discussions.
Methods. We considered observational data available from the MIMIC-III open-access ICU database and collected within a study period between year 2002 up to 2011. If a patient had multiple admissions to the ICU during the 30 days before death, only the first stay was analyzed, leading to a final set of 6,436 unique ICU admissions during the study period. We tested two hypotheses: (i) administration of invasive intervention during the ICU stay immediately preceding end-of-life would decrease over the study time period and (ii) time-to-death from ICU admission would also decrease, due to the decrease in invasive intervention administration. To investigate the latter hypothesis, we performed a subgroups analysis by considering patients with lowest and highest severity. To do so, we stratified the patients based on their SAPS I scores, and we considered patients within the first and the third tertiles of the score. We then assessed differences in trends within these groups between years 2002–05 vs. 2008–11.
Results. Comparing the period 2002–2005 vs. 2008–2011, we found a reduction in endotracheal ventilation among patients who died within 30 days of ICU admission (120.8 vs. 68.5 hours for the lowest severity patients, p<0.001; 47.7 vs. 46.0 hours for the highest severity patients, p = 0.004). This is explained in part by an increase in the use of non-invasive ventilation. Comparing the period 2002–2005 vs. 2008–2011, we found a reduction in the use of vasopressors and inotropes among patients with the lowest severity who died within 30 days of ICU admission (41.8 vs. 36.2 hours, p<0.001) but not among those with the highest severity. Despite a reduction in the use of invasive interventions, we did not find a reduction in the time to death between 2002–2005 vs. 2008–2011 (7.8 days vs. 8.2 days for the lowest severity patients, p = 0.32; 2.1 days vs. 2.0 days for the highest severity patients, p = 0.74).
Conclusion. We found that the reduction in the use of invasive treatments over time in patients with very poor prognosis did not shorten the time-to-death. These findings may be useful for goals of care discussions.
Research Interests: ICU and INTENSIVE CARE
Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, homophily and other spurious causes. However, most of the studies to... more
Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, homophily and other spurious causes. However, most of the studies to characterize social influence and, in general, most data-science analyses focus on correlations , statistical independence, conditional independence etc.; only recently, there has been a resurgence of interest in " causal data science, " e.g., grounded on causality theories. In this paper we adopt a principled causal approach to the analysis of social influence from information-propagation data, rooted in probabilistic causal theory. Our approach develops around two phases. In the first step, in order to avoid the pitfalls of misinterpreting causation when the data spans a mixture of several subtypes (" Simpson's paradox "), we partition the set of propagation traces in groups, in such a way that each group is as less contradictory as possible in terms of the hierarchical structure of information propagation. For this goal we borrow from the literature the notion of " agony " [26] and define the Agony-bounded Partitioning problem, which we prove being hard, and for which we develop two efficient algorithms with approximation guarantees. In the second step, for each group from the first phase, we apply a constrained MLE approach to ultimately learn a minimal causal topology. Experiments on synthetic data show that our method is able to retrieve the genuine causal arcs w.r.t. a known ground-truth generative model. Experiments on real data show that, by focusing only on the extracted causal structures instead of the whole social network, we can improve the effectiveness of predicting influence spread.
Research Interests:
Bayesian Networks have been widely used in the last decades in many _elds, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest... more
Bayesian Networks have been widely used in the last decades in many _elds, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that poses many challenges. On the one hand, it is a well-known NP-complete problem, practically hardened by the huge search space of possible solutions. On the other hand, the phenomenon of I-equivalence, i.e., di_erent graphical structures underpinning the same set of statistical dependencies, may lead to multimodal _tness landscapes further hindering maximum likelihood approaches to solve the task. Despite all these di_culties, greedy search methods based on a likelihood score coupled with a regularizator score to account for model complexity, have been shown to be surprisingly e_ective in practice. In this paper, we consider the formulation of the task of learning the structure of Bayesian Networks as an optimization problem based on a likelihood score, without complexity terms to regularize it. In particular, we exploit the NSGA-II multi-objective optimization procedure in order to explicitly account for both the likelihood of a solution and the number of selected arcs, by setting these as the two objective functions of the method. The aim of this work is to investigate the behavior of NSGA-II and analyse the quality of its solutions. We thus thoroughly examined the optimization results obtained on a wide set of simulated data, by considering both the goodness of the inferred solutions in terms of the objective functions values achieved, and by comparing the retrieved structures with the ground truth, i.e., the networks used to generate the target data. Our results show that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic approaches, although paradoxically characterized in many cases by a lower similarity with the target network.
Research Interests:
Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from... more
Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features.
Research Interests:
Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and personalized treatment. This promise has motivated recent... more
Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and personalized treatment. This promise has motivated recent efforts to produce large amounts of multidimensional genomic (‘multi-omic’) data, but current algorithms still face challenges in the integrated analysis of such data. Here we present Cancer Integration via Multikernel Learning (CIMLR), a new cancer subtyping method that integrates multi-omic data to reveal molecular subtypes of cancer. We apply CIMLR to multi-omic data from 36 cancer types and show significant improvements in both computational efficiency and ability to extract biologically meaningful cancer subtypes. The discovered subtypes exhibit significant differences in patient survival for 27 of 36 cancer types. Our analysis reveals integrated patterns of gene expression, methylation, point mutations and copy number changes in multiple cancers and highlights patterns specifically associated with poor patient outcomes.
Research Interests:
Identification of modules in molecular networks is at the core of many current analysis methods in biomedical research. However, how well different approaches identify disease-relevant modules in different types of gene and protein... more
Identification of modules in molecular networks is at the core of many current analysis methods in biomedical research. However, how well different approaches identify disease-relevant modules in different types of gene and protein networks remains poorly understood. We launched the “Disease Module Identification DREAM Challenge”, an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology, and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies (GWAS). Our critical assessment of 75 contributed module identification methods reveals novel top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets and correctly prioritize candidate disease genes. This community challenge establishes benchmarks, tools and guidelines for molecular network analysis to study human disease biology (https://synapse.org/modulechallenge).
Research Interests:
Learning the structure of dependencies among multiple random variables is a problem of considerable theoretical and practical interest. Within the context of Bayesian Networks, a practical and surprisingly successful solution to this... more
Learning the structure of dependencies among multiple random variables is a problem of considerable theoretical and practical interest. Within the context of Bayesian Networks, a practical and surprisingly successful solution to this learning problem is achieved by adopting score-functions optimisation schema, augmented with multiple restarts to avoid local optima. Yet, the conditions under which such strategies work well are poorly understood, and there are also some intrinsic limitations to learning the directionality of the interaction among the variables. Following an early intuition of Friedman and Koller, we propose to decouple the learning problem into two steps: first, we identify a partial ordering among input variables which constrains the structural learning problem, and then propose an effective bootstrap-based algorithm to simulate augmented data sets, and select the most important dependencies among the variables. By using several synthetic data sets, we show that our algorithm yields better recovery performance than the state of the art, increasing the chances of identifying a globally-optimal solution to the learning problem, and solving also well-known identifiability issues that affect the standard approach. We use our new algorithm to infer statistical dependencies between cancer driver somatic mutations detected by high-throughput genome sequencing data of multiple colorectal cancer patients. In this way, we also show how the proposed methods can shade new insights about cancer initiation, and progression. Code: https://github.com/caravagn/Bootstrap-based-Learning
Research Interests:
The increasing availability of sequencing data of cancer samples is fueling the development of algorithmic strategies to investigate tumor heterogeneity and infer reliable models of cancer evolution. We here build up on previous works on... more
The increasing availability of sequencing data of cancer samples is fueling the development of algorithmic strategies to investigate tumor heterogeneity and infer reliable models of cancer evolution. We here build up on previous works on cancer progression inference from genomic alteration data, to deliver two distinct Cytoscape-based applications, which allow to produce, visualize and manipulate cancer evolution models, also by interacting with public genomic and proteomics databases. In particular, we here introduce cyTRON, a stand-alone Cytoscape app, and cyTRON/JS, a web application which employs the functionalities of Cytoscape/JS.
cyTRON was developed in Java; the code is available at https://github.com/BIMIB-DISCo/cyTRON and on the Cytoscape App Store http://apps.cytoscape.org/apps/cytron. cyTRON/JS was developed in JavaScript and R; the source code of the tool is available at https://github.com/BIMIB-DISCo/cyTRON-js and the tool is accessible from https://bimib.disco.unimib.it/cytronjs/welcome.
cyTRON was developed in Java; the code is available at https://github.com/BIMIB-DISCo/cyTRON and on the Cytoscape App Store http://apps.cytoscape.org/apps/cytron. cyTRON/JS was developed in JavaScript and R; the source code of the tool is available at https://github.com/BIMIB-DISCo/cyTRON-js and the tool is accessible from https://bimib.disco.unimib.it/cytronjs/welcome.
Research Interests:
One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions, and by the fact that the problem is... more
One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions, and by the fact that the problem is NP-hard. Hence, full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before.
For this reason, in this work, we provide a detailed comparison of many different state-of-the-arts methods for structural learning on simulated data considering both BNs with discrete and continuous variables, and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them.
For this reason, in this work, we provide a detailed comparison of many different state-of-the-arts methods for structural learning on simulated data considering both BNs with discrete and continuous variables, and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them.
Research Interests:
Motivation We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for... more
Motivation
We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.
Availability and Implementation
SIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org.
We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.
Availability and Implementation
SIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org.
Research Interests:
Several statistical techniques have been recently developed for the inference of cancer progression models from the increasingly available NGS cross-sectional mutational profiles. A particular algorithm, CAPRI, was proven to be the most... more
Several statistical techniques have been recently developed for the inference of cancer progression models from the increasingly available NGS cross-sectional mutational profiles. A particular algorithm, CAPRI, was proven to be the most efficient with respect to sample size and level of noise in the data. The algorithm combines structural constraints based on Suppes' theory of probabilistic causation and maximum likelihood fit with regulariza-tion, and defines constrained Bayesian networks, named Suppes-Bayes Causal Networks (SBCNs), which account for the selective advantage relations among genomic events. In general, SBCNs are effective in modeling any phenomenon driven by cumulative dynami-cal, as long as the modeled events are persistent. We here discuss on the effectiveness of the SBCN theoretical framework and we investigate the influence of: (i) the priors based on Suppes' theory and (ii) different maximum likelihood regularization parameters on the inference performance estimated on large synthetically generated datasets.
Research Interests:
Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical... more
Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. Here, we propose a novel similarity-learning framework, SIMLR (single-cell interpretation via multi-kernel learning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization applications. Benchmarking against state-of-the-art methods for these applications, we used SIMLR to re-analyse seven representative single-cell data sets, including high-throughput droplet-based data sets with tens of thousands of cells. We show that SIMLR greatly improves clustering sensitivity and accuracy, as well as the visualization and interpretability of the data.