Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Wikum Dinalankara

    In this work we develop a framework which allows for a systematic analysis of joint DNA and putative downstream RNA effects in cancer data cohorts. Using the Reactome database, we extract gene pairs that are linked by known mechanistic... more
    In this work we develop a framework which allows for a systematic analysis of joint DNA and putative downstream RNA effects in cancer data cohorts. Using the Reactome database, we extract gene pairs that are linked by known mechanistic connections. Such pairs, which we refer to as 'Source Target Pairs' or STPs, consist of a source gene for which we examine aberrant activity in the DNA profile, and a target gene that is affected by said source gene, for which we examine aberrant activity in the RNA profile. Using TCGA data for six different cancer types (breast, colon, kidney, liver, lung and prostate), we use mutation and copy number variation information to compile DNA aberrant activity data. For the same cancer cohorts, we use RNASeq gene expression data to quantify RNA aberrant activity via the previous 'divergence' method we have developed. In the divergence framework, normal samples from the same cancer are used to estimate a normal range of expression for targe...
    Additional file 2: Supplementary Figures. PDF file containing the raw results of all bioinformatics analysis. Figure S1. Cross-study of differential gene expression in PTEN-null vs PTEN-intact in ERG+ samples. Meta-analysis of HPFS/PHS... more
    Additional file 2: Supplementary Figures. PDF file containing the raw results of all bioinformatics analysis. Figure S1. Cross-study of differential gene expression in PTEN-null vs PTEN-intact in ERG+ samples. Meta-analysis of HPFS/PHS and NH cohorts with Bayesian Hierarchical Model for DGE using XDE showing the top 25 most concordant differentially up- and down-regulated genes. PTEN status were based on IHC assays. Figure S2. PTEN expression levels stratified by CNV. Figure shows PTEN expression levels distribution by copy number variation (CNV), called by GISTIC algorithm. Figure S3. Correspondence-at-the-top (CAT) plot between TCGA CNV-based calls and the Bayesian Hierarchical Model approach (BHM). Agreement of genes ranked by t-statistics (TCGA) and average Bayesian Effect Size (BHM). Lines represent agreement between tested cohorts for PTEN-intact vs PTEN-null. Black-to-light grey shades represent the decreasing probability of agreeing by chance based on the hypergeometric dist...
    Additional file 1: Supplementary Tables. Excel spreadsheet containing the raw results of all bioinformatics analysis.
    In recent years, immunotherapy has become one of the most exciting and promising avenues to cancer treatment. Treatment with immune checkpoint inhibitors has managed to produce long-term remission of solid tumors in many patients.... more
    In recent years, immunotherapy has become one of the most exciting and promising avenues to cancer treatment. Treatment with immune checkpoint inhibitors has managed to produce long-term remission of solid tumors in many patients. However, patients who respond well to such treatment are often a minority; this is particularly the case with some cancers such as renal cell carcinoma, non-small cell lung cancer, and glioblastoma, where many patients either derive no benefit or only a short-term benefit. In this analysis, we examined gene expression data from RNA sequencing experiments that compared tumor-infiltrating lymphocytes (TIL) with paired circulating lymphocytes from patients with renal cell carcinoma (RCC), bladder cancer (BLCA), prostate cancer (PRAD), and glioblastoma (GBM). Our analysis helped to characterize global CD4 and CD8 TIL gene expression patterns among these four cohorts. Further, using the expression profiles for known immune checkpoint markers PD-1, TIM-3, and LA...
    Since the beginning of the coronavirus disease-2019 (COVID-19) pandemic in 2020, there has been a tremendous accumulation of data capturing different statistics including the number of tests, confirmed cases and deaths. This data wealth... more
    Since the beginning of the coronavirus disease-2019 (COVID-19) pandemic in 2020, there has been a tremendous accumulation of data capturing different statistics including the number of tests, confirmed cases and deaths. This data wealth offers a great opportunity for researchers to model the effect of certain variables on COVID-19 morbidity and mortality and to get a better understanding of the disease at the epidemiological level. However, in order to draw any reliable and unbiased estimate, models also need to take into account other variables and metrics available from a plurality of official and unofficial heterogenous resources. In this study, we introduce covid19census, an R package that extracts from many different repositories and combines together COVID-19 metrics and other demographic, environment- and health-related variables of the USA and Italy at the county and regional levels, respectively. The package is equipped with a number of user-friendly functions that dynamica...
    We used the following publicly available microarray datasets and one methylation datasets: for colon cancer anti-profile analysis of carcinoma vs. adenoma comparisons, we used two microarray datasets with GEO access numbers GSE4183 (1)(8... more
    We used the following publicly available microarray datasets and one methylation datasets: for colon cancer anti-profile analysis of carcinoma vs. adenoma comparisons, we used two microarray datasets with GEO access numbers GSE4183 (1)(8 normals, 15 adenomas, and 15 carcinomas) and GSE20916 (2)(10 normals, 10 adenomas, and 10 carcinomas). Here we used each dataset to generate a colon cancer anti-profile and used it to analysis the hyper-variability of cancer and adenoma samples of the other dataset and to calculate anti-profile scores for those samples. For carcinoma vs. adenoma comparisons with the universal anti-profile probesets, we used an adrenocortical dataset: GSE10927 (3)(10 normals, 22 adenomas, and 33 carcinomas). For this dataset we used the universal anti-profile signature from (4) and the normals in the dataset to calculate anti-profile scores for the adenoma and carcinoma samples. We also used Thyroid data from GSE27155 (5) which is a GPL96 platform microarray dataset ...
    Anomaly detection is a classical problem in Statistical Learning with wide-reaching applications in security, networks, genomics and others. In this work, we formulate the anomaly classification problem as an extension to the detection... more
    Anomaly detection is a classical problem in Statistical Learning with wide-reaching applications in security, networks, genomics and others. In this work, we formulate the anomaly classification problem as an extension to the detection problem: how to distinguish between samples from multiple heterogenous classes that are anomalies relative to a well-defined, homogenous, normal class. Our formulation of this learning setting arises from studies in cancer genomics, where this problem follows from prognosis and diagnosis applications. Standard binary and multi-class classification schemes are not well suited to the anomaly classification task since they attempt to directly model these highly unstable, heterogeneous classes. In this work, we show that robust classifiers can be obtained by modeling the degree of deviation from the normal class as a stable characteristic of each anomaly class. To do so, we formalize the anomaly classification problem, characterize it statistically and computationally via kernel methods and propose a class of robust learning methods, anti-profiles, specifically designed for this task. We focus on an open area of research in cancer genomics which motivates this project: the classification of tumors for prognosis and diagnosis. We provide experimental results obtained by applying the anti-profile method to gene expression data. In addition we extend the anti-profile approach to use kernel functions, and develop a support-vector machine (SVM) based method for classification of anomalies based on their deviation from a stable normal class. We provide experimental results obtained by applying this method to genetic data to classify different stages of tumor progression, and show that this method provides much more stable classifiers than the application of regular classifiers. In addition we show that this approach can be applied to anomaly classification problems in other application domains. We conclude by developing an SVM for censored survival information and demonstrate tha [...]
    Cancer cells are adept at reprogramming energy metabolism, and the precise manifestation of this metabolic reprogramming exhibits heterogeneity across individuals (and from cell to cell). In this study, we analyzed the metabolic... more
    Cancer cells are adept at reprogramming energy metabolism, and the precise manifestation of this metabolic reprogramming exhibits heterogeneity across individuals (and from cell to cell). In this study, we analyzed the metabolic differences between interpersonal heterogeneous cancer phenotypes. We used divergence analysis on gene expression data of 1156 breast normal and tumor samples from The Cancer Genome Atlas (TCGA) and integrated this information with a genome-scale reconstruction of human metabolism to generate personalized, context-specific metabolic networks. Using this approach, we classified the samples into four distinct groups based on their metabolic profiles. Enrichment analysis of the subsystems indicated that amino acid metabolism, fatty acid oxidation, citric acid cycle, androgen and estrogen metabolism, and reactive oxygen species (ROS) detoxification distinguished these four groups. Additionally, we developed a workflow to identify potential drugs that can selecti...
    SummaryMany gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance in different datasets. Here, we... more
    SummaryMany gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance in different datasets. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle invasive disease; response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: mechanistic, by restricting the training process to features capturing a specific biological mechanism; and agnostic, in which the training didn’t use any a priori biological information. Mechanistic models had a similar or better performance to their agnostic counterparts in the testing data, w...
    The COVID-19 mortality rate is higher in the elderly and in those with pre-existing chronic medical conditions. The elderly also suffer from increased morbidity and mortality from seasonal influenza infections; thus, an annual influenza... more
    The COVID-19 mortality rate is higher in the elderly and in those with pre-existing chronic medical conditions. The elderly also suffer from increased morbidity and mortality from seasonal influenza infections; thus, an annual influenza vaccination is recommended for them. In this study, we explore a possible county-level association between influenza vaccination coverage in people aged 65 years and older and the number of deaths from COVID-19. To this end, we used COVID-19 data up to 14 December 2020 and US population health data at the county level. We fit quasi-Poisson regression models using influenza vaccination coverage in the elderly population as the independent variable and the COVID-19 mortality rate as the outcome variable. We adjusted for an array of potential confounders using different propensity score regression methods. Results show that, on the county level, influenza vaccination coverage in the elderly population is negatively associated with mortality from COVID-1...
    ABSTRACTPTEN is the most frequently lost tumor suppressor in primary prostate cancer (PCa) and its loss is associated with aggressive disease. However, the transcriptional changes associated with PTEN loss in PCa have not been described... more
    ABSTRACTPTEN is the most frequently lost tumor suppressor in primary prostate cancer (PCa) and its loss is associated with aggressive disease. However, the transcriptional changes associated with PTEN loss in PCa have not been described in detail. Here, we applied a meta-analysis approach, leveraging two large PCa cohorts with experimentally validated PTEN and ERG status, to derive a transcriptomic signature ofPTENloss, while also accounting for potential confounders due toERGrearrangements. Strikingly, the signature indicates a strong activation of both innate and adaptive immune systems uponPTENloss, as well as an expected activation of cell-cycle genes. Moreover, we made use of our recently developed FC-R2 expression atlas to expand this signature to include many non-coding RNAs recently annotated by the FANTOM consortium. With this resource, we analyzed the TCGA-PRAD cohort, creating a comprehensive transcriptomic landscape ofPTENloss in PCa that comprises both the coding and an...
    ABSTRACTCOVID-19 mortality rate is higher in the elderly and in those with preexisting chronic medical conditions. The elderly also suffer from increased morbidity and mortality from seasonal influenza infection, and thus annual influenza... more
    ABSTRACTCOVID-19 mortality rate is higher in the elderly and in those with preexisting chronic medical conditions. The elderly also suffer from increased morbidity and mortality from seasonal influenza infection, and thus annual influenza vaccination is recommended for them.In this study, we explore a possible area-level association between influenza vaccination coverage in people aged 65 years and older and the number of deaths from COVID-19. To this end, we used COVID-19 data until June 10, 2020 together with population health data for the United States at the county level. We fit quasi-Poisson regression models using influenza vaccination coverage in the elderly population as the independent variable and the number of deaths from COVID-19 as the outcome variable. We adjusted for a wide array of potential confounding variables using both county-level generalized propensity scores for influenza vaccination rates, as well as direct adjustment.Our results suggest that influenza vacci...
    Usually recognized as transcripts of length more than 200 bp and devoid of open reading frames, long non-coding RNAs (lncRNAs) span a vast range of the human genome. Once thought to be of relatively little biological importance, they have... more
    Usually recognized as transcripts of length more than 200 bp and devoid of open reading frames, long non-coding RNAs (lncRNAs) span a vast range of the human genome. Once thought to be of relatively little biological importance, they have been shown to be involved in many biological processes and diseases, highlighting their important regulatory function, which is now gradually starting to be understood. For instance, the recent discovery of immune gene priming lncRNAs (IPLs) – a subset of lncRNAs that facilitate epigenetic priming of immune gene promoters – has demonstrated the hitherto unappreciated role of lncRNAs in orchestrating inflammatory and immune responses. In this study we generated and analyzed bulk RNA sequencing data from a collection of purified cell populations obtained from patients with prostate cancer, glioblastoma, bladder cancer, and renal cell carcinoma. Each cohort was comprised of tumor infiltrating lymphocytes (TILs) and paired circulating lymphocytes, deri...
    Phosphatase and tensin homologue (PTEN) is a tumor suppressor gene that is frequently inactivated by deletion in prostate cancer (PCa). Occurring in around 20% of primary PCa tumors, and up to 50% in castration resistant tumors, it is the... more
    Phosphatase and tensin homologue (PTEN) is a tumor suppressor gene that is frequently inactivated by deletion in prostate cancer (PCa). Occurring in around 20% of primary PCa tumors, and up to 50% in castration resistant tumors, it is the most frequent genomic aberration in PCa. Loss of PTEN activates the phosphoinositide 3-kinase-RAC-alpha serine/threonine-protein kinase (PI3K-AKT) pathway, which is associated with poor clinical outcomes. Despite the consequences of PTEN loss being well studied, most of what is known is restricted to protein-coding genes, with relatively little information about the role of non-coding genes. Using our recently created resource - the FC-R2 expression atlas, which encompasses expression levels for thousands of lncRNAs recently unveiled by the FANTOM consortium - we analyzed differential gene expression of PTEN-null vs PTEN-intact tumors with the goal of characterizing the molecular landscape of PTEN loss. First, we generated a consensus signature usi...
    Cancer cells display massive dysregulation of key regulatory pathways due to now well- catalogued mutations and other DNA-related aberrations. Moreover, enormous heterogeneity has been commonly observed in the identity, frequency, and... more
    Cancer cells display massive dysregulation of key regulatory pathways due to now well- catalogued mutations and other DNA-related aberrations. Moreover, enormous heterogeneity has been commonly observed in the identity, frequency, and location of these aberrations across individuals with the same cancer type or subtype, and this variation naturally propagates to the transcriptome, resulting in myriad types of dysregulated gene expression programs. Many have argued that a more integrative and quantitative analysis of heterogeneity of DNA and RNA molecular profiles may be necessary for designing more systematic explorations of alternative therapies and improving predictive accuracy. We introduce a representation of multi-omics profiles which is sufficiently rich to account for observed heterogeneity and support the construction of quantitative, integrated metrics of variation. Starting from the network of interactions existing in Reactome, we build a library of "paired DNA- RNA a...
    Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a... more
    Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.
    The three-prime untranslated region (39-UTR) of a mRNA influences its biological behavior, from stability, post-transcriptional control through miRNAs, and availability for translation. Alternative polyadenylation (APA) can modulate 39... more
    The three-prime untranslated region (39-UTR) of a mRNA influences its biological behavior, from stability, post-transcriptional control through miRNAs, and availability for translation. Alternative polyadenylation (APA) can modulate 39 end site selection, and approximately 50% of coding genes are subject to it. Global transcript shortening has been reported in normal and cancer cells. APA can be seen as a regulatory step that controls differential expression of transcript isoforms, hence it can be analyzed similarly to gene expression, comparing relevant phenotypes (e.g., tumor vs. normal, survival) with appropriate statistical methods (e.g., generalized linear models, Cox proportional hazards models).We analyzed APA across 16 cancer types, taking advantage of the following public domain resources: 1) recount2, an annotation-agnostic RNA expression database for over 72,000 human samples (Collado-Torres et al, 2017); 2) Snaptron, a search engine and database that enables one to summarize expression for specific genomic regions and features (Wilks et al, 2017); and 3) APADB, the largest database collection of Human APA sites for coding and non-coding genes (Muller et al, 2014). We leveraged Snaptron to extract expression levels for 100-base-pair windows upstream and downstream APA sites defined in APADB. We annotated these genomic features, corresponding to short and long transcript isoforms, using metadata from recount2. As a proof of concept, we analyzed differential APA isoform expression in TCGA, comparing tumor vs. normal samples, and identifying APA events associated with recurrence and survival, as well as other well-defined clinical, morphologic and molecular classifications.Our preliminary results show hundreds of genes switching PA sites to shorten or extend 39-UTR length in primary tumors when compared to normal tissues. Some of these genes are associated with cell cycle and proliferation, indicating that PA sites are dynamically used in primary tumors as another mechanism to evade and modulate post-transcriptional control. Even more interestingly, a substantial fraction of these APA isoforms were associated to tumor recurrence and survival independently from standard clinical and pathological variables.In conclusion, by leveraging public domain resources, such as APADB, recount2, and Snaptron, we created a comprehensive resource that enables to detect dynamic usage of PA sites across cancer phenotypes. Furthermore, the association of many APA isoforms with tumor progression suggests that these could serve as clinically useful biomarkers. Most importantly, the comprehensive resource we have built accounts for over 72,000 human samples, hence it is not limited to the cancer phenotypes we explored in this study. Once released in the public domain, our APA expression atlas will empower the scientific community at large to explore APA across many other cancer and human disease phenotypes. Citation Format: Eddie L. Imada, Diego F. Sanchez, Tejasvi Matam, Leonardo Collado-Torres, Christopher Wilks, Wikum Dinalankara, Alexey Stupnikov, Ben Langmead, Shawn E. Lupold, Luigi Marchionni. Comprehensive analysis of alternative polyadenylation across cancer phenotypes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 908.
    ABSTRACTLong non-coding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes including... more
    ABSTRACTLong non-coding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly-defined human transcriptome, inclusive of over 109,000 coding and non-coding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original recount2 resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue specific transcription profiles for distinct classes of coding and non-coding genes, (b) perform differential expression analyses across thirteen cancer types, providing new insights linking promoter and enha...
    In recent years, in depth exploration of genomes structure and function has revealed a central role for non-coding RNAs (ncRNAs) in orchestrating key biological and cellular processes through the fine tuning of gene expression regulation.... more
    In recent years, in depth exploration of genomes structure and function has revealed a central role for non-coding RNAs (ncRNAs) in orchestrating key biological and cellular processes through the fine tuning of gene expression regulation. Most importantly a role for ncRNAs has also started to emerge in human disease pathogenesis. This further speaks to the importance of an in-depth characterization of ncRNA involvement in human diseases, including cancer. In this work, we have built a comprehensive atlas of gene expression across the human transcriptome containing over 100,000 genes by leveraging two publicly available resources: the FANTOM CAGE Associated Transcriptome (FANTOM-CAT), and recount2. The FANTOM-CAT is a comprehensive meta-assembly of the human transcriptome encompassing coding and non-coding genes, including promoters, enhancers, and lncRNAs. recount2 is the largest, available collection of human RNA-seq data processed and quantified using a unified pipeline, containin...
    Androgen receptor (AR) transcriptional activity contributes to prostate cancer development and castration resistance. The growth and survival pathways driven by AR remain incompletely defined. Here, we found PDCD4 to be a new target of AR... more
    Androgen receptor (AR) transcriptional activity contributes to prostate cancer development and castration resistance. The growth and survival pathways driven by AR remain incompletely defined. Here, we found PDCD4 to be a new target of AR signaling and a potent regulator of prostate cancer cell growth, survival, and castration resistance. The 3′ untranslated region of PDCD4 is directly targeted by the androgen-induced miRNA, miR-21. Androgen treatment suppressed PDCD4 expression in a dose responsive and miR-21–dependent manner. Correspondingly, AR inhibition dose-responsively induced PDCD4 expression. Using data from prostate cancer tissue samples in The Cancer Genome Atlas (TCGA), we found a significant and inverse correlation between miR-21 and PDCD4 mRNA and protein levels. Higher Gleason grade tumors exhibited significantly higher levels of miR-21 and significantly lower levels of PDCD4 mRNA and protein. PDCD4 knockdown enhanced androgen-dependent cell proliferation and cell-cyc...
    Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and... more
    Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be "divergent" if it lies outside the estimated support of the baseline distribut...
    Overcoming acquired drug resistance remains a core challenge in the clinical management of human cancer, including in urothelial carcinoma of the bladder (UCB). Cancer stem-like cells (CSC) have been implicated in the emergence of drug... more
    Overcoming acquired drug resistance remains a core challenge in the clinical management of human cancer, including in urothelial carcinoma of the bladder (UCB). Cancer stem-like cells (CSC) have been implicated in the emergence of drug resistance but mechanisms and intervention points are not completely understood. Here, we report that the proinflammatory COX2/PGE2 pathway and the YAP1 growth-regulatory pathway cooperate to recruit the stem cell factor SOX2 in expanding and sustaining the accumulation of urothelial CSCs. Mechanistically, COX2/PGE2 signaling induced promoter methylation of let-7, resulting in its downregulation and subsequent SOX2 upregulation. YAP1 induced SOX2 expression more directly by binding its enhancer region. In UCB clinical specimens, positive correlations in the expression of SOX2, COX2, and YAP1 were observed, with coexpression of COX2 and YAP1 particularly commonly observed. Additional investigations suggested that activation of the COX2/PGE2 and YAP1 pa...
    Renal cell carcinomas (RCCs) with Xp11 translocation (Xp11 RCC) constitute a distinctive molecular subtype characterized by chromosomal translocations involving the Xp11.2 locus, resulting in gene fusions between the TFE3 transcription... more
    Renal cell carcinomas (RCCs) with Xp11 translocation (Xp11 RCC) constitute a distinctive molecular subtype characterized by chromosomal translocations involving the Xp11.2 locus, resulting in gene fusions between the TFE3 transcription factor with a second gene (usually ASPSCR1, PRCC, NONO, or SFPQ). RCCs with Xp11 translocations comprise up to 1-4% of adult cases, frequently displaying papillary architecture with epithelioid clear cells. In order to better understand the biology of this molecularly distinct tumor subtype, we analyze the miRNA expression profiles of Xp11 Renal cell carcinoma (RCC) compared to normal renal parenchyma using microarray and quantitative reverse transcription polymerase chain reaction (RT-PCR). We further compare Xp11 RCC with other RCC histologic subtypes using publically available datasets, identifying common and distinctive microRNA (miRNA) signatures along with the associated signaling pathways and biological processes. Overall, Xp11 RCC more closely...
    Gene expression signatures are commonly used to create cancer prognosis and diagnosis methods, yet only a small number of them are successfully deployed in the clinic since many fail to replicate performance on subsequent validation. A... more
    Gene expression signatures are commonly used to create cancer prognosis and diagnosis methods, yet only a small number of them are successfully deployed in the clinic since many fail to replicate performance on subsequent validation. A primary reason for this lack of reproducibility is the fact that these signatures attempt to model the highly variable and unstable genomic behavior of cancer. Our group recently introduced gene expression anti-profiles as a robust methodology to derive gene expression signatures based on the observation that while gene expression measurements are highly heterogeneous across tumors of a specific cancer type relative to the normal tissue, their degree of deviation from normal tissue expression in specific genes involved in tissue differentiation is a stable tumor mark that is reproducible across experiments and cancer types. Here we show that constructing gene expression signatures based on variability and the anti-profile approach yields classifiers c...
    ABSTRACT Contemporary machine intelligence is far from realizing prominent hallmarks of human understanding and consciousness. The primary shortcoming of current methods can be attributed to the difficulty or implausibility of foreseeing... more
    ABSTRACT Contemporary machine intelligence is far from realizing prominent hallmarks of human understanding and consciousness. The primary shortcoming of current methods can be attributed to the difficulty or implausibility of foreseeing and pre-programming each and every piece of information or knowledge. Emergent intelligence methods based on principles of self learning and self organization have been successful in infusing traits of understanding in machines. This understanding is in contrast to the constrained intelligence permeated on machines by classical approaches of intelligence following supervised knowledge acquisition mechanisms. The primary objective of this paper is to review current work in emergent intelligence methods and discuss means of orchestrating these in to a practical model that resembles the process of human understanding. The paper delineates intricacies of self-learning in humans from both biological and psychological perspectives. Following a discussion of several artificial models of the human mind that have been researched and documented at the conceptual level, we propose a comparatively pragmatic approach based on a novel unsupervised learning algorithm, the GSOM algorithm. This algorithm has been successfully applied to many real world knowledge acquisition and pattern discovery problems. The paper concludes with a further discussion of research developments in emergent systems, which we perceive to be the stepping stones in the search for true machine understanding.
    Motivation: Complex cancer omics data can be difficult to interpret and analyze with standard statistical methods. We thereby propose an innovative data representation that drastically reduces complexity while improving usability and... more
    Motivation: Complex cancer omics data can be difficult to interpret and analyze with standard statistical methods. We thereby propose an innovative data representation that drastically reduces complexity while improving usability and interpretability for complex cancer phenotype analysis. Method: Despite recent advances in omics technologies, the robustness of predictive biomarkers in cancer remains severely limited. We hypothesize that this is primarily due to an overemphasis on applying statistical learning methods without taking into consideration the underlying biological processes driving cancer. We therefore propose a new approach based on representing data based on the comparison to a baseline group. This results in a data format that encodes biologically meaningful information and can be easily analyzed. We apply this transformation to publicly available datasets obtained across multiple tumor types using different omics technologies. For each cancer phenotype considered, we...