Increasing amounts of data obtained from high-throughput experiments in molecular biology require new analysis methods. One such high-throughput technology uses microarrays for the simultaneous measurement of expression levels of thousands of genes. Measuring the expression of many genes, even whole genomes, has proven useful for understanding the molecular basis of diseases and has allowed for a more systems level view of cellular processes. Besides the statistical analysis of such large amounts of data, another challenge is the biological interpretation of the analysis results. Biological experts trying to understand the biological meaning of the expression results are often overwhelmed by the amount of functional information in the literature and databases. We developed algorithms that address both of these challenges. The Clustering in SVD Subspace (CSS) algorithm identifies similarly regulated genes in time series gene expression data by identifying gene clusters in two-dimensional Singular Value Decomposition (SVD) subspaces. The MeSH Functional Theme Finder (MFTF) algorithm was developed for the discovery of biomedical functional themes associated in the literature with groups of genes or proteins, e.g. as obtained from high-throughput experiments. The CSS algorithm has been applied to two expression data sets, one obtained during the yeast cell-cycle and the second after herpes cytomegalovirus infection of human fibroblast cells. The algorithm successfully identified clusters of genes whose expression is similarly regulated during the respective cellular processes. The MFTF algorithm was applied to the gene clusters identified via the CSS algorithm in the herpes data set. The MFTF algorithm identified, in an automated way, the same main functional themes that were identified by a biological expert. In addition, the algorithm identified new relevant functional themes associated with the gene clusters. Finally, a large-scale validation of the MFTF algorithm is presented. The vector space model underlying the MFTF algorithm was used to correctly classify a large number of proteins into families of functionally related proteins, proving that keywords from literature can be used to capture functional relationships of proteins or genes.
Index Terms
- Multivariate analysis of gene expression data and functional information: automated methods for functional genomics
Recommendations
Literature based Bayesian analysis of gene expression data
BIBMW '11: Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine WorkshopsRecent research has focused on incorporating biological function and pathway information into the analysis of gene expression data, partly as a means of compensating for insufficient experimental replications, low signal to noise, lack of ...
Statistical methods for gene set co-expression analysis
Motivation: The power of a microarray experiment derives from the identification of genes differentially regulated across biological conditions. To date, differential regulation is most often taken to mean differential expression, and a number of ...
Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data
Motivation: Metabolomics is a rapidly evolving field that holds promise to provide insights into genotype–phenotype relationships in cancers, diabetes and other complex diseases. One of the major informatics challenges is providing tools that link ...