Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Multivariate analysis of gene expression data and functional information: automated methods for functional genomics
Publisher:
  • Portland State University
  • P. O. Box 751 Portland, OR
  • United States
ISBN:978-0-542-24926-6
Order Number:AAI3183764
Pages:
162
Reflects downloads up to 10 Oct 2024Bibliometrics
Skip Abstract Section
Abstract

Increasing amounts of data obtained from high-throughput experiments in molecular biology require new analysis methods. One such high-throughput technology uses microarrays for the simultaneous measurement of expression levels of thousands of genes. Measuring the expression of many genes, even whole genomes, has proven useful for understanding the molecular basis of diseases and has allowed for a more systems level view of cellular processes. Besides the statistical analysis of such large amounts of data, another challenge is the biological interpretation of the analysis results. Biological experts trying to understand the biological meaning of the expression results are often overwhelmed by the amount of functional information in the literature and databases. We developed algorithms that address both of these challenges. The Clustering in SVD Subspace (CSS) algorithm identifies similarly regulated genes in time series gene expression data by identifying gene clusters in two-dimensional Singular Value Decomposition (SVD) subspaces. The MeSH Functional Theme Finder (MFTF) algorithm was developed for the discovery of biomedical functional themes associated in the literature with groups of genes or proteins, e.g. as obtained from high-throughput experiments. The CSS algorithm has been applied to two expression data sets, one obtained during the yeast cell-cycle and the second after herpes cytomegalovirus infection of human fibroblast cells. The algorithm successfully identified clusters of genes whose expression is similarly regulated during the respective cellular processes. The MFTF algorithm was applied to the gene clusters identified via the CSS algorithm in the herpes data set. The MFTF algorithm identified, in an automated way, the same main functional themes that were identified by a biological expert. In addition, the algorithm identified new relevant functional themes associated with the gene clusters. Finally, a large-scale validation of the MFTF algorithm is presented. The vector space model underlying the MFTF algorithm was used to correctly classify a large number of proteins into families of functionally related proteins, proving that keywords from literature can be used to capture functional relationships of proteins or genes.

Contributors
  • The University of Texas at Austin
  • Indiana University Bloomington

Recommendations