Multivariate analysis of gene expression data and functional information

Multivariate analysis of gene expression data and functional information: automated methods for functional genomics

January 2005

Author:
Andreas Rechtsteiner
Portland State University
,
Adviser:
Andrew Fraser
Portland State University

Publisher:

Portland State University
P. O. Box 751 Portland, OR
United States

ISBN:978-0-542-24926-6

Order Number:AAI3183764

Pages:

162

Purchase on ProQuest

Bibliometrics

Abstract

Increasing amounts of data obtained from high-throughput experiments in molecular biology require new analysis methods. One such high-throughput technology uses microarrays for the simultaneous measurement of expression levels of thousands of genes. Measuring the expression of many genes, even whole genomes, has proven useful for understanding the molecular basis of diseases and has allowed for a more systems level view of cellular processes. Besides the statistical analysis of such large amounts of data, another challenge is the biological interpretation of the analysis results. Biological experts trying to understand the biological meaning of the expression results are often overwhelmed by the amount of functional information in the literature and databases. We developed algorithms that address both of these challenges. The Clustering in SVD Subspace (CSS) algorithm identifies similarly regulated genes in time series gene expression data by identifying gene clusters in two-dimensional Singular Value Decomposition (SVD) subspaces. The MeSH Functional Theme Finder (MFTF) algorithm was developed for the discovery of biomedical functional themes associated in the literature with groups of genes or proteins, e.g. as obtained from high-throughput experiments. The CSS algorithm has been applied to two expression data sets, one obtained during the yeast cell-cycle and the second after herpes cytomegalovirus infection of human fibroblast cells. The algorithm successfully identified clusters of genes whose expression is similarly regulated during the respective cellular processes. The MFTF algorithm was applied to the gene clusters identified via the CSS algorithm in the herpes data set. The MFTF algorithm identified, in an automated way, the same main functional themes that were identified by a biological expert. In addition, the algorithm identified new relevant functional themes associated with the gene clusters. Finally, a large-scale validation of the MFTF algorithm is presented. The vector space model underlying the MFTF algorithm was used to correctly classify a large number of proteins into families of functionally related proteins, proving that keywords from literature can be used to capture functional relationships of proteins or genes.

Contributors

Andrew M. Fraser
The University of Texas at Austin
- Publication Years1988 - 2008
- Publication counts3
- Citation count24
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article8
View Full Profile
Andreas Rechtsteiner
Indiana University Bloomington
- Publication Years1999 - 2009
- Publication counts5
- Citation count5
- Available for Download2
- Downloads (cumulative)43
- Downloads (12 months)27
- Downloads (6 weeks)1
- Average Downloads per Article22
- Average Citation per Article1
View Full Profile

Index Terms

Multivariate analysis of gene expression data and functional information: automated methods for functional genomics

Comments

Recommendations

Literature based Bayesian analysis of gene expression data
BIBMW '11: Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Recent research has focused on incorporating biological function and pathway information into the analysis of gene expression data, partly as a means of compensating for insufficient experimental replications, low signal to noise, lack of ...
Statistical methods for gene set co-expression analysis

Motivation: The power of a microarray experiment derives from the identification of genes differentially regulated across biological conditions. To date, differential regulation is most often taken to mean differential expression, and a number of ...
Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data

Motivation: Metabolomics is a rapidly evolving field that holds promise to provide insights into genotype–phenotype relationships in cancers, diabetes and other complex diseases. One of the major informatics challenges is providing tools that link ...

Browse Theses

Sections

Index Terms

Literature based Bayesian analysis of gene expression data

Statistical methods for gene set co-expression analysis

Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data

Sections

Save to Binder

Index Terms

Recommendations

Literature based Bayesian analysis of gene expression data

Statistical methods for gene set co-expression analysis

Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data