Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets

  • Conference paper
Bioinformatics and Computational Biology (BICoB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Included in the following conference series:

  • 1237 Accesses

Abstract

We propose a two-step biclustering approach to mine co-regulation patterns of a given reference gene to discover other genes that function in a common biological process. Currently, several successful methods utilize Pearson Correlation Coefficient (PCC) based gene expression analysis across all samples in datasets. However, microarray datasets are fraught with spurious samples or samples of diverse origin, and many genes/proteins that function in the same biological pathway may be missed. The novel PCC based biclustering algorithm introduced in this paper identifies subsets of genes with high correlation by stringently filtering the data and reducing false negatives due to spurious or unrelated samples in a dataset. Then, correlation information extracted from resulting biclusters are synthesized. We applied our method using the breast cancer associated tumor suppressors, BRCA1 and BRCA2, as the reference proteins to reveal genes and proteins important in the complex process of breast tumor formation. Experiments on 20 very large datasets showed that the top-ranked genes were remarkably enriched for genes that regulate the mitotic spindle and cytokinesis. The results imply that BRCA1 and BRCA2 proteins, which are considered to be DNA repair factors, have critical function regarding the mitotic spindle as well. Initial biological verification reveal that this identified factor function to control both centrosome dynamics, and also, surprisingly, DNA repair. Thus, this biclustering approach is successful at identifying proteins with highly related function from extremely complex datasets, and permits novel insights into gene function.

This work was supported in parts by the U.S. DOE SciDAC Institute Grant #DE-FC02-06ER2775; the U.S. National Science Foundation Grants #CNS-0643969, #CCF-0342615, and #CNS-0426241, #CNS-0403342; Ohio Supercomputing Center Grant #PAS0052.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, H.: Extreme self-organization in networks constructed from gene expression data. Phys. Rev. Lett. 89(26), 268702 (2002)

    Article  PubMed  Google Scholar 

  2. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order-preserving submatrix problem. Int’l Conf. Comput. Biol., 49–57 (2002)

    Google Scholar 

  3. Jiang, D., Pei, J., Zhang, A.: DHC: A density-based hierarchical clustering method for time series gene expression data. In: IEEE Symp. Bioinform. and Bioeng., pp. 393–400 (2003)

    Google Scholar 

  4. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1), 24–45 (2004)

    Article  CAS  Google Scholar 

  5. Pujana, M.A., et al.: Network modeling links breast cancer susceptibility and centrosome dysfunction. Nature Genetics 39(11), 1338–1349 (2007)

    Article  CAS  PubMed  Google Scholar 

  6. Devore, J.L.: Probability and Statistics for Engineeringand Sciences. Brook/Cole Publishing Company (1991)

    Google Scholar 

  7. Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucl. Acids Res. 30(1), 207–210 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cheng, Y., Church, G.M.: Biclustering of expression data. Int’l Conf. Intelligent Systems for Molecular Biology, 93–103 (2000)

    Google Scholar 

  9. Segal, E., Taskar, B., Gasch, A., Friedman, N., Koller, D.: Rich probabilistic models for gene expression. Bioinformatics 17(suppl. 1), S243–S252 (2001)

    Article  Google Scholar 

  10. Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: ACM SIGMOD (2002)

    Google Scholar 

  11. Liu, J., Wang, W.: Op-cluster: Clustering by tendency in high dimensional space. In: IEEE Int’l. Conf. Data Mining, p. 187 (2003)

    Google Scholar 

  12. Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Pac. Symp. Biocomp., vol. 8 (2003)

    Google Scholar 

  13. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), 136–144 (2002)

    Article  Google Scholar 

  14. Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24(5), 603–619 (2002)

    Article  Google Scholar 

  16. Casella, G., Wells, M.T.: Is Pitman closeness a reasonable criterion?: Comment. Journal of the American Statistical Association 88(421), 70–71 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bozdağ, D., Parvin, J.D., Catalyurek, U.V. (2009). A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets . In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00727-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00726-2

  • Online ISBN: 978-3-642-00727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics