Abstract
Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
“What we call chaos is just patterns we haven’t recognized. What we call random is just patterns we can’t decipher.”
— Chuck Palahniuk, Survivor
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: The order-preserving submatrix problem. In: Proceedings of the International Conference on Computational Biology, pp 49–57
Jiang D, Pei J, Zhang A (2003) DHC: a density-based hierarchical clustering method for time series gene expression data. In: Proceedings IEEE Symposium on BioInformatics and Bioengineering, pp 393–400
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1):24–45
Pujana MA, Han J-DJ, LM Starita, Stevens KN, Tewari M, Ahn JS, Rennert G, Moreno V, Kirchhoff T, Gold B, Assmann V, ElShamy WM, Rual J-F, Levine D, Rozek LS, Gelman RS, Gunsalus KC, Greenberg RA, Sobhian B, Bertin N, Venkatesan K, Ayivi-Guedehoussou N, Sole X, Hernandez P, Lazaro C, Nathanson KL, Weber BL, Cusick ME, Hill DE, Offit K, Livingston DM, Gruber SB, Parvin JD, Vidal M (2007) Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39(11):1338–1349
Owen AB, Stuart J, Mach K, Villeneuve AM, Kim S (2003) A gene recommender algorithm to identify coexpressed genes in C. elegans. Genome Res 13(8):1828–1837
Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG (2007) Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23:2692–2699
Dhollander T, Sheng Q, Lemmens K, De Moor B, Marchal K, Moreau Y (2007) Query-driven module discovery in microarray data. Bioinformatics 23:2573–2580
Adler P, Kolde R, Kull M, Tkachenko A, Peterson H, Reimand J, Vilo J (2009) Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods. Genome Biol 10:R139
Bozdağ D, Parvin JD, Çatalyürek ÜV (2009) A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Proceedings of 1st International Conference on Bioinformatics and Computational Biology, pp 151–163
Zhao H, Cloots L, Van den Bulcke T, Wu Y, De Smet R, Storms V, Meysman P, Engelen K, Marchal K (2011) Query-based biclustering of gene expression data using probabilistic relational models. BMC Bioinf 12(Suppl 1):S37
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology, pp 93–103
Segal E, Taskar B, Gasch A, Friedman N, Koller D (2001) Rich probabilistic models for gene expression. Bioinformatics 17(suppl_1):S243–S252
Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceedings of ACM SIGMOD
Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Tech. Rep., Stanford University
Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39(12):2464–2477
Mejía-Roa E, Carmona-Saez P, Nogales R, Vicente C, Vázquez M, Yang XY, García C, Tirado F, Pascual-Montano A (2008) bioNMF: a web-based tool for nonnegative matrix factorization in biology. Nucleic Acids Res 36(suppl 2):W523–W528
Gu J, Liu JS (2008) Bayesian biclustering of gene expression data. BMC Genomics 9(Suppl 1):S4
Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W et al (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12):1520–1527
Painsky A, Rosset S (2012) Exclusive row biclustering for gene expression using a combinatorial auction approach. In: Proceedings of the 2012 I.E. 12th International Conference on Data Mining, pp 1056–1061. IEEE Computer Society
Joung J-G, Kim S-J, Shin S-Y, Zhang B-T (2012) A probabilistic coevolutionary biclustering algorithm for discovering coherent patterns in gene expression dataset. BMC Bioinf 13(Suppl 17):S12
Flores JL, Inza I, Larrañaga P, Calvo B (2013) A new measure for gene expression biclustering based on non-parametric correlation. Comput Methods Prog Biomed 112(3):367–397
Sun P, Speicher NK, Röttger R, Guo J, Baumbach J (2014) Bi-force: large-scale bicluster editing and its application to gene expression data biclustering. Nucleic Acids Res. doi:10.1093/nar/gku201
Chakraborty A (2005) Biclustering of gene expression data by simulated annealing. In: Proceedings of Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005, pp 627–632
Liew AW-C, Law N-F, Yan H (2011) Recent patents on biclustering algorithms for gene expression data analysis. Recent Pat DNA Gene Seq 5(2):117–125
Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: Proceedings of the 7th International Conference on Advanced Data Mining and Applications - Volume Part I, ADMA’11, pp 190–200. Springer, Berlin/Heidelberg
An J, Liew AW-C, Nelson CC (2012) Seed-based biclustering of gene expression data. PLoS ONE 7:e42431, 08
Kiraly A, Abonyi J, Laiho A, Gyenesei A (2012) Biclustering of high-throughput gene expression data with bicluster miner. In: IEEE 12th International Conference on Data Mining Workshops (ICDMW), 2012, pp 131–138
Liu J, Wang J, Wang W (2004) Biclustering in gene expression data by tendency. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 182–193. IEEE Computer Society
Liu J, Wang J, Wang W (2004) Gene ontology friendly biclustering of expression profiles. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 436–447. IEEE Computer Society
Madeira S, Oliveira A (2005) A linear time biclustering algorithm for time series gene expression data. In: Casadio R, Myers G (eds) Algorithms in bioinformatics. Lecture Notes in Computer Science, vol 3692, pp 39–52, Springer, Berlin/Heidelberg
Pontes B, Giraldéz R, Aguilar-Ruiz JS (2013) Configurable pattern-based evolutionary biclustering of gene expression data. Algorithms Mol Biol 8:4
Yang W-H, Dai D-Q, Yan H (2011) Finding correlated biclusters from gene expression data. IEEE Trans Knowl Data Eng 23:568–584
Yoon S, Nardini C, Benini L, De Micheli G (2005) Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE/ACM Trans Comput Biol Bioinf 2:339–354
Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. Inf Sci 178(6):1479–1497
Bryan K (2005) Biclustering of expression data using simulated annealing. In: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, CBMS’05, (Washington, DC, USA), pp 383–388. IEEE Computer Society
Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. Trans Inf Tech Biomed 10:519–525
Bleuler S, Prelic A, Zitzler E (2004) An EA framework for biclustering of gene expression data. In: Congress on Evolutionary Computation, 2004 (CEC2004), vol 1, pp 166–173
Divina F, Aguilar-Ruiz J (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18:590–602
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS (2010) Correlation-based scatter search for discovering biclusters from gene expression data. In: Proceedings of the 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, EvoBIO’10, pp 122–133. Springer, Berlin/Heidelberg
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS (2011) A comparative analysis of biclustering algorithms for gene expression data. BioData Mining 4:3
Erten C, Sözdinler M (2009) Biclustering expression data based on expanding localized substructures. In: Rajasekaran S (ed) Bioinformatics and computational biology. Lecture Notes in Computer Science, vol 5462, pp 224–235. Springer, Berlin/Heidelberg
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(Supplement 1):136–144
Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlinear Soft Matter Phys 67:031902
Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13(4):703–716
Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22:1122–1129
Li G, Ma Q, Tang H, Paterson AH, Xu Y (2009) QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res 37(15):e101
Huttenhower C, Mutungu KT, Indik N, Yang W, Schroeder M, Forman JJ, Troyanskaya OG, Coller HA (2009) Detailing regulatory networks through large scale data integration. Bioinformatics 25:3267–3274
Voggenreiter O, Bleuler S, Gruissem W (2012) Exact biclustering algorithm for the analysis of large gene expression data sets. BMC Bioinf 13(Suppl 18):A10
Bryan K, Cunningham P (2006) Bottom-up biclustering of expression data. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2006 (CIBCB ’06), pp 1–8
Murali T, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 8:77–88
Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. In: Proceedings of IEEE International Conference on Data Mining, p 187
Freitas AV, Ayadi W, Elloumi M, Oliveira J, Oliveira J, Hao J-K (2013) Survey on biclustering of gene expression data, pp 591–608. Wiley, New York
Bozdağ D, Kumar A, Çatalyürek ÜV (2010) Comparative Analysis of Biclustering Algorithms. In: ACM International Conference on Bioinformatics and Computational Biology
Chia BKH, Karuturi RKM (2010) Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms. Algorithms Mol Biol 5(1):8
Eren K, Deveci M, Küçüktunç O, Çatalyürek ÜV (2012) A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinform
Oghabian A, Kilpinen S, Hautaniemi S, Czeizler E (2014) Biclustering methods: Biological relevance and application in gene expression analysis. PloS one 9(3):e90801
Bhattacharya A, De RK (2009) Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics 25(21):2795–2801
Casella G, Wells MT (1993) Is Pitman closeness a reasonable criterion: comment. J Am Stat Assoc 88(421):70–71
Mian O, Wang S, Zhu S, Gnanapragasam M, Graham L, Bear H, Ginder G (2011) Methyl-binding domain protein 2-dependent proliferation and survival of breast cancer cells. Mol Cancer Res 9(8):1152–62
Kioulafa M, Kaklamanis L, Stathopoulos E, Mavroudis D, Georgoulias V, Lianidou ES (2009) Kallikrein 10 (KLK10) methylation as a novel prognostic biomarker in early breast cancer. Ann Oncol 20:1020–1025
Dorszewska J, Florczak J, Rozycka A, Jaroszewska-Kolecka J, Trzeciak WH, Kozubski W (2005) Polymorphisms of the CHRNA4 gene encoding the alpha4 subunit of nicotinic acetylcholine receptor as related to the oxidative DNA damage and the level of apoptotic proteins in lymphocytes of the patients with Alzheimer’s disease. DNA Cell Biol 24:786–794
Zhang L, Farrell JJ, Zhou H, Elashoff D, Akin D, Park N-H, Chia D, Wong DT (2010) Salivary transcriptomic biomarkers for detection of resectable pancreatic cancer. Gastroenterology 138(3):949–957, e1–7
Lindahl M, Poteryaev D, Yu L, Arumae U, Timmusk T, Bongarzone I, Aiello A, Pierotti MA, Airaksinen MS, Saarma M (2001) Human glial cell line-derived neurotrophic factor receptor alpha 4 is the receptor for persephin and is predominantly expressed in normal and malignant thyroid medullary cells. J Biol Chem 276:9344–9351
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this protocol
Cite this protocol
Deveci, M., Küçüktunç, O., Eren, K., Bozdağ, D., Kaya, K., Çatalyürek, Ü.V. (2015). Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering. In: Guzzi, P. (eds) Microarray Data Analysis. Methods in Molecular Biology, vol 1375. Humana Press, New York, NY. https://doi.org/10.1007/7651_2015_246
Download citation
DOI: https://doi.org/10.1007/7651_2015_246
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3172-9
Online ISBN: 978-1-4939-3173-6
eBook Packages: Springer Protocols