Abstract
Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.
Similar content being viewed by others
References
Cohen J. (2004). Bioinformatics—an introduction for computer scientists. ACM Comput. Surv. 36(2): 122–158
Lizhuang, Z., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD Conference, pp. 694–705 (2005)
Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Third Annual International Conference on Computational Molecular Biology (April 1999)
Jiang, D., Pei, J., Zhang, A.: Articles on microarray data mining: towards interactive exploration of gene expression patterns. ACM SIGKDD Explorations Newsletter 5(2), 79–90 (December 2003)
Tanay A., Sharan R. and Shamir R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics 18: S136–S144
Liu, J., Wang, W.: Op-cluster: clustering by tendency in high dimensional space. 3rd IEEE International Conference on Data Mining, pp. 187–194. Melbourne (2003)
Murali T.M. and Kasif S. (2003). Extracting conserved gene expression motifs from gene expression data. Pac. Symp. Biocomput. 8: 77–88
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) (January 2004)
Araujo, R., Trielli, G., Orair, G., Meira, W. Jr. Ferreira, R., Guedes, D.: Partricluster: a scalable parallel algorithm for gene expression analysis. In: SBAC-PAD ’06: Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’06), pp. 3–10. IEEE Computer Society, Washington, DC, USA (2006)
Ferreira, R., Meira, W. Jr., Guedes, D., Drummond, L., Coutinho, B., Teodoro, G., Tavares, T., Araujo, R., Ferreira, G.: Anthill: a scalable run-time environment for data mining applications. In: Proceedings of the 17th International Symposium on Computer Architecture and High Performance Computing. Rio de Janeiro, RJ (2005)
Veloso, A., Meira, W. Jr., Ferreira, R., Guedes, D.: Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining. In: ECML/PKDD 2004 Conference, pp. 647–652 ACM Press (2004)
Acharya, A., Uysal, M., Saltz, J.H.: Active disks: programming model, algorithms and evaluation. In: Architectural Support for Programming Languages and Operating Systems, pp. 81–91 (1998)
Beynon, M., Ferreira, R., Kurc, T.M., Sussman, A., Saltz, J.H.: Datacutter: middleware for filtering very large scientific datasets on archival storage systems. In: IEEE Symposium on Mass Storage Systems, pp. 119–134 (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Braga Araújo, R., Trielli Ferreira, G.H., Orair, G.H. et al. The ParTriCluster Algorithm for Gene Expression Analysis. Int J Parallel Prog 36, 226–249 (2008). https://doi.org/10.1007/s10766-007-0067-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-007-0067-9