Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

The ParTriCluster Algorithm for Gene Expression Analysis

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cohen J. (2004). Bioinformatics—an introduction for computer scientists. ACM Comput. Surv. 36(2): 122–158

    Article  Google Scholar 

  2. Lizhuang, Z., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD Conference, pp. 694–705 (2005)

  3. Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Third Annual International Conference on Computational Molecular Biology (April 1999)

  4. Jiang, D., Pei, J., Zhang, A.: Articles on microarray data mining: towards interactive exploration of gene expression patterns. ACM SIGKDD Explorations Newsletter 5(2), 79–90 (December 2003)

  5. Tanay A., Sharan R. and Shamir R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics 18: S136–S144

    Google Scholar 

  6. Liu, J., Wang, W.: Op-cluster: clustering by tendency in high dimensional space. 3rd IEEE International Conference on Data Mining, pp. 187–194. Melbourne (2003)

  7. Murali T.M. and Kasif S. (2003). Extracting conserved gene expression motifs from gene expression data. Pac. Symp. Biocomput. 8: 77–88

    Google Scholar 

  8. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) (January 2004)

  9. Araujo, R., Trielli, G., Orair, G., Meira, W. Jr. Ferreira, R., Guedes, D.: Partricluster: a scalable parallel algorithm for gene expression analysis. In: SBAC-PAD ’06: Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’06), pp. 3–10. IEEE Computer Society, Washington, DC, USA (2006)

  10. Ferreira, R., Meira, W. Jr., Guedes, D., Drummond, L., Coutinho, B., Teodoro, G., Tavares, T., Araujo, R., Ferreira, G.: Anthill: a scalable run-time environment for data mining applications. In: Proceedings of the 17th International Symposium on Computer Architecture and High Performance Computing. Rio de Janeiro, RJ (2005)

  11. Veloso, A., Meira, W. Jr., Ferreira, R., Guedes, D.: Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining. In: ECML/PKDD 2004 Conference, pp. 647–652 ACM Press (2004)

  12. Acharya, A., Uysal, M., Saltz, J.H.: Active disks: programming model, algorithms and evaluation. In: Architectural Support for Programming Languages and Operating Systems, pp. 81–91 (1998)

  13. Beynon, M., Ferreira, R., Kurc, T.M., Sussman, A., Saltz, J.H.: Datacutter: middleware for filtering very large scientific datasets on archival storage systems. In: IEEE Symposium on Mass Storage Systems, pp. 119–134 (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renata Braga Araújo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Braga Araújo, R., Trielli Ferreira, G.H., Orair, G.H. et al. The ParTriCluster Algorithm for Gene Expression Analysis. Int J Parallel Prog 36, 226–249 (2008). https://doi.org/10.1007/s10766-007-0067-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-007-0067-9

Keywords