Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2488551.2488583acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Scalability and accuracy improvements of consistency-based multiple sequence alignment tools

Published: 15 September 2013 Publication History

Abstract

Multiple sequence alignment (MSA) is one of the most useful tools in bioinformatics. However, the growth of sequencing data imposes further difficulties for aligning it with traditional tools. For large-scale alignments with thousands of sequences it will be necessary to use and take profit of the high performance computing (HPC). This paper, focused on the consistency-based T-Coffee MSA tool, presents several innovative solutions with the objective of improving its efficiency, scalability and accuracy. The results obtained show that our approach doubles the speed-up of the progressive alignment, thus allowing T-Coffee to align twice as many sequences while also improving the alignment accuracy.

References

[1]
C. Do, M. Mahabhashyam, M. Brudno, and S. Batzoglou. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome research, 15(2):330--340, Feb. 2005.
[2]
D. Feng and R. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. of molecular evolution, 25(4):351--360, 1987.
[3]
K. Katoh, K. Misawa, K. Kuma, and T. Miyata. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14):3059--3066, July 2002.
[4]
K. Li. Clustalw-mpi: Clustalw analysis using distributed and parallel computing. Bioinformatics, 19(12):1585--1586, 2003.
[5]
Y. Liu, B. Schmidt, and D. L. Maskell. Msaprobs: multiple sequence alignment based on pair hidden markov models and partition function posterior probabilities. Bioinformatics, 26(16):1958--1964, 2010.
[6]
C. Notredame, D. Higgins, and J. Heringa. T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of molecular biology, 302(1):205--217, Sept. 2000.
[7]
M. Orobitg, F. Cores, F. Guirado, C. Kemena, C. Notredame, and A. Ripoll. Enhancing the scalability of consistency-based progressive multiple sequences alignment applications. In IPDPS'12: 26th International Parallel and Distributed Processing Symposium, 2012.
[8]
M. Orobitg, F. Cores, F. Guirado, C. Roig, and C. Notredame. Improving multiple sequence alignment biological accuracy through genetic algorithms. The Journal of Supercomputing, pages 1--13, 2013.
[9]
M. Orobitg, F. Guirado, C. Notredame, and F. Cores. Exploiting parallelism on progressive alignment methods. The Journal of Supercomputing, 58(2):186--194, 2009.
[10]
E. Robert. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5(1):113+, Aug. 2004.
[11]
Roshan, Usman, Livesay, and R. Dennis. Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics, 22(22):2715--2721, Nov. 2006.
[12]
D. Sankoff and J. H. Nadeau. Comparative genomics. Springer, 2000.
[13]
F. Sievers, A. Wilm, D. Dineen, T. Gibson, K. Karplus, W. Li, R. Lopez, H. McWilliam, M. Remmert, J. Soding, J. Thompson, and D. Higgins. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7(1), Oct. 2011.
[14]
E. Sonnhamer, S. Eddy, and R. Durbin. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins, 28(3):405--420, '97.
[15]
J. Thompson, D. Higgins, and T. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research, 22(22):4673--4680, 1994.
[16]
P. Tommaso, M. Orobitg, F. Guirado, F. Cores, T. Espinosa, and C. Notredame. Cloud-Coffee: Implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud. Bioinformatics, 26(15):1903--1904, 2010.
[17]
J. Zola, X. Yang, A. Rospondek, and S. Aluru. Parallel-tcoffee: A parallel multiple sequence aligner. In G. Chaudhry and S.-Y. Lee, editors, ISCA PDCS, pages 248--253. ISCA, 2007.

Cited By

View all
  • (2022)Parallel protein multiple sequence alignment approaches: a systematic literature reviewThe Journal of Supercomputing10.1007/s11227-022-04697-979:2(1201-1234)Online publication date: 22-Jul-2022
  • (2015)High Performance computing improvements on bioinformatics consistency-based multiple sequence alignment toolsParallel Computing10.1016/j.parco.2014.09.01042:C(18-34)Online publication date: 1-Feb-2015
  • (2015)Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignmentsNeurocomputing10.1016/j.neucom.2015.01.080164:C(123-136)Online publication date: 21-Sep-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
September 2013
289 pages
ISBN:9781450319034
DOI:10.1145/2488551
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. T-Coffee
  2. consistency
  3. multiple sequence alignments

Qualifiers

  • Research-article

Funding Sources

Conference

EuroMPI '13
Sponsor:
  • ARCOS
EuroMPI '13: 20th European MPI Users's Group Meeting
September 15 - 18, 2013
Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Parallel protein multiple sequence alignment approaches: a systematic literature reviewThe Journal of Supercomputing10.1007/s11227-022-04697-979:2(1201-1234)Online publication date: 22-Jul-2022
  • (2015)High Performance computing improvements on bioinformatics consistency-based multiple sequence alignment toolsParallel Computing10.1016/j.parco.2014.09.01042:C(18-34)Online publication date: 1-Feb-2015
  • (2015)Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignmentsNeurocomputing10.1016/j.neucom.2015.01.080164:C(123-136)Online publication date: 21-Sep-2015
  • (2014)The number of reduced alignments between two DNA sequencesBMC Bioinformatics10.1186/1471-2105-15-9415:1Online publication date: 1-Apr-2014
  • (2014)Progressive alignment using Shortest Common Supersequence2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2014.6968310(1113-1117)Online publication date: Sep-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media