Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1982185.1982208acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Rapid computation of distance estimators from nucleotide and amino acid alignments

Published: 21 March 2011 Publication History

Abstract

Distance estimators are needed as input for popular distance based phylogenetic reconstruction methods such as UPGMA and neighbour-joining. Computation of these takes O(n2l) time for n sequences with length l which is usually fast compared to reconstructing a phylogenetic tree of n taxa. However, with the introduction of fast search heuristics for distance based phylogenetic reconstruction methods, the computation of distance estimators has become a bottleneck especially for long sequences. Elias et al. have shown how distance estimators can be computed efficiently from unaligned nucleotide sequences using vectorisation of code. In this paper we extend their method to allow efficient computation of distance estimators from aligned nucleotide and amino acid sequences using vectorisation of code and parallelisation on both CPUs and GPUs. Experiments are presented which show an increase in performance of up to 36x and 8x relative to the naive approach when computing distance estimators from nucleotides and amino acids alignments respectively.

References

[1]
Hiv databases. http://www.hiv.lanl.gov.
[2]
H. E. McClymont. Molecular phylogeny of microsporidian parasites with special attention to mollusc-infective species 2006.
[3]
T. Carver. distmat - the European Molecular Biology Open Software Suite (EMBOSS). http://emboss.sourceforge.net/index.html.
[4]
I. Elias and J. Lagergren. Fast computation of distance estimators. Bioinformatics, 8: 89, 2007.
[5]
David Eppstein. Fast hierarchical clustering and other applications of dynamic closest pairs. Journal of Experimental Algorithmics, 5: 1, 2000.
[6]
J. Felsenstein. Phylip 3.69. http://evolution.genetics.washington.edu/phylip.html.
[7]
R. D. Finn et al. Pfam: clans, web tools and services. Nucleic Acids Research, Database Issue 34:D247--D251, 2006.
[8]
Mark Harris, S Sengupta, and JD Owens. Parallel prefix sum (scan) with CUDA. GPU Gems, (April), 2007.
[9]
K. Howe, A. Bateman, and R. Durbin. QuickTree: Building huge neighbour-joining trees of protein sequences. Bioinformatics, 18(11): 1546--1547, 2002.
[10]
T. H. Jukes and C. R. Cantor. Evolution of protein molecules. Mammalian Protein Metabolism, pages 21--123, 1969.
[11]
M. Nei K. Tamura. Estimation of the number of nucleotide substitutions in the control region of mitochondrial dna in humans and chimpanzees. Molecular Biology and Evolution, 10(3): 512--526, 1993.
[12]
M. Kimura. A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 1980.
[13]
M. Kimura. The Neutral Theory of Molecular Evolution. Cambridge University Press, 1983.
[14]
V. Lombard et al. Embl-align: a new public nucleotide and amino acid multiple sequence alignment database. Bioinformatics, 18(5): 753--764, 2002.
[15]
C. D. Michener and R. R. Sokal. A quantitative approach to a problem in classification. Evolution, 11: 130--162, 1957.
[16]
N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4: 406--425, 1987.
[17]
M. Simonsen, T. Mailund, and C. N. S. Pedersen. Rapid neighbour-joining. Algorithms in Bioinformatics, Proceedings 8th International Workshop, volume 5251, pages 113--123, 2008.
[18]
V. W. Lee et al. Debunking the 100X GPU vs CPU myth an evaluation of throughput computing on CPU and GPU Proceedings of the 37th annual international symposium on Computer architecture 2010

Cited By

View all
  • (2025)Deciphering the Population Characteristics of Leiqiong Cattle Using Whole-Genome Sequencing DataAnimals10.3390/ani1503034215:3(342)Online publication date: 24-Jan-2025
  • (2023) DNA methylation differences between stick insect ecotypes Molecular Ecology10.1111/mec.1716532:24(6809-6823)Online publication date: 21-Oct-2023
  • (2020)Symbiosis genes show a unique pattern of introgression and selection within a Rhizobium leguminosarum species complexMicrobial Genomics10.1099/mgen.0.0003516:4Online publication date: 1-Apr-2020
  • Show More Cited By

Index Terms

  1. Rapid computation of distance estimators from nucleotide and amino acid alignments

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing
          March 2011
          1868 pages
          ISBN:9781450301138
          DOI:10.1145/1982185
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 21 March 2011

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. GPU
          2. parallelization
          3. phylogenetic distance estimator
          4. phylogenetic inference
          5. vectorization

          Qualifiers

          • Research-article

          Conference

          SAC'11
          Sponsor:
          SAC'11: The 2011 ACM Symposium on Applied Computing
          March 21 - 24, 2011
          TaiChung, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

          Upcoming Conference

          SAC '25
          The 40th ACM/SIGAPP Symposium on Applied Computing
          March 31 - April 4, 2025
          Catania , Italy

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 11 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Deciphering the Population Characteristics of Leiqiong Cattle Using Whole-Genome Sequencing DataAnimals10.3390/ani1503034215:3(342)Online publication date: 24-Jan-2025
          • (2023) DNA methylation differences between stick insect ecotypes Molecular Ecology10.1111/mec.1716532:24(6809-6823)Online publication date: 21-Oct-2023
          • (2020)Symbiosis genes show a unique pattern of introgression and selection within a Rhizobium leguminosarum species complexMicrobial Genomics10.1099/mgen.0.0003516:4Online publication date: 1-Apr-2020
          • (2020)Antibiotic Treatment Regimes as a Driver of the Global Population Dynamics of a Major Gonorrhea LineageMolecular Biology and Evolution10.1093/molbev/msaa28238:4(1249-1261)Online publication date: 3-Nov-2020
          • (2016)Research on Jukes-Cantor Model Parallel Algorithm Based on OpenMPBig Data Technology and Applications10.1007/978-981-10-0457-5_8(74-82)Online publication date: 2-Feb-2016
          • (2014)A computational analysis of the structural determinants of APOBEC3’s catalytic activity and vulnerability to HIV-1 VifVirology10.1016/j.virol.2014.09.023471-473(105-116)Online publication date: Dec-2014

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media