Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2382936.2382985acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Alignment seeding strategies using contiguous pyrimidine purine matches

Published: 07 October 2012 Publication History

Abstract

Large-scale genomic pairwise aligners usually start with a seeding procedure, which scans two sequences to obtain base matches (called hits) that follow a certain pattern (called a seed). The seed pattern and size determine the sensitivity and specificity of the seeding procedure and greatly affect the alignment accuracy and computational efficiency. Much effort has been focused on obtaining an optimal (set of) spaced seed(s) to improve sensitivity. However, specificity also becomes a big concern when aligning very long genomic sequences. We present a seeding strategy that identifies contiguous pyrimidine purine (py·pu) matches. This model may improve sensitivity and specificity simultaneously compared to a contiguous base match model. We further present a seeding strategy that identifies contiguous py·pu matches with at least a certain number of contiguous base matches. This model significantly improves sensitivity and specificity simultaneously compared to the base match model. It can also achieve better sensitivity than an optimal spaced seed without loss of specificity, when the ratio of transition to transversion is high. Our examination on the CFTR region of 2M bases between human and mouse shows that this new model can have very high specificity without much loss of sensitivity compared to an optimal spaced seed. Based on the characteristics (e.g. the sequence similarity, the ratio between transition and transversion, and the lengths of gapless alignments) of alignments between human and other mammals, the new seeding strategies are promising in improving alignment quality of a wide selection of species pairs. This paper also lays the groundwork for future advancement of applying spaced patterns in these seeding strategies.

References

[1]
S. Altschul et al. Basic local alignment search tool. J. Mol. Biol., 215:403--410, 1990.
[2]
S. Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25:3389--402, 1997.
[3]
J. Buhler et al. Designing seeds for similarity search in genomic DNA. RECOMB 2003, pages 67--75, 2003.
[4]
C. Dewey et al. Parametric alignment of drosophila genomes. PLoS Comput Biol, 2(6):e73, 2006.
[5]
R. Harris. Improved pairwise alignment of genomic DNA. Ph.D. thesis, Penn State Univ, 2007.
[6]
I. Herms and S. Rahmann. Computing alignment seed sensitivity with probabilistic arithmetic automata. Proceedings of WABI 2008, 5251:318--329, 2008.
[7]
W. Kent et al. The human genome browser at UCSC. Genome Res., 12:996--1006, 2005.
[8]
G. Kucherov, L. Noé, and M. A. Roytberg. A unifying framework for seed sensitivity and its application to subset seeds. JBCB, 4(2):553--570, 2006.
[9]
M. Li, B. Ma, D. Kisman, and J. Tromp. Patternhunter II: Highly sensitive and fast homology search. Genome Informatics, 14:164--175, 2003.
[10]
B. Ma et al. PatternHunter: faster and more sensitive homology search. Bioinformatics, 18(3):440--445, 2002.
[11]
E. Margulies et al. Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat. Rev. Genet., 9:303--13, 2008.
[12]
M. Nei et al. Molecular Evolution and Phylogenetics. Oxford University Press, New York, 2000.
[13]
L. Noe et al. Improved hit criteria for DNA local alignment. BMC Bioinformatics, 5:149--157, 2004.
[14]
S. Schwartz et al. Human-mouse alignments with BLASTZ. Genome Res., 13:103--107, 2003.
[15]
A. Smit et al. RepeatMasker Open-3.0, http://www.repeatmasker.org. 1996--2004.
[16]
Y. Sun and J. Buhler. Designing multiple simultaneous seeds for DNA similarity search. RECOMB 2004, pages 76--84, 2004.
[17]
Y. Sun and J. Buhler. Choosing the best heuristic for seeded alignment of DNA sequences. BMC Bioinformatics, 7:133--144, 2006.
[18]
J. Yang and L. Zhang. Run probabilities of seed-like patterns and identifying good transition seeds. J. Comput. Biol., 15:1295--1313, 2008.
[19]
L. Zhang. Superiority of spaced seeds for homology search. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4:496--505, 2007.
[20]
L. Zhou et al. Universal seeds for cDNA-to-genome comparison. BMC Bioinformatics, 9:36, 2008.
[21]
L. Zhou and L. Florea. Designing sensitive and specific seeds for cross-species mRNA-to-genome alignment. J. Comput. Biol., 14:113--130, 2007.

Index Terms

  1. Alignment seeding strategies using contiguous pyrimidine purine matches

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
    October 2012
    725 pages
    ISBN:9781450316705
    DOI:10.1145/2382936
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. alignment
    2. genomic sequence
    3. matches
    4. model
    5. seeding

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    BCB' 12
    Sponsor:

    Acceptance Rates

    BCB '12 Paper Acceptance Rate 33 of 159 submissions, 21%;
    Overall Acceptance Rate 254 of 885 submissions, 29%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 77
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media