Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-29627-7_29guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Alignment-Free sequence comparison based on next generation sequencing reads: extended abstract

Published: 21 April 2012 Publication History

Abstract

Next generation sequencing (NGS) technologies have generated enormous amount of shotgun read data and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D2, D2*, and D2S, both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D2* and D2S outperform D2 for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D2* and D2S. Finally, variations of these statistics, d2, d2* and d2S, respectively, are used to first cluster 5 mammalian species with known phylogenetic relationships and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d2S are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d2S provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

References

[1]
Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proceedings of the National Academy of Sciences of the United States of America 83(14), 5155-5159 (1986).
[2]
Domazet-Lošo, M., Haubold, B.: Alignment-free detection of local similarity among viral and bacterial genomes. Bioinformatics 27(11), 1466-1472 (2011).
[3]
Ivan, A., Halfon, M., Sinha, S.: Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs. Genome Biology 9(1), R22 (2008).
[4]
Jun, S.R., Sims, G.E., Wu, G.A., Kim, S.H.: Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution. Proceedings of the National Academy of Sciences of the United States of America 107(1), 133-138 (2010).
[5]
Leung, G., Eisen, M.B.: Identifying cis-regulatory sequences by word profile similarity. PLoS One 4, e6901 (2009).
[6]
Lippert, R.A., Huang, H.Y., Waterman, M.S.: Distributional regimes for the number of k-word matches between two random sequences. Proceedings of the National Academy of Sciences of the United States of America 100(13), 13980-13989 (2002).
[7]
Liu, X., Wan, L., Li, J., Reinert, G., Waterman, M.S., Sun, F.: New powerful statistics for alignment-free sequence comparison under a pattern transfer model. Journal of Theoretical Biology 284(1), 106-116 (2011).
[8]
Reinert, G., Chew, D., Sun, F.Z., Waterman, M.S.: Alignment-free sequence comparison (I): Statistics and power. Journal of Computational Biology 16(12), 1615-1634 (2009).
[9]
Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America 106(8), 2677-2682 (2009).
[10]
Vinga, S., Almeida, J.: Alignment-free sequence comparison-a review. Bioinformatics 19(4), 513-523 (2003).
[11]
Wan, L., Reinert, G., Sun, F., Waterman, M.S.: Alignment-free sequence comparison (II): Theoretical power of comparison statistics. Journal of Computational Biology 17(11), 1467-1490 (2010).
[12]
Zhai, Z.Y., Ku, S.Y., Luan, Y.H., Reinert, G., Waterman, M.S., Sun, F.Z.: The power of detecting enriched patterns: An HMM approach. Journal of Computational Biology 17(4), 581-592 (2010).
[13]
Zhang, Z.D., Rozowsky, J., Snyder, M., Chang, J., Gerstein, M.: Modeling ChIP sequencing in silico with applications. PLoS Computational Biology 4(8), e1000158 (2008).
[14]
Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research 38(12), e131 (2010).
[15]
Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology 11, R50 (2010).
[16]
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008).
[17]
Cannon, C.H., Kua, C.S., Zhang, D., Harting, J.R.: Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack. Molecular Ecology 19(suppl. 1), 146-160 (2010).
[18]
Miller, W., Rosenbloom, K., Hardison, R.C., Hou, M., Taylor, J., Raney, B., Burhans, R., King, D.C., Baertsch, R., Blankenberg, D., et al.: 28-way vertebrate alignment and conservation track in the UCSC genome browser. Genome Research 17(12), 1797-1808 (2007).

Index Terms

  1. Alignment-Free sequence comparison based on next generation sequencing reads: extended abstract
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          RECOMB'12: Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
          April 2012
          372 pages
          ISBN:9783642296260
          • Editor:
          • Benny Chor

          Sponsors

          • iSCB: International Society For Computational Biology
          • INB: INB
          • NSF
          • BioMed Central: BioMed Central
          • MICINN: MICINN

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 21 April 2012

          Author Tags

          1. HMM
          2. NGS
          3. normal approximation
          4. statistical power
          5. word count statistics

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 30 Aug 2024

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media