Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/974614.974639acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

Multiple organism gene finding by collapsed gibbs sampling

Published: 27 March 2004 Publication History

Abstract

The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then numerous variants of the original idea have emerged, however in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8Mb of sequence in each organism. We show that our approach compares favorably with existing ab-initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as 4 organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.

References

[1]
Alexandersson, M., Cawley, S., Pachter, L. (2003). SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Research, 13, 496--502.
[2]
Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K. D., Ovcharenko, I., Pachter, L., Rubin, E.M. (2003). Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science, 299, 1391--4.
[3]
Burge, C., Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268,78--94.
[4]
Cawley, S., Pachter, L. (2003). HMM sampling and applications to gene finding and alternative splicing. Bioinformatics supplement for the European Conference on Computational Biology 2003, 36--41.
[5]
Kent W. J. (2002) BLAT-the BLAST-like alignment tool. Genome Research, 12, 656--64.
[6]
Durbin, R., Eddy, S., Krogh, A., Mitchison, G. (1998). Biological Sequence Analysis, Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
[7]
Göttgens, B., Barton, L. M., Chapman, M. A., Sinclair, A. M., Knudsen, B., Grafham, D., Gilbert, J. G. R, Rogers, J., Bentley, D. R., Green, R. (2002). Transcriptional Regulation of the Stem Cell Leukemia Gene (SCL) - Comparative Analysis of Five Vertebrate SCL Loci. Genome Research, 12, 749--759.
[8]
Korf, I., Flicek, P., Duan, D., Brent, M. R. (2001). Integrating genomic homology into gene structure prediction. Bioinformatics, 17, S140--8.
[9]
Kulp, D., Haussler, D., Reese, M. G., Eeckman, F. H. (1996). A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol, 4, 134--42.
[10]
Lawrence, C.E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., Wootton, J. C. (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 208--14.
[11]
Liu, J. S. (1994). The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association, 89, 958--966.
[12]
Liu, J. S., Neuwald, A. F., Lawrence, C. E. (1995) Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies. Journal of the American Statsitical Association, 90, 1156--1170.
[13]
Parra G., Agarawal, P., Abril J.F., Wiehe, T., Fickett, J.W., Guig'o, R. (2003) Comparative Gene Prediction in Human and Mouse Genome Research, 13, 108--117.
[14]
Rat Genome Sequencing Consortium (2004). Evolution of the Mammalian Genome: Sequence of the Genome of the Brown Norway Rat. Nature, submitted.
[15]
Tanner, M.A., Wong, W. H., (1987). The Calculation of Posterior Distributions by Data Augmentation. Journal of the American Statistical Association, 82, 528--550.
[16]
Thomas J.W. et al. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. em Nature 424 788--793.
[17]
Waterston, R. H. et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520--62.

Cited By

View all
  • (2016)Improved Automatic Keyword Extraction Given More Semantic KnowledgeDatabase Systems for Advanced Applications10.1007/978-3-319-32055-7_10(112-125)Online publication date: 12-Apr-2016
  • (2010)Term weighting schemes for Latent Dirichlet AllocationHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858069(465-473)Online publication date: 2-Jun-2010
  • (2006)Reference based annotation with GeneMapperGenome Biology10.1186/gb-2006-7-4-r297:4Online publication date: 5-Apr-2006

Index Terms

  1. Multiple organism gene finding by collapsed gibbs sampling

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology
        March 2004
        370 pages
        ISBN:1581137559
        DOI:10.1145/974614
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 March 2004

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. gene finding
        2. gibbs sampling
        3. hidden markov model

        Qualifiers

        • Article

        Conference

        RECOMB04
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 148 of 538 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 10 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2016)Improved Automatic Keyword Extraction Given More Semantic KnowledgeDatabase Systems for Advanced Applications10.1007/978-3-319-32055-7_10(112-125)Online publication date: 12-Apr-2016
        • (2010)Term weighting schemes for Latent Dirichlet AllocationHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858069(465-473)Online publication date: 2-Jun-2010
        • (2006)Reference based annotation with GeneMapperGenome Biology10.1186/gb-2006-7-4-r297:4Online publication date: 5-Apr-2006

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media