Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/369133.369227acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

Applications of generalized pair hidden Markov models to alignment and gene finding problems

Published: 22 April 2001 Publication History

Abstract

Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding, and describe applications to DNA-cDNA and DNA-protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.

References

[1]
Alexandersson, M. Cawley, S. Pachter, L. (2000) Cross-species gene finding with a generalized pair hidden Markov model. To be published.
[2]
Bafna, V. Huson, D. H. (2000) The Conserved Exon Method for Gene Finding. ISMB-O0: Proceedings of the Eighth International Conference on Intelligent systems for Molecular Biology.
[3]
Batzoglou, S. Pachter, L. Mesirov, J. Berger, B. Lander, E. S. (2000). Comparative Analysis of Mouse and Human DNA and Applications to Exon Prediction. Genome Research 10:7 950-958.
[4]
Burge, C. (1997). Identification of genes in human genomic DNA. PhD thesis, Stanford University, Stanford, CA.
[5]
Burge, C., Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78-94.
[6]
Cawley, S. (2000). Statistical Models for DNA Sequencing and Analysis. Ph.D. Thesis, Department of Statistics, U.C. Berkeley.
[7]
Churchill, G. A. (1989) Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical Biology, 51, 79-94.
[8]
Dayhoff, M. O. Schwartz, R. M. Orcutt, B. C. (1978) A model of evolutionary changes in proteins. In Dayhoff, M. O. ed., Atlas of Protein Sequence and Structure, volume 5, supplement 3. National Biomedical Research Foundation, Washington D. C. 345-352.
[9]
Durbin, R. Eddy, S. Krogh, A. Mitchison, G. (1998). Biological sequence analysis. Cambridge University Press.
[10]
Florea, L. Hartzell, G. Zhang, Z. Rubin, G. M. Miller, W. (1998) A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence. Genome Research, 8, 967-974.
[11]
Gelfand, M. S. Mironov, A. Pevzner, P. A. (1996). Gene recognition via spliced sequence alignment. Proc. Natl. Sci. USA, 93, 9061-9066.
[12]
Gotoh, O. (2000). Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps. Bioinformatics 16 (3), 190-202.
[13]
Gusfield, D. (1997) Algorithms on Strings, Trees and Sequences. Cambridge University Press.
[14]
Henderson, J. Salzberg, S. Fasman, K. (1997). Finding genes in human DNA with a hidden Markov model. Journal of Computational Biology 4 (2), 127-141.
[15]
Jareborg, N. Birney E. Durbin, R. (1999). Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene Pairs. Genome Research, 9 (9), 815-824.
[16]
Kent, W. Zahler, A. (2000). Conservation, Regulation, Synteny, and Introns in a Large-scale C. riggsae-C, elegans Genomic Alignment. Genome Research 10:8 1115-1125.
[17]
Krogh, a. (2000). Using Database Matches with HMMGene for Automated Gene Detection in Drosophila. Genome Research 10:4 523-528.
[18]
Kulp, D. Haussler, D. Reese, M. G. Eeckman, F. H. (1996). A generalized hidden Markov model for the recognition of human genes in DNA. In ISMB-96: Proceedings of the Fourth International Conference on Intelligent systems for Molecular Biology, 134-141.
[19]
Makalowski, W. Zhang, J. Boguski, M. S. (1996). Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Genome Research, 6, 846-857.
[20]
Mironov, A. A. Fickett, J. W., and Gelfand, M. S. (1999). Frequent alternative splicing of human genes. Genome Research, 9, 1288-1293.
[21]
M~ller, T. Vingron, M. (1999) Modeling Amino Acid Replacement. Journal of Computational Biology, to appear.
[22]
Needleman, S. B. Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48,443-453.
[23]
Pachter, L. (1999). Domino tiling, gene recognition, and mice. Ph.D. thesis, Department of Mathematics, Massachusetts Institute of Technology.
[24]
Pachter, L. Batzoglou, S. Spitkovsky, V. I. Banks, E. Lander, E. S. Berger, B. Kleitman, D. J. (1999) A dictionary based approach for gene annotation. Journal of Computational Biology, 6, 419-430.
[25]
Rabiner, L. R. (1989). a tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 (2), 257-286.
[26]
Reese, M. G., Kulp, D., Tammana, H., Haussler, D. Genie - Gene Finding in Drosophila melanogaster. Genome Research, 10:4 529-538.
[27]
Salzberg, S. L. (1998) Decision trees and Markov chains for gene finding. In Computational Methods in Molecular Biology, Salzberg, Searls, Kasif eds. 187-203.
[28]
Searls, D. B. Murphy, K. (1995). Automata-Theoretic Models of Mutation and Alignment. ISMB-95: Proceedings of the Third International Conference on Intelligent systems for Molecular Biology, 341-349.
[29]
Smith, T. F., Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147:195-197.
[30]
Usuka, J. Volker, B. (2000). Gene Structure Prediction by Spliced Alignment of Genomic DNA with Protein Sequences: Increased Accuracy by Differential Splice Site Scoring. Journal of Molecular Biology, 297, no. 5, 1075-1085.
[31]
Wiehe, T. Burset, M. Abril, J. Gebauer-Jung, S. Guigo, R. (1999) Comparative Genomics: at the Crossroads of Evolutionary Biology and Genome Sequence Analysis. Poster at Meeting of the European Society for Molecular Biology and Evolution, Barcelona.
[32]
Wirth, A. (1998). A Plasmodium falciparum genefinder. Honours thesis, Department of Mathematics and Statistics, University of Melbourne.

Cited By

View all
  • (2011)HMM-Based Abnormal Behaviour Detection Using Heterogeneous Sensor NetworkTechnological Innovation for Sustainability10.1007/978-3-642-19170-1_30(277-285)Online publication date: 2011
  • (2009)Speeding Up HMM Decoding and Training by Exploiting Sequence RepetitionsAlgorithmica10.1007/s00453-007-9128-054:3(379-399)Online publication date: 1-May-2009
  • (2006)Behavioral distance measurement using hidden markov modelsProceedings of the 9th international conference on Recent Advances in Intrusion Detection10.1007/11856214_2(19-40)Online publication date: 20-Sep-2006
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RECOMB '01: Proceedings of the fifth annual international conference on Computational biology
April 2001
316 pages
ISBN:1581133537
DOI:10.1145/369133
  • Chairman:
  • Thomas Lengauer
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

RECOMB01
Sponsor:

Acceptance Rates

RECOMB '01 Paper Acceptance Rate 35 of 128 submissions, 27%;
Overall Acceptance Rate 148 of 538 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2011)HMM-Based Abnormal Behaviour Detection Using Heterogeneous Sensor NetworkTechnological Innovation for Sustainability10.1007/978-3-642-19170-1_30(277-285)Online publication date: 2011
  • (2009)Speeding Up HMM Decoding and Training by Exploiting Sequence RepetitionsAlgorithmica10.1007/s00453-007-9128-054:3(379-399)Online publication date: 1-May-2009
  • (2006)Behavioral distance measurement using hidden markov modelsProceedings of the 9th international conference on Recent Advances in Intrusion Detection10.1007/11856214_2(19-40)Online publication date: 20-Sep-2006
  • (2004)Comparison of discrimination methods for peptide classification in tandem mass spectrometryIGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing (IEEE Cat. No.04CH37612)10.1109/CIBCB.2004.1393949(160-167)Online publication date: 2004
  • (2003)SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov ModelGenome Research10.1101/gr.42420313:3(496-502)Online publication date: 12-Feb-2003
  • (2002)Comparative Methods for Gene Structure Prediction in Homologous SequencesAlgorithms in Bioinformatics10.1007/3-540-45784-4_17(220-234)Online publication date: 10-Oct-2002

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media