Abstract
The increasing number of sequenced genomes motivates the use of evolutionary patterns to detect genes. We present a series of comparative methods for gene finding in homologous prokaryotic or eukaryotic sequences. Based on a model of legal genes and a similarity measure between genes, we find the pair of legal genes of maximum similarity. We develop methods based on genes models and alignment based similarity measures of increasing complexity, which take into account many details of real gene structures, e.g. the similarity of the proteins encoded by the exons. When using a similarity measure based on an exiting alignment, the methods run in linear time. When integrating the alignment and prediction process which allows for more fine grained similarity measures, the methods run in quadratic time. We evaluate the methods in a series of experiments on synthetic and real sequence data, which show that all methods are competitive but that taking the similarity of the encoded proteins into account really boost the performance.
Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).
Bioinformatics Research Center (BiRC), www.birc.dk, funded by Aarhus University Research Fundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
V. Bafna and D. H. Huson. The conserved exon method for gene finding. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 3–12, 2000.
S. Batzolou, L. Pachter, J. P. Mesirov, B. Berger, and E. S. Lander. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Research, 10:950–958, 2000.
P. Blayo, P. Rouzé, and M.-F. Sagot. Orphan gene finding-an exon assembly approach. Unpublished manuscript, 1999.
S. Brunak, J. Engelbrecht, and S. Knudsen. Prediction of human mRNA donor and acceptor sites from the DNA sequence. Journal of Molecular Biology, 220:49–65, 1991.
C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, (268):78–94, 1997.
M. Burset and R. Guigó. Evaluation of gene structure prediction programs. Genomics, 34:353–367, 1996.
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, chapter 1–6. Cambridge University Press, 1998.
M. S. Gelfand, A. A. Mironov, and P. A. Pevzner. Gene recognition via spliced sequence alignment. Proceedings of the National Academy of Science of the USA, 93:9061–9066, 1996.
O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162:705–708, 1982.
J. Hein. An algorithm combining DNA and protein alignment. Journal of Theoretical Biology, 167:169–174, 1994.
J. Hein and J. Støvlbæk. Combined DNA and protein alignment. In Methods in Enzymology, volume 266, pages 402–418. Academic Press, 1996.
D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communication of the ACM, 18(6):341–343, 1975.
I. Korf, P. Flicek, D. Duan, and M. R. Brent. Integrating genomic homology into gene structure prediction. Bioinformatics, 17:140–148, 2001.
A. Krogh. A hidden Markov model that finds genes in e. coli DNA. Nucleic Acids Research, 22:4768–4778, 1994.
L. Milanesi and I. Rogozin. Prediction of human gene structure. In Guide to Human Genome Computing, chapter 10. Academic Press Limited, 2nd edition, 1998.
L. Pachter, M. Alexandersson, and S. Cawley. Applications of generalized pair hidden Markov models to alignment and gene finding problems. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB), pages 241–248, 2001.
C. N. S. Pedersen and T. Scharling. Comparative methods for gene structure prediction in homologous sequences. Technical Report RS-02-29, BRICS, June 2002.
J. S. Pedersen and J. Hein. Gene finding with hidden Markov model of genome structure and evolution. Unpublished manuscript, submitted to Bioinformatics.
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195–197, 1981.
Z. Yang. Phylogenetic Analysis by Maximum Likelihood (PAML). University College London, 3.0 edition, may 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pedersen, C.N., Scharling, T. (2002). Comparative Methods for Gene Structure Prediction in Homologous Sequences. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_17
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive