The first part of this thesis outlines the details of a computational program to identify genes and their coding regions in human DNA. Our main result is a new algorithm for identifying genes based on comparisons between orthologous human and mouse genes. Using our new technique we are able to improve on the current best gene recognition results. Testing on a collection of 117 genes for which we have human and mouse orthologs, we find that we predict 84% of the coding exons in genes correctly on both ends. Our nucleotide sensitivity and specificity is 95% and 98% respectively.
Most importantly, our algorithms are applicable to large scale annotation problems. The methods are completely scalable. We are able to take into account multiple or incomplete genes in a genomic region, splice sites without the usual GT/AG consensus, as well as genes on either strand. In addition to our algorithmic results, we also detail a number of computational studies relevant to the biological phenomena associated with splicing. We discuss the implications of directionality in splice site detection, statistical characteristics of splice sites and exons, as well as how to apply this information to the gene recognition problem.
The second part of the thesis is devoted to combinatorial problems that originate from domino tiling questions. Our main results are upper and lower bounds for forcing numbers of matchings on square grids, as well as the first combinatorial proof that the number of domino tilings of a 2 n x 2 n square grid is of the form 2 n (2 k + 1) 2 . Our approach to both problems is concrete and combinatorial, relying on the same set of tools and techniques. We also discuss a number of new problems and conjectures. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
Cited By
- Pachter L, Alexandersson M and Cawley S Applications of generalized pair hidden Markov models to alignment and gene finding problems Proceedings of the fifth annual international conference on Computational biology, (241-248)
- Batzoglou S, Pachter L, Mesirov J, Berger B and Lander E Human and mouse gene structure Proceedings of the fourth annual international conference on Computational molecular biology, (46-53)
Recommendations
Transforming men into mice: the Nadeau-Taylor chromosomal breakage model revisited
RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biologyAlthough analysis of genome rearrangements was pioneered by Dobzhansky and Sturtevant 65 years ago, we still know very little about the rearrangement events that produced the existing varieties of genomic architectures. The genomic sequences of human ...