Abstract
We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = 〈a 1, a 2, ..., a n〉 of real numbers, a segment S is a consecutive subsequence 〈a i, a i+1, ..., a j〉. The width of S is j - i + 1, while the density is (∑i≤k≤j a k)/(j - i+1). The maximum-density segment problem takes A and two integers L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. If U = n (or equivalently, U = 2L - 1), we can solve the problem in O(n) time, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao for a general sequence A. Furthermore, if U and L are arbitrary, we solve the problem in O(n + n log(U - L + 1)) time. There has been no nontrivial result for this case previously. Both results also hold for a weighted variant of the maximum-density segment problem.
Supported in part by NSF grant EIA-0112934.
Supported in part by NSC grant NSC-90-2218-E-001-005.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
N. N. Alexandrov and V. V. Solovyev. Statistical significance of ungapped sequence alignments. In Proceedings of Pacific Symposium on Biocomputing, volume 3, pages 461–470, 1998.
G. Barhardi. Isochores and the evolutionary genomics of vertebrates. Gene, 241:3–17, 2000.
J. L. Bentley. Programming Pearls. Addison-Wesley, Reading, MA, 1986.
G. Bernardi and G. Bernardi. Compositional constraints and genome evolution. Journal of Molecular Evolution, 24:1–11, 1986.
B. Charlesworth. Genetic recombination: patterns in the genome. Current Biology, 4:182–184, 1994.
L. Duret, D. Mouchiroud, and C. Gautier. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. Journal of Molecular Evolution, 40:308–371, 1995.
A. Eyre-Walker. Evidence that both G+C rich and G+C poor isochores are replicated early and late in the cell cycle. Nucleic Acids Research, 20:1497–1501, 1992.
A. Eyre-Walker. Recombination and mammalian genome evolution. Proceedings of the Royal Society of London Series B, Biological Science, 252:237–243, 1993.
C. A. Fields and C. A. Soderlund. gm: a practical tool for automating DNA sequence analysis. Computer Applications in the Biosciences, 6:263–270, 1990.
J. Filipski. Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Letters, 217:184–186, 1987.
M. P. Francino and H. Ochman. Isochores result from mutation not selection. Nature, 400:30–31, 1999.
S. M. Fullerton, A. B. Carvalho, and A. G. Clark. Local rates of recombination are positively correlated with GC content in the human genome. Molecular Biology and Evolution, 18(6):1139–1142, 2001.
P. Guldberg, K. Gronbak, A. Aggerholm, A. Platz, P. thor Straten, V. Ahrenkiel, P. Hokland, and J. Zeuthen. Detection of mutations in GC-rich DNA by bisulphite denaturing gradient gel electrophoresis. Nucleic Acids Research, 26(6):1548–1549, 1998.
R. C. Hardison, D. Drane, C. Vandenbergh, J.-F. F. Cheng, J. Mansverger, J. Taddie, S. Schwartz, X. Huang, and W. Miller. Sequence and comparative analysis of the rabbit alpha-like globin gene cluster reveals a rapid mode of evolution in a G+C rich region of mammalian genomes. Journal of Molecular Biology, 222:233–249, 1991.
W. Henke, K. Herdel, K. Jung, D. Schnorr, and S. A. Loening. Betaine improves the PCR amplification of GC-rich DNA sequences. Nucleic Acids Research, 25(19):3957–3958, 1997.
G. P. Holmquist. Chromosome bands, their chromatin flavors, and their functional features. American Journal of Human Genetics, 51:17–37, 1992.
X. Huang. An algorithm for identifying regions of a DNA sequence that satisfy a content requirement. Computer Applications in the Biosciences, 10(3):219–225, 1994.
K. Ikehara, F. Amada, S. Yoshida, Y. Mikata, and A. Tanaka. A possible origin of newly-born bacterial genes: significance of GC-rich nonstop frame on antisense strand. Nucleic Acids Research, 24(21):4249–4255, 1996.
R. B. Inman. A denaturation map of the 1 phage DNA molecule determined by electron microscopy. Journal of Molecular Biology, 18:464–476, 1966.
R. Jin, M.-E. Fernandez-Beros, and R. P. Novick. Why is the initiation nick site of an AT-rich rolling circle plasmid at the tip of a GC-rich cruciform? The EMBO Journal, 16(14):4456–4466, 1997.
Y. L. Lin, T. Jiang, and K. M. Chao. Algorithms for locating the length-constrained heaviest segments, with applications to biomolecular sequence analysis. Journal of Computer and System Sciences, 2002. To appear.
G. Macaya, J.-P. Thiery, and G. Bernardi. An approach to the organization of eukaryotic enomes at a macromolecular level. Journal of Molecular Biology, 108:237–254, 1976.
C. S. Madsen, C. P. Regan, and G. K. Owens. Interaction of CArG elements and GC-rich repressor element in transcriptional regulation of the smooth muscle myosin heavy chain gene in vascular smooth muscle cells. Journal of Biological Chemistry, 272(47):29842–29851, 1997.
S.-i. Murata, P. Herman, and J. R. Lakowicz. Texture analysis of fluorescence lifetime images of AT-and GC-rich regions in nuclei. Journal of Hystochemistry and Cytochemistry, 49:1443–1452, 2001.
A. Nekrutenko and W.-H. Li. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Research, 10:1986–1995, 2000.
P. Rice, I. Longden, and A. Bleasby. EMBOSS: The European molecular biology open software suite. Trends in Genetics, 16(6):276–277, June 2000.
L. Scotto and R. K. Assoian. A GC-rich domain with bifunctional effects on mRNA and protein levels: implications for control of transforming growth factor beta 1 expression. Molecular and Cellular Biology, 13(6):3588–3597, 1993.
P. H. Sellers. Pattern recognition in genetic sequences by mismatch density. Bulletin of Mathematical Biology, 46(4):501–514, 1984.
P. M. Sharp, M. Averof, A. T. Lloyd, G. Matassi, and J. F. Peden. DNA sequence evolution: the sounds of silence. Philosophical Transactions of the Royal Society of London Series B, Biological Sciences, 349:241–247, 1995.
P. Soriano, M. Meunier-Rotival, and G. Bernardi. The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proceedings of the National Academy of Sciences of the United States of America, 80:1816–1820, 1983.
N. Stojanovic, L. Florea, C. Riemer, D. Gumucio, J. Slightom, M. Goodman, W. Miller, and R. Hardison. Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Research, 27:3899–3910, 1999.
N. Sueoka. Directional mutation pressure and neutral molecular evolution. Proceedings of the National Academy of Sciences of the United States of America, 80:1816–1820, 1988.
Z. Wang, E. Lazarov, M. O’Donnel, and M. F. Goodman. Resolving a fidelity paradox: Why Escherichia coli DNA polymerase II makes more base substitution errors in at-compared to GC-rich DNA. Journal of Biological Chemistry, 2002. To appear.
K. H. Wolfe, P. M. Sharp, and W.-H. Li. Mutation rates differ among regions of the mammalian genome. Nature, 337:283–285, 1989.
Y. Wu, R. P. Stulp, P. Elfferich, J. Osinga, C. H. Buys, and R. M. Hofstra. Improved mutation detection in GC-rich DNA fragments by combined DGGE and CDGE. Nucleic Acids Research, 27(15):e9, 1999.
S. Zoubak, O. Clay, and G. Bernardi. The gene distribution of the human genome. Gene, 174:95–102, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goldwasser, M.H., Kao, MY., Lu, HI. (2002). Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_12
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive