Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Efficient Algorithms for the Computational Design of Optimal Tiling Arrays

Published: 01 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    The representation of a genome by oligonucleotide probes is a prerequisite for the analysis of many of its basic properties, such as transcription factor binding sites, chromosomal breakpoints, gene expression of known genes and detection of novel genes, in particular those coding for small RNAs. An ideal representation would consist of a high density set of oligonucleotides with similar melting temperatures that do not cross-hybridize with other regions of the genome and are equidistantly spaced. The implementation of such design is typically called a tiling array or genome array. We formulate the minimal cost tiling path problem for the selection of oligonucleotides from a set of candidates. Computing the selection of probes requires multi-criterion optimization, which we cast into a shortest path problem. Standard algorithms running in linear time allow us to compute globally optimal tiling paths from millions of candidate oligonucleotides on a standard desktop computer for most problem variants. The solutions to this multi-criterion optimization are spatially adaptive to the problem instance. Our formulation incorporates experimental constraints with respect to specific regions of interest and trade offs between hybridization parameters, probe quality and tiling density easily.

    References

    [1]
    M. J. Buck and J. D. Lieb, "Chip-Chip: Considerations for the Design, Analysis, and Application of Genome-Wide Chromatin Immunoprecipitation Experiments," Genomics, vol. 83, no. 3, pp. 349-360, Mar. 2004.
    [2]
    M. Barrett et al., "Comparative Genomic Hybridization Using Oligonucleotide Microarrays and Total Genomic DNA," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 51, pp. 17765- 17770, 2004.
    [3]
    J. Mattick, "The Functional Genomics of Noncoding RNA," Science, pp. 1527-1528, 2005.
    [4]
    E. Birney et al., "Identification and Analysis of Functional Elements in 1 Percent of the Human Genome by the ENCODE Pilot Project," Nature, vol. 447, no. 7146, pp. 799-816, 2007.
    [5]
    G. Yuan, Y. Liu, M. Dion, M. Slack, L. Wu, S. Altschuler, and O. Rando, "Genome-Scale Identification of Nucleosome Positions in S. cerevisiae," Science, pp. 626-630, 2005.
    [6]
    P. Sabo et al., "Genome-Scale Mapping of DNase I Sensitivity In Vivo Using Tiling DNA Microarrays," Nature Methods, vol. 3, pp. 511-518, 2006.
    [7]
    D. Okou, K. Steinberg, C. Middle, D. Cutler, T. Albert, and M. Zwick, "Microarray-Based Genomic Selection for High-Throughput Resequencing," Nature Methods, vol. 4, pp. 907-909, 2007.
    [8]
    T. Albert et al., "Direct Selection of Human Genomic Loci by Microarray Hybridization," Nature Methods, vol. 4, pp. 903-905, 2007.
    [9]
    T. Mockler and J. Ecker, "Applications of DNA Tiling Arrays for Whole-Genome Analysis," Genomics, vol. 85, no. 1, pp. 1-15, 2005.
    [10]
    K. Breslauer, R. Frank, H. Blocker, and L. Marky, "Predicting DNA Duplex Stability from the Base Sequence," Proc. Nat'l Academy of Sciences USA, vol. 83, no. 11, pp. 3746-3750, 1986.
    [11]
    M. D. Kane, T. A. Jatkoe, C. R. Stumpf, J. Lu, J. D. Thomas, and S.J. Madore, "Assessment of the Sensitivity and Specificity of Oligonucleotide (50mer) Microarrays," Nucleic Acids Research, vol. 28, no. 22, pp. 4552-4557, Nov. 2000.
    [12]
    D. W. Selinger, K. J. Cheung, R. Mei, E. M. Johansson, C. S. Richmond, F. R. Blattner, D. J. Lockhart, and G. M. Church, "RNA Expression Analysis Using a 30 Base Pair Resolution Escherichia coli Genome Array," Nature Biotechnology, vol. 18, no. 12, pp. 1262-1268, Dec. 2000.
    [13]
    P. Kapranov, S. Cawley, J. Drenkow, S. Bekiranov, R. Strausberg, S. Fodor, and T. Gingeras, "Large-Scale Transcriptional Activity in Chromosomes 21 and 22," Science, vol. 296, no. 5569, p. 916, 2002.
    [14]
    P. Kapranov et al., "RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription," Science, vol. 316, no. 5830, p. 1484, 2007.
    [15]
    D. Lipson, Z. Yakhini, and Y. Aumann, "Optimization of Probe Coverage for High-Resolution Oligonucleotide ACGH," Bioinformatics , vol. 23, no. 2, pp. 77-83, Jan. 2007.
    [16]
    E. Prak and H. Kazazian Jr., "Mobile Elements and the Human Genome," Nature Rev. Genetics, vol. 1, no. 2, pp. 134-144, 2000.
    [17]
    A. Smit, R. Hubley, and P. Green, RepeatMasker Open-3.0. Inst. for Systems Biology, http://www.repeatmasker.org, 2004.
    [18]
    E. Ryder, R. Jackson, A. Ferguson-Smith, and S. Russell, MAMMOT--a Set of Tools for the Design, Management and Visualization of Genomic Tiling Arrays, pp. 883-884, 2006.
    [19]
    E. Lander et al., "Initial Sequencing and Analysis of the Human Genome," Nature, vol. 409, no. 6822, pp. 860-921, 2001.
    [20]
    R. Brosch et al., "Genome Plasticity of BCG and Impact on Vaccine Efficacy," Proc. Nat'l Academy of Sciences USA, vol. 104, no. 13, p. 5596, 2007.
    [21]
    P. Bertone, V. Trifonov, J. S. Rozowsky, F. Schubert, O. Emanuelsson, J. Karro, M. Y. Kao, M. Snyder, and M. Gerstein, "Design Optimization Methods for Genomic DNA Tiling Arrays," Genome Research, vol. 16, no. 2, pp. 271-281, Feb. 2006.
    [22]
    S. Gräf, F. G. Nielsen, S. Kurtz, M. A. Huynen, E. Birney, H. Stunnenberg, and P. Flicek, "Optimized Design and Assessment of Whole Genome Tiling Arrays," Bioinformatics, vol. 23, no. 13, pp. 195-204, July 2007.
    [23]
    A. Schliep, D. Torney, and S. Rahmann, "Group Testing with DNA Chips: Generating Designs and Decoding Experiments," Proc. Second IEEE CS Bioinformatics Conf. (CSB '03), pp. 84-93, 2003.
    [24]
    O. Shai, Q. Morris, B. Blencowe, and B. Frey, "Inferring Global Levels of Alternative Splicing Isoforms Using a Generative Model of Microarray Data," Bioinformatics, vol. 22, no. 5, p. 606, 2006.
    [25]
    Z. Galil and K. Park, "Dynamic Programming with Convexity, Concavity, and Sparsity," Theoretical Computer Science, vol. 92, no. 1, pp. 49-76, citeseer.ist.psu.edu/galil92dynamic.html, 1992.
    [26]
    R. E. Burkard, B. Klinz, and R. Rudolf, "Perspectives of Monge Properties in Optimization," Discrete Applied Math., vol. 70, no. 2, pp. 95-161, 1996.
    [27]
    R. Wilber, "The Concave Least-Weight Subsequence Problem Revisited," J. Algorithms, vol. 9, no. 3, pp. 418-425, 1988.
    [28]
    D. Eppstein, "Sequence Comparison with Mixed Convex and Concave Costs," J. Algorithms, vol. 11, no. 1, pp. 85-101, 1990.
    [29]
    Z. Galil, K. Park, "A Linear-Time Algorithm for Concave One-Dimensional Dynamic Programming," Information Processing Letters, vol. 33, no. 6, pp. 309-311, 1990.
    [30]
    A. Aggarwal, M. Klawe, S. Moran, P. Shor, and R. Wilber, "Geometric Applications of a Matrix Searching Algorithm," Proc. Second Ann. Symp. Computational Geometry (SCG '86), pp. 285-292, 1986.
    [31]
    A. Aggarwal, B. Schieber, and T. Tokuyama, "Finding a Minimum Weight K-Link Path in Graphs with Monge Property and Applications," Proc. Ninth Ann. Symp. Computational Geometry (SCG '93), pp. 189-197, 1993.
    [32]
    E. W. Dijkstra, "A Note on Two Problems in Connexion with Graphs," Numerische Mathematik, vol. 1, Mathematisch Centrum, pp. 269-271, 1959.
    [33]
    M. L. Fredman and R. E. Tarjan, "Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms," J. ACM, vol. 34, no. 3, pp. 596-615, 1987.
    [34]
    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, second ed. MIT Press, Sept. 2001.
    [35]
    Leda, http://www.algorithmic-solutions.com/, 2008.
    [36]
    A. Schliep, "The Software GADAR and Its Application to Extremal Graph Theory," Proc. 25th Southeastern Int'l Conf. Combinatorics, Graph Theory and Computing, vol. 104, pp. 193-203, 1994.
    [37]
    S. Kurtz, "The Vmatch Large Scale Sequence Analysis Software," Ref Type: Computer Program, pp. 4-12, 2003.
    [38]
    S. Bienert, "Flexible Combination of Filters for Oligodesign," Diplomathesis, Center for Bioinformatics, Universität Hamburg, 2006.
    [39]
    M. Abouelhoda, S. Kurtz, and E. Ohlebusch, "Replacing Suffix Trees with Enhanced Suffix Arrays," J. Discrete Algorithms, vol. 2, no. 1, pp. 53-86, 2004.
    [40]
    Z. Ning, A. J. Cox, and J. C. Mullikin, "SSAHA: A Fast Search Method for Large DNA Databases," Genome Research, vol. 11, no. 10, pp. 1725-1729, Oct. 2001.
    [41]
    P. Rice, I. Longden, and A. Bleasby, "EMBOSS: The European Molecular Biology Open Software Suite," Trends in Genetics, vol. 16, no. 6, pp. 276-277, 2000.
    [42]
    C. Garnis, W. Lockwood, E. Vucic, Y. Ge, L. Girard, J. Minna, A. Gazdar, S. Lam, C. MacAulay, and W. Lam, "High Resolution Analysis of Non-Small Cell Lung Cancer Cell Lines by Whole Genome Tiling Path Array CGH," Int'l J. Cancer, vol. 118, pp. 1556- 1564, 2006.
    [43]
    A. Schliep and R. Krause, "Efficient Computational Design of Tiling Arrays Using a Shortest Path Approach," Lecture Notes in Computer Science, vol. 4645, p. 383, 2007.
    [44]
    Boost c++ Libraries, http://www.boost.org/, 2008.
    [45]
    Automatically Tuned Linear Algebra Software (ATLAS), http:// math-atlas.sourceforge.net/, 2008.
    [46]
    Python, http://www.python.org/, 2008.
    [47]
    Numpy, http://numpy.scipy.org/, 2008.
    [48]
    A. Pozhitkov, P.A. Noble, T. Domazet-Loso, A. W. Nolte, R. Sonnenberg, P. Staehler, M. Beier, and D. Tautz, "Tests of rRNA Hybridization to Microarrays Suggest that Hybridization Characteristics of Oligonucleotide Probes for Species Discrimination Cannot Be Predicted," Nucleic Acids Research, vol. 34, no. 9, 2006.
    [49]
    Z. He, L. Wu, X. Li, M. Fields, and J. Zhou, "Empirical Establishment of Oligonucleotide Probe Design Criteria," Applied and Environmental Microbiology, vol. 71, no. 7, pp. 3753-3760, 2004.
    [50]
    O. Matveeva, S. Shabalina, V. Nemtsov, A. Tsodikov, R. Gesteland, J. Atkins, and O. Journals, "Thermodynamic Calculations and Statistical Correlations for Oligo-Probes Design," Nucleic Acids Research, vol. 31, no. 14, pp. 4211-4217, 2003.
    [51]
    S. Cole et al., "Deciphering the Biology of Mycobacterium Tuberculosis from the Complete Genome Sequence," Nature, vol. 393, pp. 537-544, 1998.
    [52]
    J. SantaLucia Jr., H. Allawi, and P. Seneviratne, "Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability," Biochemistry, vol. 35, no. 11, pp. 3555-3562, 1996.

    Cited By

    View all
    • (2011)Selecting Oligonucleotide Probes for Whole-Genome Tiling Arrays with a Cross-Hybridization PotentialIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2011.398:6(1642-1652)Online publication date: 1-Nov-2011

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
    IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 4
    October 2008
    158 pages

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 01 October 2008
    Published in TCBB Volume 5, Issue 4

    Author Tags

    1. Biology and genetics
    2. Graph Theory

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)Selecting Oligonucleotide Probes for Whole-Genome Tiling Arrays with a Cross-Hybridization PotentialIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2011.398:6(1642-1652)Online publication date: 1-Nov-2011

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media