Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Improved exact enumerative algorithms for the planted (l, d)-motif search problem

Published: 01 March 2014 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper efficient exact algorithms are proposed for the planted (l, d)-motif search problem. This problem is to find all motifs of length l that are planted in each input string with at most d mismatches. The "quorum" version of this problem is also treated in this paper to find motifs planted not in all input strings but in at least q input strings. The proposed algorithms are based on the previous algorithms called qPMSPruneI and qPMS7 that traverse a search tree starting from a l-length substring of an input string. To improve these previous algorithms, several techniques are introduced, which contribute to reducing the computation time for the traversal. In computational experiments, it will be shown that the proposed algorithms outperform the previous algorithms.

    References

    [1]
    M. Frances and A. Litman, "On Covering Problems of Codes," Theory of Computing Systems, vol. 30, no. 2, pp. 113-119, Apr. 1997.
    [2]
    M. Li, B. Ma, and L. Wang, "On the Closest String and Substring Problems," J. ACM, vol. 49, no. 2, pp. 157-171, Mar. 2002.
    [3]
    J. Gramm, R. Niedermeier, and P. Rossmanith, "Fixed-Parameter Algorithms for Closest String Related Problems," Algorithmica, vol. 37, no. 1, pp. 25-42, Sept. 2003.
    [4]
    P.A. Evans and A.D. Smith, "Complexity of Approximating Closest Substring Problems," Proc. Symp. Fundamentals of Computation Theory (FCT '03), pp. 210-221, 2003.
    [5]
    D. Marx, "Closest Substring Problems with Small Distances," SIAM J. Computing, vol. 38, no. 4, pp. 1382-1410, 2008.
    [6]
    J. Wang, J. Chen, and M. Huang, "An Improved Lower Bound on Approximation Algorithms for the Closest Substring Problem," Information Processing Letters, vol. 107, no. 1, pp. 24-28, June 2008.
    [7]
    B. Ma and X. Sun, "More Efficient Algorithms for Closest String and Substring Problems," SIAM J. Computing, vol. 39, no. 4, pp. 1432-1443, 2009.
    [8]
    Z.-Z. Chen and L. Wang, "Fast Exact Algorithms for the Closest String and Substring Problems with Application to the Planted (L,d)- Motif Model," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1400-1410, Sept./Oct. 2011.
    [9]
    P.A. Pevzner and S.-H. Sze, "Combinatorial Approaches to Finding Subtle Signals in DNA Sequences," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), pp. 269-278, Aug. 2000.
    [10]
    S. Liang, "cWINNOWER Algorithm for Finding Fuzzy DNA Motifs," Proc. Computational Systems Bioinformatics Conf. (CSB '03), pp. 260-265, Aug. 2003.
    [11]
    S.-H. Sze, S.L. Lu, and J. Chen, "Integrating Sample-Driven and Pattern-Driven Approaches in Motif Finding," Proc. Fourth Workshop Algorithms in Bioinformatics (WABI '04), pp. 438-449, Sept. 2004.
    [12]
    X. Yang and J.C. Rajapakse, "Graphical Approach to Weak Motif Recognition," Genome Informatics, vol. 15, no. 2, pp. 52-62, Dec. 2004.
    [13]
    H.Q. Sun, M.Y.H. Low, W.J. Hsu, and J.C. Rajapakse, "RecMotif: A Novel Fast Algorithm for Weak Motif Discovery," BMC Bioinformatics, vol. 11, Suppl 11, article S8, 2010.
    [14]
    H.Q. Sun, M.Y.H. Low, W.J. Hsu, and J.C. Rajapakse, "ListMotif: A Time and Memory Efficient Algorithm for Weak Motif Discovery," Proc. IEEE Int'l Conf. Intelligent Systems and Knowledge Eng. (ISKE '10), pp. 254-260, Nov. 2010.
    [15]
    H.Q. Sun, M.Y.H. Low, W.J. Hsu, C.W. Tan, and J.C. Rajapakse, "Tree-Structured Algorithm for Long Weak Motif Discovery," Bioinformatics, vol. 27, no. 19, pp. 2641-2647, Oct. 2011.
    [16]
    M.-F. Sagot, "Spelling Approximate Repeated or Common Motifs Using a Suffix Tree," Proc. Third Latin Am. Theoretical Informatics Symp. (LATIN '98), pp. 374-390, Apr. 1998.
    [17]
    G. Pavesi, G. Mauri, and G. Pesole, "An Algorithm for Finding Signals of Unknown Length in DNA Sequences," Bioinformatics, vol. 17, no. Suppl. 1, pp. S207-S214, June 2001.
    [18]
    E. Eskin and P.A. Pevzner, "Finding Composite Regulatory Patterns in DNA Sequences," Bioinformatics, vol. 18, no. Suppl. 1, pp. S354-S363, July 2002.
    [19]
    P.A. Evans and A.D. Smith, "Toward Optimal Motif Enumeration," Proc. Eighth Int'l Workshop Algorithms and Data Structures (WADS '03), pp. 47-58, July/Aug. 2003.
    [20]
    N. Pisanti, A.M. Carvalho, L. Marsan, and M.-F. Sagot, "RISOTTO: Fast Extraction of Motifs with Mismatches," Proc. Seventh Latin Am. Theoretical Informatics Symp. (LATIN '06), pp. 757-768, Mar. 2006.
    [21]
    F.Y.L. Chin and H.C.M. Leung, "Voting Algorithms for Discovering Long Motifs," Proc. Third Asia-Pacific Bioinformatics Conference (APBC '05), pp. 261-271, Jan. 2005.
    [22]
    S. Rajasekaran, S. Balla, and C.-H. Huang, "Exact Algorithms for Planted Motif Problems," J. Computational Biology, vol. 12, no. 8, pp. 1117-1128, Oct. 2005.
    [23]
    S.-H. Sze and X. Zhao, "Improved Pattern-Driven Algorithms for Motif Finding in DNA Sequences," Proc. Joint Ann. Satellite Conf. Systems Biology and Regulatory Genomics, pp. 198-211, 2006.
    [24]
    J. Davila, S. Balla, and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted Motif Search," Proc. Second Int'l Workshop Bioinformatics Research and Applications (IWBRA '06), pp. 822-829, May 2006.
    [25]
    P.P. Kuksa and V. Pavlovic, "Efficient Motif Finding Algorithms for Large-Alphabet Inputs," BMC Bioinformatics, vol. 11, Suppl. 8, article S1, May 2010.
    [26]
    S. Rajasekaran and H. Dinh, "A Speedup Technique for (l, d) Motif Finding Algorithms," BMC Research Notes, vol. 4, article 54, Mar. 2011.
    [27]
    H. Dinh, S. Rajasekaran, and V.K. Kundeti, "PMS5: An Efficient Exact Algorithm for the (l, d) Motif Finding Problem," BMC Bioinformatics, vol. 12, article 410, Oct. 2011.
    [28]
    S. Bandyopadhyay, S. Sahni, and S Rajasekaran, "PMS6: A Fast Algorithm for Motif Discovery," Proc. IEEE Second Int'l Conf. Computational Advances in Bio and Medical Sciences (ICCABS '12), Feb. 2012.
    [29]
    Q. Yu, H. Huo, Y. Zhang, and H. Guo, "PairMotif: A New Pattern-Driven Algorithm for Planted (l, d) DNA Motif Search," PLoS One, vol. 7, article e48442, Oct. 2012.
    [30]
    J. Davila, S. Balla, and S. Rajasekaran, "Fast and Practical Algorithms for Planted (l, d) Motif Search," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, Oct.-Dec. 2007.
    [31]
    J. Davila, S. Balla, and S. Rajasekaran, "Pampa: An Improved Branch and Bound Algorithm for Planted (l, d) Motif Search," Technical Report BECAT/CSE-TR-07-5 School of Eng., Univ. of Connecticut, http://becat.engr.uconn.edu/becat_technical_reports/BECAT-CSE-TR-07-5.pdf, 2007.
    [32]
    D. Sharma and S. Rajasekaran, "A Simple Algorithm for (l, d) Motif Search," Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '09), pp. 148-154, Mar./Apr. 2009.
    [33]
    H. Dinh, S. Rajasekaran, and J. Davila, "qPMS7: A Fast Algorithm for Finding(l, d)-Motifs in DNA and Protein Sequences," PLoS One, vol. 7, no. 7, article e41425, July 2012.
    [34]
    M.M. Abbas, M. Abouelhoda, and H.M. Bahig, "A Hybrid Method for the Exact Planted (l, d) Motif Finding Problem and Its Parallelization," BMC Bioinformatics, vol. 13, Suppl. 17, article S10, Dec. 2012.

    Cited By

    View all
    • (2024)A new efficient quorum planted (ℓ, d) motif search on ChIP-seq dataset using segmentation to filtration and freezing firefly algorithmsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09236-z28:4(3049-3070)Online publication date: 1-Feb-2024
    • (2018)Randomised sequential and parallel algorithms for efficient quorum planted motif searchInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2017.08645718:2(105-124)Online publication date: 23-Dec-2018
    • (2018)Parallel implementation of quorum planted (, d) motif search on multi-core/many-core platformsMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00846:PB(255-263)Online publication date: 28-Dec-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
    IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 2
    March/April 2014
    160 pages

    Publisher

    IEEE Computer Society Press

    Washington, DC, United States

    Publication History

    Published: 01 March 2014
    Accepted: 07 February 2014
    Revised: 06 January 2014
    Received: 27 May 2013
    Published in TCBB Volume 11, Issue 2

    Author Tags

    1. closest substring problem
    2. d)-motif search problem
    3. exact enumerative algorithm
    4. planted (l
    5. tree search

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A new efficient quorum planted (ℓ, d) motif search on ChIP-seq dataset using segmentation to filtration and freezing firefly algorithmsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09236-z28:4(3049-3070)Online publication date: 1-Feb-2024
    • (2018)Randomised sequential and parallel algorithms for efficient quorum planted motif searchInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2017.08645718:2(105-124)Online publication date: 23-Dec-2018
    • (2018)Parallel implementation of quorum planted (, d) motif search on multi-core/many-core platformsMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00846:PB(255-263)Online publication date: 28-Dec-2018
    • (2016)Mining Contiguous Sequential Generators in Biological SequencesIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2015.249513213:5(855-867)Online publication date: 1-Sep-2016

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media