Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

In this paper, we consider the problem of enumerating all maximal motifs in an input string for the class of repeated motifs with wild cards. A maximal motif is such a representative motif that is not properly contained in any larger motifs with the same location lists. Although the enumeration problem for maximal motifs with wild cards has been studied in Parida et al. (2001), Pisanti et al. (2003) and Pelfrêne et al. (2003), its output-polynomial time computability has been still open. The main result of this paper is a polynomial space polynomial delay algorithm for the maximal motif enumeration problem for the repeated motifs with wild cards. This algorithm enumerates all maximal motifs in an input string of length n in O(n 3) time per motif with O(n) space, in particular O(n 3) delay. The key of the algorithm is depth-first search on a tree-shaped search route over all maximal motifs based on a technique called prefix-preserving closure extension. We also show an exponential lower bound and a succinctness result on the number of maximal motifs, which indicate the limit of a straightforward approach. The results of the computational experiments show that our algorithm can be applicable to huge string data such as genome data in practice, and does not take large additional computational cost compared to usual frequent motif mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Apostolico A, Comin M 2, Parida L (2005) Conservative extraction of over-represented extensible motifs. ISMB (Supplement of Bioinformatics) 21:9–18

    Article  Google Scholar 

  • Apostolico A, Parida L (2003) Compression and the wheel of fortune. In: Proceedings of the 2003 data compression conference (DCC'03), IEEE

  • Arimura H, Uno T (2005) An output-polynomial time algorithm for mining frequent closed attribute trees. In: Proceedings of the ILP'05, LNAI 3625, pp 1–19

  • Arimura H, Shinohara T, Otsuki S (1994) Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: STACS'94, LNCS 775, Springer-Verlag, pp 649–660

  • Boros E, Gurvich V, Khachiyan L, Makino K (2002) The complexity of generating maximal frequent and minimal infrequent sets. In: Proceedings of the STACS '02, LNCS, pp 133–141

  • Crochemore M, Rytter W (2002) Jewels of stringology. World Scientific

  • Goldberg LA (1993) Polynomial space polynomial delay algorithms for listing families of graphs. In: Proceedings of the 25th STOC, ACM, pp 218–225

  • Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge

  • Parida L, Rigoutsos I, Floratos A, Platt D, Gao Y (2000) Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and effcient polynomial time algorithm. In: Proceedings of the 11th SIAM symposium on discrete algorithms (SODA'00), pp 297–308

  • Parida L, Rigoutsos I, Platt DE (2001) An output-sensitive flexible pattern discovery algorithm. In: Proceedings of the CPM'01, LNCS 2089, pp 131–142

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the ICDT'99, pp 398–416

  • Pelfrêne J, Abdeddaim S, Alexandre J (2003) Extending approximate patterns. In: Proceeding of the CPM'03, LNCS 2676, pp 328–347

  • Pisanti N, Crochemore M, Grossi R, Sagot M-F (2003) A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum, In: Proceedings of the MFCS'03, LNCS 2747, pp 622–631

  • Pisanti N, Crochemore M, Grossi R, Sagot M-F (2004) A comparative study of bases for motif inference. String algorithmics. KCL publications

  • Uno T (2003) Two general methods to reduce delay and change of enumeration algorithms, NII Technical Report, NII-2003-004E, April 2003

  • Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: Proceedings of the DS'04, LNAI 3245, pp 16–30

  • Valiant LG (1979) The complexity of computing the permanent. Theor Comput Sci 8:189–201

    Article  MATH  MathSciNet  Google Scholar 

  • Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceedings of the SIGKDD'03

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takeaki Uno.

Additional information

This work is done during the Hiroki Arimura’s visit in LIRIS, University Claude-Bernard Lyon 1, France.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arimura, H., Uno, T. An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence. J Comb Optim 13, 243–262 (2007). https://doi.org/10.1007/s10878-006-9029-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-006-9029-1

Keywords