Abstract
Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.
Similar content being viewed by others
References
Malarvizhi, S.P., Sathiyabhama, B.: Frequent pagesets from web log by enhanced weighted association rule mining. Clust. Comput. 19(1), 269–277 (2016)
Ding, B., Lo, D., Han, J., et al.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th International Conference on Data Engineering, pp. 1024–1035 (2009)
Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)
Tan, C., Min, F., Wang, M., et al.: Discovering patterns with weak-wildcard gaps. IEEE Access 4, 4922–4932 (2016)
Feng, Y., Ji, M., Xiao, J., et al.: Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cybern. 45(12), 2693–2706 (2015)
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)
Wu, Y., Wang, L., Ren, J., et al.: Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41(1), 99–116 (2014)
Chou, C., Jea, K., Liao, H.: A syntactic approach to twig-query matching on XML streams. J. Syst. Softw. 84(6), 993–1007 (2011)
Cole, J., Chai, B., Farris, R., et al.: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33(suppl_1), D294–D296 (2005)
Li, C., Yang, Q., Wang, J., et al.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2 (2012)
Ghosh, S., Feng, M., Nguyen, H., et al.: Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns. In: AMIA Annual Symposium Proceedings, pp. 1748–1757. American Medical Informatics Association (2014)
Drory Retwitzer, M., Polishchuk, M., Churkin, E., et al.: RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res. 43(W1), W507–W512 (2015)
Wang, X., Duan, L., Dong, G., et al.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International Conference on Database Systems for Advanced Applications, pp. 372–387. Springer, Cham (2014)
Yang, H., Duan, L., Hu, B., et al.: Mining Top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015). (in Chinese)
Wang, H., Duan, L., Zuo, J., et al.: Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin. J. Comput. 39(10), 1979–1991 (2016). (in Chinese)
Wu, Y., Tong, Y., Zhu, X., et al.: NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern. (2017). https://doi.org/10.1109/TCYB.2017.2750691
Min, F., Wu, Y., Wu, X.: The Apriori property of sequence pattern mining with wildcard gaps. Int. J. Funct. Inform. Pers. Med. 4(1), 15–31 (2012)
Zhang, M., Kao, B., Cheung, D., et al.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)
Zhang, L., Luo, P., Tang, L., et al.: Occupancy-based frequent pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 10(2), 14 (2015)
Wu, Y., Liu, D., Jiang, H.: Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 32(3), 630–643 (2017)
Egho, E., Gay, D., Boulle, M., et al.: A parameter-free approach for mining robust sequential classification rules. Knowl. Inf. Syst. 52(1), 53–81 (2017)
Wu, Y., Shen, C., Jiang, H., et al.: Strict pattern matching under non-overlapping condition. Sci. China Inf. Sci. 60(1), 012101 (2017)
Yen, S., Lee, Y.: Mining non-redundant time-gap sequential patterns. Appl. Intell. 39(4), 727–738 (2013)
Wu, Y., Wu, X., Min, F., et al.: A Nettree for pattern matching with flexible wildcard constraints. In: International Conference on Information Reuse and Integration, pp. 109–114 (2010)
Wu, Y., Tang, Z., Jiang, H., et al.: Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)
Wu, Y., Fu, S., Jiang, H., et al.: Strict approximate pattern matching with general gaps. Appl. Intell. 42(3), 566–580 (2015)
Fradkin, D., Mörchen, F.: Mining sequential patterns for classification. Knowl. Inf. Syst. 45(3), 731–749 (2015)
Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)
Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016)
Acknowledgements
The work was supported in part by the National Natural Science Foundation of China under Grant 61673159, in part by the Natural Science Foundation of Hebei Province under Grant F2016202145, in part by the Science and the Technology Project of Hebei Province under Grant 15210325, and in part by the Graduate Student Innovation Program of Hebei Province under Grant CXZZSS2017037.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, Y., Wang, Y., Liu, J. et al. Mining distinguishing subsequence patterns with nonoverlapping condition. Cluster Comput 22 (Suppl 3), 5905–5917 (2019). https://doi.org/10.1007/s10586-017-1671-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1671-0