Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

NTP-Miner: Nonoverlapping Three-Way Sequential Pattern Mining

Published: 22 October 2021 Publication History

Abstract

Nonoverlapping sequential pattern mining is an important type of sequential pattern mining (SPM) with gap constraints, which not only can reveal interesting patterns to users but also can effectively reduce the search space using the Apriori (anti-monotonicity) property. However, the existing algorithms do not focus on attributes of interest to users, meaning that existing methods may discover many frequent patterns that are redundant. To solve this problem, this article proposes a task called nonoverlapping three-way sequential pattern (NTP) mining, where attributes are categorized according to three levels of interest: strong, medium, and weak interest. NTP mining can effectively avoid mining redundant patterns since the NTPs are composed of strong and medium interest items. Moreover, NTPs can avoid serious deviations (the occurrence is significantly different from its pattern) since gap constraints cannot match with strong interest patterns. To mine NTPs, an effective algorithm is put forward, called NTP-Miner, which applies two main steps: support (frequency occurrence) calculation and candidate pattern generation. To calculate the support of an NTP, depth-first and backtracking strategies are adopted, which do not require creating a whole Nettree structure, meaning that many redundant nodes and parent–child relationships do not need to be created. Hence, time and space efficiency is improved. To generate candidate patterns while reducing their number, NTP-Miner employs a pattern join strategy and only mines patterns of strong and medium interest. Experimental results on stock market and protein datasets show that NTP-Miner not only is more efficient than other competitive approaches but can also help users find more valuable patterns. More importantly, NTP mining has achieved better performance than other competitive methods in clustering tasks. Algorithms and data are available at: https://github.com/wuc567/Pattern-Mining/tree/master/NTP-Miner.

References

[1]
Wensheng Gan, J. C.-W. Lin, P. Fournier-Viger, H.-C. Chao, and P. S. Yu. 2020. Huopm: High-utility occupancy pattern mining. IEEE Transaction on Cybernetics 50, 3 (2020), 1195–1208.
[2]
Wensheng Gan, J. C.-W. Lin, P. Fournier-Viger, H. C. Chao, and P. S. Yu. 2019. A survey of parallel sequential pattern mining. ACM Transactions on Knowledge Discovery from Data 13, 3 (2019), 1–34.
[3]
Tingting Wang, L. Duan, G. Dong, and Z. Bao. 2020. Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Transaction on Knowledge Discovery from Data 14, 5 (2020), 1–26.
[4]
Philippe Fournier-Viger, J. C-W. Lin, T. Truong-Chi, and R. Nkambou. 2019. A survey of high utility itemset mining. High-Utility Pattern Mining. Studies in Big Data 51, 1 (2019), 1–45.
[5]
Philippe Fournier-Viger, P. Yang, R. U. Kiran, S. Ventura, and J. M. Luna. 2021. Mining local periodic patterns in a discrete sequence. Information Sciences 544 (2021), 519–548.
[6]
Fan Min, Z.-H. Zhang, W.-J. Zhai, and R.-P. Shen. 2020. Frequent pattern discovery with tri-partition alphabets. Information Sciences 507 (2020), 715–732.
[7]
Youxi Wu, Y. Tong, X. Zhu, and X. Wu. 2018. Nosep: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern. 48, 10 (2018), 2809–2822.
[8]
Philippe Fournier-Viger, J. Li, J. C.-W. Lin, T. T. Chi, and R. U. Kiran. 2020. Mining cost-effective patterns in event logs. Knowledge-Based Systems 191 (2020), 105241.
[9]
Zhaoyu Shou, Y. Wang, Y. Wen, and H. Zhang. 2020. Knowledge point recommendation algorithm based on enhanced correction factor and weighted sequential pattern mining. International Journal of Performability Engineering 16, 4 (2020), 549–559.
[10]
Xiangjun Dong, P. Qiu, J. Lu, L. Cao, and T. Xu. 2019. Mining top- \(k\) useful negative sequential patterns via learning. IEEE Transactions on Neural Networks and Learning Systems 30, 9 (2019), 2764–2778.
[11]
Johannes D. Smedt, G. Deeva, and J. D. Weerdt. 2020. Mining behavioral sequence constraints for classification. IEEE Transactions on Knowledge and Data Engineering 32, 6 (2020), 1130–1142.
[12]
Kui Yu, W. Ding, D. A. Simovici, H. Wang, J. Pei, and X. Wu. 2015. Classification with streaming features: An emerging-pattern mining approach. ACM Transaction on Knowledge Discovery from Data 9, 4 (2015), 1–31.
[13]
Udi Manber and R. Baeza-Yates. 1991. An algorithm for string matching with a sequence of don’t cares. Information Processing Letters 37, 3 (1919), 133–136.
[14]
Chao Gao, L. Duan, G. Dong, H. Zhang, H. Yang, and C. Tang. 2016. Mining top-k distinguishing sequential patterns with flexible gap constraints. In Proceedings of the International Conference on Web-Age Information Management, Springer International Publishing. 82–94.
[15]
Xindong Wu, J. Qiang, and F. Xie. 2014. Pattern matching with flexible wildcards. Journal of Computer Science and Technology 29, 5 (2014), 740–750.
[16]
Youxi Wu, L. Wang, J. Ren, W. Ding, and X. Wu. 2014. Mining sequential patterns with periodic wildcard gaps. Applied Intelligence 41, 1 (2014), 99–116.
[17]
Youxi Wu, M. Geng, Y. Li, L. Guo, Z. Li, P. Fournier-Viger, X. Zhu, and X. Wu. 2021. HANP-Miner: High average utility nonoverlapping sequential pattern mining. Knowledge-Based Systems. 229 (2021), 107361.
[18]
Youxi Wu, X. Liu, W. Yan, L. Guo, and X. Wu. 2021. Efficient solving algorithm for strict pattern matching under nonoverlapping condition. Journal of Software. DOI:https://doi.org/10.13328/j.cnki.jos.006054
[19]
Youxi Wu, C. Zhu, Y. Li, L. Guo, and X. Wu. 2020. Netncsp: Nonoverlapping closed sequential pattern mining. Knowledge-Based Systems 196 (2020), 105812.
[20]
Chun Li, Q. Yang, J. Wang, and M. Li. 2012. Efficient mining of gap-constrained subsequences and its various applications. ACM Transaction on Knowledge Discovery from Data 6, 1 (2012), 1–39.
[21]
Fei Xie, X. Wu, and X. Zhu. 2017. Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowledge-Based Systems 115 (2017), 27–39.
[22]
Yiyu Yao. 2010. Three-way decisions with probabilistic rough sets. Information Sciences 180, 3 (2010), 341–353.
[23]
Jianming Zhan, J. Ye, W. Ding, and P. Liu. 2021. A novel three-way decision model based on utility theory in incomplete fuzzy decision systems.IEEE Transactions on Fuzzy Systems.
[24]
Shuhui Cheng, Y. Wu, Y. Li, F. Yao, and F. Min. 2021. TWD-SFNN: Three-way decisions with a single hidden layer feedforward neural network. Information Sciences 579 (2021), 15–32.
[25]
Zhongyu Zhou and D. Pi. 2019. Mining method of minimal rare pattern oriented to satellite telemetry data stream. Chinese Journal of Computers Journal of Software, 1351–1366. DOI:https://doi.org/10.13328/j.cnki.jos.006054
[26]
Unil Yun, G. Lee, and E. Yoon. 2019. Advanced approach of sliding window based erasable pattern mining with list structure of industrial fields. Information Sciences 494 (2019), 37–59.
[27]
Lei Zhang, P. Luo, L. Tang, E. Chen, Q. Liu, M. Wang, and H. Xiong. 2015. Occupancy-based frequent pattern mining*. ACM Transaction on Knowledge Discovery from Data 10, 2 (2015), 1–33.
[28]
Xiangjun Dong, Y. Gong, and L. Cao. 2020. E-rnsp: An efficient method for mining repetition negative sequential patterns. IEEE Transactions on Cybernetics 50, 5 (2020), 2084–2096.
[29]
Tin Truong, H. Duong, B. Le, and P. Fournier-Viger. 2019. Fmaxclohusm: An efficient algorithm for mining frequent closed and maximal high utility sequences. Engineering Applications of Artificial Intelligence 85 (2019), 1–20.
[30]
Youxi Wu, Y. Wang, Y. Li, X. Zhu, and X. Wu. 2021. Top-k self-adaptive contrast sequential pattern mining. IEEE Transactions on Cybernetics. 1–15. DOI:https://doi.org/10.1109/TCYB.2021.3082114
[31]
Jaysawal B. Prasad and J.-W. Huang. 2018. Psp-ams: Progressive mining of sequential patterns across multiple streams. ACM Transaction on Knowledge Discovery from Data 13, 1 (2018), 1–23.
[32]
Lizhen Wang, X. Bao, and L. Zhou. 2018. Redundancy reduction for prevalent co-location patterns. IEEE Transactions on Knowledge and Data Engineering 30, 1 (2018), 142–155.
[33]
Philippe Fournier-Viger, Y. Zhang, J. C.-W. Lin, D.-T. Dinh, and H. B. Le. 2020. Mining correlated high-utility itemsets using various measures. Logic Journal of the IGPL 28, 1 (2020), 19–32.
[34]
Wei Song, Y. Liu, and J. Li. 2013. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence 40, 1 (2013), 29–43.
[35]
Youxi Wu, L. Wang, J. Ren, W. Ding, and X. Wu. 2014. Mining sequential patterns with periodic wildcard gaps. Applied Intelligence 41, 1 (2014), 99–116.
[36]
Wei Song, B. Jiang, and Y. Qiao. 2018. Mining multi-relational high utility itemsets from star schemas. Intelligent Data Analysis 22, 1 (2018), 143–165.
[37]
Jerry C.-W. Lin, T. Li, M. Pirouz, J. Zhang, and P. Fournier-Viger. 2019. High average-utility sequential pattern mining based on uncertain databases. Knowledge and Information Systems 62, 3 (2019), 1199–1228.
[38]
S. Vincent, Tseng, Bai-En, C. W. Shie, P. S. Wu, and Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8 (2013), 1772–1786.
[39]
Tin Truong, H. Duong, B. Le, and P. Fournier-Viger. 2019. Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Transactions on Knowledge and Data Engineering 31, 2 (2019), 301–314.
[40]
C. F. M. M. Rahman, K. S. Ahmed, and Leung. 2018. Mining weighted frequent sequences in uncertain databases. Information Sciences 479 (2018), 76–100.
[41]
Hyoju Nam, U. Yun, E. Yoon, and J. C-W. Lin. 2020. Efficient approach for incremental weighted erasable pattern mining with list structure. Expert Systems with Applications 143, 4 (2020), 113087.
[42]
Unil Yun and K. H. Ryu. 2013. Efficient mining of maximal correlated weight frequent patterns. Intelligent Data Analysis 17, 5 (2013), 917–939.
[43]
Bilong Shen, Z. Wen, Y. Zhao, D. Zhou, and W. Zheng. 2016. OCEAN: Fast discovery of high utility occupancy itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 354–365
[44]
Chao-Dong Tan, F. Min, M. Wang, H. R. Zhang, and Z. H. Zhang. 2016. Discovering patterns with weak-wildcard gaps. IEEE Access 4 (2016), 4922–4932.
[45]
Yu Fang, C. Gao, and Y. Yao. 2020. Granularity-driven sequential three-way decisions: A cost-sensitive approach to classification. Information Sciences 507, 1 (2020), 644–664.
[46]
Huiting Liu, Z. Liu, H. Huang, and X. Wu. 2018. Sequential pattern matching with general gaps and one-off condition. Journal of Software 2 (2018), 363–382.
[47]
Youxi Wu, R. Lei, Y. Li, L. Guo, and X. Wu. 2021. Haop-miner: Self-adaptive high-average utility one-off sequential pattern mining. Expert Systems with Applications 184 (2021), 115449.
[48]
Youxi Wu, X. Wang, Y. Li, L. Guo, Z. Li, J. Zhang, and X. Wu. 2021. OWSP-Miner: Self-adaptive one-off weak-gap strong pattern mining. ACM Transactions on Management Information Systems. DOI:https://doi.org/10.1145/3476247
[49]
Bolin Ding, D. Lo, J. Han, and S. C. Khoo. 2009. Efficient mining of closed repetitive gapped subsequences from a sequence database. In Proceedings of the International Conference on Data Engineering. 1024–1035.
[50]
Qiaoshuo Shi, J. Shan, W. Yan, Y. Wu, and X. Wu. 2020. Netnpg: Nonoverlapping pattern matching with general gap constraints. Appl Intell 50, 6 (2020), 1832–1845.
[51]
Youxi Wu, C. Shen, H. Jiang, and X. Wu. 2016. Strict pattern matching under non-overlapping condition. Science China Information Sciences 60, 1 (2016), 5–20.
[52]
Florian Heimerl, S. Lohmann, S. Lange, and T. Ertl. 2014. Word cloud explorer: Text analytics based on word clouds. In Proceedings of the Hawaii International Conference on System Sciences. 1833–1842.
[53]
Marco Capo, A. Perez, and J. A. Antonio. Lozano. 2020. An efficient split-merge re-start for the k-means algorithm. IEEE Transactions on Knowledge and Data Engineering.DOI:https://doi.org/10.1109/TKDE.2020.3002926
[54]
Leon Danon, A. Díaz-Guilera, J. Duch, and A. Arenas. 2005. Comparing community structure identification. Journal of Statistical Mechanics Theory and Experimen 2005, 09 (2005), P09008–P09008.
[55]
Andrew Rosenberg and J. Hirschberg. 2007. V-measure: A. Conditional entropy-based external cluster evaluation measure. In Proceedings of the Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 410–420.

Cited By

View all
  • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 12-Jun-2024
  • (2024)Totally-ordered Sequential Rules for Utility MaximizationACM Transactions on Knowledge Discovery from Data10.1145/362845018:4(1-23)Online publication date: 12-Feb-2024
  • (2024)RNP-Miner: Repetitive Nonoverlapping Sequential Pattern MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333430036:9(4874-4889)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 3
June 2022
494 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3485152
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2021
Accepted: 01 August 2021
Revised: 01 June 2021
Published in TKDD Volume 16, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sequential pattern mining
  2. frequent pattern
  3. three-way decisions
  4. gap constraint
  5. Apriori property

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • National Key Research and Development Program of China
  • National Science Foundation
  • Natural Science Foundation of Hebei Province, China
  • Graduate Student Innovation Program of Hebei Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)10
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Co-occurrence Order-preserving Pattern Mining with Keypoint Alignment for Time SeriesACM Transactions on Management Information Systems10.1145/365845015:2(1-27)Online publication date: 12-Jun-2024
  • (2024)Totally-ordered Sequential Rules for Utility MaximizationACM Transactions on Knowledge Discovery from Data10.1145/362845018:4(1-23)Online publication date: 12-Feb-2024
  • (2024)RNP-Miner: Repetitive Nonoverlapping Sequential Pattern MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333430036:9(4874-4889)Online publication date: Sep-2024
  • (2024)COPP-Miner: Top-k Contrast Order-Preserving Pattern Mining for Time Series ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332174936:6(2372-2387)Online publication date: Jun-2024
  • (2024)An approach based on maximal cliques and multi-density clustering for regional co-location pattern miningExpert Systems with Applications10.1016/j.eswa.2024.123414248(123414)Online publication date: Aug-2024
  • (2024)An efficient pruning method for mining inter-sequence patterns based on pseudo-IDListExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121738238:PBOnline publication date: 27-Feb-2024
  • (2023)From basic approaches to novel challenges and applications in Sequential Pattern MiningElectronic Research Archive10.3934/aci.20230043:1(44-78)Online publication date: 2023
  • (2023)Instrumental Variable-Driven Domain Generalization with Unobserved ConfoundersACM Transactions on Knowledge Discovery from Data10.1145/359538017:8(1-21)Online publication date: 28-Jun-2023
  • (2023)Multi-view Ensemble Clustering via Low-rank and Sparse Decomposition: From Matrix to TensorACM Transactions on Knowledge Discovery from Data10.1145/358976817:7(1-19)Online publication date: 4-May-2023
  • (2023)ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series DataACM Transactions on Knowledge Discovery from Data10.1145/358793717:8(1-18)Online publication date: 12-May-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media