Abstract
Sequential pattern mining is a key data mining task, where the aim is to find subsequences appearing frequently in sequences of items (symbols). To provide more flexibility and reveal more valuable patterns, sequential pattern mining with a periodic gap has emerged as an important extension. Algorithms for this task identify repetitive gapped subsequences (patterns) in a sequence. Although this has many applications, patterns are only selected based on their occurrence frequency and the external utility (relative importance) of each symbol is ignored. Consequently, these methods can find many unimportant frequent patterns and neglect some low frequency but extremely important patterns. To address this problem, this chapter presents a novel task of High Average Utility Periodic Gapped Sequential Pattern (HAPP) mining and proposes an efficient algorithm called Nettree for HAPP (NetHAPP), which involves two key steps: support calculation and candidate pattern generation. To calculate the support of patterns, a backtracking strategy is adopted that effectively reduces the time complexity of algorithm. To reduce the number of candidate patterns, an average utility upper bound method is combined with a pattern join strategy. A wide range of experimental results show that NetHAPP is not only more efficient than competitive algorithms but can also discover more valuable patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Z. Abdullah, O. Adam, T. Herawan, M.M. Deris, A review on sequential pattern mining algorithms based on apriori and patterns growth, in Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015), eds. by J. Abawajy, M. Othman, R. Ghazali, M. Deris, H. Mahdin, T. Herawan. Lecture Notes in Electrical Engineering (Springer, Singapore, 2019), pp. 111–118
R. Agrawal, R. Srikant, Mining sequential patterns, in Proceedings of the Eleventh International Conference on Data Engineering (IEEE, Taipei, Taiwan, 1995), pp. 3–14
R. Agarwal, R. Srikant, Fast algorithms for mining association rules, in Proceedings of the 20th VLDB Conference, pp. 487–499
X. Chai, X. Jia, Y. Wu, H. Jiang, X. Wu, Strict pattern matching with general gaps and One-Off condition. J. Softw. 26(5), 1096–1112 (2015)
X. Chen, Y. Rao, H. Xie, F. Wang, Y. Zhao, J. Yin, Sentiment classification using negative and intensive sentiment supplement information. Data Sci. Eng. 4(2), 109–118 (2019)
M. D’Andreagiovanni, F. Baiardi, J. Lipilini, S. Ruggieri, F. Tonelli, Sequential pattern mining for ICT risk assessment and management. J. Log. Algebr. Methods Program 102, 1–16 (2019)
B. Ding, D. Lo, J. Han, S. Khoo, Efficient mining of closed repetitive gapped subsequences from a sequence database, in 2009 IEEE 25th International Conference on Data Engineering (IEEE, 2009), pp. 1024–1035
X. Dong, Y. Gong, L. Cao, e-RNSP: an efficient method for mining repetition negative sequential patterns. IEEE T. Cybern. 50(5), 2084–2096 (2020)
X. Dong, Q. Qiu, J. Lu, L. Cao, T. Xu, Mining top-k useful negative sequential patterns via learning. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2764–2778 (2019)
F. Fumarola, P.F. Lanotte, M. Ceci, D. Malerba, CloFAST: closed sequential pattern mining using sparse and vertical ID-lists. Knowl. Inf. Syst. 48(2), 429–463 (2016)
P. Fournier-Viger, J. Li, J.C.W. Lin, T.T. Chi, R.U. Kiran, Mining cost-effective patterns in event logs. Knowl.-Based Syst. 191, 105241 (2020)
P. Fournier-Viger, P. Yang, J.C.W. Lin, P.U. Kiran, Discovering stable periodic-frequent patterns in transactional data, in Advances and Trends in Artificial Intelligence, eds. by F. Wotawa, G. Friedrich, I. Pill, R. Koitz-Hristov, M. Ali. From Theory to Practice. IEA/AIE. Lecture Notes in Computer Science (Springer, Berlin, 2019), pp. 230–244
P. Fournier-Viger, J.C.W. Lin, R.U. Kiran, Y.-S. Koh, A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)
W. Gan, J.C.W. Lin, P. Fournier-Viger, H.C. Chao, P.S. Yu, HUOPM: high-utility occupancy pattern mining. IEEE T. Cybern. (2019). https://doi.org/10.1109/TCYB.2019.2896267
W. Gan, J.C.W. Lin, J. Zhang, H. Chao, H. Fujita, P.S. Yu, ProUM: Projection-based utility mining on sequence data. Inf. Sci. 513, 222–240 (2020)
J. Ge, Y. Xia, J. Wang, C.H. Nadungodage, S. Prabhakar, Sequential pattern mining in databases with temporal uncertainty. Knowl. Inf. Syst. 51(3), 821–850 (2017)
D. Guo, X. Hu, F. Xie, X. Wu, Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 29, 57–74 (2013)
T. Guyet, R. Quiniou, NegPSpan: efficient extraction of negative sequential patterns with embedding constraints. Data Min. Knowl. Disc. 34, 563–609 (2020)
T.P. Hong, C.H. Lee, S.L. Wang, Mining high average-utility itemsets, in Proceedings of the IEEE International Conference on Systems (IEEE, San Antonio, 2009), pp. 2526–2530
H. Jiang, X. Chen, T. He, Z. Chen, X. Li, Fuzzy clustering of crowdsourced test reports for apps. ACM Trans. Internet Technol. 18(2), 1–28 (2018)
H. Jiang, X. Li, Z. Ren, J. Xuan, Z. Jin, Toward better summarizing bug reports with crowdsourcing EliciteWd attribute. IEEE Trans. Reliab. 68(1), 2–22 (2019)
B.C. Kachhadiya, B. Patel, A survey on sequential pattern mining algorithm for web log pattern data, in 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI) (IEEE, Tirunelveli, 2018), pp. 1269–1273
H.T. Lam, F. Moerchen, D. Fradkin, T. Calders, Mining compressing sequential patterns. Statal Anal. Data Min. 71(1), 34–52 (2014)
B. Le, H. Duong, T. Truong, P. Fournier-Viger, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inf. Syst. 52, 71–107 (2017)
G. Lee, U. Yu, Performance and characteristic analysis of maximal frequent pattern mining methods using additional factors. Soft. Comput. 22, 4267–4273 (2018)
J.C.W. Lin, J.M. Wu, P. Fournier-viger, T. Hong, T. Li, Efficient mining of high average-utility sequential patterns from uncertain databases, in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (IEEE, Bari, Italy, 2019) pp. 1989–1994
S. Lin, Y. Chen, D. Yang, J. Wu, Discovering long maximal frequent pattern, in 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI) (IEEE, Chiang Mai, Thailand, 2016), pp. 136–142
T. Lu, B. Vo, H.T. Nguyen, T.Z. Hong, A new method for mining high average utility itemsets, in Computer Information Systems and Industrial Management, eds. by K. Saeed, V. Snel. CISIM 2015. Lecture Notes in Computer Science (Springer, Heidelberg, 2014), pp. 33–42
A.R. Maske, B. Joglekar, An algorithmic approach for mining customer behavior prediction in market basket analysis, in Innovations in Computer Science and Engineering, eds. by H. Saini, R. Sayal, A. Govardhan, R. Buyya. Lecture Notes in Networks and Systems (Springer, Singapore, 2019), pp. 31–38
F. Min, Z. Zhang, W. Zhai, R. Shen, Frequent pattern discovery with tri-partition alphabets. Inf. Sci. 507, 715–732 (2020)
H. Nam, U. Yun, E. Yoon, J.C.W. Lin, Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions. Inf. Sci. 529, 1–27 (2020)
J. Pei, J. Wang, W. Wang, Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28, 133–160 (2007)
A. Rahman, Y. Xu, K. Radke, E. Foo, Finding anomalies in SCADA logs using rare sequential pattern mining, in Network and System Security, eds. by J. Chen, V. Piuri, C. Su, M. Yung. NSS 2016. Lecture Notes in Computer Science (Springer, Cham, 2016), pp. 499–506
J. Ren, Y. Sun, S. Guo, Maximal sequential pattern mining based on simultaneous monotone and anti-monotone constraints, in Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007) (IEEE, Kaohsiung, 2007), pp. 143–146
C.B. Rjeily, G. Badr, A.H.E. Hassani, E. Andres, Medical data mining for heart diseases and the future of sequential mining in medical field, in Machine Learning Paradigms, eds. by G. Tsihrintzis, D. Sotiropoulos, L. Jain. Intelligent Systems Reference Library (Springer, Cham, 2019), pp. 71–99
H. Ryang, U. Yun, Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques. Knowl. Inf. Syst. 51(2), 627–659 (2017)
Q. Shi, J. Shan, W. Yan, Y. Wu, X. Wu, NetNPG: nonoverlapping pattern matching with general gap constraints. Appl. Intell. 50(6), 1832–1845 (2020)
A. Soltani, M. Soltani, A new algorithm for high average-utility itemset mining. J. AI Data Min. 7(4), 537–550 (2019)
W. Song, Y. Liu, J. Li, Mining high utility itemsets by dynamically pruning the tree structure. Appl. Intell. 40, 29–43 (2014)
W. Song, B. Jiang, Y. Qiao, Mining multi-relational high utility itemsets from star schemas. Intell. Data Anal. 22(1), 143–165 (2018)
T. Truong, H. Duong, B. Le, P. Fournier-Viger, U. Yun, Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowledge-Based Syst. 183, 104847 (2019)
V.S. Tseng, B.E. Shie, C.W. Wu, P.S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)
J. Wang, J. Huang, Y. Chen, On efficiently mining high utility sequential patterns. Knowl. Inf. Syst. 49, 597–627 (2016)
X. Wang, L. Chai, Q. Xu, Y. Yang, J. Li, J. Wang, Y. Chai, Efficient subgraph matching on large RDF graphs using mapreduce. Data Sci. Eng. 4(1), 24–43 (2019)
Y. Wang, W. Hou, F. Wang, Mining co-occurrence and sequence patterns from cancer diagnoses in New York State. PLoS ONE (2018). https://doi.org/10.1371/journal.pone.0194407
Q. Xu, D. He, N. Zhang, C. Kang, J. Bai, J. Huang, A short-term wind power forecasting approach with adjustment of numerical weather prediction input by data mining. IEEE Trans. Sustain. Energy 6(4), 1283–1291 (2015)
X. Wu, J. Qiang, F. Xie, Pattern matching with flexible wildcards. J. Comput. Sci. Technol. 29(5), 740–750 (2014)
X. Wu, D. Theodoratos, Homomorphic pattern mining from a single large data tree. Data Sci. Eng. 1(4), 203–218 (2016)
X. Wu, F. Xie, Y. Ming, J. Gao, Mining sequential patterns with wildcards and the one-off condition. J. Soft. 24(8), 1804–1815 (2013)
X. Wu, X. Zhu, Y. He, A.N. Arslan, PMBC: pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43(5), 481–492 (2013)
Y. Wu, J. Fan, Y. Li, L. Guo, X. Wu, NetDAP: (\( \delta \), \( \gamma \))-approximate pattern matching with length constraints. Appl. Intell. 50(11), 4094–4116 (2020). https://doi.org/10.1007/s10489-020-01778-1
Y. Wu, C. Shen, H. Jiang, X. Wu, Strict pattern matching under non-overlapping condition. Sci. China-Inf. Sci. 60(1), 012101 (2017)
Y. Wu, Z. Tang, H. Jiang, X. Wu, Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)
Y. Wu, Y. Tong, X. Zhu, X. Wu, NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE T. Cybern. 48(10), 2809–2822 (2018)
Y. Wu, L. Wang, J. Ren, W. Ding, X. Wu, Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41, 99–116 (2014)
Y. Wu, Y. Wang, J. Liu, M. Yu, Y. Li, Mining distinguishing subsequence patterns with nonoverlapping condition. Cluster Comput. 22, 5905–5917 (2019)
Y. Wu, X. Wu, H. Jiang, F. Min, A heuristic algorithm for solving MPMGOOC problem. Chin. J. Comput. 34(8), 1452–1462 (2011)
Y. Wu, C. Zhu, Y. Li, L. Guo, X. Wu, NetNCSP: nonoverlapping closed sequential pattern mining. Knowledge-Based Syst. 196, 105812 (2020)
H. Yao, H.J. Hamilton, Butz, A foundational approach to mining itemset utilities from databases, in Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM, 2004), pp. 482–486
J. Yeo, S. Hwang, S. Kim, E. Koh, N. Lipka, Conversion prediction from click stream: modeling market prediction and customer predictability. IEEE Trans. Knowl. Data Eng. 32(2), 246–259 (2020)
M. Zhang, B. Kao, D.W. Cheung, K.Y. Yip, Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)
S. Zida, P. Fournier-Viger, J.C. Lin, C. Wu, V.S. Tseng, EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 21(2), 599–625 (2017)
Acknowledgements
This work was partly supported by National Natural Science Foundation of China (61976240, 52077056), and Graduate Student Innovation Program of Hebei Province (CXZZBS2020024).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wu, Y., Geng, M., Li, Y., Guo, L., Fournier-Viger, P. (2021). NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining. In: Kiran, R.U., Fournier-Viger, P., Luna, J.M., Lin, J.CW., Mondal, A. (eds) Periodic Pattern Mining . Springer, Singapore. https://doi.org/10.1007/978-981-16-3964-7_11
Download citation
DOI: https://doi.org/10.1007/978-981-16-3964-7_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3963-0
Online ISBN: 978-981-16-3964-7
eBook Packages: Computer ScienceComputer Science (R0)