Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining

  • Chapter
  • First Online:
Periodic Pattern Mining
  • 521 Accesses

Abstract

Sequential pattern mining is a key data mining task, where the aim is to find subsequences appearing frequently in sequences of items (symbols). To provide more flexibility and reveal more valuable patterns, sequential pattern mining with a periodic gap has emerged as an important extension. Algorithms for this task identify repetitive gapped subsequences (patterns) in a sequence. Although this has many applications, patterns are only selected based on their occurrence frequency and the external utility (relative importance) of each symbol is ignored. Consequently, these methods can find many unimportant frequent patterns and neglect some low frequency but extremely important patterns. To address this problem, this chapter presents a novel task of High Average Utility Periodic Gapped Sequential Pattern (HAPP) mining and proposes an efficient algorithm called Nettree for HAPP (NetHAPP), which involves two key steps: support calculation and candidate pattern generation. To calculate the support of patterns, a backtracking strategy is adopted that effectively reduces the time complexity of algorithm. To reduce the number of candidate patterns, an average utility upper bound method is combined with a pattern join strategy. A wide range of experimental results show that NetHAPP is not only more efficient than competitive algorithms but can also discover more valuable patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Z. Abdullah, O. Adam, T. Herawan, M.M. Deris, A review on sequential pattern mining algorithms based on apriori and patterns growth, in Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015), eds. by J. Abawajy, M. Othman, R. Ghazali, M. Deris, H. Mahdin, T. Herawan. Lecture Notes in Electrical Engineering (Springer, Singapore, 2019), pp. 111–118

    Google Scholar 

  2. R. Agrawal, R. Srikant, Mining sequential patterns, in Proceedings of the Eleventh International Conference on Data Engineering (IEEE, Taipei, Taiwan, 1995), pp. 3–14

    Google Scholar 

  3. R. Agarwal, R. Srikant, Fast algorithms for mining association rules, in Proceedings of the 20th VLDB Conference, pp. 487–499

    Google Scholar 

  4. X. Chai, X. Jia, Y. Wu, H. Jiang, X. Wu, Strict pattern matching with general gaps and One-Off condition. J. Softw. 26(5), 1096–1112 (2015)

    MathSciNet  Google Scholar 

  5. X. Chen, Y. Rao, H. Xie, F. Wang, Y. Zhao, J. Yin, Sentiment classification using negative and intensive sentiment supplement information. Data Sci. Eng. 4(2), 109–118 (2019)

    Article  Google Scholar 

  6. M. D’Andreagiovanni, F. Baiardi, J. Lipilini, S. Ruggieri, F. Tonelli, Sequential pattern mining for ICT risk assessment and management. J. Log. Algebr. Methods Program 102, 1–16 (2019)

    Google Scholar 

  7. B. Ding, D. Lo, J. Han, S. Khoo, Efficient mining of closed repetitive gapped subsequences from a sequence database, in 2009 IEEE 25th International Conference on Data Engineering (IEEE, 2009), pp. 1024–1035

    Google Scholar 

  8. X. Dong, Y. Gong, L. Cao, e-RNSP: an efficient method for mining repetition negative sequential patterns. IEEE T. Cybern. 50(5), 2084–2096 (2020)

    Article  Google Scholar 

  9. X. Dong, Q. Qiu, J. Lu, L. Cao, T. Xu, Mining top-k useful negative sequential patterns via learning. IEEE Trans. Neural Netw. Learn. Syst. 30(9), 2764–2778 (2019)

    Article  Google Scholar 

  10. F. Fumarola, P.F. Lanotte, M. Ceci, D. Malerba, CloFAST: closed sequential pattern mining using sparse and vertical ID-lists. Knowl. Inf. Syst. 48(2), 429–463 (2016)

    Article  Google Scholar 

  11. P. Fournier-Viger, J. Li, J.C.W. Lin, T.T. Chi, R.U. Kiran, Mining cost-effective patterns in event logs. Knowl.-Based Syst. 191, 105241 (2020)

    Google Scholar 

  12. P. Fournier-Viger, P. Yang, J.C.W. Lin, P.U. Kiran, Discovering stable periodic-frequent patterns in transactional data, in Advances and Trends in Artificial Intelligence, eds. by F. Wotawa, G. Friedrich, I. Pill, R. Koitz-Hristov, M. Ali. From Theory to Practice. IEA/AIE. Lecture Notes in Computer Science (Springer, Berlin, 2019), pp. 230–244

    Google Scholar 

  13. P. Fournier-Viger, J.C.W. Lin, R.U. Kiran, Y.-S. Koh, A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)

    Google Scholar 

  14. W. Gan, J.C.W. Lin, P. Fournier-Viger, H.C. Chao, P.S. Yu, HUOPM: high-utility occupancy pattern mining. IEEE T. Cybern. (2019). https://doi.org/10.1109/TCYB.2019.2896267

  15. W. Gan, J.C.W. Lin, J. Zhang, H. Chao, H. Fujita, P.S. Yu, ProUM: Projection-based utility mining on sequence data. Inf. Sci. 513, 222–240 (2020)

    Google Scholar 

  16. J. Ge, Y. Xia, J. Wang, C.H. Nadungodage, S. Prabhakar, Sequential pattern mining in databases with temporal uncertainty. Knowl. Inf. Syst. 51(3), 821–850 (2017)

    Article  Google Scholar 

  17. D. Guo, X. Hu, F. Xie, X. Wu, Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 29, 57–74 (2013)

    Article  Google Scholar 

  18. T. Guyet, R. Quiniou, NegPSpan: efficient extraction of negative sequential patterns with embedding constraints. Data Min. Knowl. Disc. 34, 563–609 (2020)

    Article  MathSciNet  Google Scholar 

  19. T.P. Hong, C.H. Lee, S.L. Wang, Mining high average-utility itemsets, in Proceedings of the IEEE International Conference on Systems (IEEE, San Antonio, 2009), pp. 2526–2530

    Google Scholar 

  20. H. Jiang, X. Chen, T. He, Z. Chen, X. Li, Fuzzy clustering of crowdsourced test reports for apps. ACM Trans. Internet Technol. 18(2), 1–28 (2018)

    Article  Google Scholar 

  21. H. Jiang, X. Li, Z. Ren, J. Xuan, Z. Jin, Toward better summarizing bug reports with crowdsourcing EliciteWd attribute. IEEE Trans. Reliab. 68(1), 2–22 (2019)

    Article  Google Scholar 

  22. B.C. Kachhadiya, B. Patel, A survey on sequential pattern mining algorithm for web log pattern data, in 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI) (IEEE, Tirunelveli, 2018), pp. 1269–1273

    Google Scholar 

  23. H.T. Lam, F. Moerchen, D. Fradkin, T. Calders, Mining compressing sequential patterns. Statal Anal. Data Min. 71(1), 34–52 (2014)

    Article  MathSciNet  Google Scholar 

  24. B. Le, H. Duong, T. Truong, P. Fournier-Viger, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy. Knowl. Inf. Syst. 52, 71–107 (2017)

    Article  Google Scholar 

  25. G. Lee, U. Yu, Performance and characteristic analysis of maximal frequent pattern mining methods using additional factors. Soft. Comput. 22, 4267–4273 (2018)

    Article  Google Scholar 

  26. J.C.W. Lin, J.M. Wu, P. Fournier-viger, T. Hong, T. Li, Efficient mining of high average-utility sequential patterns from uncertain databases, in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (IEEE, Bari, Italy, 2019) pp. 1989–1994

    Google Scholar 

  27. S. Lin, Y. Chen, D. Yang, J. Wu, Discovering long maximal frequent pattern, in 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI) (IEEE, Chiang Mai, Thailand, 2016), pp. 136–142

    Google Scholar 

  28. T. Lu, B. Vo, H.T. Nguyen, T.Z. Hong, A new method for mining high average utility itemsets, in Computer Information Systems and Industrial Management, eds. by K. Saeed, V. Snel. CISIM 2015. Lecture Notes in Computer Science (Springer, Heidelberg, 2014), pp. 33–42

    Google Scholar 

  29. A.R. Maske, B. Joglekar, An algorithmic approach for mining customer behavior prediction in market basket analysis, in Innovations in Computer Science and Engineering, eds. by H. Saini, R. Sayal, A. Govardhan, R. Buyya. Lecture Notes in Networks and Systems (Springer, Singapore, 2019), pp. 31–38

    Google Scholar 

  30. F. Min, Z. Zhang, W. Zhai, R. Shen, Frequent pattern discovery with tri-partition alphabets. Inf. Sci. 507, 715–732 (2020)

    Article  MathSciNet  Google Scholar 

  31. H. Nam, U. Yun, E. Yoon, J.C.W. Lin, Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions. Inf. Sci. 529, 1–27 (2020)

    Google Scholar 

  32. J. Pei, J. Wang, W. Wang, Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28, 133–160 (2007)

    Article  Google Scholar 

  33. A. Rahman, Y. Xu, K. Radke, E. Foo, Finding anomalies in SCADA logs using rare sequential pattern mining, in Network and System Security, eds. by J. Chen, V. Piuri, C. Su, M. Yung. NSS 2016. Lecture Notes in Computer Science (Springer, Cham, 2016), pp. 499–506

    Google Scholar 

  34. J. Ren, Y. Sun, S. Guo, Maximal sequential pattern mining based on simultaneous monotone and anti-monotone constraints, in Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007) (IEEE, Kaohsiung, 2007), pp. 143–146

    Google Scholar 

  35. C.B. Rjeily, G. Badr, A.H.E. Hassani, E. Andres, Medical data mining for heart diseases and the future of sequential mining in medical field, in Machine Learning Paradigms, eds. by G. Tsihrintzis, D. Sotiropoulos, L. Jain. Intelligent Systems Reference Library (Springer, Cham, 2019), pp. 71–99

    Google Scholar 

  36. H. Ryang, U. Yun, Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques. Knowl. Inf. Syst. 51(2), 627–659 (2017)

    Article  Google Scholar 

  37. Q. Shi, J. Shan, W. Yan, Y. Wu, X. Wu, NetNPG: nonoverlapping pattern matching with general gap constraints. Appl. Intell. 50(6), 1832–1845 (2020)

    Article  Google Scholar 

  38. A. Soltani, M. Soltani, A new algorithm for high average-utility itemset mining. J. AI Data Min. 7(4), 537–550 (2019)

    MathSciNet  Google Scholar 

  39. W. Song, Y. Liu, J. Li, Mining high utility itemsets by dynamically pruning the tree structure. Appl. Intell. 40, 29–43 (2014)

    Article  Google Scholar 

  40. W. Song, B. Jiang, Y. Qiao, Mining multi-relational high utility itemsets from star schemas. Intell. Data Anal. 22(1), 143–165 (2018)

    Article  Google Scholar 

  41. T. Truong, H. Duong, B. Le, P. Fournier-Viger, U. Yun, Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowledge-Based Syst. 183, 104847 (2019)

    Google Scholar 

  42. V.S. Tseng, B.E. Shie, C.W. Wu, P.S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)

    Article  Google Scholar 

  43. J. Wang, J. Huang, Y. Chen, On efficiently mining high utility sequential patterns. Knowl. Inf. Syst. 49, 597–627 (2016)

    Article  Google Scholar 

  44. X. Wang, L. Chai, Q. Xu, Y. Yang, J. Li, J. Wang, Y. Chai, Efficient subgraph matching on large RDF graphs using mapreduce. Data Sci. Eng. 4(1), 24–43 (2019)

    Article  Google Scholar 

  45. Y. Wang, W. Hou, F. Wang, Mining co-occurrence and sequence patterns from cancer diagnoses in New York State. PLoS ONE (2018). https://doi.org/10.1371/journal.pone.0194407

    Article  Google Scholar 

  46. Q. Xu, D. He, N. Zhang, C. Kang, J. Bai, J. Huang, A short-term wind power forecasting approach with adjustment of numerical weather prediction input by data mining. IEEE Trans. Sustain. Energy 6(4), 1283–1291 (2015)

    Article  Google Scholar 

  47. X. Wu, J. Qiang, F. Xie, Pattern matching with flexible wildcards. J. Comput. Sci. Technol. 29(5), 740–750 (2014)

    Article  MathSciNet  Google Scholar 

  48. X. Wu, D. Theodoratos, Homomorphic pattern mining from a single large data tree. Data Sci. Eng. 1(4), 203–218 (2016)

    Article  Google Scholar 

  49. X. Wu, F. Xie, Y. Ming, J. Gao, Mining sequential patterns with wildcards and the one-off condition. J. Soft. 24(8), 1804–1815 (2013)

    Article  Google Scholar 

  50. X. Wu, X. Zhu, Y. He, A.N. Arslan, PMBC: pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43(5), 481–492 (2013)

    Article  Google Scholar 

  51. Y. Wu, J. Fan, Y. Li, L. Guo, X. Wu, NetDAP: (\( \delta \), \( \gamma \))-approximate pattern matching with length constraints. Appl. Intell. 50(11), 4094–4116 (2020). https://doi.org/10.1007/s10489-020-01778-1

    Article  Google Scholar 

  52. Y. Wu, C. Shen, H. Jiang, X. Wu, Strict pattern matching under non-overlapping condition. Sci. China-Inf. Sci. 60(1), 012101 (2017)

    Google Scholar 

  53. Y. Wu, Z. Tang, H. Jiang, X. Wu, Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)

    Article  Google Scholar 

  54. Y. Wu, Y. Tong, X. Zhu, X. Wu, NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE T. Cybern. 48(10), 2809–2822 (2018)

    Article  Google Scholar 

  55. Y. Wu, L. Wang, J. Ren, W. Ding, X. Wu, Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41, 99–116 (2014)

    Article  Google Scholar 

  56. Y. Wu, Y. Wang, J. Liu, M. Yu, Y. Li, Mining distinguishing subsequence patterns with nonoverlapping condition. Cluster Comput. 22, 5905–5917 (2019)

    Article  Google Scholar 

  57. Y. Wu, X. Wu, H. Jiang, F. Min, A heuristic algorithm for solving MPMGOOC problem. Chin. J. Comput. 34(8), 1452–1462 (2011)

    Article  Google Scholar 

  58. Y. Wu, C. Zhu, Y. Li, L. Guo, X. Wu, NetNCSP: nonoverlapping closed sequential pattern mining. Knowledge-Based Syst. 196, 105812 (2020)

    Google Scholar 

  59. H. Yao, H.J. Hamilton, Butz, A foundational approach to mining itemset utilities from databases, in Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM, 2004), pp. 482–486

    Google Scholar 

  60. J. Yeo, S. Hwang, S. Kim, E. Koh, N. Lipka, Conversion prediction from click stream: modeling market prediction and customer predictability. IEEE Trans. Knowl. Data Eng. 32(2), 246–259 (2020)

    Article  Google Scholar 

  61. M. Zhang, B. Kao, D.W. Cheung, K.Y. Yip, Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)

    Article  Google Scholar 

  62. S. Zida, P. Fournier-Viger, J.C. Lin, C. Wu, V.S. Tseng, EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 21(2), 599–625 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by National Natural Science Foundation of China (61976240, 52077056), and Graduate Student Innovation Program of Hebei Province (CXZZBS2020024).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Fournier-Viger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wu, Y., Geng, M., Li, Y., Guo, L., Fournier-Viger, P. (2021). NetHAPP: High Average Utility Periodic Gapped Sequential Pattern Mining. In: Kiran, R.U., Fournier-Viger, P., Luna, J.M., Lin, J.CW., Mondal, A. (eds) Periodic Pattern Mining . Springer, Singapore. https://doi.org/10.1007/978-981-16-3964-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-3964-7_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-3963-0

  • Online ISBN: 978-981-16-3964-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics