Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Incremental clickstream pattern mining with search boundaries

Published: 25 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Recently, there has been a growing interest in sequential pattern mining in data mining, with a particular focus on clickstream pattern mining. These areas hold the potential for discovering valuable patterns. However, traditional mining algorithms in these domains often assume that databases are static, simplifying the mining process. In reality, databases are updated incrementally over time, partially rendering a portion of the previous results invalid. This necessitates rerunning algorithms on updated databases to obtain accurate frequent patterns. As database size increases, this approach can become time-consuming and affect performance. To tackle this issue, we propose PSB-CUP to mine frequent clickstream patterns in an incremental update manner. PSB-CUP employs the concept of search borders to reduce the search space and the information retained in memory. Furthermore, an IDList generation method called “partial imbalance join” was proposed to reconstruct possibly missing information during the incremental process. This join method, however, requires more extra information to be cached in exchange for speed. We then improve this technique by introducing “recursive imbalance join”, removing the need for extra cached data in the PSB-CUP + algorithm. The experimental results show that our proposed algorithms are efficient for incremental clickstream pattern mining.

    References

    [1]
    P. Fournier-Viger, A. Gomariz, M. Campos, R. Thomas, Fast vertical mining of sequential patterns using co-occurrence information, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2014, pp. 40–52,.
    [2]
    H.M. Huynh, L.T.T. Nguyen, B. Vo, A. Nguyen, V.S. Tseng, Efficient methods for mining weighted clickstream patterns, Expert Syst. Appl. 142 (2019),.
    [3]
    M. Ceci, P.F. Lanotte, Closed sequential pattern mining for sitemap generation, World Wide Web, vol. 24, no. 1, 2021, 10.1007/s11280-020-00839-2.
    [4]
    L. Wang, P. Xu, Q. Ma, Incremental fuzzy clustering of time series, Fuzzy Set. Syst. 421 (2021),.
    [5]
    J. Bao, W. Wang, T. Yang, and G. Wu, “An incremental clustering method based on the boundary profile,” PLoS One, vol. 13, no. 4, 2018, 10.1371/journal.pone.0196108.
    [6]
    H. Kim, et al., An advanced approach for incremental flexible periodic pattern mining on time-series data, Expert Syst. Appl. 230 (2023),.
    [7]
    K. Yuan, W. Xu, W. Li, W. Ding, An incremental learning mechanism for object classification based on progressive fuzzy three-way concept, Inf. Sci. (N. Y.) 584 (2022),.
    [8]
    F. Hao, Y. Yang, G. Min, V. Loia, Incremental construction of three-way concept lattice for knowledge discovery in social networks, Inf Sci (n Y) 578 (2021),.
    [9]
    T. P. Hong, C. Y. Wang, and Y. H. Tao, “A new incremental data mining algorithm using pre-large itemsets,” Intelligent Data Analysis, vol. 5, no. 2, 2001, 10.3233/ida-2001-5203.
    [10]
    B. Vo, H.C. Nguyen, B. Huynh, T. Le, Efficient Methods for Clickstream Pattern Mining on Incremental Databases, IEEE Access 9 (2021),.
    [11]
    M.H. Huynh, B. Vo, Z.K. Oplatková, W. Pedrycz, An Approach for Incremental Mining of Clickstream Patterns as a Service Application (accepted), IEEE Trans. Serv. Comput. (2023).
    [12]
    Q. Su and L. Chen, “A method for discovering clusters of e-commerce interest patterns using clickstream data,” Electron Commer Res Appl, vol. 14, no. 1, 2015, 10.1016/j.elerap.2014.10.002.
    [13]
    D. Anandhi, M.S.I. Ahmed, Prediction of user’s type and navigation pattern using clustering and classification algorithms, Cluster Comput 22 (2019),.
    [14]
    O. Raphaeli, A. Goldstein, L. Fink, Analyzing online consumer behavior in mobile and PC devices: A novel web usage mining approach, Electron. Commer. Res. Appl. 26 (2017),.
    [15]
    N. N. Pham, Z. K. Oplatkova, H. M. Huynh, B. Vo, Mining Top-K High Utility Itemsets Using Bio-Inspired Algorithms with a Diversity within Population Framework, in: 2022 RIVF International Conference on Computing and Communication Technologies (RIVF), IEEE, Dec. 2022, pp. 167–172. 10.1109/RIVF55975.2022.10013891.
    [16]
    H.J. Choi, C.H. Park, Emerging topic detection in twitter stream based on high utility pattern mining, Expert Syst. Appl. 115 (2019),.
    [17]
    B. Dong, R. Liu, H.W. Wang, Trust-but-Verify: Verifying Result Correctness of Outsourced Frequent Itemset Mining in Data-Mining-As-a-Service Paradigm, IEEE Trans. Serv. Comput. 9 (1) (Jan. 2016) 18–32,.
    [18]
    M.V. Ahluwalia, A. Gangopadhyay, Z. Chen, Y. Yesha, Target-Based, Privacy Preserving, and Incremental Association Rule Mining, IEEE Trans. Serv. Comput. 10 (4) (Jul. 2017) 633–645,.
    [19]
    Y. Wu, Q. Hu, Y. Li, L. Guo, X. Zhu, and X. Wu, “OPP-Miner: Order-Preserving Sequential Pattern Mining for Time Series,” IEEE Trans Cybern, vol. 53, no. 5, 2023, 10.1109/TCYB.2022.3169327.
    [20]
    Y. Li, et al., MCoR-Miner: Maximal Co-Occurrence Nonoverlapping Sequential Rule Mining, IEEE Trans. Knowl. Data Eng. (2023),.
    [21]
    W. Gan, J. C. W. Lin, J. Zhang, P. Fournier-Viger, H. C. Chao, and P. S. Yu, “Fast Utility Mining on Sequence Data,” IEEE Trans Cybern, vol. 51, no. 2, 2021, 10.1109/TCYB.2020.2970176.
    [22]
    W. Gan, J. C. W. Lin, P. Fournier-Viger, H. C. Chao, V. S. Tseng, and P. S. Yu, “A Survey of Utility-Oriented Pattern Mining,” IEEE Trans Knowl Data Eng, vol. 33, no. 4, 2021, 10.1109/TKDE.2019.2942594.
    [23]
    W. Gan, J. C. W. Lin, H. C. Chao, and P. S. Yu, “Discovering High Utility Episodes in Sequences,” IEEE Transactions on Artificial Intelligence, vol. 4, no. 3, 2023, 10.1109/TAI.2022.3223965.
    [24]
    H. Duong, T. Truong, A. Tran, and B. Le, “Fast generation of sequential patterns with item constraints from concise representations,” Knowl Inf Syst, vol. 62, no. 6, 2020, 10.1007/s10115-019-01418-2.
    [25]
    L. Cao, Y. Yan, S. Madden, E. A. Rundensteiner, and M. Gopalsamy, “Efficient discovery of sequence outlier patterns,” Proceedings of the VLDB Endowment, vol. 12, no. 8, pp. 920–932, Apr. 2019, 10.14778/3324301.3324308.
    [26]
    P. Fournier-Viger, Z. Li, J.-C.-W. Lin, R.U. Kiran, H. Fujita, Efficient algorithms to identify periodic patterns in multiple sequences, Inf Sci (n Y) 489 (Jul. 2019) 205–226,.
    [27]
    P. Fournier-Viger, P. Yang, Z. Li, J.-C.-W. Lin, R.U. Kiran, Discovering rare correlated periodic patterns in multiple sequences, Data Knowl. Eng. 126 (Mar. 2020),.
    [28]
    X. Dong, Y. Gong, L. Cao, e-RNSP: An Efficient Method for Mining Repetition Negative Sequential Patterns, IEEE Trans. Cybern. 50 (5) (May 2020) 2084–2096,.
    [29]
    S. Karsoum, C. Barrus, L. Gruenwald, E. Leal, Minits-AllOcc: An Efficient Algorithm for Mining Timed Sequential Patterns, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2021, pp. 668–685,.
    [30]
    Y. Wu, Y. Wang, Y. Li, X. Zhu, X. Wu, Top-k Self-Adaptive Contrast Sequential Pattern Mining, IEEE Trans. Cybern. (2021) 1–15,.
    [31]
    W. Gan, J.-C.-W. Lin, P. Fournier-Viger, H.-C. Chao, P.S. Yu, A Survey of Parallel Sequential Pattern Mining, ACM Trans. Knowl. Discov. Data 13 (3) (Jul. 2019) 1–34,.
    [32]
    Y. Zou, Y. Zhu, Y. Li, F.X. Wu, J. Wang, Parallel computing for genome sequence processing, Brief. Bioinform. 22 (5) (2021) pp,.
    [33]
    B. Huynh, C. Trinh, H. Huynh, T.T. Van, B. Vo, V. Snasel, An efficient approach for mining sequential patterns using multiple threads on very large databases, Eng. Appl. Artif. Intel. 74 (2018) 242–251,.
    [34]
    B. Huynh, B. Vo, V. Snasel, An efficient method for mining frequent sequential patterns using multi-Core processors, Appl. Intell. (2016),.
    [35]
    H.M. Huynh, L.T.T. Nguyen, B. Vo, Z.K. Oplatková, P. Fournier-Viger, U. Yun, An efficient parallel algorithm for mining weighted clickstream patterns, Inf Sci (n Y) 582 (2022),.
    [36]
    N. Martin, A. Solti, J. Mendling, B. Depaire, A. Caris, Mining Batch Activation Rules from Event Logs, IEEE Trans. Serv. Comput. 14 (6) (Nov. 2021) 1908–1919,.
    [37]
    C. Zeng, L. Tang, W. Zhou, T. Li, L. Shwartz, G.Y. Grabarnik, An Integrated Framework for Mining Temporal Logs from Fluctuating Events, IEEE Trans. Serv. Comput. 12 (2) (Mar. 2019) 199–213,.
    [38]
    Z. Zhao, D. Yan, W. Ng, Mining probabilistically frequent sequential patterns in large uncertain databases, IEEE Trans. Knowl. Data Eng. 26 (5) (2014) 1171–1184,.
    [39]
    J.K. Tarus, Z. Niu, D. Kalui, A hybrid recommender system for e-learning based on context awareness and sequential pattern mining, Soft. Comput. 22 (8) (2018) pp,.
    [40]
    H.M. Huynh, L.T.T. Nguyen, B. Vo, U. Yun, Z.K. Oplatková, T.-P. Hong, Efficient algorithms for mining clickstream patterns using pseudo-IDLists, Futur. Gener. Comput. Syst. 107 (2020),.
    [41]
    B. Huynh, et al., A Novel Approach for Mining Closed Clickstream Patterns, Cybern. Syst. 52 (5) (Jul. 2021) 328–349,.
    [42]
    D.W. Cheung, J. Han, V.T. Ng, C.Y. Wong, Maintenance of discovered association rules in large databases: An incremental updating technique, Proceedings - International Conference on Data Engineering (1996),.
    [43]
    T.P. Hong, C.W. Lin, Y.L. Wu, Incrementally fast updated frequent pattern trees, Expert Syst. Appl. 34 (4) (2008) pp,.
    [44]
    R. Davashi, ILUNA: Single-pass incremental method for uncertain frequent pattern mining without false positives, Inf Sci (n Y) 564 (2021),.
    [45]
    S. Kim, et al., Efficient approach for mining high-utility patterns on incremental databases with dynamic profits, Knowl Based Syst 282 (Dec. 2023),.
    [46]
    H. Kim, et al., Mining high occupancy patterns to analyze incremental data in intelligent systems, ISA Trans. 131 (2022),.
    [47]
    C. Yue, G. Jiankui, W. Yaqin, X. Yun, Z. Yangyong, Incremental mining of sequential patterns using prefix tree, Pacific-Asia Conference on Knowledge Discovery and Data Mining (2007),.
    [48]
    J. Pei et al., “PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth,” in Proceedings of the International Conference on Data Engineering (ICDE), 2001, pp. 215–224. 10.1109/ICDE.2001.914830.
    [49]
    J.C.W. Lin, T.P. Hong, W. Gan, H.Y. Chen, S.T. Li, Incrementally updating the discovered sequential patterns based on pre-large concept, Intell. Data Anal. 19 (5) (2015) pp,.
    [50]
    P. Fournier-Viger et al., “The SPMF open-source data mining library version 2,” in Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2016, pp. 36–40. 10.1007/978-3-319-46131-1_8.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Sciences: an International Journal
    Information Sciences: an International Journal  Volume 662, Issue C
    Mar 2024
    1436 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 25 June 2024

    Author Tags

    1. Clickstream pattern mining
    2. Pre-large concept
    3. Progressive search border
    4. Incremental pattern mining

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media