Abstract
With the growing volume of data, mining Frequent Itemsets remains of paramount importance. These have applications in various domains such as market basket analysis, clustering, classification, software bug detection web-mining to name a few. Over the recent years, several “data-structures” were employed to mine “frequent itemsets”. Unfortunately, many of them showed less efficiency in runtime or memory. This resulted in the design of Hybrid Frameworks that uses a combination of two or more data structures to extract frequent itemsets. This exploiting the benefits of different data structures while minimizing their drawbacks. This paper employs a tree-based data structure named as NegNodesets in collaboration with the list-based structure N-list for developing a novel Hybrid Framework for mining the frequent itemsets. NegNodesets have the advantage of employing bitmaps for generating a concise representation of itemsets. The N-list structure on the other hand depends on list based intersection operation for generating frequent itemsets, which is much faster than other conventional approaches. Transaction merging concept is utilized in this work to minimize the run time by merging several transactions into a single itemset. A switching criterion depends on the length of nodelist is used for switching between the algorithms. The efficacy of this approach has been enhanced by using a hash-based mechanism for generating the final set of frequent item sets. JAVA is the programming language used for coding the algorithms. The simulation analysis is carried out to know the efficacy of proposed approach in run time, memory consumption and compared with some existing approaches. From the comparative analysis, it is proved that the proposed NPLengthSwitch consumes lesser memory and run time than other techniques.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143. https://doi.org/10.1016/j.eswa.2018.03.041
Bhatt R, Dhall A, (2010) Skin segmentation dataset. UCI Machine Learning Repository.
Bui H, Vo B, Nguyen H, Nguyen-Hoang T-A, Hong T-P (2018) A weighted N-list-based method for mining frequent weighted itemsets. Expert Syst Appl 96:388–405. https://doi.org/10.1016/j.eswa.2017.10.039
Bui H, Vo B, Nguyen-Hoang T-A, Yun U (2021) Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy. Appl Intell 51(3):1439–1459. https://doi.org/10.1007/s10489-020-01899-7
Bustio-Martínez L, Letras-Luna M, Cumplido R, Hernández-León R, Feregrino-Uribe C, Bande-Serrano JM (2019) Using hashing and lexicographic order for frequent Itemsets mining on data streams. J Parallel Distrib Comput 125:58–71. https://doi.org/10.1016/j.jpdc.2018.11.002
Chen DD (2015) Online retail data set. UC Irvine Machine Learning Repository.
Chon K-W, Hwang S-H, Kim M-S (2018) GMiner: a fast GPU-based frequent itemset mining method for large-scale data. Inf Sci 439–440:19–38. https://doi.org/10.1016/j.ins.2018.01.046
Cui Y, Gan W, Lin H, Zheng W (2022) FRI-miner: fuzzy rare itemset mining. Appl Intell 52(3):3387–3402. https://doi.org/10.1007/s10489-021-02574-1
Davashi R (2021) UP-tree & UP-mine: a fast method based on upper bound for frequent pattern mining from uncertain data. Eng Appl Artif Intell 106:104477. https://doi.org/10.1016/j.engappai.2021.104477
Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827. https://doi.org/10.1007/s10489-017-0932-1
Deng Z-H, Lv S-L (2015) PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning. Expert Syst Appl 42(13):5424–5432. https://doi.org/10.1016/j.eswa.2015.03.004
Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148. https://doi.org/10.1016/j.knosys.2017.10.016
Gatuha G, Jiang T (2017) Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures. Turk J Electric Eng Comput Sci 25(3):2096–2107. https://doi.org/10.3906/elk-1602-113
Goyal P, Challa JS, Shrivastava S, Goyal N (2020) Anytime frequent Itemset Mining of Transactional Data Streams. Big Data Research 21:100146. https://doi.org/10.1016/j.bdr.2020.100146
Hebrail G, Berard A (2012 Aug 30) Individual household electric power consumption data set. É. d. France, Ed., ed: UCI Machine Learning Repository
Ilamchezhian J (2021) A novel approach for frequent Itemset mining using geometric progression number labeling. Turk J Comput Math Educ (TURCOMAT) 12(10):3529–3538
Jamsheela O, Raju GK (2021) Parallelization of frequent Itemset mining methods with FP-tree: an experiment with PrePost+ algorithm. Int Arab J Inf Technol 18(2): 208–213. https://doi.org/10.34028/iajit/18/2/9
Kaliappan J, Sai SM, Preetham KS (2019) Weblog and retail industries analysis using a robust modified Apriori algorithm. Int J Innov Technol Explor Eng 8(6):1727–1733
Lessanibahri S, Gastaldi L, González Fernández C (2020) A novel pruning algorithm for mining long and maximum length frequent itemsets. Expert Syst Appl 142:113004. https://doi.org/10.1016/j.eswa.2019.113004
Leung CK, Zhang H, Souza J, Lee W (2018) Scalable vertical mining for big data analytics of frequent itemsets. In: International conference on database and expert systems applications. Springer, Cham, pp 3–17. https://doi.org/10.1007/978-3-319-98809-2_1
Li Z, Chen F, Wu J, Liu Z, Liu W (2021) Efficient weighted probabilistic frequent itemset mining in uncertain databases. Expert Syst 38(5). https://doi.org/10.1111/exsy.12551
Lin JC-W, Ahmed U, Srivastava G, Wu JM-T, Hong T-P, Djenouri Y (2021) Linguistic frequent pattern mining using a compressed structure. Appl Intell 51(7):4806–4823. https://doi.org/10.1007/s10489-020-02080-w
Liu J, Ye Z, Yang X, Wang X, Shen L, Jiang X (2022) Efficient strategies for incremental mining of frequent closed itemsets over data streams. Expert Syst Appl 191:116220. https://doi.org/10.1016/j.eswa.2021.116220
Liu D, Li Y, Baskett W, Lin D, Shyu C-R (2022) RHPTree—risk hierarchical pattern tree for scalable long pattern mining. ACM Trans Knowl Discov Data 16(4):1–33. https://doi.org/10.1145/3488380
Nguyen HH (2019) Boosting frequent itemset mining via early stopping intersections. arXiv:190107773 [cs]
Nguyen T-N, Nguyen LTT, Vo B, Nguyen N-T, Nguyen TDD (2020) An N-list-based approach for mining frequent inter-transaction patterns. IEEE Access 8:116840–116855. https://doi.org/10.1109/ACCESS.2020.3004530
Ponmudiyan Poovan JS, Udupi DA, Subba Reddy NV (2022) A multithreaded hybrid framework for mining frequent itemsets. IJECE 12(3): 3249-3264. https://doi.org/10.11591/ijece.v12i3
Qu J-F, Hang B, Wu Z, Wu Z, Gu Q, Tang B (2020) Efficient Mining of Frequent Itemsets Using Only one Dynamic Prefix Tree. IEEE Access 8:183722–183735. https://doi.org/10.1109/ACCESS.2020.3029302
Rahman MM, Ahmed CF, Leung CK-S (2019) Mining weighted frequent sequences in uncertain databases. Inf Sci 479:76–100. https://doi.org/10.1016/j.ins.2018.11.026
Reiss A (April 2019) Stricker D, Pamap2 physical activity monitoring data set. Retrieved.
Siahaan APU, Ikhwan A, Aryza S (2018) A novelty of data mining for promoting education based on FP-growth algorithm.
Sohrabi MK, Taheri N (2018) A Hadoop-based parallel mining of frequent itemsets using N-lists. J Chin Inst Eng 41(3):229–238. https://doi.org/10.1080/02533839.2018.1454853
Son LH, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, Baik SW (2018) ARM–AMO: an efficient association rule mining algorithm based on animal migration optimization. Knowl-Based Syst 154:68–80. https://doi.org/10.1016/j.knosys.2018.04.038
Stolfo SJ, (1999) KDD cup 1999 dataset. UCI KDD repository. http://kdd.ics.uci.edu.
Vanahalli MK, Patil N (2019) An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets. Data Knowl Eng 123:101721. https://doi.org/10.1016/j.datak.2019.101721
Vo B, Bui H, Vo T, Le T (2020) Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy. Knowl-Based Syst 201–202:106064. https://doi.org/10.1016/j.knosys.2020.106064
Waghere SS, RajaRajeswari P, Ganesan V (2021) Retrieval of frequent itemset using improved mining algorithm in Hadoop. Singapore, 787–798. https://doi.org/10.1007/978-981-15-5148-2_68
Wang L, Meng J, Xu P, Peng K (2018) Mining temporal association rules with frequent itemsets tree. Appl Soft Comput 62:817–829. https://doi.org/10.1016/j.asoc.2017.09.013
Wang G, Cong G, Zhang Y, Hai Z, Ye J (2021) A synopsis based approach for Itemset frequency estimation over massive multi-transaction stream. ACM Trans Knowl Discov Data 16(2):29:1-29:30. https://doi.org/10.1145/3465238
Wu N, Zou Y, Shan C (2021) A frequent Itemset mining method based on local differential privacy. In International Conference on Web Information Systems and Applications Springer, 225-236. https://doi.org/10.1007/978-3-030-87571-8_20
Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2022) NTP-miner: nonoverlapping three-way sequential pattern mining. ACM Trans Knowl Discov Data 16(3):1–21. https://doi.org/10.1145/3480245
Xun Y, Zhang J, Yang H, Qin X (2021) HBPFP-DC: a parallel frequent itemset mining using spark. Parallel Comput 101:102738. https://doi.org/10.1016/j.parco.2020.102738
Yamamoto Y, Tabei Y, Iwanuma K (2020) PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data. J Intell Inf Syst 55(1):119–147. https://doi.org/10.1007/s10844-019-00590-9
Yimin M, Junhao G, Mwakapesa DS, Nanehkaran YA, Chi Z, Xiaoheng D, Zhigang C (2021) PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining. Multimedia Systems 27(4):709–722. https://doi.org/10.1007/s00530-020-00725-x
Zhang R, Chen W, Hsu T-C, Yang H, Chung Y-C (2019) ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining. J Supercomput 75(2):646–661. https://doi.org/10.1007/s11227-017-2049-z
Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) HashEclat: an efficient frequent itemset algorithm. Int J Mach Learn & Cyber 10(11):3003–3016. https://doi.org/10.1007/s13042-018-00918-x
Zhang C, Tian P, Zhang X, Jiang ZL, Yao L, Wang X (2019) Fast Eclat algorithms based on Minwise hashing for large scale transactions. IEEE Internet Things J 6(2):3948–3961. https://doi.org/10.1109/JIOT.2018.2885851
Zhao X, Ning SY (2021) Improved algorithm of multiple minimum support association rules based on can tree 206–213. https://doi.org/10.1007/978-3-030-92632-8_20
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jashma Suresh, P., Dinesh Acharya, U. & Reddy, N.S. Mining frequent Itemsets from transaction databases using hybrid switching framework. Multimed Tools Appl 82, 27571–27591 (2023). https://doi.org/10.1007/s11042-023-14484-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14484-0