Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining frequent Itemsets from transaction databases using hybrid switching framework

Published: 16 February 2023 Publication History

Abstract

With the growing volume of data, mining Frequent Itemsets remains of paramount importance. These have applications in various domains such as market basket analysis, clustering, classification, software bug detection web-mining to name a few. Over the recent years, several “data-structures” were employed to mine “frequent itemsets”. Unfortunately, many of them showed less efficiency in runtime or memory. This resulted in the design of Hybrid Frameworks that uses a combination of two or more data structures to extract frequent itemsets. This exploiting the benefits of different data structures while minimizing their drawbacks. This paper employs a tree-based data structure named as NegNodesets in collaboration with the list-based structure N-list for developing a novel Hybrid Framework for mining the frequent itemsets. NegNodesets have the advantage of employing bitmaps for generating a concise representation of itemsets. The N-list structure on the other hand depends on list based intersection operation for generating frequent itemsets, which is much faster than other conventional approaches. Transaction merging concept is utilized in this work to minimize the run time by merging several transactions into a single itemset. A switching criterion depends on the length of nodelist is used for switching between the algorithms. The efficacy of this approach has been enhanced by using a hash-based mechanism for generating the final set of frequent item sets. JAVA is the programming language used for coding the algorithms. The simulation analysis is carried out to know the efficacy of proposed approach in run time, memory consumption and compared with some existing approaches. From the comparative analysis, it is proved that the proposed NPLengthSwitch consumes lesser memory and run time than other techniques.

References

[1]
Aryabarzan N, Minaei-Bidgoli B, and Teshnehlab M negFIN: an efficient algorithm for fast mining frequent itemsets Expert Syst Appl 2018 105 129-143
[2]
Bhatt R, Dhall A, (2010) Skin segmentation dataset. UCI Machine Learning Repository.
[3]
Bui H, Vo B, Nguyen H, Nguyen-Hoang T-A, and Hong T-P A weighted N-list-based method for mining frequent weighted itemsets Expert Syst Appl 2018 96 388-405
[4]
Bui H, Vo B, Nguyen-Hoang T-A, and Yun U Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy Appl Intell 2021 51 3 1439-1459
[5]
Bustio-Martínez L, Letras-Luna M, Cumplido R, Hernández-León R, Feregrino-Uribe C, and Bande-Serrano JM Using hashing and lexicographic order for frequent Itemsets mining on data streams J Parallel Distrib Comput 2019 125 58-71
[6]
Chen DD (2015) Online retail data set. UC Irvine Machine Learning Repository.
[7]
Chon K-W, Hwang S-H, and Kim M-S GMiner: a fast GPU-based frequent itemset mining method for large-scale data Inf Sci 2018 439–440 19-38
[8]
Cui Y, Gan W, Lin H, and Zheng W FRI-miner: fuzzy rare itemset mining Appl Intell 2022 52 3 3387-3402
[9]
Davashi R UP-tree & UP-mine: a fast method based on upper bound for frequent pattern mining from uncertain data Eng Appl Artif Intell 2021 106 104477
[10]
Dawar S, Goyal V, and Bera D A hybrid framework for mining high-utility itemsets in a sparse transaction database Appl Intell 2017 47 3 809-827
[11]
Deng Z-H and Lv S-L PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning Expert Syst Appl 2015 42 13 5424-5432
[12]
Djenouri Y, Belhadi A, and Fournier-Viger P Extracting useful knowledge from event logs: a frequent itemset mining approach Knowl-Based Syst 2018 139 132-148
[13]
Gatuha G and Jiang T Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures Turk J Electric Eng Comput Sci 2017 25 3 2096-2107
[14]
Goyal P, Challa JS, Shrivastava S, and Goyal N Anytime frequent Itemset Mining of Transactional Data Streams Big Data Research 2020 21 100146
[15]
Hebrail G, Berard A (2012 Aug 30) Individual household electric power consumption data set. É. d. France, Ed., ed: UCI Machine Learning Repository
[16]
Ilamchezhian J A novel approach for frequent Itemset mining using geometric progression number labeling Turk J Comput Math Educ (TURCOMAT) 2021 12 10 3529-3538
[17]
Jamsheela O, Raju GK (2021) Parallelization of frequent Itemset mining methods with FP-tree: an experiment with PrePost+ algorithm. Int Arab J Inf Technol 18(2): 208–213.
[18]
Kaliappan J, Sai SM, and Preetham KS Weblog and retail industries analysis using a robust modified Apriori algorithm Int J Innov Technol Explor Eng 2019 8 6 1727-1733
[19]
Lessanibahri S, Gastaldi L, and González Fernández C A novel pruning algorithm for mining long and maximum length frequent itemsets Expert Syst Appl 2020 142 113004
[20]
Leung CK, Zhang H, Souza J, and Lee W Scalable vertical mining for big data analytics of frequent itemsets International conference on database and expert systems applications 2018 Cham Springer 3-17
[21]
Li Z, Chen F, Wu J, Liu Z, Liu W (2021) Efficient weighted probabilistic frequent itemset mining in uncertain databases. Expert Syst 38(5).
[22]
Lin JC-W, Ahmed U, Srivastava G, Wu JM-T, Hong T-P, and Djenouri Y Linguistic frequent pattern mining using a compressed structure Appl Intell 2021 51 7 4806-4823
[23]
Liu J, Ye Z, Yang X, Wang X, Shen L, and Jiang X Efficient strategies for incremental mining of frequent closed itemsets over data streams Expert Syst Appl 2022 191 116220
[24]
Liu D, Li Y, Baskett W, Lin D, and Shyu C-R RHPTree—risk hierarchical pattern tree for scalable long pattern mining ACM Trans Knowl Discov Data 2022 16 4 1-33
[25]
Nguyen HH (2019) Boosting frequent itemset mining via early stopping intersections. arXiv:190107773 [cs]
[26]
Nguyen T-N, Nguyen LTT, Vo B, Nguyen N-T, and Nguyen TDD An N-list-based approach for mining frequent inter-transaction patterns IEEE Access 2020 8 116840-116855
[27]
Ponmudiyan Poovan JS, Udupi DA, Subba Reddy NV (2022) A multithreaded hybrid framework for mining frequent itemsets. IJECE 12(3): 3249-3264.
[28]
Qu J-F, Hang B, Wu Z, Wu Z, Gu Q, and Tang B Efficient Mining of Frequent Itemsets Using Only one Dynamic Prefix Tree IEEE Access 2020 8 183722-183735
[29]
Rahman MM, Ahmed CF, and Leung CK-S Mining weighted frequent sequences in uncertain databases Inf Sci 2019 479 76-100
[30]
Reiss A (April 2019) Stricker D, Pamap2 physical activity monitoring data set. Retrieved.
[31]
Siahaan APU, Ikhwan A, Aryza S (2018) A novelty of data mining for promoting education based on FP-growth algorithm.
[32]
Sohrabi MK and Taheri N A Hadoop-based parallel mining of frequent itemsets using N-lists J Chin Inst Eng 2018 41 3 229-238
[33]
Son LH, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, and Baik SW ARM–AMO: an efficient association rule mining algorithm based on animal migration optimization Knowl-Based Syst 2018 154 68-80
[34]
Stolfo SJ, (1999) KDD cup 1999 dataset. UCI KDD repository. http://kdd.ics.uci.edu.
[35]
Vanahalli MK and Patil N An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets Data Knowl Eng 2019 123 101721
[36]
Vo B, Bui H, Vo T, and Le T Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy Knowl-Based Syst 2020 201–202 106064
[37]
Waghere SS, RajaRajeswari P, Ganesan V (2021) Retrieval of frequent itemset using improved mining algorithm in Hadoop. Singapore, 787–798.
[38]
Wang L, Meng J, Xu P, and Peng K Mining temporal association rules with frequent itemsets tree Appl Soft Comput 2018 62 817-829
[39]
Wang G, Cong G, Zhang Y, Hai Z, Ye J (2021) A synopsis based approach for Itemset frequency estimation over massive multi-transaction stream. ACM Trans Knowl Discov Data 16(2):29:1-29:30.
[40]
Wu N, Zou Y, Shan C (2021) A frequent Itemset mining method based on local differential privacy. In International Conference on Web Information Systems and Applications Springer, 225-236.
[41]
Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, and Wu X NTP-miner: nonoverlapping three-way sequential pattern mining ACM Trans Knowl Discov Data 2022 16 3 1-21
[42]
Xun Y, Zhang J, Yang H, and Qin X HBPFP-DC: a parallel frequent itemset mining using spark Parallel Comput 2021 101 102738
[43]
Yamamoto Y, Tabei Y, and Iwanuma K PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data J Intell Inf Syst 2020 55 1 119-147
[44]
Yimin M, Junhao G, Mwakapesa DS, Nanehkaran YA, Chi Z, Xiaoheng D, and Zhigang C PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining Multimedia Systems 2021 27 4 709-722
[45]
Zhang R, Chen W, Hsu T-C, Yang H, and Chung Y-C ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining J Supercomput 2019 75 2 646-661
[46]
Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, and Wang X HashEclat: an efficient frequent itemset algorithm Int J Mach Learn & Cyber 2019 10 11 3003-3016
[47]
Zhang C, Tian P, Zhang X, Jiang ZL, Yao L, and Wang X Fast Eclat algorithms based on Minwise hashing for large scale transactions IEEE Internet Things J 2019 6 2 3948-3961
[48]
Zhao X, Ning SY (2021) Improved algorithm of multiple minimum support association rules based on can tree 206–213.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 82, Issue 18
Jul 2023
1551 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 16 February 2023
Accepted: 31 January 2023
Revision received: 21 April 2022
Received: 13 September 2021

Author Tags

  1. Data mining
  2. Frequent itemsets
  3. Data structures
  4. Hashing
  5. Optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media