Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Mining frequent Itemsets from transaction databases using hybrid switching framework

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the growing volume of data, mining Frequent Itemsets remains of paramount importance. These have applications in various domains such as market basket analysis, clustering, classification, software bug detection web-mining to name a few. Over the recent years, several “data-structures” were employed to mine “frequent itemsets”. Unfortunately, many of them showed less efficiency in runtime or memory. This resulted in the design of Hybrid Frameworks that uses a combination of two or more data structures to extract frequent itemsets. This exploiting the benefits of different data structures while minimizing their drawbacks. This paper employs a tree-based data structure named as NegNodesets in collaboration with the list-based structure N-list for developing a novel Hybrid Framework for mining the frequent itemsets. NegNodesets have the advantage of employing bitmaps for generating a concise representation of itemsets. The N-list structure on the other hand depends on list based intersection operation for generating frequent itemsets, which is much faster than other conventional approaches. Transaction merging concept is utilized in this work to minimize the run time by merging several transactions into a single itemset. A switching criterion depends on the length of nodelist is used for switching between the algorithms. The efficacy of this approach has been enhanced by using a hash-based mechanism for generating the final set of frequent item sets. JAVA is the programming language used for coding the algorithms. The simulation analysis is carried out to know the efficacy of proposed approach in run time, memory consumption and compared with some existing approaches. From the comparative analysis, it is proved that the proposed NPLengthSwitch consumes lesser memory and run time than other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Aryabarzan N, Minaei-Bidgoli B, Teshnehlab M (2018) negFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst Appl 105:129–143. https://doi.org/10.1016/j.eswa.2018.03.041

    Article  Google Scholar 

  2. Bhatt R, Dhall A, (2010) Skin segmentation dataset. UCI Machine Learning Repository.

  3. Bui H, Vo B, Nguyen H, Nguyen-Hoang T-A, Hong T-P (2018) A weighted N-list-based method for mining frequent weighted itemsets. Expert Syst Appl 96:388–405. https://doi.org/10.1016/j.eswa.2017.10.039

    Article  Google Scholar 

  4. Bui H, Vo B, Nguyen-Hoang T-A, Yun U (2021) Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy. Appl Intell 51(3):1439–1459. https://doi.org/10.1007/s10489-020-01899-7

    Article  Google Scholar 

  5. Bustio-Martínez L, Letras-Luna M, Cumplido R, Hernández-León R, Feregrino-Uribe C, Bande-Serrano JM (2019) Using hashing and lexicographic order for frequent Itemsets mining on data streams. J Parallel Distrib Comput 125:58–71. https://doi.org/10.1016/j.jpdc.2018.11.002

    Article  Google Scholar 

  6. Chen DD (2015) Online retail data set. UC Irvine Machine Learning Repository.

  7. Chon K-W, Hwang S-H, Kim M-S (2018) GMiner: a fast GPU-based frequent itemset mining method for large-scale data. Inf Sci 439–440:19–38. https://doi.org/10.1016/j.ins.2018.01.046

    Article  MathSciNet  Google Scholar 

  8. Cui Y, Gan W, Lin H, Zheng W (2022) FRI-miner: fuzzy rare itemset mining. Appl Intell 52(3):3387–3402. https://doi.org/10.1007/s10489-021-02574-1

    Article  Google Scholar 

  9. Davashi R (2021) UP-tree & UP-mine: a fast method based on upper bound for frequent pattern mining from uncertain data. Eng Appl Artif Intell 106:104477. https://doi.org/10.1016/j.engappai.2021.104477

    Article  Google Scholar 

  10. Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827. https://doi.org/10.1007/s10489-017-0932-1

    Article  Google Scholar 

  11. Deng Z-H, Lv S-L (2015) PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children–parent equivalence pruning. Expert Syst Appl 42(13):5424–5432. https://doi.org/10.1016/j.eswa.2015.03.004

    Article  Google Scholar 

  12. Djenouri Y, Belhadi A, Fournier-Viger P (2018) Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl-Based Syst 139:132–148. https://doi.org/10.1016/j.knosys.2017.10.016

    Article  Google Scholar 

  13. Gatuha G, Jiang T (2017) Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures. Turk J Electric Eng Comput Sci 25(3):2096–2107. https://doi.org/10.3906/elk-1602-113

    Article  Google Scholar 

  14. Goyal P, Challa JS, Shrivastava S, Goyal N (2020) Anytime frequent Itemset Mining of Transactional Data Streams. Big Data Research 21:100146. https://doi.org/10.1016/j.bdr.2020.100146

    Article  Google Scholar 

  15. Hebrail G, Berard A (2012 Aug 30) Individual household electric power consumption data set. É. d. France, Ed., ed: UCI Machine Learning Repository

  16. Ilamchezhian J (2021) A novel approach for frequent Itemset mining using geometric progression number labeling. Turk J Comput Math Educ (TURCOMAT) 12(10):3529–3538

    Google Scholar 

  17. Jamsheela O, Raju GK (2021) Parallelization of frequent Itemset mining methods with FP-tree: an experiment with PrePost+ algorithm. Int Arab J Inf Technol 18(2): 208–213. https://doi.org/10.34028/iajit/18/2/9

  18. Kaliappan J, Sai SM, Preetham KS (2019) Weblog and retail industries analysis using a robust modified Apriori algorithm. Int J Innov Technol Explor Eng 8(6):1727–1733

    Google Scholar 

  19. Lessanibahri S, Gastaldi L, González Fernández C (2020) A novel pruning algorithm for mining long and maximum length frequent itemsets. Expert Syst Appl 142:113004. https://doi.org/10.1016/j.eswa.2019.113004

    Article  Google Scholar 

  20. Leung CK, Zhang H, Souza J, Lee W (2018) Scalable vertical mining for big data analytics of frequent itemsets. In: International conference on database and expert systems applications. Springer, Cham, pp 3–17. https://doi.org/10.1007/978-3-319-98809-2_1

    Chapter  Google Scholar 

  21. Li Z, Chen F, Wu J, Liu Z, Liu W (2021) Efficient weighted probabilistic frequent itemset mining in uncertain databases. Expert Syst 38(5). https://doi.org/10.1111/exsy.12551

  22. Lin JC-W, Ahmed U, Srivastava G, Wu JM-T, Hong T-P, Djenouri Y (2021) Linguistic frequent pattern mining using a compressed structure. Appl Intell 51(7):4806–4823. https://doi.org/10.1007/s10489-020-02080-w

    Article  Google Scholar 

  23. Liu J, Ye Z, Yang X, Wang X, Shen L, Jiang X (2022) Efficient strategies for incremental mining of frequent closed itemsets over data streams. Expert Syst Appl 191:116220. https://doi.org/10.1016/j.eswa.2021.116220

    Article  Google Scholar 

  24. Liu D, Li Y, Baskett W, Lin D, Shyu C-R (2022) RHPTree—risk hierarchical pattern tree for scalable long pattern mining. ACM Trans Knowl Discov Data 16(4):1–33. https://doi.org/10.1145/3488380

    Article  Google Scholar 

  25. Nguyen HH (2019) Boosting frequent itemset mining via early stopping intersections. arXiv:190107773 [cs]

  26. Nguyen T-N, Nguyen LTT, Vo B, Nguyen N-T, Nguyen TDD (2020) An N-list-based approach for mining frequent inter-transaction patterns. IEEE Access 8:116840–116855. https://doi.org/10.1109/ACCESS.2020.3004530

    Article  Google Scholar 

  27. Ponmudiyan Poovan JS, Udupi DA, Subba Reddy NV (2022) A multithreaded hybrid framework for mining frequent itemsets. IJECE 12(3): 3249-3264. https://doi.org/10.11591/ijece.v12i3

  28. Qu J-F, Hang B, Wu Z, Wu Z, Gu Q, Tang B (2020) Efficient Mining of Frequent Itemsets Using Only one Dynamic Prefix Tree. IEEE Access 8:183722–183735. https://doi.org/10.1109/ACCESS.2020.3029302

    Article  Google Scholar 

  29. Rahman MM, Ahmed CF, Leung CK-S (2019) Mining weighted frequent sequences in uncertain databases. Inf Sci 479:76–100. https://doi.org/10.1016/j.ins.2018.11.026

    Article  Google Scholar 

  30. Reiss A (April 2019) Stricker D, Pamap2 physical activity monitoring data set. Retrieved.

  31. Siahaan APU, Ikhwan A, Aryza S (2018) A novelty of data mining for promoting education based on FP-growth algorithm.

  32. Sohrabi MK, Taheri N (2018) A Hadoop-based parallel mining of frequent itemsets using N-lists. J Chin Inst Eng 41(3):229–238. https://doi.org/10.1080/02533839.2018.1454853

    Article  Google Scholar 

  33. Son LH, Chiclana F, Kumar R, Mittal M, Khari M, Chatterjee JM, Baik SW (2018) ARM–AMO: an efficient association rule mining algorithm based on animal migration optimization. Knowl-Based Syst 154:68–80. https://doi.org/10.1016/j.knosys.2018.04.038

    Article  Google Scholar 

  34. Stolfo SJ, (1999) KDD cup 1999 dataset. UCI KDD repository. http://kdd.ics.uci.edu.

  35. Vanahalli MK, Patil N (2019) An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets. Data Knowl Eng 123:101721. https://doi.org/10.1016/j.datak.2019.101721

    Article  Google Scholar 

  36. Vo B, Bui H, Vo T, Le T (2020) Mining top-rank-k frequent weighted itemsets using WN-list structures and an early pruning strategy. Knowl-Based Syst 201–202:106064. https://doi.org/10.1016/j.knosys.2020.106064

    Article  Google Scholar 

  37. Waghere SS, RajaRajeswari P, Ganesan V (2021) Retrieval of frequent itemset using improved mining algorithm in Hadoop. Singapore, 787–798. https://doi.org/10.1007/978-981-15-5148-2_68

  38. Wang L, Meng J, Xu P, Peng K (2018) Mining temporal association rules with frequent itemsets tree. Appl Soft Comput 62:817–829. https://doi.org/10.1016/j.asoc.2017.09.013

    Article  Google Scholar 

  39. Wang G, Cong G, Zhang Y, Hai Z, Ye J (2021) A synopsis based approach for Itemset frequency estimation over massive multi-transaction stream. ACM Trans Knowl Discov Data 16(2):29:1-29:30. https://doi.org/10.1145/3465238

  40. Wu N, Zou Y, Shan C (2021) A frequent Itemset mining method based on local differential privacy. In International Conference on Web Information Systems and Applications Springer, 225-236. https://doi.org/10.1007/978-3-030-87571-8_20

  41. Wu Y, Luo L, Li Y, Guo L, Fournier-Viger P, Zhu X, Wu X (2022) NTP-miner: nonoverlapping three-way sequential pattern mining. ACM Trans Knowl Discov Data 16(3):1–21. https://doi.org/10.1145/3480245

    Article  Google Scholar 

  42. Xun Y, Zhang J, Yang H, Qin X (2021) HBPFP-DC: a parallel frequent itemset mining using spark. Parallel Comput 101:102738. https://doi.org/10.1016/j.parco.2020.102738

    Article  MathSciNet  Google Scholar 

  43. Yamamoto Y, Tabei Y, Iwanuma K (2020) PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data. J Intell Inf Syst 55(1):119–147. https://doi.org/10.1007/s10844-019-00590-9

    Article  Google Scholar 

  44. Yimin M, Junhao G, Mwakapesa DS, Nanehkaran YA, Chi Z, Xiaoheng D, Zhigang C (2021) PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining. Multimedia Systems 27(4):709–722. https://doi.org/10.1007/s00530-020-00725-x

    Article  Google Scholar 

  45. Zhang R, Chen W, Hsu T-C, Yang H, Chung Y-C (2019) ANG: a combination of Apriori and graph computing techniques for frequent itemsets mining. J Supercomput 75(2):646–661. https://doi.org/10.1007/s11227-017-2049-z

    Article  Google Scholar 

  46. Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, Wang X (2019) HashEclat: an efficient frequent itemset algorithm. Int J Mach Learn & Cyber 10(11):3003–3016. https://doi.org/10.1007/s13042-018-00918-x

    Article  Google Scholar 

  47. Zhang C, Tian P, Zhang X, Jiang ZL, Yao L, Wang X (2019) Fast Eclat algorithms based on Minwise hashing for large scale transactions. IEEE Internet Things J 6(2):3948–3961. https://doi.org/10.1109/JIOT.2018.2885851

    Article  Google Scholar 

  48. Zhao X, Ning SY (2021) Improved algorithm of multiple minimum support association rules based on can tree 206–213. https://doi.org/10.1007/978-3-030-92632-8_20

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U Dinesh Acharya.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jashma Suresh, P., Dinesh Acharya, U. & Reddy, N.S. Mining frequent Itemsets from transaction databases using hybrid switching framework. Multimed Tools Appl 82, 27571–27591 (2023). https://doi.org/10.1007/s11042-023-14484-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14484-0

Keywords