Abstract
High utility itemset mining addresses the limitations of frequent itemset mining by introducing measures of interestingness that reflect the significance of an itemset beyond its frequency of occurrence. Among such algorithms, level-wise candidate generation-and-test approaches suffer from the drawbacks of having an immense candidate pool and requiring several database scans. Meanwhile, methods based on pattern growth tend to consume large amounts of memory to store conditional trees. We propose an efficient algorithm, called Index High Utility Itemsets Mine (IHUI-Mine), for application to high utility itemsets. The subsume index, which has been employed to mine frequent itemsets, is extended in IHUI-Mine to the discovery of high utility itemsets. In addition to the enumeration and search strategies inherited from the subsume index, we introduce a new property to specifically accelerate the computation of transaction-weighted utilization for high utility itemsets. Furthermore, given that bitmaps are used for database representation, the real utility of candidates can be verified from the recorded transactions rather than by resorting to the entire database. The computational complexity of IHUI-Mine is analyzed, and tests conducted on publicly available synthetic and real datasets further demonstrate that the proposed algorithm outperforms existing state-of-the-art algorithms.
Similar content being viewed by others
References
Achar A, Laxman S, Sastry PS (2012) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31(2):223–250
Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, Washington DC, May 1993, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference on very large data bases, Morgan Kaufmann, Santiago de Chile, Chile, September 1994, pp 487–499
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Azevedo PJ, Jorge AM (2010) Ensembles of jittered association rule classifiers. Data Min Knowl Discov 21(1):91–129
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Discov 7(2):153–185
Chan R, Yang Q, Shen Y-D (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining, IEEE Computer Society, Melbourne, Florida, USA, December 2003, pp 19–26
Chen J, Xiao K (2010) BISC: a bitmap itemset support counting approach for efficient frequent itemset mining. ACM Trans Knowl Discov Data 4(3). doi:10.1145/1839490.1839493
Erwin A, Gopalan RP, Achuthan NR (2007) CTU-Mine: An efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE international conference on computer and information technology, IEEE Computer Society, University of Aizu, Fukushima, Japan, October 2007, pp 71–76
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Li H-F, Huang H-Y, Lee S-Y (2011) Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28(3):495–522
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Lin M-Y, Tu T-F, Hsueh S-C (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14
Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, Hanoi, Vietnam, May 2005. Lecture Notes in Computer Science 3518, Springer, Berlin, pp 689–695
Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A, Memik G (2015) NU-MineBench version 2.0 data set and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBenchDownload.html
Qiao M, Zhang D (2012) Efficiently matching frequent patterns based on bitmap inverted files built from closed Itemsets. Int J Artif Intell Tools 21(3). doi:10.1142/S021821301250011X
Shelokar P, Quirin A, Cordón O (2013) MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst 34(1):75–108
Song W, Liu Y, Li JH (2012) Vertical mining for high utility itemsets. In: Proceedings of 2012 IEEE international conference on granular computing, IEEE Computer Society, Hangzhou, China, August 2012, pp 512–517
Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl Based Syst 21(6):507–513
Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Vo B, Coenen F, Le T, Hong T-P (2013) A hybrid approach for mining frequent itemsets. In: Proceedings of 2013 IEEE international conference on systems, man and cybernetics, Manchester, UK, October 2013, pp 4647–4651
Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM international conference on data mining, SIAM, Lake Buena Vista, Florida, USA, April 2004, pp 482–486
Yen S-J, Lee Y-S (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738
Zaki MJ (2014) Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge
Zhang S, Zhang J, Zhu X, Huang Z (2006) Identifying follow-correlation itemset-pairs. In: Proceedings of the 6th IEEE international conference on data mining, IEEE Computer Society, Hong Kong, China, December 2006, pp 765–774
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work was partly supported by the National Natural Science Foundation of China (Grant 61105045) and North China University of Technology (Grant CCXZ201303).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Property 4.1
For \(\forall X\subseteq subsume(i)\), we have the following two cases.
(1) The case that \(X=\emptyset \). Thus, \(i\cup X= i\), so we have \(TWU(i)=TWU(i\cup X)\).
(2) The case that \(X\ne \emptyset \). Suppose \(X= b_1b_2\ldots b_k\), where \(b_j\in {{\mathbf {I}}}(1\le j\le k)\). Since \(X\subseteq subsume(i)\) for \(\forall b_j\in X\), we have \(b_j\in subsume(i)\). According to Definition 4.1, \(g(i)\subseteq g(b_j)\). Thus, \(g(i\cup X)=g(i)\cap g(X)=g(i)\cap g(b_1)\cap g(b_2)\cap \ldots \cap g(b_k)=g(i)\), and we have \(TWU(i)=TWU(i\cup X)\).
Appendix 2: Proof of Property 4.2
We prove the property by contradiction. Suppose there exists \(j\notin subsume(i)\), such that \(i\cup j\) is an HTWUI. Since the size of the set returned by the function g(X) (described in Sect. 4.1) decreases as the size of the itemset X increases, and \(i\subseteq i\cup j\), we have \(g(i\cup j)\subseteq g(i)\), i.e., \(TWU(i\cup j)\le TWU(i)\). Assume \(TWU(i\cup j)=TWU(i)\), this means \(g(i)=g(i\cup j)=g(i)\cap g(j)\), which leads to \(g(i)\subseteq g(j)\). According to Definition 4.1, we can easily find that \(j\in subsume(i)\), which contradicts the hypothesis. Thus, \(TWU(i\cup j)<TWU(i) = min\_util\).
Appendix 3: Proof of Property 4.3
For the left side of equation:
For the right side of equation:
So we have
About this article
Cite this article
Song, W., Zhang, Z. & Li, J. A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49, 315–340 (2016). https://doi.org/10.1007/s10115-015-0900-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0900-1