A high utility itemset mining algorithm based on subsume index

Song, Wei; Zhang, Zihan; Li, Jinhong

doi:10.1007/s10115-015-0900-1

A high utility itemset mining algorithm based on subsume index

Regular Paper
Published: 09 December 2015

Volume 49, pages 315–340, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wei Song¹,
Zihan Zhang¹ &
Jinhong Li¹

432 Accesses
Explore all metrics

Abstract

High utility itemset mining addresses the limitations of frequent itemset mining by introducing measures of interestingness that reflect the significance of an itemset beyond its frequency of occurrence. Among such algorithms, level-wise candidate generation-and-test approaches suffer from the drawbacks of having an immense candidate pool and requiring several database scans. Meanwhile, methods based on pattern growth tend to consume large amounts of memory to store conditional trees. We propose an efficient algorithm, called Index High Utility Itemsets Mine (IHUI-Mine), for application to high utility itemsets. The subsume index, which has been employed to mine frequent itemsets, is extended in IHUI-Mine to the discovery of high utility itemsets. In addition to the enumeration and search strategies inherited from the subsume index, we introduce a new property to specifically accelerate the computation of transaction-weighted utilization for high utility itemsets. Furthermore, given that bitmaps are used for database representation, the real utility of candidates can be verified from the recorded transactions rather than by resorting to the entire database. The computational complexity of IHUI-Mine is analyzed, and tests conducted on publicly available synthetic and real datasets further demonstrate that the proposed algorithm outperforms existing state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

An efficient structure for fast mining high utility itemsets

Article 08 February 2018

References

Achar A, Laxman S, Sastry PS (2012) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31(2):223–250
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, Washington DC, May 1993, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference on very large data bases, Morgan Kaufmann, Santiago de Chile, Chile, September 1994, pp 487–499
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Article Google Scholar
Azevedo PJ, Jorge AM (2010) Ensembles of jittered association rule classifiers. Data Min Knowl Discov 21(1):91–129
Article MathSciNet Google Scholar
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Discov 7(2):153–185
Article MathSciNet Google Scholar
Chan R, Yang Q, Shen Y-D (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining, IEEE Computer Society, Melbourne, Florida, USA, December 2003, pp 19–26
Chen J, Xiao K (2010) BISC: a bitmap itemset support counting approach for efficient frequent itemset mining. ACM Trans Knowl Discov Data 4(3). doi:10.1145/1839490.1839493
Erwin A, Gopalan RP, Achuthan NR (2007) CTU-Mine: An efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE international conference on computer and information technology, IEEE Computer Society, University of Aizu, Fukushima, Japan, October 2007, pp 71–76
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Article MathSciNet Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Article Google Scholar
Li H-F, Huang H-Y, Lee S-Y (2011) Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28(3):495–522
Article Google Scholar
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Lin M-Y, Tu T-F, Hsueh S-C (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14
Article Google Scholar
Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, Hanoi, Vietnam, May 2005. Lecture Notes in Computer Science 3518, Springer, Berlin, pp 689–695
Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A, Memik G (2015) NU-MineBench version 2.0 data set and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBenchDownload.html
Qiao M, Zhang D (2012) Efficiently matching frequent patterns based on bitmap inverted files built from closed Itemsets. Int J Artif Intell Tools 21(3). doi:10.1142/S021821301250011X
Shelokar P, Quirin A, Cordón O (2013) MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst 34(1):75–108
Article Google Scholar
Song W, Liu Y, Li JH (2012) Vertical mining for high utility itemsets. In: Proceedings of 2012 IEEE international conference on granular computing, IEEE Computer Society, Hangzhou, China, August 2012, pp 512–517
Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl Based Syst 21(6):507–513
Article Google Scholar
Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Vo B, Coenen F, Le T, Hong T-P (2013) A hybrid approach for mining frequent itemsets. In: Proceedings of 2013 IEEE international conference on systems, man and cybernetics, Manchester, UK, October 2013, pp 4647–4651
Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Article Google Scholar
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Article Google Scholar
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM international conference on data mining, SIAM, Lake Buena Vista, Florida, USA, April 2004, pp 482–486
Yen S-J, Lee Y-S (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738
Article MathSciNet Google Scholar
Zaki MJ (2014) Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge
MATH Google Scholar
Zhang S, Zhang J, Zhu X, Huang Z (2006) Identifying follow-correlation itemset-pairs. In: Proceedings of the 6th IEEE international conference on data mining, IEEE Computer Society, Hong Kong, China, December 2006, pp 765–774

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was partly supported by the National Natural Science Foundation of China (Grant 61105045) and North China University of Technology (Grant CCXZ201303).

Author information

Authors and Affiliations

School of Computer Science, North China University of Technology, Beijing, China
Wei Song, Zihan Zhang & Jinhong Li

Authors

Wei Song
View author publications
You can also search for this author in PubMed Google Scholar
Zihan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jinhong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Song.

Appendices

Appendix 1: Proof of Property 4.1

For $\forall X\subseteq subsume(i)$, we have the following two cases.

(1) The case that $X=\emptyset $. Thus, $i\cup X= i$, so we have $TWU(i)=TWU(i\cup X)$.

(2) The case that $X\ne \emptyset $. Suppose $X= b_1b_2\ldots b_k$, where $b_j\in {{\mathbf {I}}}(1\le j\le k)$. Since $X\subseteq subsume(i)$ for $\forall b_j\in X$, we have $b_j\in subsume(i)$. According to Definition 4.1, $g(i)\subseteq g(b_j)$. Thus, $g(i\cup X)=g(i)\cap g(X)=g(i)\cap g(b_1)\cap g(b_2)\cap \ldots \cap g(b_k)=g(i)$, and we have $TWU(i)=TWU(i\cup X)$.

Appendix 2: Proof of Property 4.2

We prove the property by contradiction. Suppose there exists $j\notin subsume(i)$, such that $i\cup j$ is an HTWUI. Since the size of the set returned by the function g(X) (described in Sect. 4.1) decreases as the size of the itemset X increases, and $i\subseteq i\cup j$, we have $g(i\cup j)\subseteq g(i)$, i.e., $TWU(i\cup j)\le TWU(i)$. Assume $TWU(i\cup j)=TWU(i)$, this means $g(i)=g(i\cup j)=g(i)\cap g(j)$, which leads to $g(i)\subseteq g(j)$. According to Definition 4.1, we can easily find that $j\in subsume(i)$, which contradicts the hypothesis. Thus, $TWU(i\cup j)<TWU(i) = min\_util$.

Appendix 3: Proof of Property 4.3

For the left side of equation:

$$\begin{aligned} {\textit{TWU}}(X\cup i) =\sum _{t\in (g(X\cup i))}{} \textit{TU}(t)=\sum _{t\in (g(X)\cap g(i))}{} \textit{TU}(t) \end{aligned}$$

For the right side of equation:

$$\begin{aligned}&{\textit{TWU}}(X)-\sum _{t\in (g(X)-g(i))}{} \textit{TU}(t)\\&\quad =\sum _{t\in g(X)}{} \textit{TU}(t)-\sum _{t\in (g(X)-g(i))}{} \textit{TU}(t)\\&\quad =\sum _{t\in (g(X)-(g(X)-g(i)))}{} \textit{TU}(t) =\sum _{t\in (g(X)\cap g(i))}{} \textit{TU}(t) \end{aligned}$$

So we have

$$\begin{aligned} {\textit{TWU}}(X\cup i)= {\textit{TWU}}(X)-\sum _{t\in (g(X)-g(i))}{} \textit{TU}(t). \end{aligned}$$

About this article

Cite this article

Song, W., Zhang, Z. & Li, J. A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49, 315–340 (2016). https://doi.org/10.1007/s10115-015-0900-1

Download citation

Received: 16 September 2014
Revised: 12 September 2015
Accepted: 13 November 2015
Published: 09 December 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10115-015-0900-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A high utility itemset mining algorithm based on subsume index

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

An efficient structure for fast mining high utility itemsets

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Property 4.1

Appendix 2: Proof of Property 4.2

Appendix 3: Proof of Property 4.3

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A high utility itemset mining algorithm based on subsume index

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

An efficient structure for fast mining high utility itemsets

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Property 4.1

Appendix 2: Proof of Property 4.2

Appendix 3: Proof of Property 4.3

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation