Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A high utility itemset mining algorithm based on subsume index

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

High utility itemset mining addresses the limitations of frequent itemset mining by introducing measures of interestingness that reflect the significance of an itemset beyond its frequency of occurrence. Among such algorithms, level-wise candidate generation-and-test approaches suffer from the drawbacks of having an immense candidate pool and requiring several database scans. Meanwhile, methods based on pattern growth tend to consume large amounts of memory to store conditional trees. We propose an efficient algorithm, called Index High Utility Itemsets Mine (IHUI-Mine), for application to high utility itemsets. The subsume index, which has been employed to mine frequent itemsets, is extended in IHUI-Mine to the discovery of high utility itemsets. In addition to the enumeration and search strategies inherited from the subsume index, we introduce a new property to specifically accelerate the computation of transaction-weighted utilization for high utility itemsets. Furthermore, given that bitmaps are used for database representation, the real utility of candidates can be verified from the recorded transactions rather than by resorting to the entire database. The computational complexity of IHUI-Mine is analyzed, and tests conducted on publicly available synthetic and real datasets further demonstrate that the proposed algorithm outperforms existing state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Achar A, Laxman S, Sastry PS (2012) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31(2):223–250

    Article  Google Scholar 

  2. Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, Washington DC, May 1993, pp 207–216

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference on very large data bases, Morgan Kaufmann, Santiago de Chile, Chile, September 1994, pp 487–499

  4. Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721

    Article  Google Scholar 

  5. Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198

    Article  Google Scholar 

  6. Azevedo PJ, Jorge AM (2010) Ensembles of jittered association rule classifiers. Data Min Knowl Discov 21(1):91–129

    Article  MathSciNet  Google Scholar 

  7. Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Discov 7(2):153–185

    Article  MathSciNet  Google Scholar 

  8. Chan R, Yang Q, Shen Y-D (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining, IEEE Computer Society, Melbourne, Florida, USA, December 2003, pp 19–26

  9. Chen J, Xiao K (2010) BISC: a bitmap itemset support counting approach for efficient frequent itemset mining. ACM Trans Knowl Discov Data 4(3). doi:10.1145/1839490.1839493

  10. Erwin A, Gopalan RP, Achuthan NR (2007) CTU-Mine: An efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE international conference on computer and information technology, IEEE Computer Society, University of Aizu, Fukushima, Japan, October 2007, pp 71–76

  11. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86

    Article  MathSciNet  Google Scholar 

  12. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  13. Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107

    Article  Google Scholar 

  14. Li H-F, Huang H-Y, Lee S-Y (2011) Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28(3):495–522

    Article  Google Scholar 

  15. Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217

    Article  Google Scholar 

  16. Lin M-Y, Tu T-F, Hsueh S-C (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14

    Article  Google Scholar 

  17. Liu Y, Liao W-K, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, Hanoi, Vietnam, May 2005. Lecture Notes in Computer Science 3518, Springer, Berlin, pp 689–695

  18. Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao WK, Choudhary A, Memik G (2015) NU-MineBench version 2.0 data set and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBenchDownload.html

  19. Qiao M, Zhang D (2012) Efficiently matching frequent patterns based on bitmap inverted files built from closed Itemsets. Int J Artif Intell Tools 21(3). doi:10.1142/S021821301250011X

  20. Shelokar P, Quirin A, Cordón O (2013) MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining. Knowl Inf Syst 34(1):75–108

    Article  Google Scholar 

  21. Song W, Liu Y, Li JH (2012) Vertical mining for high utility itemsets. In: Proceedings of 2012 IEEE international conference on granular computing, IEEE Computer Society, Hangzhou, China, August 2012, pp 512–517

  22. Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl Based Syst 21(6):507–513

    Article  Google Scholar 

  23. Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

    Article  Google Scholar 

  24. Vo B, Coenen F, Le T, Hong T-P (2013) A hybrid approach for mining frequent itemsets. In: Proceedings of 2013 IEEE international conference on systems, man and cybernetics, Manchester, UK, October 2013, pp 4647–4651

  25. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107

    Article  Google Scholar 

  26. Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626

    Article  Google Scholar 

  27. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM international conference on data mining, SIAM, Lake Buena Vista, Florida, USA, April 2004, pp 482–486

  28. Yen S-J, Lee Y-S (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738

    Article  MathSciNet  Google Scholar 

  29. Zaki MJ (2014) Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  30. Zhang S, Zhang J, Zhu X, Huang Z (2006) Identifying follow-correlation itemset-pairs. In: Proceedings of the 6th IEEE international conference on data mining, IEEE Computer Society, Hong Kong, China, December 2006, pp 765–774

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was partly supported by the National Natural Science Foundation of China (Grant 61105045) and North China University of Technology (Grant CCXZ201303).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Song.

Appendices

Appendix 1: Proof of Property 4.1

For \(\forall X\subseteq subsume(i)\), we have the following two cases.

(1) The case that \(X=\emptyset \). Thus, \(i\cup X= i\), so we have \(TWU(i)=TWU(i\cup X)\).

(2) The case that \(X\ne \emptyset \). Suppose \(X= b_1b_2\ldots b_k\), where \(b_j\in {{\mathbf {I}}}(1\le j\le k)\). Since \(X\subseteq subsume(i)\) for \(\forall b_j\in X\), we have \(b_j\in subsume(i)\). According to Definition 4.1, \(g(i)\subseteq g(b_j)\). Thus, \(g(i\cup X)=g(i)\cap g(X)=g(i)\cap g(b_1)\cap g(b_2)\cap \ldots \cap g(b_k)=g(i)\), and we have \(TWU(i)=TWU(i\cup X)\).

Appendix 2: Proof of Property 4.2

We prove the property by contradiction. Suppose there exists \(j\notin subsume(i)\), such that \(i\cup j\) is an HTWUI. Since the size of the set returned by the function g(X) (described in Sect. 4.1) decreases as the size of the itemset X increases, and \(i\subseteq i\cup j\), we have \(g(i\cup j)\subseteq g(i)\), i.e., \(TWU(i\cup j)\le TWU(i)\). Assume \(TWU(i\cup j)=TWU(i)\), this means \(g(i)=g(i\cup j)=g(i)\cap g(j)\), which leads to \(g(i)\subseteq g(j)\). According to Definition 4.1, we can easily find that \(j\in subsume(i)\), which contradicts the hypothesis. Thus, \(TWU(i\cup j)<TWU(i) = min\_util\).

Appendix 3: Proof of Property 4.3

For the left side of equation:

$$\begin{aligned} {\textit{TWU}}(X\cup i) =\sum _{t\in (g(X\cup i))}{} \textit{TU}(t)=\sum _{t\in (g(X)\cap g(i))}{} \textit{TU}(t) \end{aligned}$$

For the right side of equation:

$$\begin{aligned}&{\textit{TWU}}(X)-\sum _{t\in (g(X)-g(i))}{} \textit{TU}(t)\\&\quad =\sum _{t\in g(X)}{} \textit{TU}(t)-\sum _{t\in (g(X)-g(i))}{} \textit{TU}(t)\\&\quad =\sum _{t\in (g(X)-(g(X)-g(i)))}{} \textit{TU}(t) =\sum _{t\in (g(X)\cap g(i))}{} \textit{TU}(t) \end{aligned}$$

So we have

$$\begin{aligned} {\textit{TWU}}(X\cup i)= {\textit{TWU}}(X)-\sum _{t\in (g(X)-g(i))}{} \textit{TU}(t). \end{aligned}$$

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, W., Zhang, Z. & Li, J. A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49, 315–340 (2016). https://doi.org/10.1007/s10115-015-0900-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0900-1

Keywords