Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining Top-k High On-shelf Utility Itemsets Using Novel Threshold Raising Strategies

Published: 26 March 2024 Publication History

Abstract

High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf-based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold (min_util) to find rules. However, specifying the appropriate min_util threshold is a difficult problem for users. A smaller min_util threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (Top-K high On-Shelf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound-based strategy to raise the internal min_util threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.

References

[1]
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB. 487–499.
[2]
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12(2009), 1708–1721. DOI:
[3]
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2011. HUC-Prune: An efficient candidate pruning technique to mine high utility patterns. Applied Intelligence 34, 2(2011), 181–198. DOI:
[4]
Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. 19–26. DOI:
[5]
Chun-Jung Chu, Vincent S. Tseng, and Tyne Liang. 2009. An efficient algorithm for mining high utility itemsets with negative item values in large databases. Applied Mathematics and Computation 215, 2 (2009), 767–778. DOI:
[6]
Thu-Lan Dam, Kenli Li, Philippe Fournier-Viger, and Quang-Huy Duong. 2017. An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowledge and Information Systems 52, 3(2017), 621–655. DOI:
[7]
Philippe Fournier-Viger, Jerry Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Zhihong Deng, and Hoang Thanh Lam. 2016. The SPMF Open-Source Data Mining Library Version 2. Springer International Publishing. DOI:
[8]
Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning. Springer International Publishing.
[9]
Philippe Fournier-Viger and Souleymane Zida. 2015. FOSHU: Faster On-shelf high utility itemset mining – with or without negative unit profit. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC’15). ACM, New York, NY, USA, 857–864. DOI:
[10]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM Sigmod Record. ACM, 1–12.
[11]
Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Expert Systems with Applications 42, 5 (2015), 2371–2381. DOI:
[12]
Srikumar Krishnamoorthy. 2017. Efficiently mining high utility itemsets with negative unit profits. Knowledge-Based Systems 145 (2017), 1–14. DOI:
[13]
Srikumar Krishnamoorthy. 2017. HMiner: Efficiently mining high utility itemsets. Expert Systems with Applications 90, Supplement C (2017), 168–183. DOI:
[14]
Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and Vincent S. Tseng. 2014. On-shelf utility mining with negative item values. Expert Systems with Applications 41, 7(2014), 3450–3459. DOI:
[15]
Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851–5857. DOI:
[16]
Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems 38, 1(2014), 85–107. DOI:
[17]
Jiuyong Li, Hong Shen, and Rodney Topor. 2002. Mining the optimal class association rule set. Knowledge-Based Systems 15, 7 (2002), 399–405. DOI:
[18]
Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowledge-Based Systems 111 (2016), 283–298. DOI:
[19]
J. Liu, K. Wang, and B. C. M. Fung. 2016. Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering 28, 5(2016), 1245–1257. DOI:
[20]
Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, USA, 55–64. DOI:
[21]
Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’05). 689–695.
[22]
Heungmo Ryang and Unil Yun. 2015. Top-k high utility pattern mining with effective threshold raising strategies. Knowledge-Based Systems 76 (2015), 109–126. DOI:
[23]
Kuldeep Singh, Ajay Kumar, Shashank Sheshar Singh, Harish Kumar Shakya, and Bhaskar Biswas. 2019. EHNL: An efficient algorithm for mining high utility itemsets with negative utility value and length constraints. Information Sciences 484 (2019), 44–70. DOI:
[24]
Kuldeep Singh, Harish Kumar Shakya, Abhimanyu Singh, and Bhaskar Biswas. 2018. Mining of high-utility itemsets with negative utility. Expert Systems 35, 6 (2018), e12296. DOI:
[25]
Kuldeep Singh, Shashank Sheshar Singh, Ajay Kumar, and Bhaskar Biswas. 2018. High utility itemsets mining with negative utility value: A survey. Journal of Intelligent and Fuzzy Systems 35, 6 (2018), 6551–6562. DOI:
[26]
Wei Song, Yu Liu, and Jinhong Li. 2014. BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining 10, 1(2014), 1–15. DOI:
[27]
Rui Sun, Meng Han, Chunyan Zhang, Mingyao Shen, and Shiyu Du. 2021. Mining of top-k high utility itemsets with negative utility. Journal of Intelligent and Fuzzy Systems 40, 3 (2021), 5637–5652. DOI:
[28]
Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8(2013), 1772–1786. DOI:
[29]
V. S. Tseng, C. W. Wu, P. Fournier-Viger, and P. S. Yu. 2016. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 28, 1(2016), 54–67. DOI:
[30]
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). ACM, New York, NY, USA, 253–262. DOI:
[31]
Xindong Wu, Chengqi Zhang, and Shichao Zhang. 2004. Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems 22, 3(2004), 381–405. DOI:
[32]
Hong Yao and Howard J. Hamilton. 2006. Mining itemset utilities from transaction databases. Data and Knowledge Engineering 59, 3(2006), 603–626. DOI:
[33]
Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861–3878. DOI:
[34]
Mohammed J. Zaki. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12, 3(2000), 372–390. DOI:
[35]
Shichao Zhang, Zifang Huang, Jilian Zhang, and Xiaofeng Zhu. 2008. Mining follow-up correlation patterns from time-related databases. Knowledge and Information Systems 14, 1 (2008), 81–100.
[36]
Shichao Zhang and Xindong Wu. 2001. Large scale data mining based on data partitioning. Applied Artificial Intelligence 15, 2 (2001), 129–139. DOI:
[37]
Shichao Zhang, Xindong Wu, Chengqi Zhang, and Jingli Lu. 2008. Computing the minimum-support for mining frequent patterns. Knowledge and Information Systems 15, 2 (2008), 233–257. DOI:
[38]
Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2017. EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowledge and Information Systems 51, 2 (2017), 595–625. DOI:

Cited By

View all
  • (2024)Correlated time-window constrained high-utility itemsets mining with certain and uncertain real-life datasetsMultimedia Tools and Applications10.1007/s11042-024-19715-6Online publication date: 4-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 5
June 2024
699 pages
EISSN:1556-472X
DOI:10.1145/3613659
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Online AM: 08 February 2024
Accepted: 19 January 2024
Revised: 12 November 2023
Received: 18 June 2023
Published in TKDD Volume 18, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. High on-shelf utility itemsets
  2. top-k itemsets
  3. utility mining
  4. on-shelf time periods
  5. negative utility
  6. branch & bound

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)138
  • Downloads (Last 6 weeks)11
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Correlated time-window constrained high-utility itemsets mining with certain and uncertain real-life datasetsMultimedia Tools and Applications10.1007/s11042-024-19715-6Online publication date: 4-Jul-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media