research-article

Mining Top-k High On-shelf Utility Itemsets Using Novel Threshold Raising Strategies

Authors:

Bhaskar BiswasAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 5

Article No.: 131, Pages 1 - 23

https://doi.org/10.1145/3645115

Published: 26 March 2024 Publication History

Abstract

High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf-based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold (min_util) to find rules. However, specifying the appropriate min_util threshold is a difficult problem for users. A smaller min_util threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (Top-K high On-Shelf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound-based strategy to raise the internal min_util threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.

References

[1]

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB. 487–499.

Digital Library

[2]

Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12(2009), 1708–1721. DOI:

Digital Library

[3]

Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2011. HUC-Prune: An efficient candidate pruning technique to mine high utility patterns. Applied Intelligence 34, 2(2011), 181–198. DOI:

Digital Library

[4]

Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. 19–26. DOI:

[5]

Chun-Jung Chu, Vincent S. Tseng, and Tyne Liang. 2009. An efficient algorithm for mining high utility itemsets with negative item values in large databases. Applied Mathematics and Computation 215, 2 (2009), 767–778. DOI:

Digital Library

[6]

Thu-Lan Dam, Kenli Li, Philippe Fournier-Viger, and Quang-Huy Duong. 2017. An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowledge and Information Systems 52, 3(2017), 621–655. DOI:

Digital Library

[7]

Philippe Fournier-Viger, Jerry Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Zhihong Deng, and Hoang Thanh Lam. 2016. The SPMF Open-Source Data Mining Library Version 2. Springer International Publishing. DOI:

[8]

Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning. Springer International Publishing.

[9]

Philippe Fournier-Viger and Souleymane Zida. 2015. FOSHU: Faster On-shelf high utility itemset mining – with or without negative unit profit. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC’15). ACM, New York, NY, USA, 857–864. DOI:

Digital Library

[10]

Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In Proceedings of the ACM Sigmod Record. ACM, 1–12.

Digital Library

[11]

Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Expert Systems with Applications 42, 5 (2015), 2371–2381. DOI:

Digital Library

[12]

Srikumar Krishnamoorthy. 2017. Efficiently mining high utility itemsets with negative unit profits. Knowledge-Based Systems 145 (2017), 1–14. DOI:

Digital Library

[13]

Srikumar Krishnamoorthy. 2017. HMiner: Efficiently mining high utility itemsets. Expert Systems with Applications 90, Supplement C (2017), 168–183. DOI:

Digital Library

[14]

Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and Vincent S. Tseng. 2014. On-shelf utility mining with negative item values. Expert Systems with Applications 41, 7(2014), 3450–3459. DOI:

Digital Library

[15]

Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851–5857. DOI:

Digital Library

[16]

Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems 38, 1(2014), 85–107. DOI:

[17]

Jiuyong Li, Hong Shen, and Rodney Topor. 2002. Mining the optimal class association rule set. Knowledge-Based Systems 15, 7 (2002), 399–405. DOI:

Digital Library

[18]

Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowledge-Based Systems 111 (2016), 283–298. DOI:

Digital Library

[19]

J. Liu, K. Wang, and B. C. M. Fung. 2016. Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering 28, 5(2016), 1245–1257. DOI:

Digital Library

[20]

Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, USA, 55–64. DOI:

Digital Library

[21]

Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’05). 689–695.

Digital Library

[22]

Heungmo Ryang and Unil Yun. 2015. Top-k high utility pattern mining with effective threshold raising strategies. Knowledge-Based Systems 76 (2015), 109–126. DOI:

Digital Library

[23]

Kuldeep Singh, Ajay Kumar, Shashank Sheshar Singh, Harish Kumar Shakya, and Bhaskar Biswas. 2019. EHNL: An efficient algorithm for mining high utility itemsets with negative utility value and length constraints. Information Sciences 484 (2019), 44–70. DOI:

Digital Library

[24]

Kuldeep Singh, Harish Kumar Shakya, Abhimanyu Singh, and Bhaskar Biswas. 2018. Mining of high-utility itemsets with negative utility. Expert Systems 35, 6 (2018), e12296. DOI:

[25]

Kuldeep Singh, Shashank Sheshar Singh, Ajay Kumar, and Bhaskar Biswas. 2018. High utility itemsets mining with negative utility value: A survey. Journal of Intelligent and Fuzzy Systems 35, 6 (2018), 6551–6562. DOI:

[26]

Wei Song, Yu Liu, and Jinhong Li. 2014. BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining 10, 1(2014), 1–15. DOI:

Digital Library

[27]

Rui Sun, Meng Han, Chunyan Zhang, Mingyao Shen, and Shiyu Du. 2021. Mining of top-k high utility itemsets with negative utility. Journal of Intelligent and Fuzzy Systems 40, 3 (2021), 5637–5652. DOI:

Digital Library

[28]

Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8(2013), 1772–1786. DOI:

Digital Library

[29]

V. S. Tseng, C. W. Wu, P. Fournier-Viger, and P. S. Yu. 2016. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 28, 1(2016), 54–67. DOI:

Digital Library

[30]

Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10). ACM, New York, NY, USA, 253–262. DOI:

Digital Library

[31]

Xindong Wu, Chengqi Zhang, and Shichao Zhang. 2004. Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems 22, 3(2004), 381–405. DOI:

Digital Library

[32]

Hong Yao and Howard J. Hamilton. 2006. Mining itemset utilities from transaction databases. Data and Knowledge Engineering 59, 3(2006), 603–626. DOI:

Digital Library

[33]

Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861–3878. DOI:

Digital Library

[34]

Mohammed J. Zaki. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12, 3(2000), 372–390. DOI:

Digital Library

[35]

Shichao Zhang, Zifang Huang, Jilian Zhang, and Xiaofeng Zhu. 2008. Mining follow-up correlation patterns from time-related databases. Knowledge and Information Systems 14, 1 (2008), 81–100.

Digital Library

[36]

Shichao Zhang and Xindong Wu. 2001. Large scale data mining based on data partitioning. Applied Artificial Intelligence 15, 2 (2001), 129–139. DOI:

[37]

Shichao Zhang, Xindong Wu, Chengqi Zhang, and Jingli Lu. 2008. Computing the minimum-support for mining frequent patterns. Knowledge and Information Systems 15, 2 (2008), 233–257. DOI:

[38]

Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2017. EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowledge and Information Systems 51, 2 (2017), 595–625. DOI:

Digital Library

Cited By

Sagare SKodavade D(2024)Correlated time-window constrained high-utility itemsets mining with certain and uncertain real-life datasetsMultimedia Tools and Applications10.1007/s11042-024-19715-6Online publication date: 4-Jul-2024
https://doi.org/10.1007/s11042-024-19715-6

Index Terms

Mining Top-k High On-shelf Utility Itemsets Using Novel Threshold Raising Strategies
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Mining of top-k high utility itemsets with negative utility

High utility itemset mining (HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets (HUIs) with negative items. Although the top-k HUIM ...
Mining High-Average Utility Itemsets with Positive and Negative External Utilities
Abstract
High-utility itemset mining (HUIM) is an emerging data mining topic. It aims to find the high-utility itemsets by considering both the internal (i.e., quantity) and external (i.e., profit) utilities of items. High-average-utility itemset mining (...
Mining top-K high utility itemsets
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Mining high utility itemsets from databases is an emerging topic in data mining, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. Although several studies have been carried out on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 5

June 2024

699 pages

EISSN:1556-472X

DOI:10.1145/3613659

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024

Online AM: 08 February 2024

Accepted: 19 January 2024

Revised: 12 November 2023

Received: 18 June 2023

Published in TKDD Volume 18, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
138
Total Downloads

Downloads (Last 12 months)138
Downloads (Last 6 weeks)11

Reflects downloads up to 14 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sagare SKodavade D(2024)Correlated time-window constrained high-utility itemsets mining with certain and uncertain real-life datasetsMultimedia Tools and Applications10.1007/s11042-024-19715-6Online publication date: 4-Jul-2024
https://doi.org/10.1007/s11042-024-19715-6

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents