Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

High-Utility Itemset Mining with Effective Pruning Strategies

Published: 11 November 2019 Publication History

Abstract

High-utility itemset mining is a popular data mining problem that considers utility factors, such as quantity and unit profit of items besides frequency measure from the transactional database. It helps to find the most valuable and profitable products/items that are difficult to track by using only the frequent itemsets. An item might have a high-profit value which is rare in the transactional database and has a tremendous importance. While there are many existing algorithms to find high-utility itemsets (HUIs) that generate comparatively large candidate sets, our main focus is on significantly reducing the computation time with the introduction of new pruning strategies. The designed pruning strategies help to reduce the visitation of unnecessary nodes in the search space, which reduces the time required by the algorithm. In this article, two new stricter upper bounds are designed to reduce the computation time by refraining from visiting unnecessary nodes of an itemset. Thus, the search space of the potential HUIs can be greatly reduced, and the mining procedure of the execution time can be improved. The proposed strategies can also significantly minimize the transaction database generated on each node. Experimental results showed that the designed algorithm with two pruning strategies outperform the state-of-the-art algorithms for mining the required HUIs in terms of runtime and number of revised candidates. The memory usage of the designed algorithm also outperforms the state-of-the-art approach. Moreover, a multi-thread concept is also discussed to further handle the problem of big datasets.

References

[1]
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In International Conference on Very Large Data Bases, Vol. 1215. 487--499.
[2]
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. An efficient candidate pruning technique for high utility pattern mining. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining. ACM, 749--756.
[3]
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708--1721.
[4]
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708--1721.
[5]
Brock Barber and Howard J. Hamilton. 2003. Extracting share frequent itemsets with infrequent subsets. Data Mining and Knowledge Discovery 7, 2 (2003), 153--185.
[6]
Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In International Conference on Data Mining. IEEE, 19--26.
[7]
Ming-Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 6 (1996), 866--883.
[8]
Chun-Jung Chu, Vincent S. Tseng, and Tyne Liang. 2009. An efficient algorithm for mining high utility itemsets with negative item values in large databases. Applied Mathematics and Computation 215, 2 (2009), 767--778.
[9]
Alva Erwin, Raj P. Gopalan, and N. R. Achuthan. 2008. Efficient mining of high utility itemsets from large datasets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 554--561.
[10]
Alva Erwin, Raj P. Gopalan, and N. R. Achuthan. 2007. CTU-Mine: An efficient high utility itemset mining algorithm using the pattern growth approach. In The International Conference on Computer and Information Technology. 71--76.
[11]
Philippe Fournier-Viger, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun-Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54--77.
[12]
Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In International Symposium on Methodologies for Intelligent Systems. Troels Andreasen, Henning Christiansen, Juan-Carlos Cubero, and Zbigniew W. Raś (Eds.), Springer, 83--92.
[13]
Bart Goethals. 2012. Frequent itemset mining dataset repository. Retrieved from http://fimi.ua.ac.be/data.
[14]
Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Justin Zhan. 2017. Data mining in distributed environment: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 6 (2017), e1216.
[15]
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242.
[16]
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. WIRES Data Mining and Knowledge Discovery 8, 2 (2018), e1242.
[17]
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2019. HUOPM: High-utility occupancy pattern mining. IEEE Transactions on Cybernetics (2019), 1--14.
[18]
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2019. A survey of parallel sequential pattern mining. ACM Transactions on Knowledge Discovery from Data 13, 3 (2019), 25.
[19]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. ACM, 1--12.
[20]
Tzung-Pei Hong, Jimmy Ming-Tai Wu, Yan-Kang Li, and Chun-Hao Chen. 2018. Generalizing concept-drift patterns for fuzzy association rules. Journal of Network Intelligence 3, 2 (2018), 126--137.
[21]
Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Expert Systems with Applications 42, 5 (2015), 2371--2381.
[22]
Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems 38, 1 (2014), 85--107.
[23]
Hua-Fu Li, Hsin-Yun Huang, Yi-Cheng Chen, Yu-Jiun Liu, and Suh-Yin Lee. 2008. Fast and memory efficient mining of high utility itemsets in data streams. In IEEE International Conference on Data Mining. IEEE, 881--886.
[24]
Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang. 2005. Direct candidates generation: A novel algorithm for discovering complete share-frequent itemsets. In The International Conference on Fuzzy Systems and Knowledge Discovery, Lipo Wang and Yaochu Jin (Eds.). Springer, 551--560.
[25]
Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang. 2005. Direct candidates generation: A novel algorithm for discovering complete share-frequent itemsets. In International Conference on Fuzzy Systems and Knowledge Discovery. Springer, 551--560.
[26]
Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang. 2008. Isolated items discarding strategy for discovering high utility itemsets. Data 8 Knowledge Engineering 64, 1 (2008), 198--217.
[27]
Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets. Expert Systems with Applications 38, 6 (2011), 7419--7424.
[28]
Jerry Chun-Wei Lin, Shifeng Ren, Philippe Fournier-Viger, Tzung-Pei Hong, Ja-Hwung Su, and Bay Vo. 2017. A fast algorithm for mining high average-utility itemsets. Applied Intelligence 47, 2 (2017), 331--346.
[29]
Jerry Chun-Wei Lin, Shifeng Ren, Philippe Fournier-Viger, Jeng-Shyan Pan, and Tzung-Pei Hong. 2019. Efficiently updating the discovered high average-utility itemsets with transaction insertion. Engineering Applications of Artificial Intelligence 72, C (2019), 136--149.
[30]
Jerry Chun-Wei Lin, Lu Yang, Philippe Fournier-Viger, and Tzung-Pei Hong. 2019. Mining of skyline patterns by considering both frequent and utility constraints. Engineering Applications of Artificial Intelligence 77 (2019), 229--238.
[31]
Junqiang Liu, Ke Wang, and Benjamin C. M. Fung. 2012. Direct discovery of high utility itemsets without candidate generation. In The International Conference on Data Mining. IEEE, 984--989.
[32]
Mengchi Liu and Qu Junfeng. 2012. Mining high utility itemsets without candidate generation. In The International Conference on Information and Knowledge Management. ACM, 55--64.
[33]
Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689--695.
[34]
Heungmo Ryang and Unil Yun. 2016. High utility pattern mining over data streams with sliding window technique. Expert Systems with Applications 57 (2016), 214--231.
[35]
Heungmo Ryang and Unil Yun. 2017. Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques. Knowledge and Information Systems 51, 2 (2017), 627--659.
[36]
Heungmo Ryang, Unil Yun, and Keun Ho Ryu. 2016. Fast algorithm for high utility pattern mining with the sum of item quantities. Intelligent Data Analysis 20, 2 (2016), 395--415.
[37]
Bai-En Shie, Hui-Fang Hsiao, and Vincent S. Tseng. 2013. Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowledge and Information Systems 37, 2 (2013), 363--387.
[38]
Bai-En Shie, Hui-Fang Hsiao, Vincent S. Tseng, and S. Yu Philip. 2011. Mining high utility mobile sequential patterns in mobile commerce environments. In International Conference on Database Systems for Advanced Applications. Springer, 224--238.
[39]
Bai-En Shie, Vincent S. Tseng, and Philip S. Yu. 2010. Online mining of temporal maximal utility itemsets from data streams. In ACM Symposium on Applied Computing. ACM, 1622--1626.
[40]
Wei Song, Yu Liu, and Jinhong Li. 2014. BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap. International Journal of Data Warehousing and Mining 10, 1 (2014), 1--15.
[41]
Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2012. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8 (2012), 1772--1786.
[42]
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In ACM International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.
[43]
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.
[44]
Bay Vo, Frans Coenen, and Bac Le. 2013. A new method for mining frequent weighted itemsets based on WIT-trees. Expert Systems with Applications 40, 4 (2013), 1256--1264.
[45]
Cheng Wei Wu, Bai-En Shie, Vincent S. Tseng, and Philip S. Yu. 2012. Mining top-k high utility itemsets. In The International Conference on Knowledge Discovery and Data Mining. ACM, 78--86.
[46]
Jimmy Ming-Tai Wu, Justin Zhan, and Jerry Chun-Wei Lin. 2017. An ACO-based approach to mine high-utility itemsets. Knowledge-Based Systems 116 (2017), 102--113.
[47]
Hong Yao and Howard J. Hamilton. 2006. Mining itemset utilities from transaction databases. Data 8 Knowledge Engineering 59, 3 (2006), 603--626.
[48]
Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from databases. In SIAM International Conference on Data Mining. SIAM, 482--486.
[49]
Show-Jane Yen and Yue-Shi Lee. 2007. Mining high utility quantitative association rules. In International Conference on Data Warehousing and Knowledge Discovery. Springer, 283--292.
[50]
Unil Yun, Donggyu Kim, Eunchul Yoon, and Hamido Fujita. 2018. Damped window based high average utility pattern mining over data streams. Knowledge-Based Systems 144 (2018), 188--205.
[51]
Unil Yun, Gangin Lee, and Keun Ho Ryu. 2014. Mining maximal frequent patterns by considering weight conditions over data streams. Knowledge-Based Systems 55 (2014), 49--65.
[52]
Unil Yun, Gangin Lee, and Eunchul Yoon. 2017. Efficient high utility pattern mining for establishing manufacturing plans with sliding window control. IEEE Transactions on Industrial Electronics 64, 9 (2017), 7239--7249.
[53]
Unil Yun, Heungmo Ryang, Gangin Lee, and Hamido Fujita. 2017. An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowledge-Based Systems 124 (2017), 188--206.
[54]
Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861--3878.
[55]
Unil Yun and Keun Ho Ryu. 2013. Efficient mining of maximal correlated weight frequent patterns. Intelligent Data Analysis 17, 5 (2013), 917--939.
[56]
Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In The International Conference on Artificial Intelligence, Grigori Sidorov and Sofia N. Galicia-Haro (Eds.). Springer, 530--546.
[57]
Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential patterns. BMC Systems Biology 11, 6 (2017), 109.

Cited By

View all
  • (2024)Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systemsScientific Reports10.1038/s41598-023-50375-y14:1Online publication date: 10-Jan-2024
  • (2024)MRI-CE: Minimal rare itemset discovery using the cross-entropy methodInformation Sciences10.1016/j.ins.2024.120392(120392)Online publication date: Mar-2024
  • (2024)A Metaheuristic Perspective on Extracting Numeric Association Rules: Current Works, Applications, and RecommendationsArchives of Computational Methods in Engineering10.1007/s11831-024-10109-3Online publication date: 29-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 6
December 2019
282 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3366748
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2019
Accepted: 01 August 2019
Revised: 01 February 2019
Received: 01 February 2017
Published in TKDD Volume 13, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HUIM
  2. high-utility itemset
  3. multiple threads
  4. pruning strategy

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systemsScientific Reports10.1038/s41598-023-50375-y14:1Online publication date: 10-Jan-2024
  • (2024)MRI-CE: Minimal rare itemset discovery using the cross-entropy methodInformation Sciences10.1016/j.ins.2024.120392(120392)Online publication date: Mar-2024
  • (2024)A Metaheuristic Perspective on Extracting Numeric Association Rules: Current Works, Applications, and RecommendationsArchives of Computational Methods in Engineering10.1007/s11831-024-10109-3Online publication date: 29-Mar-2024
  • (2023)Mining skyline frequent-utility patterns from big data environment based on MapReduce frameworkIntelligent Data Analysis10.3233/IDA-22075627:5(1359-1377)Online publication date: 6-Oct-2023
  • (2023)The effective skyline quantify-utility patterns mining algorithm with pruning strategiesComputer Science and Information Systems10.2298/CSIS220615040W20:3(1085-1108)Online publication date: 2023
  • (2023)A Clique-Querying Mining Framework for Discovering High Utility Co-Location Patterns without Generating CandidatesACM Transactions on Knowledge Discovery from Data10.1145/361737818:1(1-42)Online publication date: 16-Oct-2023
  • (2023)Mining High Utility Itemsets Using Prefix Trees and Utility VectorsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325612635:10(10224-10236)Online publication date: 1-Oct-2023
  • (2023)Efficient Parallel Mining of High-utility Itemsets on Multicore Processors2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00388(638-652)Online publication date: Apr-2023
  • (2023)Efficient Parallel Mining of High-utility Itemsets on Multicore Processors2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00384(3563-3577)Online publication date: Apr-2023
  • (2023)FTKHUIM: A Fast and Efficient Method for Mining Top-K High-Utility ItemsetsIEEE Access10.1109/ACCESS.2023.331498411(104789-104805)Online publication date: 2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media