Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Beyond Frequency: Utility Mining with Varied Item-specific Minimum Utility

Published: 05 January 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Consumer behavior plays a very important role in economics and targeted marketing. However, understanding economic consumer behavior is quite challenging, such as finding credible and reliable information on product profitability. Different from frequent pattern mining, utility-oriented mining integrates utility theory and data mining. Utility mining is a useful tool for understanding economic consumer behavior. Traditional algorithms for mining high-utility patterns (HUPs) applies a single/uniform minimum utility threshold (minutil) to obtain the set of HUPs, but in some real-life circumstances, some specific products may bring lower utilities compared with others, but their profit may offer some vital information. If minutil is set high, the patterns with low minutil are missed; if minutil is set low, the number of patterns becomes unmanageable. In this article, an efficient one-phase utility-oriented pattern mining algorithm, called HIMU, is proposed for mining HUPs with varied item-specific minimum utility. A novel tree structure called a multiple item utility set-enumeration tree (MIU-tree) and the global sorted and the conditional downward closure properties are introduced in HIMU. In addition, we extended the compact utility-list structure to keep the necessary information, and thus this one-phase HIMU model greatly reduces the computational costs and memory requirements. Moreover, two pruning strategies are then extended to enhance the performance. We conducted extensive experiments in several synthetic and real-world datasets; the results indicate that the designed one-phase HIMU algorithm can address the “rare item problem” and has better performance than the state-of-the-art algorithms in terms of runtime, memory usage, and scalability. Furthermore, the enhanced algorithms outperform the non-optimized HIMU approach.

    References

    [1]
    Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, Vol. 22. ACM, 207--216.
    [2]
    Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering. IEEE, 3--14.
    [3]
    Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Vol. 1215. 487--499.
    [4]
    Rakesh Agrawal and Ramakrishnan. Quest synthetic data generator. 1994. Retrieved from http://www.Almaden.ibm.com/cs/quest/syndata.html.
    [5]
    Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21, 12 (2009), 1708--1721.
    [6]
    Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CROM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Trans. Knowl. Data Eng. 27, 10 (2015), 2645--2657.
    [7]
    Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 19--26.
    [8]
    Ming-Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8, 6 (1996), 866--883.
    [9]
    Microsoft. Example database foodmart of Microsoft analysis services. 2016. Retrieved from http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.
    [10]
    Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83--92.
    [11]
    Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Shyue-Liang Wang, and Philip S. Yu. 2018. Privacy preserving utility mining: A survey. In Proceedings of the IEEE International Conference on Big Data. IEEE, 2617--2626.
    [12]
    Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Philip S. Yu. 2019. Utility-driven mining of high utility episodes. In Proceedings of the IEEE International Conference on Big Data. IEEE, 2644--2653.
    [13]
    Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Justin Zhan. 2017. Data mining in distributed environment: A survey. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 7, 6 (2017).
    [14]
    Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. More efficient algorithms for mining high-utility itemsets with multiple minimum utility thresholds. In Proceedings of the International Conference on Database and Expert Systems Applications. Springer, 71--87.
    [15]
    Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 8, 2 (2018).
    [16]
    Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent S. Tseng, and Philip S. Yu. 2019. A survey of utility-oriented pattern mining. IEEE Trans. Knowl. Data Eng. (2019).
    [17]
    Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2020. HUOPM: High utility occupancy pattern mining. IEEE Trans. Cybern. 50, 3 (2020), 1195–1208.
    [18]
    Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Justin Zhan. 2017. Mining of frequent patterns with multiple minimum supports. Eng. Applic. Artif. Intell. 60 (2017), 83--96.
    [19]
    Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and Philip S. Yu. 2020. ProUM: Projection-based utility mining on sequence data. Inf. Sci. 513 (2020), 222--240.
    [20]
    Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Comput. Surv. 38, 3 (2006), 9.
    [21]
    Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowl. Discov. 8, 1 (2004), 53--87.
    [22]
    Ya-Han Hu and Yen-Liang Chen. 2006. Mining association rules with multiple minimum supports: A new mining algorithm and a support tuning mechanism. Decis. Supp. Syst. 42, 1 (2006), 1--24.
    [23]
    Ya-Han Hu, Fan Wu, and Yi-Jiun Liao. 2013. An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports. J. Syst. Softw. 86, 5 (2013), 1224--1238.
    [24]
    Tony Cheng-Kui Huang. 2013. Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions. Inf. Sci. 222 (2013), 126--146.
    [25]
    2012. Frequent itemset mining dataset repository. Retrieved from http://fimi.ua.ac.be/data/.
    [26]
    R. Uday Kiran and P. Krishna Reddy. 2011. Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In Proceedings of the 14th International Conference on Extending Database Technology. ACM, 11--20.
    [27]
    Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Exp. Syst. Applic. 42, 5 (2015), 2371--2381.
    [28]
    Srikumar Krishnamoorthy. 2018. Efficient mining of high utility itemsets with multiple minimum utility thresholds. Eng. Applic. Artif. Intell. 69 (2018), 112--126.
    [29]
    Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for mining high utility itemsets. Knowl. Inf. Syst. 38, 1 (2014), 85--107.
    [30]
    Guo-Cheng Lan, Tzung-Pei Hong, Vincent S. Tseng, and Shyue-Liang Wang. 2014. Applying the maximum utility measure in high utility sequential pattern mining. Exp. Syst. Applic. 41, 11 (2014), 5071--5081.
    [31]
    Yeong-Chyi Lee, Tzung-Pei Hong, and Wen-Yang Lin. 2004. Mining fuzzy association rules with multiple minimum supports using maximum constraints. In Proceedings of the International Conference on Knowledge-based and Intelligent Information and Engineering Systems. Springer, 1283--1290.
    [32]
    Beibei Li, Anindya Ghose, and Panagiotis G. Ipeirotis. 2011. Towards a theory model for product search. In Proceedings of the 20th International Conference on World Wide Web. ACM, 327--336.
    [33]
    Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets. Exp. Syst. Applic. 38, 6 (2011), 7419--7424.
    [34]
    Chun-Wei Lin, Guo-Cheng Lan, and Tzung-Pei Hong. 2015. Mining high utility itemsets for transaction deletion in a dynamic database. Intell. Data Anal. 19, 1 (2015), 43--55.
    [35]
    Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl.-based Syst. 111 (2016), 283--298.
    [36]
    Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl.-based Syst. 96 (2016), 171--187.
    [37]
    Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Fast algorithms for mining high-utility itemsets with various discount strategies. Adv. Eng. Inform. 30, 2 (2016), 109--126.
    [38]
    Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Justin Zhan. 2016. Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl.-based Syst. 113 (2016), 100--115.
    [39]
    Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Adv. Eng. Inform. 29, 3 (2015), 648--661.
    [40]
    Jerry Chun-Wei Lin, Jiexiong Zhang, and Philippe Fournier-Viger. 2017. High-utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the Asia-Pacific Web and Web-age Information Management Joint Conference on Web and Big Data. Springer, 215--229.
    [41]
    Bing Liu. 2007. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer Science 8 Business Media.
    [42]
    Bing Liu, Wynne Hsu, and Yiming Ma. 1999. Mining association rules with multiple minimum supports. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 337--341.
    [43]
    Junqiang Liu, Ke Wang, and Benjamin C. M. Fung. 2012. Direct discovery of high utility itemsets without candidate generation. In Proceedings of the IEEE 12th International Conference on Data Mining. IEEE, 984--989.
    [44]
    Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 55--64.
    [45]
    Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689--695.
    [46]
    Yu-Cheng Liu, Chun-Pei Cheng, and Vincent S. Tseng. 2011. Discovering relational-based association rules with multiple minimum supports on microarray datasets. Bioinformatics 27, 22 (2011), 3142--3148.
    [47]
    Thang Mai, Bay Vo, and Loan T. T. Nguyen. 2017. A lattice-based approach for mining high utility association rules. Inf. Sci. 399 (2017), 81--97.
    [48]
    A. Marshall. 1926. In Principles of Economics (8th ed.). Macmillan and Co., London, UK.
    [49]
    Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. 1998. Pruning closed itemset lattices for association rules. In Proceedings of the International Conference on Advanced Databases. 177--196.
    [50]
    Jian Pei and Jiawei Han. 2002. Constrained frequent pattern mining: A pattern-growth view. ACM SIGKDD Explor. Newslett. 4, 1 (2002), 31--39.
    [51]
    Jian Pei, Jiawei Han, and Laks V. S. Lakshmanan. 2001. Mining frequent itemsets with convertible constraints. In Proceedings of the 17th International Conference on Data Engineering. IEEE, 433--442.
    [52]
    Heungmo Ryang, Unil Yun, and Keun Ho Ryu. 2014. Discovering high utility itemsets with multiple minimum supports. Intell. Data Anal. 18, 6 (2014), 1027--1047.
    [53]
    Ron Rymon. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning. 539--550.
    [54]
    Wei Song, Zihan Zhang, and Jinhong Li. 2016. A high utility itemset mining algorithm based on subsume index. Knowl. Inf. Syst. 49, 1 (2016), 315--340.
    [55]
    Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the International Conference on Extending Database Technology. Springer, 1--17.
    [56]
    Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25, 8 (2013), 1772--1786.
    [57]
    Vincent S. Tseng, Cheng-Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2016. Efficient algorithms for mining top- high utility itemsets. IEEE Trans. Knowl. Data Eng. 28, 1 (2016), 54--67.
    [58]
    Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.
    [59]
    Bay Vo, Tzung-Pei Hong, and Bac Le. 2013. A lattice-based approach for mining most generalization association rules. Knowl.-based Syst. 45 (2013), 20--30.
    [60]
    Bay Vo, Tuong Le, Tzung-Pei Hong, and Bac Le. 2015. Fast updated frequent-itemset lattice for transaction deletion. Data Knowl. Eng. 96 (2015), 78--89.
    [61]
    Jun-Zhe Wang and Jiun-Long Huang. 2018. On incremental high utility sequential pattern mining. ACM Trans. Intell. Syst. Technol. 9, 5 (2018), 55.
    [62]
    Jun-Zhe Wang, Jiun-Long Huang, and Yi-Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowl. Inf. Syst. 49, 2 (2016), 597--627.
    [63]
    Gary M. Weiss. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 7--19.
    [64]
    Kung-Jiuan Yang, Tzung-Pei Hong, Guo-Cheng Lan, and Yuh-Min Chen. 2014. Efficient mining of partial periodic patterns with individual event support thresholds using minimum constraints. Int. J. Uncert. Fuzz. Knowl.-based Syst. 22, 6 (2014), 793--814.
    [65]
    Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 482--486.
    [66]
    Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 660--668.
    [67]
    Unil Yun, Heungmo Ryang, Gangin Lee, and Hamido Fujita. 2017. An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl.-based Syst. 124 (2017), 188--206.
    [68]
    Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 530--546.

    Cited By

    View all

    Index Terms

    1. Beyond Frequency: Utility Mining with Varied Item-specific Minimum Utility

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 21, Issue 1
        Visions Paper, Regular Papers, SI: Blockchain in E-Commerce, and SI: Human-Centered Security, Privacy, and Trust in the Internet of Things
        February 2021
        534 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3441681
        • Editor:
        • Ling Liu
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 05 January 2021
        Accepted: 01 September 2020
        Revised: 01 August 2020
        Received: 01 October 2019
        Published in TOIT Volume 21, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Economic behavior
        2. pruning strategies
        3. rare item problem
        4. set-enumeration tree
        5. utility mining

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • National Natural Science Foundation of China
        • NSF

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)134
        • Downloads (Last 6 weeks)16
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Effective algorithms to mine skyline frequent-utility itemsetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105355116:COnline publication date: 20-Jan-2023
        • (2022)ONP-Miner: One-off Negative Sequential Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/3549940Online publication date: 4-Aug-2022
        • (2022)An Efficient Approach for Mining Reliable High Utility PatternsIEEE Access10.1109/ACCESS.2021.313902810(1419-1431)Online publication date: 2022
        • (2022)Natural Exponent Inertia Weight based Particle Swarm Optimization for Mining Serial Episode Rules from Event SequencesIETE Journal of Research10.1080/03772063.2021.202181569:8(5425-5439)Online publication date: 18-Jan-2022
        • (2022)OHUQI: Mining on-shelf high-utility quantitative itemsetsThe Journal of Supercomputing10.1007/s11227-021-04218-078:6(8321-8345)Online publication date: 9-Jan-2022
        • (2021)FRI-miner: fuzzy rare itemset miningApplied Intelligence10.1007/s10489-021-02574-152:3(3387-3402)Online publication date: 6-Jul-2021

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media