Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining Closed High Utility Itemsets based on Propositional Satisfiability

Published: 01 November 2021 Publication History

Abstract

A high utility itemset mining problem is the question of recognizing a set of items that have utility values greater than a given user utility threshold. This generalization of the classical problem of frequent itemset mining is a useful and well-known task in data analysis and data mining, since it is used in a wide range of real applications. In this paper, we first propose to use symbolic Artificial Intelligence for computing the set of all closed high utility itemsets from transaction databases. Our approach is based on reduction to enumeration problems of propositional satisfiability. Then, we enhance the efficiency of our SAT-based approach using the weighted clique cover problem. After that, in order to improve scalability, a decomposition technique is applied to derive smaller and independent sub-problems in order to capture all the closed high utility itemsets. Clearly, our SAT-based encoding can be constantly enhanced by integrating the last improvements in powerful SAT solvers and models enumeration algorithms. Finally, through empirical evaluations on different real-world datasets, we demonstrate that the proposed approach is very competitive with state-of-the-art specialized algorithms for high utility itemsets mining, while being sufficiently flexible to take into account additional constraints to finding closed high utility itemsets.

References

[1]
Ahmed C.F., Tanbeer S.K., Jeong B.-S., Lee Y.-K., Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowledge and Data Eng 21 (12) (2009) 1708–1721.
[2]
B.-E. Shie, H.-F. Hsiao V., S. Tseng, P.S. Yu, Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments, in: Proc. 16th Int’l Conf. DAtabase Systems for Advanced Applications (DASFAA ’11), vol. 6587/2011, 2011, pp. 224–238.
[3]
S.J. Yen, Y.S. Lee, Mining High Utility Quantitative Association Rules. in: Proc. Ninth Int’l Conf. Data Warehousing and Knowledge Discovery (DaWaK), 2007, pp. 283–292.
[4]
S. Jabbour, L. Sais, Y. Salhi, The top-k frequent closed itemset mining using top-k sat problem, in: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’03), 2013, pp. 403–418.
[5]
Dlala Imen, Jabbour Said, Badran Raddaoui, Sais Lakhdar, A parallel sat-based framework for closed frequent itemsets mining, in: 24th International Conference, CP 2018, Lille, France, August 27-31, 2018, Proceedings, 2018,.
[6]
Dlala Imen, Jabbour Said, Badran Raddaoui, Sais Lakhdar, A parallel SAT-based framework for closed frequent itemsets mining, in: 24th International Conference, CP 2018, Lille, France, August 27-31, 2018, Proceedings, 2018,.
[7]
S. Jabbour, et al. Boolean satisfiability for sequence mining, in proceedings of CIKM’13, pp. 649-658.
[8]
A. Boudane, S. Jabbour, L. Sais, Y. Salhi, A SAT-Based Approach for Mining Association Rules, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), 2016, pp. 2472–2478.
[9]
A. Boudane, S. Jabbour, L. Sais, Y. Salhi, Enumerating Non-redundant Association Rules Using Satisfiability, in: Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’17), 2017, pp. 824–836.
[10]
Guns T., Nijssen S., Raedt L.D., Itemset mining: A constraint programming perspective, Artificial Intelligence 175 (12-13) (2011) 1951–1983.
[11]
L.D. Raedt, T. Guns, S. Nijssen, Constraint programming for itemset mining, in ACM SIGKDD, 2008, 204–212.
[12]
G. Tseitin, On the complexity of derivations in the propositional calculus, in: Structures in Constructives Mathematics and Mathematical Logic, Part II. 1968, pp. 115-125.
[13]
Fournier-Viger P., Lin J.C.-W., Vo B, Chi T.T., Zhang J., Le H.B., A survey of itemset mining, in: WIREs Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Wiley, 2017.
[14]
Zhang. Chonsheng, et al. An empirical evaluation of high utility itemset mining algorithms, in: 101, 2018, pp. 91–115.
[15]
Rahmati Bahareh, Sohrabi Mohammad, A systematic survey on high utility itemset mining, Int. J. Inf. Technol. Decis. Mak. 18 (2019),.
[16]
Y. Liu, W. Liao, A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in: Proc. 9th Pacic-Asia Conf. on Knowl. Discovery and Data Mining, 2005, pp. 689695.
[17]
Krishnamoorthy S., Pruning strategies for mining high utility itemsets, Expert Syst. Appl. 42 (5) (2015) 2371–2381.
[18]
Tseng Vincent S., Wu Cheng-Wei, Shie Bai-En, Yu Philip S., UP-Growth: an efficient algorithm for high utility itemset mining, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10), Association for Computing Machinery, New York, NY, USA, 2010, pp. 253–262,.
[19]
M. Liu, J. Qu, Mining high utility itemsets without candidate generation, in: Proc. 22nd ACM Intern. Conf. Info. and Know. Management, 2012, pp. 5564.
[20]
Liu Junqiang, Wang ke, Fung Benjamin, Direct discovery of high utility itemsets without candidate generation, in: Proceedings - IEEE International Conference on Data Mining, ICDM, 2012, pp. 984–989,.
[21]
P. Fournier-Viger, C.-W. Wu, S. Zida, V.S. Tseng, FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning, in: Proc. 21st Intern. Symp. on Methodologies for Intell. Syst., 2014, pp. 8392.
[22]
Zida Souleymane, Viger Fournier, Lin Philippe, Wu Chun-Wei, Tseng Vincent, EFIM: a fast and memory efficient algorithm for high-utility itemset mining, Knowl. Inf. Syst. 51 (2017),.
[23]
Peng Alex Yuxuan, Koh Yun Sing, Riddle Patricia, mHUIMiner: A Fast high utility itemset mining algorithm for sparse datasets. en, in: Kim Jinho, et al. (Eds.), Advances in Knowledge Discovery and Data Mining, vol. 10235, Springer International Publishing, Cham, 2017, pp. 196–207.
[24]
Quang-Huy Duong, et al. Efficient High Utility Itemset Mining Using Buffered Utility-lists, in: Applied Intelligence, vol. 48.7, 2018, pp. 1859–1877.
[25]
V.S. Tseng, C. Wu, P. Fournier-Viger, P.S. Yu, Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets, in: IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 726-739, 1 March 2015, https://doi.org/10.1109/TKDE.2014.2345377.
[26]
Selvan S., Nataraj R.V., Efficient mining of large maximal bicliques from 3D symmetric adjacency matrix, in: IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 12, pp. 1797-1802, Dec. 2010, 2010,.
[27]
Lucchese C., Orlando S., Perego R., Fast and memory efficient mining of frequent closed itemsets, IEEE Trans. Knowl. Data Eng. 18 (1) (2006) 21–36,.
[28]
Fournier-Viger P., Zida S., Lin J.C.W., Wu C.W., Tseng V.S., EFIM-Closed: Fast and memory efficient discovery of closed high-utility itemsets, in: Perner P. (Ed.), Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science, Vol 9729, Springer, Cham, 2016,.
[29]
C. Wu, P. Fournier-Viger, J. Gu, V.S. Tseng, Mining closed+ high utility itemsets without candidate generation, in: 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2015, pp. 187–194.
[30]
Davis Martin, Logemann George, Loveland Donald, A machine program for theorem-proving, Commun. ACM 5 (7) (1962) 394–397,.
[31]
Hsu Wen-Lian, Nemhauser George L., A polynomial algorithm for the minimum weighted clique cover problem on claw-free perfect graphs, Discrete Math. (1982) 65–71.
[32]
Cheng James, Ke Yiping, Fu Ada Wai-Chee, Yu Jeffrey Xu, Zhu Linhong, Finding maximal cliques in massive networks, ACM Trans. Database Syst 36 (2011) 4.
[33]
Eblen J.D., Phillips C.A., Rogers G.L., et al., The maximum clique enumeration problem: algorithms, applications, and implementations, BMC Bioinformatics 13 (2012) S5.
[34]
A. Hidouri, S. Jabbour, B. Raddaoui, B.B. Yaghlane, A SAT-Based Approach for Mining High Utility Itemsets from Transaction Databases, in: Data Analytics and Knowledge Discovery. DAWAK 2020.
[35]
Fournier-Viger P., SPMF: A Java Open-Source Data Mining Library, 2018, URL: www.philippe-fournier-viger.com/spmf/. (visited on 08/15/2018).
[36]
Niklas Eént, Niklas Sörensson, An Extensible SAT-solver, in: Proceedings of SAT, 2003, pp. 502–518.

Cited By

View all
  • (2024)An efficient approach for incremental erasable utility pattern mining from non-binary dataKnowledge and Information Systems10.1007/s10115-024-02185-566:10(5919-5958)Online publication date: 1-Oct-2024
  • (2023)Targeting minimal rare itemsets from transaction databasesProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/235(2114-2121)Online publication date: 19-Aug-2023
  • (2023)Data-Aware Declarative Process Mining with SATACM Transactions on Intelligent Systems and Technology10.1145/360010614:4(1-26)Online publication date: 10-Aug-2023
  • Show More Cited By

Index Terms

  1. Mining Closed High Utility Itemsets based on Propositional Satisfiability
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Data & Knowledge Engineering
        Data & Knowledge Engineering  Volume 136, Issue C
        Nov 2021
        93 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 01 November 2021

        Author Tags

        1. Data Mining
        2. High Utility
        3. Symbolic Artificial Intelligence
        4. Propositional Satisfiability

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 09 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)An efficient approach for incremental erasable utility pattern mining from non-binary dataKnowledge and Information Systems10.1007/s10115-024-02185-566:10(5919-5958)Online publication date: 1-Oct-2024
        • (2023)Targeting minimal rare itemsets from transaction databasesProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/235(2114-2121)Online publication date: 19-Aug-2023
        • (2023)Data-Aware Declarative Process Mining with SATACM Transactions on Intelligent Systems and Technology10.1145/360010614:4(1-26)Online publication date: 10-Aug-2023
        • (2023)FCHM-stream: fast closed high utility itemsets mining over data streamsKnowledge and Information Systems10.1007/s10115-023-01831-865:6(2509-2539)Online publication date: 1-Jun-2023
        • (2022)An overview of high utility itemsets mining methods based on intelligent optimization algorithmsKnowledge and Information Systems10.1007/s10115-022-01741-164:11(2945-2984)Online publication date: 1-Nov-2022

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media