Abstract
Traditional association-rule mining (ARM) considers only the frequency of items in a binary database, which provides insufficient knowledge for making efficient decisions and strategies. The mining of useful information from quantitative databases is not a trivial task compared to conventional algorithms in ARM. Fuzzy-set theory was invented to represent a more valuable form of knowledge for human reasoning, which can also be applied and utilized for quantitative databases. Many approaches have adopted fuzzy-set theory to transform the quantitative value into linguistic terms with its corresponding degree based on defined membership functions for the discovery of FFIs, also known as fuzzy frequent itemsets. Only linguistic terms with maximal scalar cardinality are considered in traditional fuzzy frequent itemset mining, but the uncertainty factor is not involved in past approaches. In this paper, an efficient fuzzy mining (EFM) algorithm is presented to quickly discover multiple FFIs from quantitative databases under type-2 fuzzy-set theory. A compressed fuzzy-list (CFL)-structure is developed to maintain complete information for rule generation. Two pruning techniques are developed for reducing the search space and speeding up the mining process. Several experiments are carried out to verify the efficiency and effectiveness of the designed approach in terms of runtime, the number of examined nodes, memory usage, and scalability under different minimum support thresholds and different linguistic terms used in the membership functions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD record, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The International conference on very large databases, pp 487–499
Au WH, Chan KCC (1998) An effective algorithm for discovering fuzzy rules in relational databases. In: IEEE International conference on fuzzy systems, pp 1314–1319
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 6:866–883
Li C, Yan B, Tang M, Yi J, Zhang X (2018) Data driven hybrid fuzzy model for short-term traffic flow prediction. J Intell Fuzzy Sys 35:6525–6536
Chen JS, Chen FG, Wang JY (2012) Enhance the multi-level fuzzy association rules based on cumulative probability distribution approach. The ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, pp 89–94
Chen CH, Hong TP, Li Y (2015) Fuzzy association rule mining with type-2 membership functions. Lect Notes Comput Sci, 128–134
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Tseng VS, Yu PS (2017) FDHUP: fast algorithm for mining discriminative high utility patterns. Knowl Inf Syst 51(3):873–909
Gupta PK, Muhuri PK (2020) Perceptual reasoning based solution methodology for linguistic optimization problems. arXiv:https://arxiv.org/abs/2004.14933
Holland J (1975) Adaptation in natural and artificial systems. MIT Press, Cambridge
Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: The international conference on very large data bases, pp 420–431
Hong TP, Kuo CS, Chi SC (1999) Mining association rules from quantitative data. Intell Data Analy 3:363–376
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8:53–87
Hagras H (2008) Type-2 fuzzy logic controllers: a way forward for fuzzy systems in real world environments. Lect Notes Comput Sci, 181–200
Hong TP, Lan GC, Lin YH, Pan ST (2013) An effective gradual data-reduction strategy for fuzzy itemset mining. Int J Fuzzy Syst 15(2):170–181
Hong TP, Lin CW, Lin TC (2014) The MFFP-tree fuzzy mining algorithm to discover complete linguistci frequent itemsets. Comput Intell 30:145–166
Karnik NN, Mendel JM (1998) Introduction to type-2 fuzzy logic systems. In: International conference on fuzzy systems, pp 915–920
Kuok CM, Fu A, Wong MH (1998) Mining fuzzy association rules in databases. ACM SIGMOD record 27:41–46
Kar S, Kabir MMJ (2019) Comparative analysis of mining fuzzy association rule using genetic algorithm. In: The international conference on electrical, computer and communication engineering, pp 1–5
Lin CW, Hong TP, Lu WH (2009) The pre-FUFP algorithm for incremental mining. Expert Syst Appl 36:9498–9505
Lin CW, Hong TP, Lu WH (2010) Linguistic data mining with fuzzy FP-trees. Expert Syst Appl 37:4560–4567
Lin CW, Hong TP, Lu WH (2010) An efficient tree-based fuzzy data mining approach. Int J Fuzzy Syst 12:150–157
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev Data Min Knowl Discov 3:190–199
Lin CW, Hong TP (2014) Mining fuzzy frequent itemsets based on UBFFP trees. J Intell Fuzzy Syst 27:535–548
Lin JCW, Hong TP, Lin TC (2015) A CMFFP-tree algorithm to mine complete multiple fuzzy frequent itemsets. Appl Soft Comput 28:431–439
Lin JCW, Hong TP, Lin TC, Pan ST (2015) An UBMFFP tree for mining multiple fuzzy frequent itemsets, International journal of uncertainty. Fuzz Knowl-Based Syst 23:861–879
Lin JCW, Li T, Fournier-Viger P, Hong TP (2015) A fast algorithm for mining fuzzy frequent itemsets. J Intell Fuzz Syst 29:2373–2379
Lin JCW, Lv X, Fournier-Viger P, Wu TY, Hong TP (2016) Efficient mining of fuzzy frequent itemsets with type-2 membership functions. In: The Asian conference on intelligent information and database systems, pp 191–200
Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intel 55:320–330
Fournier-Viger P, Lin CW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1:54–77
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Chao HC (2017) Mining of skyline patterns by considering both frequent and utility constraints. Knowl Inf Syst 51(3):873–909
Lin JCW, Srivastava G, Djenouri Y, Zhang Y, Aloqaily M (2020) Privacy preserving multi-objective sanitization model in 6G IoT environments. IEEE Internet of Things Journal
Lin JCW, Shao Y, Djenouri Y, Yun U (2020) ASRNN: a recurrent neural network with an attention model for sequence labeling. Knowledge-based Systems
Mendel JM, John RIB (2002) Type-2 fuzzy sets made simple. IEEE Trans Fuzzy Syst 10:117–127
Mishra D, Mishra S, Satapathy SK, Patnaik S (2012) Genetic algorithm based fuzzy frequent pattern mining from gene expression data. Soft computing techniques in vision science, pp 1–14
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: The SIGMOD international conference on management of data, pp 1–12
Shukla AK, Muhuri PK (2019) Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intel 77:268–282
Srivastava DK, Roychoudhury B, Samalia HV (2019) Fuzzy association rule mining for economic development indicators. Int J Intell Enterprise 6(1):3–18
Srivastava G, Lin JCW, Zhang X, Li Y (2020) Large-scale high-utility sequential pattern analytics in Internet of things. IEEE Internet of Things Journal
Srivastava G, Lin JCW, Jolfaei A, Li Y, Djenouri Y (2020) Uncertain-driven analytics of sequence data in IoCV environments. IEEE trans Intell Transp Syst
Watanabe T, Fujioka R (2012) Fuzzy association rules mining algorithm based on equivalence redundancy of items. In: IEEE International conference on systems, man, and cybernetics, pp 1960–1965
Wu JMT, Lin JCW, Tamrakar A (2019) High-utility itemset mining with effective pruning strategies. ACM transactions on knowledge discovery from data, 13, Article 58
Wang L, Ma Q, Meng J (2019) Incremental fuzzy association rule mining for classification and regression. IEEE Access 7:121095–121110
Wu TY, Lin JCW, Yun U, Chen CH, Srivastava G, Lv X (2020) An efficient algorithm for fuzzy frequent itemset mining. J Intell Fuzzy Syst, 1–11
Zadeh LA (1965) . Fuzzy sets, Inf Control 8:338–353
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 1
For an termset X, if Sup(X) or rSup(X) is less than the minimum support threshold, then any supersets (extension) of X is not multiple fuzzy frequent pattern and should be pruned.
Proof
∀ transaction \(T\supseteq X^{\prime }\),
-
\(\because \)
-
\(X^{\prime }\) is an extension of X, \((X^{\prime } - X) = (X^{\prime }/X)\), we can obtain that \(X\subseteq X^{\prime }\subseteq T\Rightarrow (X^{\prime }/X)\subseteq (T/X)\),
-
\(\therefore \)
-
\( fv(X^{\prime }, T) = fv(X, T)\cup fv((X^{\prime } - X), T) = min(fv(X, T), fv(X^{\prime }/X, T))\leq fv(X, T)\) and \(min(fv(X, T), fv(X^{\prime }/X, T))\leq fv(X^{\prime }/X, T) = rmrfv(X, T)\).
Suppose that X.tids denotes the set of tids of X,
-
\(\because \)
-
\( X\subseteq X^{\prime }\Rightarrow X^{\prime }.tids\subseteq X.tids\),
-
\(\therefore \)
-
\(\frac {{\sum }_{id(T)\in X^{\prime }.tids}fv(X^{\prime }, T)}{N}\leq \frac {{\sum }_{id(T)\in X.tids}fv(X, T)}{N}\Rightarrow Sup(X) < minSup\).
Furthermore, we can obtain that \(\frac {{\sum }_{id(T)\in X^{\prime }.tids}rmrfv(X^{\prime }, T)}{N}\leq \frac {{\sum }_{id(T)\in X.tids}rmrfv(X, T)}{N}\Rightarrow rSup(X) < minSup\). □
Lemma 2
For a termset X, if Sup(X) or relative remaining support rSup(X) is less than the minimum support threshold, then any supersets (extension) of X is not a MFFP and should be discarded.
Proof
-
\(\because \)
-
\( X\subseteq X^{\prime }\Rightarrow X^{\prime }.tids\subseteq X.tids\),
-
\(\therefore \)
-
\( Sup(X^{\prime }) = \frac {{\sum }_{id(T)\in X.tids}fv(X^{\prime }, T)}{N} = \frac {{\sum }_{id(T)\in X^{\prime }.tids}min(fv(X, T), fv(X^{\prime }/X, T)}{N}\\ \leq \frac {{\sum }_{id(T)\in X^{\prime }.tids}min(fv(X, T), rmrfv(X, T)}{N} = \frac {{\sum }_{id(T)\in Q^{\prime }}fv(X, T) +{\sum }_{id(T)\in Q^{\prime \prime }}rmrfv(X, T)}{N}= rSup(X)\leq minSup\).
Note that suppose \(Q^{\prime }\cup Q^{\prime \prime } = X^{\prime }.tids\) and \(Q^{\prime }\cap Q^{\prime \prime } = \null \), \(T\in Q^{\prime }, fv(X, T) < rmrfv(X, T)\), and \(T\in Q^{\prime }, fv(X, T)\geq rmrfv(X, T)\). □
Rights and permissions
About this article
Cite this article
Lin, J.CW., Ahmed, U., Srivastava, G. et al. Linguistic frequent pattern mining using a compressed structure. Appl Intell 51, 4806–4823 (2021). https://doi.org/10.1007/s10489-020-02080-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02080-w