article

Interestingness measures for data mining: A survey

Authors:

Howard J. HamiltonAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 38, Issue 3

Pages 9 - es

https://doi.org/10.1145/1132960.1132963

Published: 30 September 2006 Publication History

Abstract

Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.

References

[1]

Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases. Santiago, Chile. 487--499.]]

Digital Library

[2]

Barber, B. and Hamilton, H. J. 2003. Extracting share frequent itemsets with infrequent subsets. Data Mining Knowl. Discovery 7, 2, 153--185.]]

Digital Library

[3]

Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., and Lakhal, L. 2000. Mining minimal nonredundant association rules using frequent closed itemsets. In Proceedings of the Ist International Conference on Computational Logic. London, UK. 972--986.]]

Digital Library

[4]

Bay, S. D. and Pazzani, M. J. 1999. Detecting change in categorical data: Mining contrast sets. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99). San Diego, CA. 302--306.]]

Digital Library

[5]

Bayardo, R. J. and Agrawal R. 1999. Mining the most interesting rules. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99). San Diego, CA. 145--154.]]

Digital Library

[6]

Breiman, L., Freidman, J., Olshen, R., and S tone, C. 1984. Classification and Regression Trees. Wadsworth and Brooks, Pacific Grove, CA.]]

[7]

Cai, C. H., Fu, A. W., Cheng, C. H., and Kwong, W. W. 1998. Mining association rules with weighted items. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS '98). Cardiff, UK. 68--77.]]

Digital Library

[8]

Carter, C. L., Hamilton, H. J., and Cercone, N. 1997. Share-Based measures for itemsets. In Proceedings of the Ist European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '97). Trondheim, Norway. 14--24.]]

Digital Library

[9]

Carvalho, D. R. and Freitas, A. A. 2000. A genetic algorithm-based solution for the problem of small disjuncts. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2000). Lyon, France. 345--352.]]

Digital Library

[10]

Chan, R., Yang, Q., and Shen, Y. 2003. Mining high-utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM '03). Melbourne, FL. 19--26.]]

Digital Library

[11]

Clark, P. and Boswell, R. 1991. Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European Working Session on Learning (EWSL '91). Porto, Portugal. 151--163.]]

Digital Library

[12]

Dong, G. and Li, J. 1998. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In Proceedings of the 2nd Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD-98). Melbourne, Australia. 72--86.]]

Digital Library

[13]

Fabris, C. C. and Freitas, A. A. 2001. Incorporating deviation-detection functionality into the OLAP paradigm. In Proceedings of the 16th Brazilian Symposium on Databases (SBBD 2001). Rio de Janeiro, Brazil. 274--285.]]

[14]

Fayyad, U. M., Piatetsky-Shapiro, G., and S myth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, U. M. Fayyad et al., Eds. MIT Press. Cambridge, MA, 1--34.]]

Digital Library

[15]

Forsyth, R. S., Clarke, D. D., and Wright, R. L. 1994. Overfitting revisited: An information-theoretic approach to simplifying discrimination trees. J. Exp. Theor. Artif. Intell. 6, 289--302.]]

[16]

Freitas, A. A. 1998. On objective measures of rule surprisingness. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '98). Nantes, France. 1--9.]]

Digital Library

[17]

Fürnkranz, J. and Flach, P. A. 2005. ROC ‘n’ rule learning: Towards a better understanding of covering algorithms. Mach. Learn. 58, 1, 39--77.]]

Digital Library

[18]

Gray, B. and Orlowska, M. E. 1998. CCAIIA: Clustering categorical attributes into interesting association rules. In Proceedings of the 2nd Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98). Melbourne, Australia. 132--143.]]

Digital Library

[19]

Hamilton, H. J., Geng, L., Findlater, L., and R andall, D. J. 2006. Efficient spatio-temporal data mining with GenSpace graphs. J. Appl. Logic 4, 2, 192--214.]]

[20]

Hilderman, R. J., Carter, C. L., Hamilton, H. J., and Cercone, N. 1998. Mining market basket data using share measures and characterized itemsets. In Proceedings of the 2nd Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD-98). Melbourne, Australia. 72--86.]]

Digital Library

[21]

Hilderman, R. J. and Hamilton, H. J. 2001. Knowledge Discovery and Measures of Interest. Kluwer Academic, Boston, MA.]]

Digital Library

[22]

Hoaglin, D. C., Mosteller, F., and Tukey, J. W., Eds. 1985. Exploring Data Tables, Trends, and Shapes. Wiley, New York.]]

Digital Library

[23]

Jaroszewicz, S. and Simovici, D. A. 2001. A general measure of rule interestingness. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2001). Freiburg, Germany. 253--265.]]

Digital Library

[24]

Klosgen, W. 1996. Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining, U. M. Fayyad et al., Eds. MIT Press, Cambridge, MA, 249--271.]]

Digital Library

[25]

Knorr, E. M., Ng, R. T., and Tucakov, V. 2000. Distance based outliers: Algorithms and applications. Int. J. Very Large Databases 8, 237--253.]]

Digital Library

[26]

Lavrac, N., Flach, P., and Zupan, B. 1999. Rule evaluation measures: A unifying view. In Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP '99). Bled, Slovenia. Springer-Verlag, 174--185.]]

Digital Library

[27]

Lenca, P., Meyer, P., Vaillant, B., and L allich, S. 2004. A multicriteria decision aid for interestingness measure selection. Tech. Rep. LUSSI-TR-2004-01-EN, May 2004. LUSSI Department, GET/ENST, Bretagne, France.]]

[28]

Li, G. and Hamilton, H. J. 2004. Basic association rules. In Proceedings of the 4th SIAM International Conference on Data Mining. Orlando, FL. 166--177.]]

[29]

Ling, C., Chen, T., Yang, Q., and Chen, J. 2002. Mining optimal actions for profitable CRM. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM '02). Maebashi City, Japan. 767--770.]]

Digital Library

[30]

Liu, B., Hsu, W., and Chen, S. 1997. Using general impressions to analyze discovered classification rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97). Newport Beach, CA. 31--36.]]

[31]

Liu, B., Hsu, W., Mun, L., and Lee, H. 1999. Finding interesting patterns using user expectations. IEEE Trans. Knowl. Data Eng. 11, 6, 817--832.]]

Digital Library

[32]

Lu, S., Hu, H., and Li, F. 2001. Mining weighted association rules. Intell. Data Anal. 5, 3, 211--225.]]

Digital Library

[33]

McGarry, K. 2005. A survey of interestingness measures for knowledge discovery. Knowl. Eng. Review 20, 1, 39--61.]]

Digital Library

[34]

Murthy, S. K. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining Knowl. Discovery 2, 4, 345--389.]]

Digital Library

[35]

Ohsaki, M., Kitaguchi, S., Okamoto, K., Yokoi, H., and Yamaguchi, T. 2004. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In Proceedings of the 8th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2004). Pisa, Italy. 362--373.]]

Digital Library

[36]

Padmanabhan, B. and Tuzhilin, A. 1998. A belief-driven method for discovering unexpected patterns. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98). New York. 94--100.]]

[37]

Padmanabhan, B. and Tuzhilin, A. 2000. Small is beautiful: Discovering the minimal set of unexpected patterns. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD 2000). Boston, MA. 54--63.]]

Digital Library

[38]

Pagallo, G. and Haussler, D. 1990. Boolean feature discovery in empirical leaning. Mach. Learn. 5, 1, 71--99.]]

Digital Library

[39]

Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley, Eds. MIT Press, Cambridge, MA, 229--248.]]

[40]

Piatetsky-Shapiro, G. and Matheus, C. 1994. The interestingness of deviations. In Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94). Seattle, WA. 25--36.]]

Digital Library

[41]

Quinlan, J. R. 1986. Induction of decision trees. Mach. Learn. 1, 1, 81--106.]]

[42]

Sahar, S. 1999. Interestingness via what is not interesting. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99). San Diego, CA. 332--336.]]

Digital Library

[43]

Sarawagi, S. 1999. Explaining differences in multidimensional aggregates. In Proceedings of the 25th International Conference on Very Large Databases (VLDB '99). Edinburgh, U. K. 42--53.]]

Digital Library

[44]

Sarawagi, S., Agrawal, R., and Megiddo, N. 1998. Discovery-driven exploration of OLAP data cubes. In Proceedings of the 6th International Conference of Extending Database Technology (EDBT '98). Valencia, Spain. 168--182.]]

Digital Library

[45]

Shen, Y. D., Zhang, Z., and Yang, Q. 2002. Objective-Oriented utility-based association mining. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM '02). Maebashi City, Japan. 426--433.]]

Digital Library

[46]

Silberschatz, A. and Tuzhilin, A. 1995. On subjective measures of interestingness in knowledge discovery. In Proceedings of the Ist International Conference on Knowledge Discovery and Data Mining (KDD-95). Montreal, Canada. 275--281.]]

[47]

Silberschatz, A. and Tuzhilin, A. 1996. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8, 6, 970--974.]]

Digital Library

[48]

Tan, P. and Kumar, V. 2000. Interestingness measures for association patterns: A perspective. Tech. Rep. 00-036, Department of Computer Science, University of Minnesota.]]

[49]

Tan, P., Kumar, V., and Srivastava, J. 2002. Selecting the right interestingness measure for association patterns. In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD 2002). Edmonton, Canada. 32--41.]]

Digital Library

[50]

Vaillant, B., Lenca, P., and Lallich, S. 2004. A clustering of interestingness measures. In Proceedings of the 7th International Conference on Discovery Science (DS 2004). Padova, Italy. 290--297.]]

[51]

Vitanyi, P. M. B. and Li, M. 2000. Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans. Inf. Theory 46, 2, 446--464.]]

Digital Library

[52]

Wang, K., Zhou, S., and Han, J. 2002. Profit mining: From patterns to actions. In Proceedings of the 8th Conference on Extending Database Technology (EDBT 2002). Prague, Czech Republic. 70--87.]]

Digital Library

[53]

Webb, G. I. and Brain, D. 2002. Generality is predictive of prediction accuracy. In Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop (PKAW 2002). Tokyo. 117--130.]]

[54]

Yao, Y. Y., Chen, Y. H., and Yang, X. D. 2006. A measurement-theoretic foundation of rule interestingness evaluation. In Foundations and Novel Approaches in Data Mining, T. Y. Lin et al., Eds. Springer-Verlag, Berlin, 41--59.]]

[55]

Yao, Y. Y. and Zhong, N. 1999. An analysis of quantitative measures associated with rules. In Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-99). Beijing, China. 479--488.]]

Digital Library

[56]

Yao, H., Hamilton, H. J., and Butz, C. J. 2004. A foundational approach for mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. Orlando, FL. 482--486.]]

[57]

Yao, H. and Hamilton, H. J. 2006. Mining itemset utilities from transaction databases. Data Knowl. Eng. 59, 3.]]

Digital Library

[58]

Zbidi, N., Faiz, S., and Limam, M. 2006. On mining summaries by objective measures of interestingness. Mach. Learn. 62, 3, 175--198.]]

Digital Library

[59]

Zhang, H., Padmanabhan, B., and Tuzhilin, A. 2004. On the discovery of significant statistical quantitative rules. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle, WA. 374--383.]]

Digital Library

[60]

Zhong, N., Yao, Y. Y., and Ohshima, M. 2003. Peculiarity oriented multidatabase mining. IEEE Trans. Knowl. Data Engi. 15, 4, 952--960.]]

Digital Library

Cited By

Jin LLi PWang YYang Z(2025)Risk analysis of Arctic navigation using text mining (TM) and improved association rule mining (ARM) methodsRegional Studies in Marine Science10.1016/j.rsma.2024.10399081(103990)Online publication date: Jan-2025
https://doi.org/10.1016/j.rsma.2024.103990
Perez-Haro ADiaz-Perez A(2024)ABAC Policy Mining through Affiliation Networks and Biclique AnalysisInformation10.3390/info1501004515:1(45)Online publication date: 12-Jan-2024
https://doi.org/10.3390/info15010045
Nandhini SKannimuthu S(2024)Mining high average utility itemsets using artificial fish swarm algorithm with computed multiple minimum average utility thresholdsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23185246:1(1597-1613)Online publication date: 10-Jan-2024
https://doi.org/10.3233/JIFS-231852
Show More Cited By

Index Terms

Interestingness measures for data mining: A survey
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Interestingness measures for association rules: Combination between lattice and hash tables

There are many methods which have been developed for improving the time of mining frequent itemsets. However, the time for generating association rules were not put in deep research. In reality, if a database contains many frequent itemsets (from ...
A New Interestingness Measure of Association Rules
WGEC '08: Proceedings of the 2008 Second International Conference on Genetic and Evolutionary Computing

Discovering association rules is one of the most important tasks in data mining. The classical model of association rules mining is support-confidence, the interestingness measure of which is the confidence measure. The classical Interestingness measure ...
Mining interestingness measures for string pattern mining

A novel method of detecting interesting patterns in strings is presented. A common way to refine the results of pattern mining algorithms is by using interestingness measures. However, the set of appropriate measures differs for each domain and problem. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 38, Issue 3

2006

129 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/1132960

Issue’s Table of Contents

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2006

Published in CSUR Volume 38, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

834
Total Citations
View Citations
13,724
Total Downloads

Downloads (Last 12 months)164
Downloads (Last 6 weeks)18

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jin LLi PWang YYang Z(2025)Risk analysis of Arctic navigation using text mining (TM) and improved association rule mining (ARM) methodsRegional Studies in Marine Science10.1016/j.rsma.2024.10399081(103990)Online publication date: Jan-2025
https://doi.org/10.1016/j.rsma.2024.103990
Perez-Haro ADiaz-Perez A(2024)ABAC Policy Mining through Affiliation Networks and Biclique AnalysisInformation10.3390/info1501004515:1(45)Online publication date: 12-Jan-2024
https://doi.org/10.3390/info15010045
Nandhini SKannimuthu S(2024)Mining high average utility itemsets using artificial fish swarm algorithm with computed multiple minimum average utility thresholdsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23185246:1(1597-1613)Online publication date: 10-Jan-2024
https://doi.org/10.3233/JIFS-231852
Zhang YPaquette LBosch N(2024)Using Permutation Tests to Identify Statistically Sound and Nonredundant Sequential Patterns in Educational Event SequencesJournal of Educational and Behavioral Statistics10.3102/10769986241248772Online publication date: 9-May-2024
https://doi.org/10.3102/10769986241248772
Elyashiv IGilad AIsakov ETikochinsky TSomech A(2024)PD-Explain: A Unified Python-Native Framework for Query Explanations Over DataFramesProceedings of the VLDB Endowment10.14778/3685800.368590317:12(4473-4476)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685903
Xing JWang XJagadish H(2024)Data-Driven Insight Synthesis for Multi-Dimensional DataProceedings of the VLDB Endowment10.14778/3641204.364121117:5(1007-1019)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641211
Casanova ICampos MJuarez JGomariz ACanovas-Segura BLorente-Ros MLorente J(2024)Surprising and novel multivariate sequential patterns using odds ratio for temporal evolution in healthcareBMC Medical Informatics and Decision Making10.1186/s12911-024-02566-424:1Online publication date: 13-Jun-2024
https://doi.org/10.1186/s12911-024-02566-4
Fan WPang KLu PTian C(2024)Making It Tractable to Detect and Correct Errors in GraphsACM Transactions on Database Systems10.1145/370231549:4(1-75)Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702315
Fan WHan ZXie MZhang G(2024)Discovering Top-k Relevant and Diversified RulesProceedings of the ACM on Management of Data10.1145/36771312:4(1-28)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677131
Rajabiyazdi FRamesh SLangstone BKulik DPontalba J(2024)TextVista: NLP-Enriched Time-Series Text Data VisualizationsProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670971(1-12)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670971
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents