Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Interestingness measures for data mining: A survey

Published: 30 September 2006 Publication History

Abstract

Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.

References

[1]
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases. Santiago, Chile. 487--499.]]
[2]
Barber, B. and Hamilton, H. J. 2003. Extracting share frequent itemsets with infrequent subsets. Data Mining Knowl. Discovery 7, 2, 153--185.]]
[3]
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., and Lakhal, L. 2000. Mining minimal nonredundant association rules using frequent closed itemsets. In Proceedings of the Ist International Conference on Computational Logic. London, UK. 972--986.]]
[4]
Bay, S. D. and Pazzani, M. J. 1999. Detecting change in categorical data: Mining contrast sets. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99). San Diego, CA. 302--306.]]
[5]
Bayardo, R. J. and Agrawal R. 1999. Mining the most interesting rules. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99). San Diego, CA. 145--154.]]
[6]
Breiman, L., Freidman, J., Olshen, R., and S tone, C. 1984. Classification and Regression Trees. Wadsworth and Brooks, Pacific Grove, CA.]]
[7]
Cai, C. H., Fu, A. W., Cheng, C. H., and Kwong, W. W. 1998. Mining association rules with weighted items. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS '98). Cardiff, UK. 68--77.]]
[8]
Carter, C. L., Hamilton, H. J., and Cercone, N. 1997. Share-Based measures for itemsets. In Proceedings of the Ist European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '97). Trondheim, Norway. 14--24.]]
[9]
Carvalho, D. R. and Freitas, A. A. 2000. A genetic algorithm-based solution for the problem of small disjuncts. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2000). Lyon, France. 345--352.]]
[10]
Chan, R., Yang, Q., and Shen, Y. 2003. Mining high-utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM '03). Melbourne, FL. 19--26.]]
[11]
Clark, P. and Boswell, R. 1991. Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European Working Session on Learning (EWSL '91). Porto, Portugal. 151--163.]]
[12]
Dong, G. and Li, J. 1998. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In Proceedings of the 2nd Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD-98). Melbourne, Australia. 72--86.]]
[13]
Fabris, C. C. and Freitas, A. A. 2001. Incorporating deviation-detection functionality into the OLAP paradigm. In Proceedings of the 16th Brazilian Symposium on Databases (SBBD 2001). Rio de Janeiro, Brazil. 274--285.]]
[14]
Fayyad, U. M., Piatetsky-Shapiro, G., and S myth, P. 1996. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, U. M. Fayyad et al., Eds. MIT Press. Cambridge, MA, 1--34.]]
[15]
Forsyth, R. S., Clarke, D. D., and Wright, R. L. 1994. Overfitting revisited: An information-theoretic approach to simplifying discrimination trees. J. Exp. Theor. Artif. Intell. 6, 289--302.]]
[16]
Freitas, A. A. 1998. On objective measures of rule surprisingness. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '98). Nantes, France. 1--9.]]
[17]
Fürnkranz, J. and Flach, P. A. 2005. ROC ‘n’ rule learning: Towards a better understanding of covering algorithms. Mach. Learn. 58, 1, 39--77.]]
[18]
Gray, B. and Orlowska, M. E. 1998. CCAIIA: Clustering categorical attributes into interesting association rules. In Proceedings of the 2nd Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98). Melbourne, Australia. 132--143.]]
[19]
Hamilton, H. J., Geng, L., Findlater, L., and R andall, D. J. 2006. Efficient spatio-temporal data mining with GenSpace graphs. J. Appl. Logic 4, 2, 192--214.]]
[20]
Hilderman, R. J., Carter, C. L., Hamilton, H. J., and Cercone, N. 1998. Mining market basket data using share measures and characterized itemsets. In Proceedings of the 2nd Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD-98). Melbourne, Australia. 72--86.]]
[21]
Hilderman, R. J. and Hamilton, H. J. 2001. Knowledge Discovery and Measures of Interest. Kluwer Academic, Boston, MA.]]
[22]
Hoaglin, D. C., Mosteller, F., and Tukey, J. W., Eds. 1985. Exploring Data Tables, Trends, and Shapes. Wiley, New York.]]
[23]
Jaroszewicz, S. and Simovici, D. A. 2001. A general measure of rule interestingness. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2001). Freiburg, Germany. 253--265.]]
[24]
Klosgen, W. 1996. Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining, U. M. Fayyad et al., Eds. MIT Press, Cambridge, MA, 249--271.]]
[25]
Knorr, E. M., Ng, R. T., and Tucakov, V. 2000. Distance based outliers: Algorithms and applications. Int. J. Very Large Databases 8, 237--253.]]
[26]
Lavrac, N., Flach, P., and Zupan, B. 1999. Rule evaluation measures: A unifying view. In Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP '99). Bled, Slovenia. Springer-Verlag, 174--185.]]
[27]
Lenca, P., Meyer, P., Vaillant, B., and L allich, S. 2004. A multicriteria decision aid for interestingness measure selection. Tech. Rep. LUSSI-TR-2004-01-EN, May 2004. LUSSI Department, GET/ENST, Bretagne, France.]]
[28]
Li, G. and Hamilton, H. J. 2004. Basic association rules. In Proceedings of the 4th SIAM International Conference on Data Mining. Orlando, FL. 166--177.]]
[29]
Ling, C., Chen, T., Yang, Q., and Chen, J. 2002. Mining optimal actions for profitable CRM. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM '02). Maebashi City, Japan. 767--770.]]
[30]
Liu, B., Hsu, W., and Chen, S. 1997. Using general impressions to analyze discovered classification rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97). Newport Beach, CA. 31--36.]]
[31]
Liu, B., Hsu, W., Mun, L., and Lee, H. 1999. Finding interesting patterns using user expectations. IEEE Trans. Knowl. Data Eng. 11, 6, 817--832.]]
[32]
Lu, S., Hu, H., and Li, F. 2001. Mining weighted association rules. Intell. Data Anal. 5, 3, 211--225.]]
[33]
McGarry, K. 2005. A survey of interestingness measures for knowledge discovery. Knowl. Eng. Review 20, 1, 39--61.]]
[34]
Murthy, S. K. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining Knowl. Discovery 2, 4, 345--389.]]
[35]
Ohsaki, M., Kitaguchi, S., Okamoto, K., Yokoi, H., and Yamaguchi, T. 2004. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In Proceedings of the 8th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2004). Pisa, Italy. 362--373.]]
[36]
Padmanabhan, B. and Tuzhilin, A. 1998. A belief-driven method for discovering unexpected patterns. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98). New York. 94--100.]]
[37]
Padmanabhan, B. and Tuzhilin, A. 2000. Small is beautiful: Discovering the minimal set of unexpected patterns. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD 2000). Boston, MA. 54--63.]]
[38]
Pagallo, G. and Haussler, D. 1990. Boolean feature discovery in empirical leaning. Mach. Learn. 5, 1, 71--99.]]
[39]
Piatetsky-Shapiro, G. 1991. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley, Eds. MIT Press, Cambridge, MA, 229--248.]]
[40]
Piatetsky-Shapiro, G. and Matheus, C. 1994. The interestingness of deviations. In Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94). Seattle, WA. 25--36.]]
[41]
Quinlan, J. R. 1986. Induction of decision trees. Mach. Learn. 1, 1, 81--106.]]
[42]
Sahar, S. 1999. Interestingness via what is not interesting. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD-99). San Diego, CA. 332--336.]]
[43]
Sarawagi, S. 1999. Explaining differences in multidimensional aggregates. In Proceedings of the 25th International Conference on Very Large Databases (VLDB '99). Edinburgh, U. K. 42--53.]]
[44]
Sarawagi, S., Agrawal, R., and Megiddo, N. 1998. Discovery-driven exploration of OLAP data cubes. In Proceedings of the 6th International Conference of Extending Database Technology (EDBT '98). Valencia, Spain. 168--182.]]
[45]
Shen, Y. D., Zhang, Z., and Yang, Q. 2002. Objective-Oriented utility-based association mining. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM '02). Maebashi City, Japan. 426--433.]]
[46]
Silberschatz, A. and Tuzhilin, A. 1995. On subjective measures of interestingness in knowledge discovery. In Proceedings of the Ist International Conference on Knowledge Discovery and Data Mining (KDD-95). Montreal, Canada. 275--281.]]
[47]
Silberschatz, A. and Tuzhilin, A. 1996. What makes patterns interesting in knowledge discovery systems. IEEE Trans. Knowl. Data Eng. 8, 6, 970--974.]]
[48]
Tan, P. and Kumar, V. 2000. Interestingness measures for association patterns: A perspective. Tech. Rep. 00-036, Department of Computer Science, University of Minnesota.]]
[49]
Tan, P., Kumar, V., and Srivastava, J. 2002. Selecting the right interestingness measure for association patterns. In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD 2002). Edmonton, Canada. 32--41.]]
[50]
Vaillant, B., Lenca, P., and Lallich, S. 2004. A clustering of interestingness measures. In Proceedings of the 7th International Conference on Discovery Science (DS 2004). Padova, Italy. 290--297.]]
[51]
Vitanyi, P. M. B. and Li, M. 2000. Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans. Inf. Theory 46, 2, 446--464.]]
[52]
Wang, K., Zhou, S., and Han, J. 2002. Profit mining: From patterns to actions. In Proceedings of the 8th Conference on Extending Database Technology (EDBT 2002). Prague, Czech Republic. 70--87.]]
[53]
Webb, G. I. and Brain, D. 2002. Generality is predictive of prediction accuracy. In Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop (PKAW 2002). Tokyo. 117--130.]]
[54]
Yao, Y. Y., Chen, Y. H., and Yang, X. D. 2006. A measurement-theoretic foundation of rule interestingness evaluation. In Foundations and Novel Approaches in Data Mining, T. Y. Lin et al., Eds. Springer-Verlag, Berlin, 41--59.]]
[55]
Yao, Y. Y. and Zhong, N. 1999. An analysis of quantitative measures associated with rules. In Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-99). Beijing, China. 479--488.]]
[56]
Yao, H., Hamilton, H. J., and Butz, C. J. 2004. A foundational approach for mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. Orlando, FL. 482--486.]]
[57]
Yao, H. and Hamilton, H. J. 2006. Mining itemset utilities from transaction databases. Data Knowl. Eng. 59, 3.]]
[58]
Zbidi, N., Faiz, S., and Limam, M. 2006. On mining summaries by objective measures of interestingness. Mach. Learn. 62, 3, 175--198.]]
[59]
Zhang, H., Padmanabhan, B., and Tuzhilin, A. 2004. On the discovery of significant statistical quantitative rules. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle, WA. 374--383.]]
[60]
Zhong, N., Yao, Y. Y., and Ohshima, M. 2003. Peculiarity oriented multidatabase mining. IEEE Trans. Knowl. Data Engi. 15, 4, 952--960.]]

Cited By

View all
  • (2024)ABAC Policy Mining through Affiliation Networks and Biclique AnalysisInformation10.3390/info1501004515:1(45)Online publication date: 12-Jan-2024
  • (2024)Mining high average utility itemsets using artificial fish swarm algorithm with computed multiple minimum average utility thresholdsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23185246:1(1597-1613)Online publication date: 10-Jan-2024
  • (2024)Using Permutation Tests to Identify Statistically Sound and Nonredundant Sequential Patterns in Educational Event SequencesJournal of Educational and Behavioral Statistics10.3102/10769986241248772Online publication date: 9-May-2024
  • Show More Cited By

Index Terms

  1. Interestingness measures for data mining: A survey

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 38, Issue 3
    2006
    129 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/1132960
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 September 2006
    Published in CSUR Volume 38, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Knowledge discovery
    2. association rules
    3. classification rules
    4. interest measures
    5. interestingness measures
    6. summaries

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)158
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ABAC Policy Mining through Affiliation Networks and Biclique AnalysisInformation10.3390/info1501004515:1(45)Online publication date: 12-Jan-2024
    • (2024)Mining high average utility itemsets using artificial fish swarm algorithm with computed multiple minimum average utility thresholdsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23185246:1(1597-1613)Online publication date: 10-Jan-2024
    • (2024)Using Permutation Tests to Identify Statistically Sound and Nonredundant Sequential Patterns in Educational Event SequencesJournal of Educational and Behavioral Statistics10.3102/10769986241248772Online publication date: 9-May-2024
    • (2024)Data-Driven Insight Synthesis for Multi-Dimensional DataProceedings of the VLDB Endowment10.14778/3641204.364121117:5(1007-1019)Online publication date: 1-Jan-2024
    • (2024)Surprising and novel multivariate sequential patterns using odds ratio for temporal evolution in healthcareBMC Medical Informatics and Decision Making10.1186/s12911-024-02566-424:1Online publication date: 13-Jun-2024
    • (2024)MicroNet: Operation Aware Root Cause Identification of Microservice System AnomaliesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.338755221:4(4255-4267)Online publication date: Aug-2024
    • (2024) AC.Rank A : Rule Ranking Method via Aggregation of Objective Measures for Associative Classifiers IEEE Access10.1109/ACCESS.2024.341913012(88862-88882)Online publication date: 2024
    • (2024)Privacy-preserving association rule mining: a survey of techniques for sensitive rule identification and enhanced data protectionInternational Journal of Computers and Applications10.1080/1206212X.2024.230708646:4(252-265)Online publication date: 29-Jan-2024
    • (2024)A novel algorithm weighting different importance of classes in enhanced association rulesKnowledge-Based Systems10.1016/j.knosys.2024.111741294:COnline publication date: 21-Jun-2024
    • (2024)A scalable, distributed framework for significant subgroup discoveryKnowledge-Based Systems10.1016/j.knosys.2023.111335284:COnline publication date: 25-Jan-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media