Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Selecting the right objective measure for association analysis

Published: 01 June 2004 Publication History

Abstract

Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twenty-one measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when support-based pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns.

References

[1]
{1} R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of 1993 ACM-SIGMOD International Conference on Management of Data, Washington, DC, May 1993, pp. 207-216.
[2]
{2} R. Agrawal, T. Imielinski, A. Swami, Database mining: a performance perspective, IEEE Trans. Knowledge Data Eng. 5 (6) (1993) 914-925.
[3]
{3} F. Mosteller, Association and estimation in contingency tables, J. Am. Stat. Assoc. 63 (1968) 1-28.
[4]
{4} A. Agresti, Categorical Data Analysis, Wiley, New York, 1990.
[5]
{5} G. Piatetsky-Shapiro, Discovery, analysis and presentation of strong rules, in: G. Piatetsky-Shapiro, W. Frawley (Eds.), Knowledge Discovery in Databases, MIT Press, Cambridge, MA, 1991, pp. 229-248.
[6]
{6} R.J. Hilderman, H.J. Hamilton, B. Barber, Ranking the interestingness of summaries from data mining systems, in: Proceedings of the 12th International Florida Artificial Intelligence Research Symposium (FLAIRS'99), Orlando, FL, May 1999, pp. 100-106.
[7]
{7} R.J. Hilderman, H.J. Hamilton, Knowledge Discovery and Measures of Interest, Kluwer Academic Publishers, Norwell, MA, 2001.
[8]
{8} I. Kononenko, On biases in estimating multivalued attributes, in: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI'95), Montreal, Canada, 1995, pp. 1034-1040.
[9]
{9} R. Bayardo, R. Agrawal, Mining the most interesting rules, in: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999, pp. 145-154.
[10]
{10} M. Gavrilov, D. Anguelov, P. Indyk, R. Motwani, Mining the stock market: which measure is best? in: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000.
[11]
{11} Y. Zhao, G. Karypis, Criterion functions for document clustering: experiments and analysis. Technical Report TR01-40, Department of Computer Science, University of Minnesota, 2001.
[12]
{12} L.A. Goodman, W.H. Kruskal, Measures of association for cross-classifications, J. Am. Stat. Assoc. 49 (1968) 732-764.
[13]
{13} G.U. Yule, On the association of attributes in statistics, Philos. Trans. R. Soc. A 194 (1900) 257-319.
[14]
{14} G.U. Yule, On the methods of measuring association between two attributes, J. R. Stat. Soc. 75 (1912) 579-642.
[15]
{15} J. Cohen, A coefficient of agreement for nominal scales, Educ. Psychol. Meas. 20 (1960) 37-46.
[16]
{16} T. Cover, J. Thomas, Elements of Information Theory, New York, Wiley, 1991.
[17]
{17} P. Smyth, R.M. Goodman, Rule induction using information theory, in: Gregory Piatetsky-Shapiro, William Frawley (Eds.), Knowledge Discovery in Databases, MIT Press, Cambridge, MA, 1991, pp. 159-176.
[18]
{18} L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Chapman & Hall, New York, 1984.
[19]
{19} R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proceedings of the 20th VLDB Conference, Santiago, Chile, September 1994, pp. 487-499.
[20]
{20} P. Clark, R. Boswell, Rule induction with cn2: some recent improvements, in: Proceedings of the European Working Session on Learning EWSL-91, Porto, Portugal, 1991, pp. 151-163.
[21]
{21} S. Brin, R. Motwani, J. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, in: Proceedings of 1997 ACM-SIGMOD International Conference on Management of Data, Montreal, Canada, June 1997, pp. 255-264.
[22]
{22} S. Brin, R. Motwani, C. Silverstein, Beyond market baskets: generalizing association rules to correlations, in: Proceedings of 1997 ACM-SIGMOD International Conference on Management of Data, Tucson, Arizona, June 1997, pp. 255-264.
[23]
{23} C. Silverstein, S. Brin, R. Motwani, Beyond market baskets: generalizing association rules to dependence rules, Data Mining Knowledge Discovery 2 (1) (1998) 39-68.
[24]
{24} T. Brijs, G. Swinnen, K. Vanhoof, G. Wets, Using association rules for product assortment decisions: a case study, in: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999, pp. 254-260.
[25]
{25} C. Clifton, R. Cooley, Topcat: data mining for topic identification in a text corpus, in: Proceedings of the 3rd European Conference of Principles and Practice of Knowledge Discovery in Databases, Prague, Czech Republic, September 1999, pp. 174-183.
[26]
{26} W. DuMouchel, D. Pregibon, Empirical bayes screening for multi-item associations, in: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, 2001, pp. 67-76.
[27]
{27} E. Shortliffe, B. Buchanan, A model of inexact reasoning in medicine, Math. Biosci. 23 (1975) 351-379.
[28]
{28} C.C. Aggarwal, P.S. Yu, A new framework for itemset generation, in: Proceedings of the 17th Symposium on Principles of Database Systems, Seattle, WA, June 1998, pp. 18-24.
[29]
{29} S. Sahar, Y. Mansour, An empirical evaluation of objective interestingness criteria, in: SPIE Conference on Data Mining and Knowledge Discovery, Orlando, FL, April 1999, pp. 63-74.
[30]
{30} P.N. Tan, V. Kumar, Interestingness measures for association patterns: a perspective, in: KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, MA, August 2000.
[31]
{31} C.J. van Rijsbergen, Information Retrieval, 2nd Edition, Butterworths, London, 1979.
[32]
{32} W. Klosgen, Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora, Int. J. Intell. Systems 7 (7) (1992) 649-673.
[33]
{33} M. Kamber, R. Shinghal, Evaluating the interestingness of characteristic rules, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 1996, pp. 263-266.
[34]
{34} D. Hand, H. Mannila, P. Smyth, Principles of Data Mining, MIT Press, Cambridge, MA, 2001.
[35]
{35} A. George, W.H. Liu, Computer Solution of Large Sparse Positive Definite Systems, Series in Computational Mathematics, Prentice-Hall, Englewood Cliffs, NJ, 1981.

Cited By

View all
  • (2024)A Bayesian Framework for Measuring Association and Its Application to Emotional Dynamics in Web DiscourseCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651911(1450-1458)Online publication date: 13-May-2024
  • (2024)Measuring rule-based LTLf process specificationsInformation Systems10.1016/j.is.2023.102312120:COnline publication date: 1-Feb-2024
  • (2024)A novel software defect prediction approach via weighted classification based on association rule miningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107622129:COnline publication date: 16-May-2024
  • Show More Cited By

Recommendations

Reviews

Jose Hernandez-Orallo

The quality of an association pattern can be evaluated using many different measures, such as confidence, support, and interest. This makes it difficult to select the appropriate measure for a particular application. In this paper, the authors analyze the properties and correlations of 21 evaluation measures. First, they exhaustively enumerate whether each one satisfies eight mathematical properties. These properties highlight the main conceptual differences between the measures, and explain why they behave differently in general, and why they become more consistent when they are applied to associations that have a minimum support threshold. Additionally, the authors discuss their behavior after a standardization of the contingency table, and propose a standardization procedure. The experimental part of the paper has a more methodological goal. The authors develop a way for an expert to choose the measure that is most appropriate for a specific situation. To do this, the expert has to rank a small number of representative contingency tables, and then select the measure that ranks the tables most similarly. The work is restricted to measures for evaluating nonoriented frequent item sets with only two variables, so it is not extensible to general association patterns, or to oriented association rules. Nonetheless, the work is a very good reference, with a thorough conceptual and experimental analysis of 21 evaluation measures for association patterns. In addition, it proposes a practical methodology for choosing the measure that subjectively could be most adequate in a practical scenario according to an expert. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image Information Systems
Information Systems  Volume 29, Issue 4
Knowledge discovery and data mining (KDD 2002)
June 2004
92 pages

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 June 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Bayesian Framework for Measuring Association and Its Application to Emotional Dynamics in Web DiscourseCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651911(1450-1458)Online publication date: 13-May-2024
  • (2024)Measuring rule-based LTLf process specificationsInformation Systems10.1016/j.is.2023.102312120:COnline publication date: 1-Feb-2024
  • (2024)A novel software defect prediction approach via weighted classification based on association rule miningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107622129:COnline publication date: 16-May-2024
  • (2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 1-Jul-2024
  • (2024)Learning to Rank Based on Choquet Integral: Application to Association RulesAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2242-6_25(313-326)Online publication date: 7-May-2024
  • (2023)Symmetry properties and asymmetry evaluation of Bayesian Confirmation MeasuresData Mining and Knowledge Discovery10.1007/s10618-023-00942-837:6(2255-2280)Online publication date: 7-Aug-2023
  • (2023)Fast privacy-preserving utility mining algorithm based on utility-list dictionaryApplied Intelligence10.1007/s10489-023-04791-253:23(29363-29377)Online publication date: 1-Dec-2023
  • (2023)Association rules and decision rulesStatistical Analysis and Data Mining10.1002/sam.1162016:5(411-435)Online publication date: 1-Sep-2023
  • (2022)Measuring the interestingness of temporal logic behavioral specifications in process miningInformation Systems10.1016/j.is.2021.101920107:COnline publication date: 1-Jul-2022
  • (2022)Thoughts on women entrepreneurship: an application of market basket analysis with google trends dataSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07355-726:19(10035-10047)Online publication date: 1-Oct-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media