Abstract
With the rapid growth of available data, learning models are also gaining in sizes. As a result, end-users are often faced with classification results that are hard to understand. This problem also involves rule-based classifiers, which usually concentrate on predictive accuracy and produce too many rules for a human expert to interpret. In this paper, we tackle the problem of pruning rule classifiers while retaining their descriptive properties. For this purpose, we analyze the use of confirmation measures as representatives of interestingness measures designed to select rules with desirable descriptive properties. To perform the analysis, we put forward the CM-CAR algorithm, which uses interestingness measures during rule pruning. Experiments involving 20 datasets show that out of 12 analyzed confirmation measures \(c_1\), F, and Z are best for general-purpose rule pruning and sorting. An additional analysis comparing results on balanced/imbalanced and binary/multi-class problems highlights also N, S, and \(c_3\) as measures for sorting rules on binary imbalanced datasets. The obtained results can be used to devise new classifiers that optimize confirmation measures during model training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Sources available at: http://www.cs.put.poznan.pl/dbrzezinski/software.php.
- 2.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014)
Carnap, R.: Logical Foundations of Probability. University of Chicago Press, Chicago (1962)
Ceci, M., Appice, A.: Spatial associative classification: propositional vs structural approach. J. Intell. Inf. Syst. 27(3), 191–213 (2006)
Christensen, D.: Measuring confirmation. J. Philos. 96, 437–461 (1999)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning (ICML 1995), pp. 115–123 (1995)
Crupi, V., Tentori, K., Gonzalez, M.: On Bayesian measures of evidential support: theoretical and empirical issues. Philos. Sci. 74, 229–252 (2007)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Domingos, P.: The rough set based rule induction technique for classification problems. In. In Proceedings of the Sixth IEEE International Conference on Tools with Artificial Intelligence, pp. 704–707 (1994)
Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999). doi:10.1007/3-540-46846-3_4
Eells, E.: Rational Decision and Causality. Cambridge University Press, Cambridge (1982)
Fitelson, B.: The plurality of Bayesian measures of confirmation and the problem of measure sensitivity. Philos. Sci. 66, 362–378 (1999)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann, Burlington (1998)
Geng, L., Hamilton, H.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3) (2006). Article no. 9
Glass, D.H.: Confirmation measures of association rule interestingness. Knowl. Based Syst. 44, 65–77 (2013)
Greco, S., Słowiński, R., Szczȩch, I.: Properties of rule interestingness measures and alternative approaches to normalization of measures. Inf. Sci. 216, 1–16 (2012)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., Burlington (2011)
Japkowicz, N.: Assessment metrics for imbalanced learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 187–206. Wiley-IEEE Press, Hoboken (2013)
Kantardzic, M.: Data-mining applications. In: Data Mining: Concepts, Models, Methods, and Algorithms, 2 edn, pp. 496–509. Wiley (2011)
Kemeny, J., Oppenheim, P.: Degrees of factual support. Philos. Sci. 19, 307–324 (1952)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 80–86 (1998)
McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20(1), 39–61 (2005)
Mortimer, H.: The Logic of Induction. Prentice Hall, Paramus (1988)
Napierala, K., Stefanowski, J.: Addressing imbalanced data with argument based rule learning. Expert Syst. Appl. 42(24), 9468–9481 (2015)
Nozick, R.: Philosophical Explanations. Clarendon Press, Oxford (1981)
Stefanowski, J.: The rough set based rule induction technique for classification problems. In. In Proceedings of 6th European Conference on Intelligent Techniques and Soft Computing EUFIT, vol. 98 (1998)
Acknowledgements
This work was supported by the National Science Centre grant DEC-2013/11/B/ST6/00963. D. Brzezinski acknowledges the support of an FNP START scholarship and Institute of Computing Science Statutory Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Brzezinski, D., Grudziński, Z., Szczęch, I. (2017). Bayesian Confirmation Measures in Rule-Based Classification. In: Appice, A., Ceci, M., Loglisci, C., Masciari, E., Raś, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2016. Lecture Notes in Computer Science(), vol 10312. Springer, Cham. https://doi.org/10.1007/978-3-319-61461-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-61461-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61460-1
Online ISBN: 978-3-319-61461-8
eBook Packages: Computer ScienceComputer Science (R0)