Abstract
Decision trees are well-known and established models for classification and regression. In this paper, we focus on the estimation and the minimization of the misclassification rate of decision tree classifiers. We apply Lidstone’s Law of Succession for the estimation of the class probabilities and error rates. In our work, we take into account not only the expected values of the error rate, which has been the norm in existing research, but also the corresponding reliability (measured by standard deviations) of the error rate. Based on this estimation, we propose an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly, and compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5.
Article PDF
Similar content being viewed by others
Explore related subjects
Find the latest articles, discoveries, and news in related topics.Avoid common mistakes on your manuscript.
References
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.
Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In EWSL-91: Proceedings of the European working session on learning on machine learning (pp. 138–150). New York: Springer.
Chernoff, H. (1952). A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23, 493–507.
Dong, M., & Kothari, R. (2001). Classifiability based pruning of decision trees. In IJCNN (Vol. 3, pp. 1739–1743).
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (2nd ed.). Hardcover: Wiley-Interscience.
Elomaa, T., & Kääriäinen, M. (2001). An analysis of reduced error pruning. Journal of Artificial Intelligence Research, 15, 163–187.
Esposito, F., Malerba, D., & Semeraro, G. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 476–491.
Freund, Y. (1998). Self bounding learning algorithms. In COLT: Proceedings of the workshop on computational learning theory. San Mateo: Morgan Kaufmann.
Good, I. J. (1965). The estimation of probabilities: an essay on modern Bayesian methods (No. 30). Cambridge: MIT Press.
Good, I. J. (1967). A Bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society, Series B (Methodological), 29(3), 399–431.
Hall, L. O., Bowyer, K. W., Banfield, R. E., Eschrich, S., & Collins, R. (2003). Is error-based pruning redeemable? International Journal on Artificial Intelligence Tools, 12(3), 249–264.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.
Kääriäinen, M., & Elomaa, T. (2003). Rademacher penalization over decision tree prunings. In N. Lavrac, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Lecture notes in computer science (Vol. 2837, pp. 193–204). Berlin: Springer.
Kääriäinen, M., Malinen, T., & Elomaa, T. (2004). Selective Rademacher penalization and reduced error pruning of decision trees. Journal of Machine Learning Research, 5, 1107–1126.
Kearns, M. J., & Mansour, Y. (1998). A fast, bottom-up decision tree pruning algorithm with near-optimal generalization. In ICML ’98: proceedings of the fifteenth international conference on machine learning (pp. 269–277). San Francisco: Morgan Kaufmann.
Kijsirikul, B., & Chongkasemwongse, K. (2001). Decision tree pruning using backpropagation neural networks. In IJCNN (Vol. 3, pp. 1876–1880).
Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple Bayes. In M. van Someren & G. Widmer (Eds.), Lecture notes in computer science (Vol. 1224, pp. 78–87). Berlin: Springer.
Krichevskiy, R. E. (1998). Laplace’s law of succession and universal encoding. IEEE Transactions on Information Theory, 44(1), 296–303.
Lidstone, G. J. (1920). Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries, 8, 182–192.
Mansour, Y. (1997). Pessimistic decision tree pruning based on tree size. In Proceedings of the 14th international conference on machine learning (pp. 195–201). San Mateo: Morgan Kaufmann.
Mansour, Y., & McAllester, D. A. (2000). Generalization bounds for decision trees. In COLT ’00: proceedings of the thirteenth annual conference on computational learning theory (pp. 69–74). San Francisco: Morgan Kaufmann.
Mehta, M., Rissanen, J., & Agrawal, R. (1995). MDL-based decision tree pruning. In KDD (pp. 216–221).
Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. Available from http://www.ics.uci.edu/~mlearn/MLRepository.html.
Niblett, T., & Bratko, I. (1986). Learning decision rules in noisy domains. In Proceedings of expert systems ’86, the 6th annual technical conference on research and development in expert systems III (pp. 25–34).
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo: Morgan Kaufmann.
Quinlan, J. R. (1999). Simplifying decision trees. International Journal of Human-Computer Studies, 51(2), 497–510.
Ristad, E. S. (1995). A natural law of succession (Tech. Rep. No. TR-495-95). Princeton University.
Stone, M. (1978). Cross-validation: a review. Mathematics, Operations and Statistics, 9, 127–140.
Vardeman, S. B., & Jobe, J. M. (2001). Basic engineering data collection and analysis. Brooks/Cole Thompson Learning.
Windeatt, T., & Ardeshir, G. (2001). An empirical comparison of pruning methods for ensemble classifiers. In IDA ’01: proceedings of the 4th international conference on advances in intelligent data analysis (pp. 208–217). London: Springer.
Zemanian, A. (1987). Distribution theory and transform analysis: an introduction to generalized functions, with applications. New York: Dover.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Hendrik Blockeel.
Rights and permissions
About this article
Cite this article
Zhong, M., Georgiopoulos, M. & Anagnostopoulos, G.C. A k-norm pruning algorithm for decision tree classifiers based on error rate estimation. Mach Learn 71, 55–88 (2008). https://doi.org/10.1007/s10994-007-5044-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5044-4