Abstract
Kernel-based methods have outstanding performance on many machine learning and pattern recognition tasks. However, they are sensitive to kernel selection, they may have low tolerance to noise, and they can not deal with mixed-type or missing data. We propose to derive a novel kernel from an ensemble of decision trees. This leads to kernel methods that naturally handle noisy and heterogeneous data with potentially non-randomly missing values. We demonstrate excellent performance of regularized least square learners based on such kernels.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36, 525–536 (1999)
Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: NIPS, pp. 196–202 (2000)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J.S.: On kernel-target alignment. In: Proc. NIPS, pp. 367–373 (2001)
Cucker, F., Smale, S.: On the mathematial foundations of learning. Bulletin of the American Mathematical Society 89(1), 1–49 (2001)
Cucker, F., Smale, S.: Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations of Computational Mathematics 2(4), 413–428 (2003)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. 13th ICML (1996)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Technical report, Dept. of Statistics, Stanford University (1999)
Friedman, J.H.: Stochastic gradient boosting. Technical report, Dept. of Statistics, Stanford University (1999)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17. MIT Press, Cambridge (2005)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Hoerl, A., Kennard, R.: Ridge regression: Applications to nonorthogonal problems. Technometrics 12(3), 69–82 (1970)
Hoerl, A., Kennard, R.: Ridge regression; biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67 (1970)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 5, 27–72 (2004)
Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Technical Report 024, Massachusetts Institute of Technology, Cambridge, MA (2002) AI Memo #2002-024
Poggio, T., Rifkin, R., Mukherjee, S., Niyogi, P.: General conditions for predictivity in learning theory. Nature 428, 419–422 (2004)
Poggio, T., Rifkin, R., Mukherjee, S., Rakhlin, A.: Bagging regularizes. CBCL Paper 214, Massachusetts Institute of Technology, Cambridge, MA (February 2002), AI Memo #2002-003.
Poggio, T., Smale, S.: The mathematics of learning: Dealing with data. Notices of the American Mathematical Society (AMS) 50(5), 537–544 (2003)
Rifkin, R.: Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, MIT (2002)
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. W.H.Wingston, Washington (1977)
Valentini, G., Dietterich, T.: Low bias bagged support vector machines. In: Proc ICML, pp. 752–759 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Torkkola, K., Tuv, E. (2005). Ensemble Learning with Supervised Kernels. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_39
Download citation
DOI: https://doi.org/10.1007/11564096_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)