Enlarging the Margins in Perceptron Decision Trees

Bennett, Kristin P.; Cristianini, Nello; Shawe-Taylor, John; Wu, Donghui

doi:10.1023/A:1007600130808

Enlarging the Margins in Perceptron Decision Trees

Published: December 2000

Volume 41, pages 295–313, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Enlarging the Margins in Perceptron Decision Trees

Download PDF

Kristin P. Bennett¹,
Nello Cristianini²,
John Shawe-Taylor³ &
…
Donghui Wu⁴

971 Accesses
54 Citations
Explore all metrics

Abstract

Capacity control in perceptron decision trees is typically performed by controlling their size. We prove that other quantities can be as relevant to reduce their flexibility and combat overfitting. In particular, we provide an upper bound on the generalization error which depends both on the size of the tree and on the margin of the decision nodes. So enlarging the margin in perceptron decision trees will reduce the upper bound on generalization error. Based on this analysis, we introduce three new algorithms, which can induce large margin perceptron decision trees. To assess the effect of the large margin bias, OC1 (Journal of Artificial Intelligence Research, 1994, 2, 1–32.) of Murthy, Kasif and Salzberg, a well-known system for inducing perceptron decision trees, is used as the baseline algorithm. An extensive experimental study on real world data showed that all three new algorithms perform better or at least not significantly worse than OC1 on almost every dataset with only one exception. OC1 performed worse than the best margin-based method on every dataset.

Article PDF

Perceptron: An Old Folk Song Sung on a New Stage

An Empirically-Sourced Heuristic for Predetermining the Size of the Hidden Layer of a Multi-layer Perceptron for Large Datasets

Tractability from overparametrization: the example of the negative perceptron

Article 22 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Alon, N., Ben-David, S., Cesa-Bianchi, N., & Haussler, D. (1997). Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44(4), 615–631.
Google Scholar
Anthony, M. & Bartlett, P. (1994). Function learning from interpolation. Technical Report (An extended abstract appeared in Computational Learning Theory, Proceedings 2nd European Conference, EuroCOLT'95, edited by Paul Vitanyi (Lecture Notes in Artificial Intelligence, vol. 904) Springer-Verlag, Berlin, 1995, pp. 211–221).
Google Scholar
Bartlett, P. L. & Long, P. M. (1995). Prediction, learning, uniform convergence, and scale-sensitive dimensions. Preprint, Department of Systems Engineering, Australian National University.
Bartlett, P., Long, P., & Williamson, R. (1996). Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 52(3), 434–452.
Google Scholar
Bartlett, P. & Shawe-Taylor, J. (1998). Generalization performance of support vector machines and other pattern classifiers. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods-support vector learning (pp. 43–54). Cambridge, USA: MIT Press.
Google Scholar
Bennett, K. & Mangasarian, O. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23–34.
Google Scholar
Bennett, K. & Mangasarian, O. (1994a). Multicategory discrimination via linear programming. Optimization Methods and Software, 3, 29–39.
Google Scholar
Bennett, K. & Mangasarian, O. (1994b). Serial and parallel multicategory discrimination. SIAM Journal on Optimization, 4(4), 722–734.
Google Scholar
Bennett, K., Wu, D., & Auslender, L. (1998). On support vector decision trees for database marketing. R.P.I. Math Report No. 98–100, Rensselaer Polytechnic Institute, Troy, NY.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.
Google Scholar
Broadley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.
Google Scholar
Cortes, C. & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Google Scholar
Cristianini, N., Shawe-Taylor, J., & Sykacek, P. (1998). Bayesian classifiers are large margin hyperplanes in a Hilbert space. In J. Shavlik (Ed.), Machine Learning: Proceedings of the Fifteenth International Conference (pp. 109–117). San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Diettrich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.
Google Scholar
Kearns, M. & Schapire, R. (1990). Efficient distribution-free learning of probabilistic concepts. In Proceedings of the 31st Symposium on the Foundations of Computer Science (pp. 382–391). Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstraping for accuracy estimation and model selection. In International Joint Conference on Artifical Intelligence (pp. 1137–1143). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Mangasarian, O., Setiono, R., & Wolberg, W. (1990). Pattern recognition via linear programming: Theory and application to medical diagnosis. In T. F. Coleman & Y. Li (Eds.), Proceedings on Workshop on Large-Scale Numerical Optimization (pp. 22–31). Philadelphia, PA: SIAM.
Google Scholar
Murthy, S. K., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–32.
Google Scholar
Neal, R. N. (1998). Assessing relevance determination methods using DELVE generalization. In C. M. Bishop (Ed.), Neural networks and machine learning (pp. 97–129). Springer-Verlag.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.
Quinlan, J. R. & Rivest, R. (1989). Learning decision trees using the minimum description length principle. Information and Computation 80, 227–248.
Google Scholar
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1(3), 317–327.
Google Scholar
Sankar, A. & Mammone, R. J. (1993). Growing and pruning neural tree networks. IEEE Transactions on Computers, 42, 291–299.
Google Scholar
Schapire, R., Freund, Y., Bartlett, P. L., & Sun Lee, W. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. In D. H. Fisher, Jr. (Ed.), Proceedings of International Conference on Machine Learning, ICML'97, (pp. 322–330). Nashville, Tennessee. Morgan Kaufmann Publishers.
Google Scholar
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1996). Structural risk minimization over data-dependent hierarchies, IEEE Transactions on Information Theory, 44(5), 1926–1940.
Google Scholar
Sirat, J. A. & Nadal, J.-P. (1990). Neural trees: A new tool for classification. Network, 1, 423–438. University of California, Irvine-Machine Learning Repository, http://www.ics.uci.edu/∼mlearn/ MLRepository.html.
Google Scholar
Utgoff, P. E. (1989). Perceptron trees: A case study in hybrid concept representations. Connection Science, 1, 377–391.
Google Scholar
Vapnik, V. (1982). Estimation of dependences based on empirical data. New York: Springer-Verlag.
Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
Google Scholar
Vapnik, V. & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Applications, 16, 264–280.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Mathematical Sciences, Rensselaer Polytechnic Institute, 110 8th St., Troy, NY, 12180, USA
Kristin P. Bennett
Dept of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
Nello Cristianini
Dept of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK
John Shawe-Taylor
Dept of Mathematical Sciences, Rensselaer Polytechnic Institute, 110 8th St., Troy, NY, 12180, USA
Donghui Wu

Authors

Kristin P. Bennett
View author publications
You can also search for this author in PubMed Google Scholar
Nello Cristianini
View author publications
You can also search for this author in PubMed Google Scholar
John Shawe-Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Donghui Wu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bennett, K.P., Cristianini, N., Shawe-Taylor, J. et al. Enlarging the Margins in Perceptron Decision Trees. Machine Learning 41, 295–313 (2000). https://doi.org/10.1023/A:1007600130808

Download citation

Issue Date: December 2000
DOI: https://doi.org/10.1023/A:1007600130808

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Enlarging the Margins in Perceptron Decision Trees

Abstract

Article PDF

Similar content being viewed by others

Perceptron: An Old Folk Song Sung on a New Stage

An Empirically-Sourced Heuristic for Predetermining the Size of the Hidden Layer of a Multi-layer Perceptron for Large Datasets

Tractability from overparametrization: the example of the negative perceptron

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Enlarging the Margins in Perceptron Decision Trees

Abstract

Article PDF

Similar content being viewed by others

Perceptron: An Old Folk Song Sung on a New Stage

An Empirically-Sourced Heuristic for Predetermining the Size of the Hidden Layer of a Multi-layer Perceptron for Large Datasets

Tractability from overparametrization: the example of the negative perceptron

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation