The Robustness of the p-Norm Algorithms

Gentile, Claudio

doi:10.1023/A:1026319107706

The Robustness of the p-Norm Algorithms

Published: December 2003

Volume 53, pages 265–299, (2003)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

The Robustness of the p-Norm Algorithms

Download PDF

Claudio Gentile¹

1976 Accesses
64 Citations
Explore all metrics

Abstract

We consider two on-line learning frameworks: binary classification through linear threshold functions and linear regression. We study a family of on-line algorithms, called p-norm algorithms, introduced by Grove, Littlestone and Schuurmans in the context of deterministic binary classification. We show how to adapt these algorithms for use in the regression setting, and prove worst-case bounds on the square loss, using a technique from Kivinen and Warmuth. As pointed out by Grove, et al., these algorithms can be made to approach a version of the classification algorithm Winnow as p goes to infinity; similarly they can be made to approach the corresponding regression algorithm EG in the limit. Winnow and EG are notable for having loss bounds that grow only logarithmically in the dimension of the instance space. Here we describe another way to use the p-norm algorithms to achieve this logarithmic behavior. With the way to use them that we propose, it is less critical than with Winnow and EG to retune the parameters of the algorithm as the learning task changes. Since the correct setting of the parameters depends on characteristics of the learning task that are not typically known a priori by the learner, this gives the p-norm algorithms a desireable robustness. Our elaborations yield various new loss bounds in these on-line settings. Some of these bounds improve or generalize known results. Others are incomparable.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Angluin, D. (1988). Queries and concept learning. Machine Learning, 2:4, 319–342.
Google Scholar
Auer, P., &; Warmuth, M. K. (1998). Tracking the best disjunction. Machine Learning, 32:2, 127–150.
Google Scholar
Auer, P., &; Gentile, C. (2000). Adaptive and self-confident on-line learning algorithms. In Proc. 13th Annu. Conf. on Comput. Learning Theory (pp. 107–117). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Azoury, K., &; Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43, 211–246.
Google Scholar
Barzdin, J. M., &; Frievald, R. V. (1972). On the prediction of general recursive functions. Soviet Math. Doklady, 13, 1224–1228.
Google Scholar
Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34, 123–135. Reprinted in Neurocomputing by Anderson and Rosenfeld.
Google Scholar
Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Physics, 7, 200- 217.
Google Scholar
Bylander, T. (1997). The binary exponentiated gradient algorithm for learning linear functions. In Proc. 8th Annu. Conf. on Comput. Learning Theory (pp. 184–192). ACM.
Cesa-Bianchi, N., Freund, Y., Helmbold, D. P., &; Warmuth, M. K. (1996). On-line prediction and conversion strategies. Machine Learning, 25, 71–110.
Google Scholar
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., &; Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44:3, 427–485.
Google Scholar
Cesa-Bianchi, N., Helmbold, P. D., &; Panizza, S. (1998). On bayes methods for on-line boolean prediction. Algorithmica, 22:1, 112–137.
Google Scholar
Cesa-Bianchi, N., Long, P., &; Warmuth, M. K. (1996). Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks, 7, 604–619.
Google Scholar
Censor, Y., &; Lent, A. (1981). An iterative row-action method for interval convex programming. Journal of Optimization Theory and Applications, 34:3, 321–353.
Google Scholar
Forster, J., &; Warmuth, M. K. (2000). Relative expected instantaneous loss bounds. In Proc. 13th Annu. Conf. on Comput. Learning Theory (pp. 90–99). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Freund, Y., &; Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37:3, 277–296.
Google Scholar
Gentile, C., &; Littlestone, N. (1999). The robustness of the p-norm algorithms. In Proc. 12th Annu. Conf. on Comput. Learning Theory (pp. 1–11). ACM.
Gentile, C., &; Warmuth, M. K. (1999). Linear Hinge Loss and Average margin. In Proc. Advances in Neural Information Processing Systems 11 (pp. 225-231). Cambridge, MA: MIT Press.
Google Scholar
Gordon, G. J. (1999). Regret bounds for prediction problems. In Proc. 12th Annu. Conf. on Comput. Learning Theory (pp. 29–40). ACM.
Grove, A. J., Littlestone, N., &; Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43:3, 173–210.
Google Scholar
Helmbold, D. P., Kivinen, J., &; Warmuth, M. K. (1999). Worst-case loss bounds for sigmoided linear neurons. IEEE Transactions on Neural Networks, 10:6, 1291–1304.
Google Scholar
Helmbold, D. P., &; Schapire, R. E. (1997). Predicting nearly as well as the best pruning of a decision tree. Machine Learning, 27, 51–68.
Google Scholar
Helmbold, D. P., &; Warmuth, M. K. (1995). On weak learning. Journal of Computer and System Sciences, 50:3, 551–573.
Google Scholar
Herbster, M., &; Warmuth, M. K. (1998a). Tracking the best expert. Machine Learning, 32:2, 151–178.
Google Scholar
Herbster, M., &; Warmuth, M. K. (1998b). Tracking the best regressor. In Proc. 11th Annu. Conf. on Comput. Learning Theory (pp. 24–31). ACM.
Jagota, A., &; Warmuth, M. K. (1998). Continuous and discrete time nonlinear gradient descent: Relative loss bounds and convergence. In Electronic Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics. Electronic, http://rutcor.rutgers.edu/~amai.
Kivinen, J., &; Warmuth, M. K. (1997). Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132:1, 1–64.
Google Scholar
Kivinen, J., &; Warmuth, M. K (2001). Relative loss bounds for multidimensional regression problems. Machine Learning, 45:3, 301–329.
Google Scholar
Kivinen, J., &; Warmuth, M. K. (1999). Averaging expert predictions. In Proc. 4th European Conference on Comput. learning Theory (pp. 153–167). Lecture Notes in Computer Science, Vol. 1572. Springer.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.
Google Scholar
Littlestone, N. (1989a). From on-line to batch learning. In Proc. 2nd Annu.Workshop on Comput. Learning Theory (pp. 269–284). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Littlestone, N. (1989b). Mistake Bounds and Logarithmic Linear-Threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz.
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow. In Proc. 4th Annu. Workshop on Comput. Learning Theory (pp. 147–156). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Littlestone, N., &; Mesterharm, C. (1997). An apobayesian relative of Winnow. In Proc. Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press.
Google Scholar
Littlestone, N., &; Mesterharm, C. (1999). A simulation study ofWinnow and related learning algorithms. Unpublished manuscript.
Littlestone, N., &; Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108:2, 212–261.
Google Scholar
Maruoka, A., Takimoto, E., &; Vovk, V. (1999). Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme. Theoretical Computer Science, to appear.
Novikov, A. B. J. (1962). On convergence proofs on perceptrons. In Proc. of the Symposium on the Mathematical Theory of Automata, vol. XII (pp. 615–622).
Google Scholar
Rockafellar, R. Convex Analysis, Princeton University press, 1970.
Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, D.C., 1962.
Google Scholar
Vovk, V. (1997). Competitive on-line linear regression. Technical Report CSD-TR-97-13, Department of Computer Science, Royal Holloway, University of London. Preliminary version in Proc. Advances in Neural Information Processing Systems 10 (pp. 364–370). Cambridge, MA: MIT Press.
Google Scholar
Vovk, V. (1990). Aggregating strategies. In Proc. 3rd Annu.Workshop on Comput. Learning Theory (pp. 371–383). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Widrow, B., Hoff, M. E. (1960). Adaptive switching circuits. 1960 IRE WESCON Conv. Record, Part 4 (pp. 96–104).
Google Scholar
Yamanishi, K. (1998). A decision-theoretic extension of stochastic complexity and its applications to learning. IEEE Information Theory, 44:4, 1424–1439.
Google Scholar

Download references

Author information

Authors and Affiliations

CRII, Università dell'Insubria, Via Ravasi, 2, 21100, Varese, Italy
Claudio Gentile

Authors

Claudio Gentile
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gentile, C. The Robustness of the p-Norm Algorithms. Machine Learning 53, 265–299 (2003). https://doi.org/10.1023/A:1026319107706

Download citation

Issue Date: December 2003
DOI: https://doi.org/10.1023/A:1026319107706

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Robustness of the p-Norm Algorithms

Abstract

Article PDF

Similar content being viewed by others

Optimal Learning Rates for Kernel Partial Least Squares

The Peaking Phenomenon in Semi-supervised Learning

The Use of Infinities and Infinitesimals for Sparse Classification Problems

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

The Robustness of the p-Norm Algorithms

Abstract

Article PDF

Similar content being viewed by others

Optimal Learning Rates for Kernel Partial Least Squares

The Peaking Phenomenon in Semi-supervised Learning

The Use of Infinities and Infinitesimals for Sparse Classification Problems

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation