Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Mistake bounds and logarithmic linear-threshold learning algorithms
Publisher:
  • University of California at Santa Cruz
  • Computer and Information Sciences Dept. 265 Applied Sciences Building Santa Cruz, CA
  • United States
Order Number:UMI Order No: GAX89-26506
Reflects downloads up to 22 Sep 2024Bibliometrics
Skip Abstract Section
Abstract

We consider a problem of learning from examples. In each of a sequence of trials the learner observes an instance to be classified and must respond with a prediction of its correct classification. Following the prediction the learner is told the correct response. We consider the two category case, and work for the most part with instances composed of a number of Boolean attributes or, more generally, chosen from (0, 1) $\sp{n}$. Initially, we assume that the correct response in each trial is a function of the corresponding instance, this target function being chosen from some target class of $\{0,1\}$-valued functions. We relax this assumption somewhat later.

We focus on evaluation of on-line predictive performance, counting the number of mistakes made by the learner during the learning process. For certain target classes we have found algorithms for which we can prove excellent mistake bounds, using no probabilistic assumptions. In the first part of the dissertation we study the properties of such worst-case mistake bounds.

In the central part of this dissertation, we present a group of linear-threshold algorithms that are particularly well suited to circumstances in which most of the attributes of the instances are irrelevant to determining the correct predictions. For target classes that can be handled by these algorithms, we show that the worst-case mistake bounds grow only logarithmically with the number of irrelevant attributes. We demonstrate that these algorithms have some measure of robustness in the face of anomalies in the training data caused, for example, by noise.

We also consider the implications of on-line, worst-case mistake bounds for learning in a batch setting with probabilistic assumptions, making use of the probabilistic PAC-learning model introduced by Valiant (Va184). We present an analysis that shows that a straightforward transformation applied to mistake bounded algorithms, consisting of adding a hypothesis testing phase, produces algorithms that have asymptotically optimal PAC-learning bounds for certain target classes.

Cited By

  1. Saul L An online passive-aggressive algorithm for difference-of-squares classification Proceedings of the 35th International Conference on Neural Information Processing Systems, (21426-21439)
  2. Sabato S, Shalev-Shwartz S, Srebro N, Hsu D and Zhang T (2015). Learning sparse low-threshold linear classifiers, The Journal of Machine Learning Research, 16:1, (1275-1304), Online publication date: 1-Jan-2015.
  3. ACM
    Mesterharm C and Pazzani M Active learning using on-line algorithms Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, (850-858)
  4. Cavallanti G, Cesa-Bianchi N and Gentile C (2010). Linear Algorithms for Online Multitask Classification, The Journal of Machine Learning Research, 11, (2901-2934), Online publication date: 1-Mar-2010.
  5. ACM
    Yang L, Jin R and Ye J Online learning by ellipsoid method Proceedings of the 26th Annual International Conference on Machine Learning, (1153-1160)
  6. ACM
    Karampatziakis N and Kozen D Learning prediction suffix trees with Winnow Proceedings of the 26th Annual International Conference on Machine Learning, (489-496)
  7. Koriche F (2008). Learning to assign degrees of belief in relational domains, Machine Language, 73:1, (25-53), Online publication date: 1-Oct-2008.
  8. Shalev-Shwartz S and Singer Y (2007). A primal-dual perspective of online learning algorithms, Machine Language, 69:2-3, (115-142), Online publication date: 1-Dec-2007.
  9. Cesa-Bianchi N, Gentile C and Zaniboni L (2006). Worst-Case Analysis of Selective Sampling for Linear Classification, The Journal of Machine Learning Research, 7, (1205-1230), Online publication date: 1-Dec-2006.
  10. Klivans A and Servedio R (2006). Toward Attribute Efficient Learning of Decision Lists and Parities, The Journal of Machine Learning Research, 7, (587-602), Online publication date: 1-Dec-2006.
  11. Crammer K, Dekel O, Keshet J, Shalev-Shwartz S and Singer Y (2006). Online Passive-Aggressive Algorithms, The Journal of Machine Learning Research, 7, (551-585), Online publication date: 1-Dec-2006.
  12. Crammer K and Globerson A Discriminative learning via semidefinite probabilistic models Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, (98-105)
  13. Gentile C (2019). The Robustness of the p-Norm Algorithms, Machine Language, 53:3, (265-299), Online publication date: 1-Dec-2003.
  14. Mesterharm C Using linear-threshold algorithms to combine multi-class sub-experts Proceedings of the Twentieth International Conference on International Conference on Machine Learning, (544-551)
  15. Cesa-Bianchi N and Lugosi G (2019). Potential-Based Algorithms in On-Line Prediction and Game Theory, Machine Language, 51:3, (239-261), Online publication date: 1-Jun-2003.
  16. Kivinen J Online learning of linear classifiers Advanced lectures on machine learning, (235-257)
  17. Long P Minimum majority classification and boosting Eighteenth national conference on Artificial intelligence, (181-186)
  18. Helmbold D, Panizza S and Warmuth M (2019). Direct and indirect algorithms for on-line learning of disjunctions, Theoretical Computer Science, 284:1, (109-142), Online publication date: 6-Jul-2002.
  19. Servedio R (2019). PAC Analogues of Perceptron and Winnow Via Boosting the Margin, Machine Language, 47:2-3, (133-151), Online publication date: 1-May-2002.
  20. Li Y and Long P (2019). The Relaxed Online Maximum Margin Algorithm, Machine Language, 46:1-3, (361-387), Online publication date: 11-Mar-2002.
  21. Kivinen J and Warmuth M (2019). Relative Loss Bounds for Multidimensional Regression Problems, Machine Language, 45:3, (301-329), Online publication date: 4-Dec-2001.
  22. Case J, Jain S, Kaufmann S, Sharma A and Stephan F (2001). Predictive learning models for concept drift, Theoretical Computer Science, 268:2, (323-349), Online publication date: 17-Oct-2001.
  23. Grove A, Littlestone N and Schuurmans D (2019). General Convergence Results for Linear Discriminant Updates, Machine Language, 43:3, (173-210), Online publication date: 1-Jun-2001.
  24. Auer P and Long P (1999). Structural Results About On-line Learning Models With and Without Queries, Machine Language, 36:3, (147-181), Online publication date: 1-Sep-1999.
  25. ACM
    Servedio R On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm Proceedings of the twelfth annual conference on Computational learning theory, (296-307)
  26. ACM
    Gentile C and Littlestone N The robustness of the p-norm algorithms Proceedings of the twelfth annual conference on Computational learning theory, (1-11)
  27. ACM
    Servedio R Computational sample complexity and attribute-efficient learning Proceedings of the thirty-first annual ACM symposium on Theory of Computing, (701-710)
  28. ACM
    Yu K and Lam W A new on-line learning algorithm for adaptive text filtering Proceedings of the seventh international conference on Information and knowledge management, (156-160)
  29. Ben-David S and Eiron N (1998). Self-Directed Learning and Its Relation to the VC-Dimension and to Teacher-Directed Learning, Machine Language, 33:1, (87-104), Online publication date: 1-Oct-1998.
  30. Herbster M and Warmuth M (2019). Tracking the Best Expert, Machine Language, 32:2, (151-178), Online publication date: 1-Aug-1998.
  31. ACM
    Herbster M and Warmuth M Tracking the best regressor Proceedings of the eleventh annual conference on Computational learning theory, (24-31)
  32. ACM
    Bshouty N A new composition theorem for learning algorithms Proceedings of the thirtieth annual ACM symposium on Theory of computing, (583-589)
  33. Ben-David S, Kushilevitz E and Mansour Y (2019). Online Learning versus Offline Learning, Machine Language, 29:1, (45-63), Online publication date: 1-Oct-1997.
  34. ACM
    Bylander T The binary exponentiated gradient algorithm for learning linear functions Proceedings of the tenth annual conference on Computational learning theory, (184-192)
  35. ACM
    Grove A, Littlestone N and Schuurmans D General convergence results for linear discriminant updates Proceedings of the tenth annual conference on Computational learning theory, (171-183)
  36. ACM
    Long P On-line evaluation and prediction using linear functions Proceedings of the tenth annual conference on Computational learning theory, (21-31)
  37. ACM
    Ben-David S, Bshouty N and Kushilevitz E A composition theorem for learning algorithms with applications to geometric concept classes Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, (324-333)
  38. Blum A (2019). Empirical Support for Winnow and Weighted-MajorityAlgorithms, Machine Language, 26:1, (5-23), Online publication date: 1-Jan-1997.
  39. ACM
    Helmbold D, Long D and Sherrod B A dynamic disk spin-down technique for mobile computing Proceedings of the 2nd annual international conference on Mobile computing and networking, (130-142)
  40. ACM
    Cesa-Bianchi N, Helmbold D and Panizza S On Bayes methods for on-line Boolean prediction Proceedings of the ninth annual conference on Computational learning theory, (314-324)
  41. ACM
    Kivinen J and Warmuth M The perceptron algorithm vs. Winnow Proceedings of the eighth annual conference on Computational learning theory, (289-296)
  42. ACM
    Ben-David S, Eiron N and Kushilevitz E On self-directed learning Proceedings of the eighth annual conference on Computational learning theory, (136-143)
  43. Khardon R and Roth D Learning to reason Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, (682-687)
  44. ACM
    Bylander T Learning linear threshold functions in the presence of classification noise Proceedings of the seventh annual conference on Computational learning theory, (340-347)
  45. ACM
    Auer P and Long P Simulating access to hidden information while learning Proceedings of the twenty-sixth annual ACM symposium on Theory of Computing, (263-272)
  46. ACM
    Goldman S and Warmuth M Learning binary relations using weighted majority voting Proceedings of the sixth annual conference on Computational learning theory, (453-462)
  47. ACM
    Littlestone N and Long P On-line learning with linear loss constraints Proceedings of the sixth annual conference on Computational learning theory, (412-421)
  48. ACM
    Kushilevitz E and Roth D On learning visual concepts and DNF formulae Proceedings of the sixth annual conference on Computational learning theory, (317-326)
  49. ACM
    Bylander T Polynomial learnability of linear threshold approximations Proceedings of the sixth annual conference on Computational learning theory, (297-302)
  50. ACM
    Auer P On-line learning of rectangles in noisy environments Proceedings of the sixth annual conference on Computational learning theory, (253-261)
  51. ACM
    Kimber D and Long P The learning complexity of smooth functions of a single variable Proceedings of the fifth annual workshop on Computational learning theory, (153-159)
  52. ACM
    Littlestone N, Long P and Warmuth M On-line learning of linear functions Proceedings of the twenty-third annual ACM symposium on Theory of Computing, (465-475)
  53. ACM
    Humphrey S and Krovetz B (1990). Selected AI-Related Dissertations, ACM SIGART Bulletin, 1:2, (39-48), Online publication date: 1-Jun-1990.
  54. ACM
    Blum A Learning boolean functions in an infinite attribute space Proceedings of the twenty-second annual ACM symposium on Theory of Computing, (64-72)
Contributors
  • University of California, Santa Cruz

Recommendations