Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On-line maximum likelihood prediction with respect to general loss functions

  • Conference paper
  • First Online:
Computational Learning Theory (EuroCOLT 1995)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 904))

Included in the following conference series:

  • 157 Accesses

Abstract

This paper introduces a new family of deterministic and stochastic on-line prediction algorithms which perform well with respect to general loss functions and analyzes their behavior in terms of expected loss bounds. The algorithms use parametric probabilistic models regardless of the kind of loss function used. The key of the algorithms is to iteratively estimate the probabilistic model using the maximum likelihood method, and then to construct the optimal prediction function which minimizes the average of the loss taken with respect to the estimated probabilistic model. A future outcome is predicted using this optimal prediction function. We analyze the algorithms for the cases where the target distribution is 1) k-dimensional parametric and k is known, 2) k-dimensional parametric but k is unknown, and 3) non-parametric. For all the cases, we derive upper bounds on the expected instantaneous or cumulative losses for the algorithms with respect to a large family of loss functions satisfying the constraint introduced by Merhav and Feder. These loss bounds show new universal relations among the expected prediction accuracy, the indexes of the loss function, the complexity of the target rule, and the number of training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Algoet, P.H.:The strong law of large numbers for sequential decisions under uncertainty, IEEE Inform. Theory IT-40 (1994) 609–633.

    Google Scholar 

  2. Amari, S., Murata, N.: Statistical theory of learning curves under entropic loss criterion. Neural Computation 5 (1993) 140–153.

    Google Scholar 

  3. Barren, A.R., Cover T.M.: Minimum complexity density estimation. IEEE Trans. Inform. Theory IT-37 (1991) 1034–1054.

    Google Scholar 

  4. Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer Verlag (1980).

    Google Scholar 

  5. Cencov, N.N. Evaluation of an unknown distribution density from observations. Soviet Math. 3 (1962) 1559–1562.

    Google Scholar 

  6. Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Proc. of The Twenty-fifth ACM Symposium on Theory of Computing, ACM Press (1993) 429–438.

    Google Scholar 

  7. Clarke, B., Barren, A.: Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theory IT-36 (1990) 453–471.

    Google Scholar 

  8. Dawid, A.: Statistical theory: the presequential approach. J. R. Stat. Soc. A (1984) 278–292.

    Google Scholar 

  9. DeSantis, A., Markowsky, G., Wegman, M.N.: Learning probabilistic prediction functions. Proc. of the First Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1988) 312–328.

    Google Scholar 

  10. Fisher, R.A.: Statistical Methods and Scientific Inference. Olyver and Boyd (1951).

    Google Scholar 

  11. Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting {0,1}-functions on randomly drawn points. Proc. of the First Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1988) 312–328.

    Google Scholar 

  12. Haussler, D., Barron, A.: How well does the Bayes method work in on-line predictions of {+1,-1}-values? Proc. of the Third NEC Symposium (1992) 74–100 SIAM.

    Google Scholar 

  13. Haussler, D., Kearns, M., Schapire, R.: Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. Proc. of the Fourth Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1991) 61–74.

    Google Scholar 

  14. Herrndorf, N.: Best Φ-and N φ -approximants in Orlicz spaces of vector valued functions. Z.Wahrscheinlichkeitstheorie verw. Gebiete 58 (1981) 309–329.

    Google Scholar 

  15. Kearns, M., Schapire, R.: Efficient distribution-free learning of probabilistic concepts. Jr. of Computer and System Sciences 48 (1994) 464–497.

    Google Scholar 

  16. Kivinen, J., Warmuth, M. Using experts for predicting continuous outcomes. Computational Learning Theory: EuroCOLT'93, Oxford (1994) 109–120.

    Google Scholar 

  17. Kullback, S.: A lower bound for discrimination in terms of variation. IEEE Trans. Inform. Theory IT-13 (1967) 126–127.

    Google Scholar 

  18. LeCam, L.: On some asymptotic properties of maximum likelihood estimates and related Bayes estimates. Uni. California Publ. Stat. 1 (1953) 277–330.

    Google Scholar 

  19. LeCam, L.: On the asymptotics used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Stat. 41 (1970) 802–828.

    Google Scholar 

  20. Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Machine Learning 2 (1988) 285–318.

    Google Scholar 

  21. Merhav, N., Feder, M.: Universal sequential learning and decision from individual data sequence. Proc. of the Fifth ACM Conference on Computational Learning Theory, ACM Press (1992) 413–427.

    Google Scholar 

  22. Rissanen, J. Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theory IT-30 (1984) 629–636.

    Google Scholar 

  23. Rissanen, J.: Stochastic complexity. J. R. Stat. Soc. B 49 (1987) 223–239.

    Google Scholar 

  24. Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Series in Computer Science 15 (1989).

    Google Scholar 

  25. Takeuchi, K.: Asymptotic Theory of Statistical Estimation. (In Japanese) Kyooiku Publishers (1974).

    Google Scholar 

  26. Vovk, V.G.: Aggregating Strategies. Proc. of the Third Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1990) 371–386.

    Google Scholar 

  27. Yamanishi, K.: A loss bound model for on-line stochastic prediction strategies. Proc. of the Fourth Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1991) 290–302.

    Google Scholar 

  28. Yamanishi, K.: A learning criterion for stochastic rules. Machine Learning: Special Issues for COLT 90 9 (1992) 165–203.

    Google Scholar 

  29. Yamanishi, K.(1993). A loss bound model for on-line stochastic prediction algorithms. Inform. Comput. (to appear).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paul Vitányi

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yamanishi, K. (1995). On-line maximum likelihood prediction with respect to general loss functions. In: Vitányi, P. (eds) Computational Learning Theory. EuroCOLT 1995. Lecture Notes in Computer Science, vol 904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59119-2_170

Download citation

  • DOI: https://doi.org/10.1007/3-540-59119-2_170

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-59119-1

  • Online ISBN: 978-3-540-49195-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics