On-line maximum likelihood prediction with respect to general loss functions

Yamanishi, Kenji

doi:10.1007/3-540-59119-2_170

Kenji Yamanishi¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 904))

Included in the following conference series:

European Conference on Computational Learning Theory

157 Accesses

Abstract

This paper introduces a new family of deterministic and stochastic on-line prediction algorithms which perform well with respect to general loss functions and analyzes their behavior in terms of expected loss bounds. The algorithms use parametric probabilistic models regardless of the kind of loss function used. The key of the algorithms is to iteratively estimate the probabilistic model using the maximum likelihood method, and then to construct the optimal prediction function which minimizes the average of the loss taken with respect to the estimated probabilistic model. A future outcome is predicted using this optimal prediction function. We analyze the algorithms for the cases where the target distribution is 1) k-dimensional parametric and k is known, 2) k-dimensional parametric but k is unknown, and 3) non-parametric. For all the cases, we derive upper bounds on the expected instantaneous or cumulative losses for the algorithms with respect to a large family of loss functions satisfying the constraint introduced by Merhav and Feder. These loss bounds show new universal relations among the expected prediction accuracy, the indexes of the loss function, the complexity of the target rule, and the number of training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Fundamental Nature of the Log Loss Function

Learning without concentration for general loss functions

Article 12 June 2017

Bayesian and minimax estimators of loss

Article 02 July 2024

References

Algoet, P.H.:The strong law of large numbers for sequential decisions under uncertainty, IEEE Inform. Theory IT-40 (1994) 609–633.
Google Scholar
Amari, S., Murata, N.: Statistical theory of learning curves under entropic loss criterion. Neural Computation 5 (1993) 140–153.
Google Scholar
Barren, A.R., Cover T.M.: Minimum complexity density estimation. IEEE Trans. Inform. Theory IT-37 (1991) 1034–1054.
Google Scholar
Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer Verlag (1980).
Google Scholar
Cencov, N.N. Evaluation of an unknown distribution density from observations. Soviet Math. 3 (1962) 1559–1562.
Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Proc. of The Twenty-fifth ACM Symposium on Theory of Computing, ACM Press (1993) 429–438.
Google Scholar
Clarke, B., Barren, A.: Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theory IT-36 (1990) 453–471.
Google Scholar
Dawid, A.: Statistical theory: the presequential approach. J. R. Stat. Soc. A (1984) 278–292.
Google Scholar
DeSantis, A., Markowsky, G., Wegman, M.N.: Learning probabilistic prediction functions. Proc. of the First Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1988) 312–328.
Google Scholar
Fisher, R.A.: Statistical Methods and Scientific Inference. Olyver and Boyd (1951).
Google Scholar
Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting {0,1}-functions on randomly drawn points. Proc. of the First Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1988) 312–328.
Google Scholar
Haussler, D., Barron, A.: How well does the Bayes method work in on-line predictions of {+1,-1}-values? Proc. of the Third NEC Symposium (1992) 74–100 SIAM.
Google Scholar
Haussler, D., Kearns, M., Schapire, R.: Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. Proc. of the Fourth Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1991) 61–74.
Google Scholar
Herrndorf, N.: Best Φ-and N _φ-approximants in Orlicz spaces of vector valued functions. Z.Wahrscheinlichkeitstheorie verw. Gebiete 58 (1981) 309–329.
Google Scholar
Kearns, M., Schapire, R.: Efficient distribution-free learning of probabilistic concepts. Jr. of Computer and System Sciences 48 (1994) 464–497.
Google Scholar
Kivinen, J., Warmuth, M. Using experts for predicting continuous outcomes. Computational Learning Theory: EuroCOLT'93, Oxford (1994) 109–120.
Google Scholar
Kullback, S.: A lower bound for discrimination in terms of variation. IEEE Trans. Inform. Theory IT-13 (1967) 126–127.
Google Scholar
LeCam, L.: On some asymptotic properties of maximum likelihood estimates and related Bayes estimates. Uni. California Publ. Stat. 1 (1953) 277–330.
Google Scholar
LeCam, L.: On the asymptotics used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Stat. 41 (1970) 802–828.
Google Scholar
Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Machine Learning 2 (1988) 285–318.
Google Scholar
Merhav, N., Feder, M.: Universal sequential learning and decision from individual data sequence. Proc. of the Fifth ACM Conference on Computational Learning Theory, ACM Press (1992) 413–427.
Google Scholar
Rissanen, J. Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theory IT-30 (1984) 629–636.
Google Scholar
Rissanen, J.: Stochastic complexity. J. R. Stat. Soc. B 49 (1987) 223–239.
Google Scholar
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Series in Computer Science 15 (1989).
Google Scholar
Takeuchi, K.: Asymptotic Theory of Statistical Estimation. (In Japanese) Kyooiku Publishers (1974).
Google Scholar
Vovk, V.G.: Aggregating Strategies. Proc. of the Third Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1990) 371–386.
Google Scholar
Yamanishi, K.: A loss bound model for on-line stochastic prediction strategies. Proc. of the Fourth Annual Workshop on Computational Learning Theory, Morgan Kaufmann (1991) 290–302.
Google Scholar
Yamanishi, K.: A learning criterion for stochastic rules. Machine Learning: Special Issues for COLT 90 9 (1992) 165–203.
Google Scholar
Yamanishi, K.(1993). A loss bound model for on-line stochastic prediction algorithms. Inform. Comput. (to appear).
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Research Institute, Inc., 4 Independence Way, 08540, Princeton, NJ, USA
Kenji Yamanishi

Authors

Kenji Yamanishi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Paul Vitányi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamanishi, K. (1995). On-line maximum likelihood prediction with respect to general loss functions. In: Vitányi, P. (eds) Computational Learning Theory. EuroCOLT 1995. Lecture Notes in Computer Science, vol 904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59119-2_170

Download citation

DOI: https://doi.org/10.1007/3-540-59119-2_170
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59119-1
Online ISBN: 978-3-540-49195-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

On-line maximum likelihood prediction with respect to general loss functions

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Fundamental Nature of the Log Loss Function

Learning without concentration for general loss functions

Bayesian and minimax estimators of loss

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On-line maximum likelihood prediction with respect to general loss functions

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Fundamental Nature of the Log Loss Function

Learning without concentration for general loss functions

Bayesian and minimax estimators of loss

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation