The Missing Consistency Theorem for Bayesian Learning: Stochastic Model Selection

Poland, Jan

doi:10.1007/11894841_22

Jan Poland²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4264))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

787 Accesses
1 Citations

Abstract

Bayes’ rule specifies how to obtain a posterior from a class of hypotheses endowed with a prior and the observed data. There are three principle ways to use this posterior for predicting the future: marginalization (integration over the hypotheses w.r.t. the posterior), MAP (taking the a posteriori most probable hypothesis), and stochastic model selection (selecting a hypothesis at random according to the posterior distribution). If the hypothesis class is countable and contains the data generating distribution, strong consistency theorems are known for the former two methods, asserting almost sure convergence of the predictions to the truth as well as loss bounds. We prove the first corresponding results for stochastic model selection. As a main technical tool, we will use the concept of a potential: this quantity, which is always positive, measures the total possible amount of future prediction errors. Precisely, in each time step, the expected potential decrease upper bounds the expected error. We introduce the entropy potential of a hypothesis class as its worst-case entropy with regard to the true distribution. We formulate our results in the online classification framework, but they are equally applicable to the prediction of non-i.i.d. sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Things Bayes Can’t Do

The Median Hypothesis

References

Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133 (1997)
Article MATH Google Scholar
McAllester, D.: PAC-bayesian stochastic model selection. Machine Learning 51, 5–21 (2003)
Article MATH Google Scholar
Blackwell, D., Dubins, L.: Merging of opinions with increasing information. Annals of Mathematical Statistics 33, 882–887 (1962)
Article MATH MathSciNet Google Scholar
Clarke, B.S., Barron, A.R.: Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theory 36, 453–471 (1990)
Article MATH MathSciNet Google Scholar
Rissanen, J.J.: Fisher Information and Stochastic Complexity. IEEE Trans. Inform. Theory 42, 40–47 (1996)
Article MATH MathSciNet Google Scholar
Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Inform. Theory 24, 422–432 (1978)
Article MATH MathSciNet Google Scholar
Poland, J., Hutter, M.: Convergence of discrete MDL for sequential prediction. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 300–314. Springer, Heidelberg (2004)
Chapter Google Scholar
Poland, J., Hutter, M.: Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory 51, 3780–3795 (2005)
Article MathSciNet Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Google Scholar
Poland, J., Hutter, M.: On the convergence speed of MDL predictions for bernoulli sequences. In: Ben-David, S., Case, J., Maruoka, A. (eds.) ALT 2004. LNCS (LNAI), vol. 3244, pp. 294–308. Springer, Heidelberg (2004)
Chapter Google Scholar
Hutter, M.: Convergence and loss bounds for Bayesian sequence prediction. IEEE Trans. Inform. Theory 49, 2061–2067 (2003)
Article MathSciNet Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, pp. 256–261. IEEE, Los Alamitos (1989)
Chapter Google Scholar
Vovk, V.G.: Aggregating strategies. In: Proc. Third Annual Workshop on Computational Learning Theory, Rochester, pp. 371–383. ACM Press, New York (1990)
Google Scholar
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research 6, 639–660 (2005)
MathSciNet Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Machine Learning 51, 239–261 (2003)
Article MATH Google Scholar
Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Heidelberg (1997)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Japan
Jan Poland

Authors

Jan Poland
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament de Llenguatges i Sistemes Informàtics Laboratori d’Algorísmica Relacional, Complexitat i Aprenentatge, Universitat Politècnica de Catalunya, Barcelona,
José L. Balcázar
Google, 1600 Amphitheatre Parkway, 94043, Mountain View, CA, USA
Philip M. Long
Department of Computer Science and Department of Mathematics, National University of Singapore, 117543, Singapore, Republic of Singapore
Frank Stephan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Poland, J. (2006). The Missing Consistency Theorem for Bayesian Learning: Stochastic Model Selection. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds) Algorithmic Learning Theory. ALT 2006. Lecture Notes in Computer Science(), vol 4264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11894841_22

Download citation

DOI: https://doi.org/10.1007/11894841_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46649-9
Online ISBN: 978-3-540-46650-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Missing Consistency Theorem for Bayesian Learning: Stochastic Model Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Things Bayes Can’t Do

The Median Hypothesis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Missing Consistency Theorem for Bayesian Learning: Stochastic Model Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Things Bayes Can’t Do

The Median Hypothesis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation