Information and Coding

Yamanishi, Kenji

doi:10.1007/978-981-99-1790-7_1

Kenji Yamanishi²

379 Accesses

Abstract

In this chapter, we introduce the notions of information, probability distributions, and coding. We show that the (inferior) probability distribution and coding are equivalent through the Kraft inequality. The most primitive quantification of information is Shannon’s information, which is the optimal code-length when a probability distribution is known in advance. We introduce the notion of stochastic complexity (SC) as an extension of Shannon’s information to the case where a probability distribution is unknown, but the class is given. SC can be calculated as the normalized maximum likelihood (NML) code-length. We introduce various methods for efficiently computing the NML code-length. Finally, we introduce the minimum description length (MDL) principle as an SC minimization strategy. We then give a unifying view of machine learning problems in terms of the MDL principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The slides of the talk [34] are available at https://eda.mmci.uni-saarland.de/events/mdldm19/#program.

References

A.R. Barron, J. Rissanen, B. Yu, The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theor. 44(6), 2743–2760 (1998)
Google Scholar
G.J. Chaitin, Algorithmic information theory. IBM J. Res. Dev. 21(4), 350–359 (1977). https://doi.org/10.1147/rd.214.0350
Article MathSciNet MATH Google Scholar
T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, 2006)
Google Scholar
R. Dwivedi, C. Singh, B. Yu, M.J. Wainwright, Revisiting minimum description length complexity in overparameterized models. arXiv:2006.10189 (2020)
A. P. Dawid, Present position and potential developments: some personal views statistical theory the prequential approach. J. Royal Stat. Soc. Ser. A 147(2), 278–290 (1984)
Google Scholar
P. D. Grünwald, The Minimum Description Length Principle (MIT Press, 2007)
Google Scholar
P.D. Grünwald, T. Roos, Minimum description length revisited. Int. J. Math. Ind. 11(01), 1930001 (2019)
Article MathSciNet MATH Google Scholar
T.S. Han, K. Kobayashi, Mathematics of Information and Coding (Baifukan, 1999). (in Japanese)
Google Scholar
T. S. Han, S. Verdu, Approximation theory of output statistics. IEEE Trans. Inform. Theor. 39(3), 752–772 (1993)
Google Scholar
M.H. Hansen, B. Yu, Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96(54), 746–774 (2001)
Google Scholar
S. Hirai, K. Yamanishi, Efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Trans. Inform. Theor. 59(11), 7718–7727 (2013)
Article MathSciNet MATH Google Scholar
S. Hirai, K. Yamanishi, Correction to efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Trans. Inform. Theor. 65(10), 6827–6828 (2019)
Article MathSciNet MATH Google Scholar
W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Google Scholar
T. Kloek, H.K. van Dijk, Bayesian estimates of equation system parameters: an application of integration by Monte Carlo. Econometrica 46(1), 1–19 (1978)
Google Scholar
A.N. Kolmogorov, Three approaches to the quantitative definition of information. Prob. Inform. Transm. (USSR) 1, 4–7 (1965)
Google Scholar
A.N. Kolmogorov, Logical basis for information theory and probability theory. IEEE Trans. Inform. Theor. 14(5), 662–664 (1968)
Google Scholar
P. Kontkanen, P. Myllymäki, W. Buntine, J. Rissanen, H. Tirri, An MDL framework for data clustering, in Advances in Minimum Description Length: Theory and Applications (MIT Press, 2005), pp. 323–335
Google Scholar
P. Kontkanen, P. Myllym, A linear-time algorithm for computing the multinomial stochastic complexity. Inform. Process. Lett. 103(6), 227–233 (2007)
Google Scholar
L.G. Kraft, A device for quanitizing, grouping and coding amplitude modulated pulses. Master’s thesis, Department of Electrical Engineering, MIT, Cambridge, MA (1949)
Google Scholar
M. Li, P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. (Springer, New York, 1997)
Book MATH Google Scholar
J. Rissanen, Modeling by shortest description length. Automatica 14(5), 465–471 (1978)
Google Scholar
J. Rissanen, Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theor. 30(4), 629–636 (1984)
Google Scholar
J. Rissanen, Stochastic Complexity in Statistical Inquiries (Wiley, 1989)
Google Scholar
J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inform. Theor. 42(1), 40–47 (1996)
Google Scholar
J. Rissanen, MDL denoising. IEEE Trans. Inform. Theor. 46(7), 2537–2543 (2000)
Google Scholar
J. Rissanen, Information and Complexity in Statistical Modeling (Springer, 2007)
Google Scholar
J. Rissanen, Optimal Estimation of Parameters (Cambridge, 2012)
Google Scholar
J. Rissanen, T. Roos, P. Myllymaki, Model selection by sequentially normalized least squares. J. Multivariate Anal. 101(4), 839–849 (2010)
Google Scholar
T. Roos, Monte Carlo estimation of minimax regret with an application to MDL model selection, in Proceedings of IEEE Information Theory Workshop (2008)
Google Scholar
C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)
Google Scholar
Y.M. Shtarkov, Universal sequential coding of single messages. Problemy Peredachi Informatsii 23(3), 3–17 (1987)
Google Scholar
A. Suzuki, K. Yamanishi, Exact calculation of normalized maximum likelihood code length using Fourier analysis, in Proceedings of IEEE Symposium on Information Theory (ISIT2018) (2018), pp. 1211–1215
Google Scholar
A. Suzuki, K. Yamanishi, Fourier-analysis-based form of normalized maximum likelihood: exact formula and relation to complex Bayesian prior. IEEE Trans. Inform. Theor. 67(9), 6164–6178 (2021)
Google Scholar
J. Vreeken, K. Yamanishi, Modern MDL meets data mining insights, theory, and practice, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19) (2019, July), pp. 3229–3230
Google Scholar
K. Yamanishi, A randomized approximation of the MDL for stochastic models with hidden variables, inProceedings of the Ninth Annual Conference on Computational Learning Theory (COLT’96) (1996), pp. 99–109
Google Scholar
K. Yamanishi, Information-Theoretic Learning Theory (Kyoritsu-Publisher, 2010) (in Japanese)
Google Scholar
K. Yamanishi, Information-Theoretic Learning and Data Mining (Asakura Publisher, 2014) (in Japanese)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, University of Tokyo, Tokyo, Japan
Kenji Yamanishi

Authors

Kenji Yamanishi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenji Yamanishi .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yamanishi, K. (2023). Information and Coding. In: Learning with the Minimum Description Length Principle . Springer, Singapore. https://doi.org/10.1007/978-981-99-1790-7_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-1790-7_1
Published: 15 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1789-1
Online ISBN: 978-981-99-1790-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Information and Coding