Abstract
In this chapter, we introduce the notions of information, probability distributions, and coding. We show that the (inferior) probability distribution and coding are equivalent through the Kraft inequality. The most primitive quantification of information is Shannon’s information, which is the optimal code-length when a probability distribution is known in advance. We introduce the notion of stochastic complexity (SC) as an extension of Shannon’s information to the case where a probability distribution is unknown, but the class is given. SC can be calculated as the normalized maximum likelihood (NML) code-length. We introduce various methods for efficiently computing the NML code-length. Finally, we introduce the minimum description length (MDL) principle as an SC minimization strategy. We then give a unifying view of machine learning problems in terms of the MDL principle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The slides of the talk [34] are available at https://eda.mmci.uni-saarland.de/events/mdldm19/#program.
References
A.R. Barron, J. Rissanen, B. Yu, The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theor. 44(6), 2743–2760 (1998)
G.J. Chaitin, Algorithmic information theory. IBM J. Res. Dev. 21(4), 350–359 (1977). https://doi.org/10.1147/rd.214.0350
T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, 2006)
R. Dwivedi, C. Singh, B. Yu, M.J. Wainwright, Revisiting minimum description length complexity in overparameterized models. arXiv:2006.10189 (2020)
A. P. Dawid, Present position and potential developments: some personal views statistical theory the prequential approach. J. Royal Stat. Soc. Ser. A 147(2), 278–290 (1984)
P. D. Grünwald, The Minimum Description Length Principle (MIT Press, 2007)
P.D. Grünwald, T. Roos, Minimum description length revisited. Int. J. Math. Ind. 11(01), 1930001 (2019)
T.S. Han, K. Kobayashi, Mathematics of Information and Coding (Baifukan, 1999). (in Japanese)
T. S. Han, S. Verdu, Approximation theory of output statistics. IEEE Trans. Inform. Theor. 39(3), 752–772 (1993)
M.H. Hansen, B. Yu, Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96(54), 746–774 (2001)
S. Hirai, K. Yamanishi, Efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Trans. Inform. Theor. 59(11), 7718–7727 (2013)
S. Hirai, K. Yamanishi, Correction to efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Trans. Inform. Theor. 65(10), 6827–6828 (2019)
W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
T. Kloek, H.K. van Dijk, Bayesian estimates of equation system parameters: an application of integration by Monte Carlo. Econometrica 46(1), 1–19 (1978)
A.N. Kolmogorov, Three approaches to the quantitative definition of information. Prob. Inform. Transm. (USSR) 1, 4–7 (1965)
A.N. Kolmogorov, Logical basis for information theory and probability theory. IEEE Trans. Inform. Theor. 14(5), 662–664 (1968)
P. Kontkanen, P. Myllymäki, W. Buntine, J. Rissanen, H. Tirri, An MDL framework for data clustering, in Advances in Minimum Description Length: Theory and Applications (MIT Press, 2005), pp. 323–335
P. Kontkanen, P. Myllym, A linear-time algorithm for computing the multinomial stochastic complexity. Inform. Process. Lett. 103(6), 227–233 (2007)
L.G. Kraft, A device for quanitizing, grouping and coding amplitude modulated pulses. Master’s thesis, Department of Electrical Engineering, MIT, Cambridge, MA (1949)
M. Li, P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. (Springer, New York, 1997)
J. Rissanen, Modeling by shortest description length. Automatica 14(5), 465–471 (1978)
J. Rissanen, Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theor. 30(4), 629–636 (1984)
J. Rissanen, Stochastic Complexity in Statistical Inquiries (Wiley, 1989)
J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inform. Theor. 42(1), 40–47 (1996)
J. Rissanen, MDL denoising. IEEE Trans. Inform. Theor. 46(7), 2537–2543 (2000)
J. Rissanen, Information and Complexity in Statistical Modeling (Springer, 2007)
J. Rissanen, Optimal Estimation of Parameters (Cambridge, 2012)
J. Rissanen, T. Roos, P. Myllymaki, Model selection by sequentially normalized least squares. J. Multivariate Anal. 101(4), 839–849 (2010)
T. Roos, Monte Carlo estimation of minimax regret with an application to MDL model selection, in Proceedings of IEEE Information Theory Workshop (2008)
C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)
Y.M. Shtarkov, Universal sequential coding of single messages. Problemy Peredachi Informatsii 23(3), 3–17 (1987)
A. Suzuki, K. Yamanishi, Exact calculation of normalized maximum likelihood code length using Fourier analysis, in Proceedings of IEEE Symposium on Information Theory (ISIT2018) (2018), pp. 1211–1215
A. Suzuki, K. Yamanishi, Fourier-analysis-based form of normalized maximum likelihood: exact formula and relation to complex Bayesian prior. IEEE Trans. Inform. Theor. 67(9), 6164–6178 (2021)
J. Vreeken, K. Yamanishi, Modern MDL meets data mining insights, theory, and practice, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19) (2019, July), pp. 3229–3230
K. Yamanishi, A randomized approximation of the MDL for stochastic models with hidden variables, inProceedings of the Ninth Annual Conference on Computational Learning Theory (COLT’96) (1996), pp. 99–109
K. Yamanishi, Information-Theoretic Learning Theory (Kyoritsu-Publisher, 2010) (in Japanese)
K. Yamanishi, Information-Theoretic Learning and Data Mining (Asakura Publisher, 2014) (in Japanese)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Yamanishi, K. (2023). Information and Coding. In: Learning with the Minimum Description Length Principle . Springer, Singapore. https://doi.org/10.1007/978-981-99-1790-7_1
Download citation
DOI: https://doi.org/10.1007/978-981-99-1790-7_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1789-1
Online ISBN: 978-981-99-1790-7
eBook Packages: Computer ScienceComputer Science (R0)