Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • 379 Accesses

Abstract

In this chapter, we introduce the notions of information, probability distributions, and coding. We show that the (inferior) probability distribution and coding are equivalent through the Kraft inequality. The most primitive quantification of information is Shannon’s information, which is the optimal code-length when a probability distribution is known in advance. We introduce the notion of stochastic complexity (SC) as an extension of Shannon’s information to the case where a probability distribution is unknown, but the class is given. SC can be calculated as the normalized maximum likelihood (NML) code-length. We introduce various methods for efficiently computing the NML code-length. Finally, we introduce the minimum description length (MDL) principle as an SC minimization strategy. We then give a unifying view of machine learning problems in terms of the MDL principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The slides of the talk [34] are available at https://eda.mmci.uni-saarland.de/events/mdldm19/#program.

References

  1. A.R. Barron, J. Rissanen, B. Yu, The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theor. 44(6), 2743–2760 (1998)

    Google Scholar 

  2. G.J. Chaitin, Algorithmic information theory. IBM J. Res. Dev. 21(4), 350–359 (1977). https://doi.org/10.1147/rd.214.0350

    Article  MathSciNet  MATH  Google Scholar 

  3. T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, 2006)

    Google Scholar 

  4. R. Dwivedi, C. Singh, B. Yu, M.J. Wainwright, Revisiting minimum description length complexity in overparameterized models. arXiv:2006.10189 (2020)

  5. A. P. Dawid, Present position and potential developments: some personal views statistical theory the prequential approach. J. Royal Stat. Soc. Ser. A 147(2), 278–290 (1984)

    Google Scholar 

  6. P. D. Grünwald, The Minimum Description Length Principle (MIT Press, 2007)

    Google Scholar 

  7. P.D. Grünwald, T. Roos, Minimum description length revisited. Int. J. Math. Ind. 11(01), 1930001 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  8. T.S. Han, K. Kobayashi, Mathematics of Information and Coding (Baifukan, 1999). (in Japanese)

    Google Scholar 

  9. T. S. Han, S. Verdu, Approximation theory of output statistics. IEEE Trans. Inform. Theor. 39(3), 752–772 (1993)

    Google Scholar 

  10. M.H. Hansen, B. Yu, Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96(54), 746–774 (2001)

    Google Scholar 

  11. S. Hirai, K. Yamanishi, Efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Trans. Inform. Theor. 59(11), 7718–7727 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. S. Hirai, K. Yamanishi, Correction to efficient computation of normalized maximum likelihood codes for Gaussian mixture models with its applications to clustering. IEEE Trans. Inform. Theor. 65(10), 6827–6828 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Google Scholar 

  14. T. Kloek, H.K. van Dijk, Bayesian estimates of equation system parameters: an application of integration by Monte Carlo. Econometrica 46(1), 1–19 (1978)

    Google Scholar 

  15. A.N. Kolmogorov, Three approaches to the quantitative definition of information. Prob. Inform. Transm. (USSR) 1, 4–7 (1965)

    Google Scholar 

  16. A.N. Kolmogorov, Logical basis for information theory and probability theory. IEEE Trans. Inform. Theor. 14(5), 662–664 (1968)

    Google Scholar 

  17. P. Kontkanen, P. Myllymäki, W. Buntine, J. Rissanen, H. Tirri, An MDL framework for data clustering, in Advances in Minimum Description Length: Theory and Applications (MIT Press, 2005), pp. 323–335

    Google Scholar 

  18. P. Kontkanen, P. Myllym, A linear-time algorithm for computing the multinomial stochastic complexity. Inform. Process. Lett. 103(6), 227–233 (2007)

    Google Scholar 

  19. L.G. Kraft, A device for quanitizing, grouping and coding amplitude modulated pulses. Master’s thesis, Department of Electrical Engineering, MIT, Cambridge, MA (1949)

    Google Scholar 

  20. M. Li, P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. (Springer, New York, 1997)

    Book  MATH  Google Scholar 

  21. J. Rissanen, Modeling by shortest description length. Automatica 14(5), 465–471 (1978)

    Google Scholar 

  22. J. Rissanen, Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theor. 30(4), 629–636 (1984)

    Google Scholar 

  23. J. Rissanen, Stochastic Complexity in Statistical Inquiries (Wiley, 1989)

    Google Scholar 

  24. J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inform. Theor. 42(1), 40–47 (1996)

    Google Scholar 

  25. J. Rissanen, MDL denoising. IEEE Trans. Inform. Theor. 46(7), 2537–2543 (2000)

    Google Scholar 

  26. J. Rissanen, Information and Complexity in Statistical Modeling (Springer, 2007)

    Google Scholar 

  27. J. Rissanen, Optimal Estimation of Parameters (Cambridge, 2012)

    Google Scholar 

  28. J. Rissanen, T. Roos, P. Myllymaki, Model selection by sequentially normalized least squares. J. Multivariate Anal. 101(4), 839–849 (2010)

    Google Scholar 

  29. T. Roos, Monte Carlo estimation of minimax regret with an application to MDL model selection, in Proceedings of IEEE Information Theory Workshop (2008)

    Google Scholar 

  30. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)

    Google Scholar 

  31. Y.M. Shtarkov, Universal sequential coding of single messages. Problemy Peredachi Informatsii 23(3), 3–17 (1987)

    Google Scholar 

  32. A. Suzuki, K. Yamanishi, Exact calculation of normalized maximum likelihood code length using Fourier analysis, in Proceedings of IEEE Symposium on Information Theory (ISIT2018) (2018), pp. 1211–1215

    Google Scholar 

  33. A. Suzuki, K. Yamanishi, Fourier-analysis-based form of normalized maximum likelihood: exact formula and relation to complex Bayesian prior. IEEE Trans. Inform. Theor. 67(9), 6164–6178 (2021)

    Google Scholar 

  34. J. Vreeken, K. Yamanishi, Modern MDL meets data mining insights, theory, and practice, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19) (2019, July), pp. 3229–3230

    Google Scholar 

  35. K. Yamanishi, A randomized approximation of the MDL for stochastic models with hidden variables, inProceedings of the Ninth Annual Conference on Computational Learning Theory (COLT’96) (1996), pp. 99–109

    Google Scholar 

  36. K. Yamanishi, Information-Theoretic Learning Theory (Kyoritsu-Publisher, 2010) (in Japanese)

    Google Scholar 

  37. K. Yamanishi, Information-Theoretic Learning and Data Mining (Asakura Publisher, 2014) (in Japanese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenji Yamanishi .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yamanishi, K. (2023). Information and Coding. In: Learning with the Minimum Description Length Principle . Springer, Singapore. https://doi.org/10.1007/978-981-99-1790-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1790-7_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1789-1

  • Online ISBN: 978-981-99-1790-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics