Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems †
Abstract
:1. Introduction
2. Bayes Rule
3. Quantity of Information and Entropy
3.1. Shannon Entropy
3.2. Thermodynamical Entropy
3.3. Statistical Mechanics Entropy
3.4. Boltzmann Entropy
3.5. Gibbs Entropy
4. Relative Entropy or Kullback–Leibler Divergence
- Relative Entropy of p1 with respect to p2:
- Kullback–Leibler divergence of p1 with respect to p2:
- KL [q : p] ≥ 0,
- KL [q : p] = 0, if q = p and
- KL [q : p0] ≥ KL [q : p1] + KL [p1 : p0].
- KL [q : p] is invariant with respect to a scale change, but is not symmetric.
- A symmetric quantity can be defined as:
5. Mutual Information
- I[Y, X] is a concave function of p(y) when p(x|y) is fixed and a convex function of p(x|y) when p(y) is fixed.
- I[Y, X] ≥ 0 with equality only if X and Y are independent.
6. Maximum Entropy Principle
7. Link between Entropy and Likelihood
8. Fisher Information, Bregman and Other Divergences
- f-divergences:The f-divergences, which are a general class of divergences, indexed by convex functions f, that include the KL divergence as a special case. Let f: (0, ∞) ↦ R be a convex function for which f(1) = 0. The f-divergence between two probability measures P and Q is defined by:Every f-divergence can be viewed as a measure of distance between probability measures with different properties. Some important special cases are:
- – f(x) = x ln x gives KL divergence: .
- – gives total variation distance: .
- – gives the square of the Hellinger distance: .
- – f(x) = (x − 1)2 gives the chi-squared divergence: .
- Rényi divergences:These are another generalization of the KL divergence. The Rényi divergence between two probability distributions P and Q is:is called Bhattacharyya divergence (closely related to Hellinger distance). Interestingly, this quantity is always smaller than KL:As a result, it is sometimes easier to derive risk bounds with as the loss function as opposed to KL.
- Bregman divergences:The Bregman divergences provide another class of divergences that are indexed by convex functions and include both the Euclidean distance and the KL divergence as special cases. Let ϕ be a differentiable strictly convex function. The Bregman divergence Bϕ is defined by:
- KL [p1 : p2] is expressed as a Bregman divergence B[θ1 : θ2].
- A Bregman divergence B[x1 : x2] is induced when KL [p1 : p2] is used to compare the two posteriors.
9. Vectorial Variables and Time Indexed Process
10. Entropy in Independent Component Analysis and Source Separation
11. Entropy in Parametric Modeling and Model Selection
12. Entropy in Spectral Analysis
12.1. Burg’s Entropy-Based Method
12.2. Extensions to Burg’s Method
12.3. Shore and Johnson Approach
12.4. ME in the Mean Approach
13. Entropy-Based Methods for Linear Inverse Problems
13.1. Linear Inverse Problems
- Convolution operations g = h * f in 1D (signal):
- Radon transform (RT) in computed tomography (CT) in the 2D case [86]:
13.2. Entropy-Based Methods
- L2 or quadratic:
- Lβ:
- Shannon entropy:
- The Burg entropy:
13.3. Maximum Entropy in the Mean Approach
14. Bayesian Approach for Inverse Problems
14.1. Simple Bayesian Approach
- Assign a prior probability law p(ϵ) to the modeling and observation errors, here ϵ. From this, find the expression of the likelihood p(g|f, θ1). As an example, consider the Gaussian case:
- Assign a prior probability law p(f|θ2) to the unknown f to translate your prior knowledge on it. Again, as an example, consider the Gaussian case:
- Apply the Bayes rule to obtain the expression of the posterior law:
- Use p(f|g, θ1, θ2) to infer any quantity dependent of f.
- MAP:
- EAP or posterior mean (PM):
- For MAP, we need optimization algorithms, which can handle the huge dimensional criterion J(f) = − ln p(f|g, θ). Very often, we may be limited to using gradient-based algorithms.
- For EAP, we need integration algorithms, which can handle huge dimensional integrals. The most common tool here is the MCMC methods [24]. However, for real applications, very often, the computational costs are huge. Recently, different methods, called approximate Bayesian computation (ABC) [96–100] or VBA, have been proposed [74,96,98,101–107].
14.2. Full Bayesian: Hyperparameter Estimation
- Bloc separable, such as q(f, θ) = q1(f) q2(θ) or
- Completely separable, such as .
15. Basic Algorithms of the Variational Bayesian Approximation
15.1. Case of Two Gaussian Variables
15.2. Case of Exponential Families
16. VBA for the Unsupervised Bayesian Approach to Inverse Problems
- and . In this case, we have:
- q1(f) is free form and In the same way, this time we obtain:
- and q2(θ) is free form. In the same way, this time we obtain:
- Both q1(f) and q2(θ) have free form. The main difficulty here is that, at each iteration, the expression of q1 and q2 may change. However, if p(f, θ) is in the generalized exponential family, the expressions of q1(f) and q2(θ) will also be in the same family, and we have only to update the parameters at each iteration.
17. VBA for a Linear Inverse Problem with Simple Gaussian Priors
18. Bayesian Variational Approximation with Hierarchical Prior Models
19. Bayesian Variational Approximation with Student t Priors
20. Conclusions
- A probability law is a tool for representing our state of knowledge about a quantity.
- The Bayes or Laplace rule is an inference tool for updating our state of knowledge about an inaccessible quantity when another accessible, related quantity is observed.
- Entropy is a measure of information content in a variable with a given probability law.
- The maximum entropy principle can be used to assign a probability law to a quantity when the available information about it is in the form of a limited number of constraints on that probability law.
- Relative entropy and Kullback–Leibler divergence are tools for updating probability laws in the same context.
- When a parametric probability law is assigned to a quantity and we want to measure the amount of information gain about the parameters when some direct observations of that quantity is available, we can use the Fisher information. The structure of the Fisher information geometry in the space of parameters is derived from the relative entropy by a second order Taylor series approximation.
- All of these rules and tools are used currently in different ways in data and signal processing. In this paper, a few examples of the ways these tools are used in data and signal processing problems are presented. One main conclusion is that each of these tools has to be used in appropriate contexts. The example in spectral estimation shows that it is very important to define the problems very clearly at the beginning and to use appropriate tools and interpret the results appropriately.
- The Laplacian or Bayesian inference is the appropriate tool for proposing satisfactory solutions to inverse problems. Indeed, the expression of the posterior probability law represents the combination of the state of the knowledge in the forward model and the data and the state of the knowledge before using the data.
- The Bayesian approach can also easily be used to propose unsupervised methods for the practical application of these methods.
- One of the main limitation of those sophisticated methods is the computational cost. For this, we proposed to use VBA as an alternative to MCMC methods to propose realistic algorithms in huge dimensional inverse problems where we want to estimate an unknown signal (1D), image (2D), volume (3D) or even more (3D + time or 3D + wavelength), etc.
Acknowledgments
- †This paper is an extended version of the paper published in Proceedings of the 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014), Amboise, France, 21–26 September 2014.
Conflicts of Interest
References
- Mohammad-Djafari, A. Bayesian or Laplacian inference, entropy and information theory and information geometry in data and signal processing. AIP Conf. Proc 2014, 1641, 43–58. [Google Scholar]
- Bayes, T. An Essay toward Solving a Problem in the Doctrine of Chances. Philos. Trans 1763, 53, 370–418, By the late Rev. Mr. Bayes communicated by Mr. Price, in a Letter to John Canton. [Google Scholar]
- De Laplace, P. S. Mémoire sur la probabilité des causes par les évènements. Mémoires de l’Academie Royale des Sciences Presentés par Divers Savan 1774, 6, 621–656. [Google Scholar]
- Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J 1948, 27, 379–423. [Google Scholar]
- Hadamard, J. Mémoire sur le problème d’analyse relatif à l’équilibre des plaques élastiques encastrées; Mémoires présentés par divers savants à l’Académie des sciences de l’Institut de France; Imprimerie nationale, 1908. [Google Scholar]
- Jaynes, E.T. Information Theory Statistical Mechanics. Phys. Rev 1957, 106, 620–630. [Google Scholar]
- Jaynes, E.T. Information Theory and Statistical Mechanics II. Phys. Rev 1957, 108, 171–190. [Google Scholar]
- Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern 1968, 4, 227–241. [Google Scholar]
- Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
- Fisher, R. On the mathematical foundations of theoretical statistics. Philos. Trans. R. Stat. Soc. A 1922, 222, 309–368. [Google Scholar]
- Rao, C. Information and accuracy attainable in the estimation of statistical parameters. Bull. Culcutta Math. Soc 1945, 37, 81–91. [Google Scholar]
- Sindhwani, V.; Belkin, M.; Niyogi, P. The Geometric basis for Semi-supervised Learning. In Semi-supervised Learning; Chapelle, O., Schölkopf, B., Zien, A., Eds.; MIT press: Cambridge, MA, USA, 2006; pp. 209–226. [Google Scholar]
- Lin, J. Divergence Measures Based on the Shannon Entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar]
- Johnson, O.; Barron, A.R. Fisher Information Inequalities and the Central Limit Theorem. Probab. Theory Relat. Fields 2004, 129, 391–409. [Google Scholar]
- Berger, J. Statistical Decision Theory and Bayesian Analysis, 2nd ed; Springer-Verlag: New York, NY, USA, 1985. [Google Scholar]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 2nd ed; Chapman & Hall/CRC Texts in Statistical Science; Chapman and Hall/CRC: Boca Raton, FL, USA, 2003. [Google Scholar]
- Skilling, J. Nested Sampling. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Proceedings of 24th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 25–30 July 2004, Fischer, R., Preuss, R., Toussaint, U.V., Eds.; pp. 395–405.
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys 1953, 21, 1087–1092. [Google Scholar]
- Hastings, W.K. Monte Carlo Sampling Methods using Markov Chains and their Applications. Biometrika 1970, 57, 97–109. [Google Scholar]
- Gelfand, A.E.; Smith, A.F.M. Sampling-Based Approaches to Calculating Marginal Densities. J. Am. Stat. Assoc 1990, 85, 398–409. [Google Scholar]
- Gilks, W.R.; Richardson, S.; Spiegelhalter, D.J. Introducing Markov Chain Monte Carlo. In Markov Chain Monte Carlo in Practice; Gilks, W.R., Richardson, S., Spiegelhalter, D.J., Eds.; Chapman and Hall: London, UK, 1996; pp. 1–19. [Google Scholar]
- Gilks, W.R. Strategies for Improving MCMC. In Markov Chain Monte Carlo in Practice; Gilks, W.R., Richardson, S., Spiegelhalter, D.J., Eds.; Chapman and Hall: London, UK, 1996; pp. 89–114. [Google Scholar]
- Roberts, G.O. Markov Chain Concepts Related to Sampling Algorithms. In Markov Chain Monte Carlo in Practice; Gilks, W.R., Richardson, S., Spiegelhalter, D.J., Eds.; Chapman and Hall: London, UK, 1996; pp. 45–57. [Google Scholar]
- Tanner, M.A. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions; Springer series in Statistics; Springer: New York, NY, USA, 1996. [Google Scholar]
- Djurić, P.M.; Godsill, S.J. (Eds.) Special Issue on Monte Carlo Methods for Statistical Signal Processing; IEEE: New York, NY, USA, 2002.
- Andrieu, C.; de Freitas, N.; Doucet, A.; Jordan, M.I. An Introduction to MCMC for Machine Learning. Mach. Learn 2003, 50, 5–43. [Google Scholar]
- Clausius, R. On the Motive Power of Heat, and on the Laws Which Can be Deduced From it for the Theory of Heat; Poggendorff’s Annalen der Physick, LXXIX, Dover Reprint: New York, NY, USA, 1850; ISBN ISBN 0-486-59065-8. [Google Scholar]
- Caticha, A. Maximum Entropy, fluctuations and priors.
- Giffin, A.; Caticha, A. Updating Probabilities with Data and Moments.
- Caticha, A.; Preuss, R. Maximum Entropy and Bayesian Data Analysis: Entropic Priors Distributions. Phys. Rev. E 2004, 70, 046127. [Google Scholar]
- Akaike, H. On Entropy Maximization Principle. In Applications of Statistics; Krishnaiah, P.R., Ed.; North-Holland: Amsterdam, The Netherlands, 1977; pp. 27–41. [Google Scholar]
- Agmon, N.; Alhassid, Y.; Levine, D. An Algorithm for Finding the Distribution of Maximal Entropy. J. Comput. Phys 1979, 30, 250–258. [Google Scholar]
- Jaynes, E.T. Where do we go from here? In Maximum-Entropy and Bayesian Methods in Inverse Problems; Smith, C.R., Grandy, W.T., Jr, Eds.; Springer: Dordrecht, The Netherlands, 1985; pp. 21–58. [Google Scholar]
- Borwein, J.M.; Lewis, A.S. Duality relationships for entropy-like minimization problems. SIAM J. Control Optim 1991, 29, 325–338. [Google Scholar]
- Elfwing, T. On some Methods for Entropy Maximization and Matrix Scaling. Linear Algebra Appl 1980, 34, 321–339. [Google Scholar]
- Eriksson, J. A note on Solution of Large Sparse Maximum Entropy Problems with Linear Equality Constraints. Math. Program 1980, 18, 146–154. [Google Scholar]
- Erlander, S. Entropy in linear programs. Math. Program 1981, 21, 137–151. [Google Scholar]
- Jaynes, E.T. On the Rationale of Maximum-Entropy Methods. Proc. IEEE 1982, 70, 939–952. [Google Scholar]
- Shore, J.E.; Johnson, R.W. Properties of Cross-Entropy Minimization. IEEE Trans. Inf. Theory 1981, 27, 472–482. [Google Scholar]
- Mohammad-Djafari, A. Maximum d’entropie et problèmes inverses en imagerie. Traitement Signal 1994, 11, 87–116. [Google Scholar]
- Bercher, J. Développement de critères de nature entropique pour la résolution des problèmes inverses linéaires. Ph.D. Thesis, Université de Paris–Sud, Orsay, France, 1995. [Google Scholar]
- Le Besnerais, G. Méthode du maximum d’entropie sur la moyenne, critère de reconstruction d’image et synthèse d’ouverture en radio astronomie. In Ph.D. Thesis; Université de Paris-Sud: Orsay, France, 1993. [Google Scholar]
- Caticha, A.; Giffin, A. Updating Probabilities. [CrossRef]
- Caticha, A. Entropic Inference.
- Costa, S.I.R.; Santos, S.A.; Strapasson, J.E. Fisher information distance: A geometrical reading 2012. arXiv: 1210.2354.
- Rissanen, J. Fisher Information and Stochastic Complexity. IEEE Trans. Inf. Theory 1996, 42, 40–47. [Google Scholar]
- Shimizu, R. On Fisher’s amount of information for location family. In A Modern Course on Statistical Distributions in Scientific Work; D. Reidel Dordrecht: The Netherlands, 1975; Volume 3, pp. 305–312. [Google Scholar]
- Nielsen, F.; Nock, R. Sided and Symmetrized Bregman Centroids. IEEE Trans. Inf. Theory 2009, 55, 2048–2059. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Schroeder, M.R. Linear prediction, entropy and signal analysis. IEEE ASSP Mag 1984, 1, 3–11. [Google Scholar]
- Itakura, F.; Saito, S. A Statistical Method for Estimation of Speech Spectral Density and Formant Frequencies. Electron. Commun. Jpn 1970, 53-A, 36–43. [Google Scholar]
- Kitagawa, G.; Gersch, W. Smoothness Priors Analysis of Time Series; Lecture Notes in Statistics; Volume 116, Springer: New York, NY, USA, 1996. [Google Scholar]
- Rue, H.; Held, L. Gaussian Markov Random Fields: Theory and Applications; CRC Press: New York, NY, USA, 2005. [Google Scholar]
- Amari, S.; Cichocki, A.; Yang, H.H. A new learning algorithm for blind source separation. 757–763.
- Amari, S. Neural learning in structured parameter spaces—Natural Riemannian gradient. 127–133.
- Amari, S. Natural gradient works efficiently in learning. Neural Comput 1998, 10, 251–276. [Google Scholar]
- Knuth, K.H. Bayesian source separation and localization. SPIE Proc 1998, 3459. [Google Scholar] [CrossRef]
- Knuth, K.H. A Bayesian approach to source separation. 283–288.
- Attias, H. Independent Factor Analysis. Neural Comput 1999, 11, 803–851. [Google Scholar]
- Mohammad-Djafari, A. A Bayesian approach to source separation. 221–244.
- Choudrey, R.A.; Roberts, S. Variational Bayesian Mixture of Independent Component Analysers for Finding Self-Similar Areas in Images. 107–112.
- Lopes, H.F.; West, M. Bayesian Model Assessment in Factor Analysis. Statsinica 2004, 14, 41–67. [Google Scholar]
- Ichir, M.; Mohammad-Djafari, A. Bayesian Blind Source Separation of Positive Non Stationary Sources. 493–500.
- Mohammad-Djafari, A. Bayesian Source Separation: Beyond PCA and ICA.
- Comon, P.; Jutten, C. (Eds.) Handbook of Blind Source Separation: Independent Component Analysis and Applications; Academic Press: Burlington, MA, USA, 2010.
- Yuan, M.; Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika 2007, 94, 19–35. [Google Scholar]
- Fitzgerald, W. Markov Chain Monte Carlo methods with Applications to Signal Processing. Signal Process 2001, 81, 3–18. [Google Scholar]
- Matsuoka, T.; Ulrych, T. Information theory measures with application to model identification. IEEE Trans. Acoust. Speech Signal Process 1986, 34, 511–517. [Google Scholar]
- Bretthorst, G.L. Bayesian Model Selection: Examples Relevant to NMR. In Maximum Entropy and Bayesian Methods; Springer: Dordrecht, The Netherlands, 1989; pp. 377–388. [Google Scholar]
- Gelfand, A.E.; Dey, D.K. Bayesian model choice: Asymptotics and exact calculations. J. R. Stat. Soc. Ser. B 1994, 56, 501–514. [Google Scholar]
- Mohammad-Djafari, A. Model selection for inverse problems: Best choice of basis function and model order selection.
- Clyde, M.A.; Berger, J.O.; Bullard, F.; Ford, E.B.; Jefferys, W.H.; Luo, R.; Paulo, R.; Loredo, T. Current Challenges in Bayesian Model Choice. 71, 224–240.
- Wyse, J.; Friel, N. Block clustering with collapsed latent block models. Stat. Comput 2012, 22, 415–428. [Google Scholar]
- Giovannelli, J.F.; Giremus, A. Bayesian noise model selection and system identification based on approximation of the evidence. 125–128.
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Control 1974, AC-19, 716–723. [Google Scholar]
- Akaike, H. Power spectrum estimation through autoregressive model fitting. Ann. Inst. Stat. Math 1969, 21, 407–419. [Google Scholar]
- Farrier, D. Jaynes’ principle and maximum entropy spectral estimation. IEEE Trans. Acoust. Speech Signal Process 1984, 32, 1176–1183. [Google Scholar]
- Wax, M. Detection and Estimation of Superimposed Signals. Ph.D. Thesis, Standford University, CA, USA, March 1985. [Google Scholar]
- Burg, J.P. Maximum Entropy Spectral Analysis.
- McClellan, J.H. Multidimensional spectral estimation. Proc. IEEE 1982, 70, 1029–1039. [Google Scholar]
- Lang, S.; McClellan, J.H. Multidimensional MEM spectral estimation. IEEE Trans. Acoust. Speech Signal Process 1982, 30, 880–887. [Google Scholar]
- Johnson, R.; Shore, J. Which is Better Entropy Expression for Speech Processing:-SlogS or logS? IEEE Trans. Acoust. Speech Signal Process 1984, ASSP-32, 129–137. [Google Scholar]
- Wester, R.; Tummala, M.; Therrien, C. Multidimensional Autoregressive Spectral Estimation Using Iterative Methods. 1. [CrossRef]
- Picinbono, B.; Barret, M. Nouvelle présentation de la méthode du maximum d’entropie. Traitement Signal 1990, 7, 153–158. [Google Scholar]
- Borwein, J.M.; Lewis, A.S. Convergence of best entropy estimates. SIAM J. Optim 1991, 1, 191–205. [Google Scholar]
- Mohammad-Djafari, A. (Ed.) Inverse Problems in Vision and 3D Tomography; digital signal and image processing series; ISTE: London, UK; Wiley: Hoboken, NJ, USA, 2010.
- Mohammad-Djafari, A.; Demoment, G. Tomographie de diffraction and synthèse de Fourier à maximum d’entropie. Rev. Phys. Appl. (Paris) 1987, 22, 153–167. [Google Scholar]
- Féron, O.; Chama, Z.; Mohammad-Djafari, A. Reconstruction of piecewise homogeneous images from partial knowledge of their Fourier transform. 68–75.
- Ayasso, H.; Mohammad-Djafari, A. Joint NDT Image Restoration and Segmentation Using Gauss–Markov–Potts Prior Models and Variational Bayesian Computation. IEEE Trans. Image Process 2010, 19, 2265–2277. [Google Scholar]
- Ayasso, H.; DuchÃłne, B.; Mohammad-Djafari, A. Bayesian inversion for optical diffraction tomography. J. Mod. Opt 2010, 57, 765–776. [Google Scholar]
- Burch, S.; Gull, S.F.; Skilling, J. Image Restoration by a Powerful Maximum Entropy Method. Comput. Vis. Graph. Image Process 1983, 23, 113–128. [Google Scholar]
- Gull, S.F.; Skilling, J. Maximum entropy method in image processing. IEE Proc. F 1984, 131, 646–659. [Google Scholar]
- Gull, S.F. Developments in maximum entropy data analysis. In Maximum Entropy and Bayesian Methods; Skilling, J., Ed.; Springer: Dordrecht, The Netherlands, 1989; pp. 53–71. [Google Scholar]
- Jones, L.K.; Byrne, C.L. General entropy criteria for inverse problems with application to data compression, pattern classification and cluster analysis. IEEE Trans. Inf. Theory 1990, 36, 23–30. [Google Scholar]
- Macaulay, V.A.; Buck, B. Linear inversion by the method of maximum entropy. Inverse Probl 1989, 5. [Google Scholar] [CrossRef]
- Rue, H.; Martino, S. Approximate Bayesian inference for hierarchical Gaussian Markov random field models. J. Stat. Plan. Inference 2007, 137, 3177–3192. [Google Scholar]
- Wilkinson, R. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error 2009. arXiv:0811.3355.
- Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations. J. R. Stat. Soc. Ser. B 2009, 71, 319–392. [Google Scholar]
- Fearnhead, P.; Prangle, D. Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC 2011. arxiv:1004.1112v2.
- Turner, B.M.; van Zandt, T. A tutorial on approximate Bayesian computation. J. Math. Psych 2012, 56, 69–85. [Google Scholar]
- MacKay, D.J.C. A Practical Bayesian Framework for Backpropagation Networks. Neural Comput 1992, 4, 448–472. [Google Scholar]
- Mohammad-Djafari, A. Variational Bayesian Approximation for Linear Inverse Problems with a hierarchical prior models. 8085, 669–676.
- Likas, C.L.; Galatsanos, N.P. A Variational Approach For Bayesian Blind Image Deconvolution. IEEE Trans. Signal Process 2004, 52, 2222–2233. [Google Scholar]
- Beal, M.; Ghahramani, Z. Variational Bayesian learning of directed graphical models with hidden variables. Bayesian Stat 2006, 1, 793–832. [Google Scholar]
- Kim, H.; Ghahramani, Z. Bayesian Gaussian Process Classification with the EM-EP Algorithm. IEEE Trans. Pattern Anal. Mach. Intell 2006, 28, 1948–1959. [Google Scholar]
- Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An introduction to variational methods for graphical models. Mach. Learn 2006, 37, 183–233. [Google Scholar]
- Forbes, F.; Fort, G. Combining Monte Carlo and Mean-Field-Like Methods for Inference in Hidden Markov Random Fields. IEEE Trans. Image Process 2007, 16, 824–837. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. (B) 1977, 39, 1–38. [Google Scholar]
- Miller, M.I.; Snyder, D.L. The Role of Likelihood and Entropy in Incomplete-Data Problems: Applications to Estimating Point-Process Intensities and Toeplitz Constrained Covariances. Proc. IEEE 1987, 75, 892–907. [Google Scholar]
- Snoussi, H.; Mohammad-Djafari, A. Information geometry of Prior Selection.
- Mohammad-Djafari, A. Approche variationnelle pour le calcul bayésien dans les problémes inverses en imagerie 2009. arXiv:0904.4148.
- Beal, M. Variational Algorithms for Approximate Bayesian Inference. In Ph.D. Thesis; Gatsby Computational Neuroscience Unit, University College London: UK, 2003. [Google Scholar]
- Winn, J.; Bishop, C.M.; Jaakkola, T. Variational message passing. J. Mach. Learn. Res 2005, 6, 661–694. [Google Scholar]
- Chatzis, S.; Varvarigou, T. Factor Analysis Latent Subspace Modeling and Robust Fuzzy Clustering Using t-DistributionsClassification of binary random Patterns. IEEE Trans. Fuzzy Syst 2009, 17, 505–517. [Google Scholar]
- Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc 2008, 103, 681–686. [Google Scholar]
- Mohammad-Djafari, A. A variational Bayesian algorithm for inverse problem of computed tomography. In Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation Therapy (IMRT); Censor, Y., Jiang, M., Louis, A.K., Eds.; Publications of the Scuola Normale Superiore/CRM Series; Edizioni della Normale: Rome, Italy, 2008; pp. 231–252. [Google Scholar]
- Mohammad-Djafari, A.; Ayasso, H. Variational Bayes and mean field approximations for Markov field unsupervised estimation. 1–6.
- Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res 2001, 1, 211–244. [Google Scholar]
- He, L.; Chen, H.; Carin, L. Tree-Structured Compressive Sensing With Variational Bayesian Analysis. IEEE Signal Process. Lett 2010, 17, 233–236. [Google Scholar]
- Fraysse, A.; Rodet, T. A gradient-like variational Bayesian algorithm, Proceedings of 2011 IEEE Conference on Statistical Signal Processing Workshop (SSP), Nice France, 28–30 June 2011; pp. 605–608.
- Johnson, V.E. On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings. Bayesian Anal 2013, 8, 741–758. [Google Scholar]
- Dumitru, M.; Mohammad-Djafari, A. Estimating the periodic components of a biomedical signal through inverse problem modeling and Bayesian inference with sparsity enforcing prior. AIP Conf. Proc 2015, 1641, 548–555. [Google Scholar]
- Wang, L.; Gac, N.; Mohammad-Djafari, A. Bayesian 3D X-ray computed tomography image reconstruction with a scaled Gaussian mixture prior model. AIP Conf. Proc 2015, 1641, 556–563. [Google Scholar]
- Mohammad-Djafari, A. Bayesian Blind Deconvolution of Images Comparing JMAP, EM and VBA with a Student-t a priori Model. 98–103.
- Su, F.; Mohammad-Djafari, A. An Hierarchical Markov Random Field Model for Bayesian Blind Image Separation.
- Su, F.; Cai, S.; Mohammad-Djafari, A. Bayesian blind separation of mixed text patterns. 1373–1378.
JMPA | BEM | VBA |
---|---|---|
© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mohammad-Djafari, A. Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems. Entropy 2015, 17, 3989-4027. https://doi.org/10.3390/e17063989
Mohammad-Djafari A. Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems. Entropy. 2015; 17(6):3989-4027. https://doi.org/10.3390/e17063989
Chicago/Turabian StyleMohammad-Djafari, Ali. 2015. "Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems" Entropy 17, no. 6: 3989-4027. https://doi.org/10.3390/e17063989
APA StyleMohammad-Djafari, A. (2015). Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems. Entropy, 17(6), 3989-4027. https://doi.org/10.3390/e17063989