Maximum Entropy Distribution
Maximum Entropy Distribution
Maximum Entropy Distribution
INTRODUCTION
Shannon (1948) indicated how maximum entropy (ME) distributions can be derived by a straigtforward application of the calculus of variations technique. He dened the entropy of a probability density function p(x) as H = p(x) ln p(x) dx (1)
Maximizing H subject to various side conditions is wellknown in the literature as a method for deriving the forms of minimal information prior distributions; e.g. Jaynes (1968) and Zellner (1977). Jaynes (1982) has extensively analyzed examples in the discrete case, while in Lisman and Van Znylen (1972), Rao (1973) and Gokhale (1975), Kagan, Linjik continuous cases are considered. In the last case, the problem, in its general form, is the following maximize H = p(x) ln p(x) dx n (x) p(x) dx = n , n = 0, . . . , N (2)
subject to E {n (x)} =
where 0 = 1 , 0 (x) = 1 and n (x), n = 0, . . . , N are N known functions, and n , n = 0, . . . , N are the given expectation data. The classical solution of this problem is given by
N
p(x) = exp
n=0
n n (x)
(3)
The (N + 1) Lagrangien parameters = [0 , . . . , n ] are obtained by solving the following set of (N + 1) nonlinear equations
N
G n ( ) =
n (x) exp
n=0
n n (x) dx = n ,
n = 0, . . . , N
(4)
The distributions dened by (3) form a great number of known distributions which are obtained by choosing the appropriate N and n (x), n = 0, . . . , N . In general n (x) are either the powers of x or the logarithm of x. See Mukhrejee and Hurst (1984), Zellner (1988), MohammadDjafari (1990) for many other examples and discussions. Special cases have been extensively analyzed and used by many authors. When n (x) = xn , n = 0, . . . , N then n , n = 0, . . . , N are the given N moments of the distribution. See, for example, Zellner (1988) for a numerical implementation in the case N = 4. In this communication we propose three programs written in MATLAB to solve the system of equations (4). The rst is a general program where n (x) can be any functions. The second is a special case where n (x) = xn , n = 0, . . . , N . In this case the n are the geometrical moments of p(x). The third is a special case where n (x) = exp(jnx), n = 0, . . . , N . In this case the n are the trigonometrical moments (Fourier components) of p(x). We give also some examples to illustrate the usefullness of these programs.
n = 0, . . . , N
(5)
n, k = 0, . . . , N
(= )
0
(6)
G = v
(7)
This system is solved for from which we drive = 0 + , which becomes our new initial vector 0 and the iterations continue until becomes appropriately small. Note that the matrix G is a symmetric one and we have
N
gnk = gkn =
n n (x) dx n, k = 0, . . . , N
(8)
So in each iteration we have to calculate the N (N 1)/2 integrals in the equation (8). The algorithm of the general Maximum Entropy problem is then as follows: 1. 2. 3. 4. Dene the range and the discretization step of x (xmin, xmax,dx). Write a function to calculate n (x), n = 0, . . . , N (fin_x). 0 Start the iterative procedure with an initial estimate (lambda0). Calculate the (N + 1) integrals in equations (4) and the N (N 1)/2 distinct elements gnk of the matrix G by calculating the integrals in the equations(8) (Gn, gnk). 5. Solve the equation (7) to nd (delta). 0 6. Calculate = + and go back to step 3 until becomes negligible.
The calculus of the integrals in equations (4) and (8) can be made by a univariate Simpsons method. We have used a very simplied version of this method.
n=
p(x) = exp
m=0
m xm
(9)
Gn () =
x exp
m=0
m xm dx = n ,
n = 0, . . . , N
(10)
gnk = gkn =
x x exp
m=0
n k
m xm dx = Gn+k ()
n, k = 0, . . . , N
(11)
This means that the [(N + 1) (N + 1)] matrix G in equation (7) becomes a symmetric Hankel matrix which is entirely dened by 2N + 1 values Gn (), n = 0, . . . , 2N . So the algorithm in this case is the same as in the precedent one with two simplications 1. In step 2 we do not need to write a seperate function to calculate the functions n (x) = xn , n = 0, . . . , N .
2. In step 4 the number of integral evaluations is reduced, because the elements gnk of the matrix G are related to the integrals Gn () in equations (10). This matrix is dened entirely by only 2N + 1 components.
n = 0, . . . , N,
(12)
where n may be complexvalued and has the property n = n . This means that we have the following relations n (x) = exp (jn0 x) ,
N
n = N, . . . , 0, . . . N, n exp (jn0 x) ,
Gn () = gnk =
n = 0, . . . , N, n, k = 0, . . . , N,
so that all the elements of the matrix G are related to the discrete Fourier transforms of p(x). Note that G is a Hermitian Toeplitz matrix.
This distribution can be considered as a ME distribution when the constraints are =1 p(x; , ) dx normalization 0 (x) = 1, x p(x; , ) dx = 1 1 (x) = x, ln(x) p(x; , ) dx = (x) = ln(x). 2 2 This is easy to verify because the equation (12) can be written as p(x; , ) = exp 0 1 x 2 ln(x)
(18)
with
0 = ln
(1) , (1 )
1 =
and
2 = .
Now consider the following problem Given 1 and 2 determine 0 , 1 and 2 . This can be done by the standard ME method. To do this, rst we must dene the range of x, (xmin, xmax, dx), and write a function fin_x to calculate the functions 0 (x) = 1, 1 (x) = x and 2 (x) = ln x (See the function fin1_x in Annex). Then we must dene an initial estimate 0 for and, nally, let the program works. The case of the Gamma distribution is interesting because there is an analytic relation between (, ) and the mean m = E {x} and variance 2 = E {(x m)2 } which is m = (1 )/ , 2 = (1 )/ 2 or inversely = ( 2 m2 )/ 2 , = m/ 2 , (20)
(19)
so that we can use these relations to determine m and 2 . Note also that the corresponding entropy of the nal result is a byproduct of the function. Table (1) gives some numerical results obtained by ME_DENS1 program (See Annex). Table 1. 1 2 m 2 0.2000 -3.0000 0.2156 -3.0962 0.2533 0.0818 0.2000 -2.0000 -0.4124 -6.9968 0.2019 0.0289 0.3000 -1.5000 -0.6969 -5.3493 0.3172 0.0593 The next example is the case of a quartic distribution
4
p(x) = exp
n=0
n xn .
(21)
This distribution can be considered as a ME distribution when the constraints are E {xn } = xn p(x) dx = n , n = 0, . . . , 4 with 0 = 1. (22)
Now consider the following problem : Given n , n = 1, . . . , 4 calculate n , n = 0, . . . , 4 . This can be done by the ME_DENS2 program. Table (2) gives some numerical results obtained by this program:
1 0 0 0
Table 2. 3 4 0 1 2 3 4 0.05 0.10 0.1992 1.7599 2.2229 -3.9375 0.4201 0.00 0.15 0.9392 0.000 -3.3414 0.0000 4.6875 0.00 0.15 0.9392 0.000 -3.3414 0.0000 4.6875
These examples show how to use the proposed programs. A third example is also given in Annex which shows how to use the ME_DENS3 program which considers the case of trigonometric moments.
CONCLUSIONS
In this paper we addressed rst the class of ME distributions when the available data are a nite set of expectations n = E {n (x)} of some known functions n (x), n = 0, . . . , N . We proposed then three Matlab programs to solve this problem by a NewtonRaphson method in general case, in case of geometrical moments data where n (x) = xn and in case of trigonometrical moments where n (x) = exp (jn0 x). Finally, we gave some numerical results for some special examples who show how to use the proposed programs.
REFERENCES
1. 2. 3. 4. 5. 6. 7. 8. A. Zellnerr and R. Highled, Calculation of Maximum Entropy Distributions and Approximation of Marginal Posterior Distributions, Journal of Econometrics 37, 1988, 195209, North Holland. D. Mukherjee and D.C. Hurst, Maximum Entropy Revisited, Statistica Neerlandica 38, 1984, na 1, 112. Verdugo Lazo and P.N. Rathie, On the Entropy of Continuous Probability Distributions, IEEE Trans. , vol. IT24, na 1, 1978. Gokhale, Maximum Entropy Characterizations of some distributions, Statistical distributions in Scientic work , vol. 3, 299304 (G.P. Patil et al., Eds., Reidel, Dordrecht, Holland, 1975). Jaynes, Papers on probability, statistics and statistical physics, Reidel Publishing Company, Dordrecht , Holland, 1983. Matz, Maximum Likelihood parameter estimation for the quartic exponential distributions, Technometrics , 20, 475484, 1978. Mohammad-Djafari A. et Demoment G., "Estimating Priors in Maximum Entropy Image Processing," Proc. of ICASSP 1990 , pp: 2069-2072 Mohammad-Djafari A. et Idier J., "Maximum entropy prior laws of images and estimation of their parameters," Proc. of The 10th Int. MaxEnt Workshop, Laramie, Wyoming , published in Maximumentropy and Bayesian methods, T.W. Grandy ed., 1990.
ANNEX A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 function [lambda,p,entr]=me_dens1(mu,x,lambda0) %ME_DENS1 % [LAMBDA,P,ENTR]=ME_DENS1(MU,X,LAMBDA0) % This program calculates the Lagrange Multipliers of the ME % probability density functions p(x) from the knowledge of the % N contstraints in the form: % E{fin(x)}=MU(n) n=0:N with fi0(x)=1, MU(0)=1. % % MU is a table containing the constraints MU(n),n=1:N. % X is a table defining the range of the variation of x. % LAMBDA0 is a table containing the first estimate of the LAMBDAs. % (This argument is optional.) % LAMBDA is a table containing the resulting Lagrange parameters. % P is a table containing the resulting pdf p(x). % ENTR is a table containing the entropy values at each % iteration. % % Author: A. Mohammad-Djafari % Date : 10-01-1991 % mu=mu(:); mu=[1;mu]; % add mu(0)=1 x=x(:); lx=length(x); % x axis xmin=x(1); xmax=x(lx); dx=x(2)-x(1); % if(nargin == 2) % initialize LAMBDA lambda=zeros(size(mu)); % This produces a uniform lambda(1)=log(xmax-xmin); % distribution. else lambda=lambda0(:); end N=length(lambda); % fin=fin1_x(x); % fin1_x(x) is an external % % function which provides fin(x). iter=0; while 1 % start iterations iter=iter+1; disp(---------------); disp([iter=,num2str(iter)]); % p=exp(-(fin*lambda)); % Calculate p(x) plot(x,p); % plot it % G=zeros(N,1); % Calculate Gn for n=1:N G(n)=dx*sum(fin(:,n).*p); end % entr(iter)=lambda*G(1:N); % Calculate the entropy value disp([Entropy=,num2str(entr(iter))]) % gnk=zeros(N,N); % Calculate gnk gnk(1,:)=-G; gnk(:,1)=-G; % first line and first column for i=2:N % lower triangle part of the for j=2:i % matrix G gnk(i,j)=-dx*sum(fin(:,j).*fin(:,i).*p); end end for i=2:N % uper triangle part of the for j=i+1:N % matrix G gnk(i,j)=gnk(j,i); end end % v=mu-G; % Calculate v delta=gnk\v; % Calculate delta lambda=lambda+delta; % Calculate lambda eps=1e-6; % Stopping rules if(abs(delta./lambda)<eps), break, end if(iter>2) if(abs((entr(iter)-entr(iter-1))/entr(iter))<eps),break, end end end % p=exp(-(fin*lambda)); % Calculate the final p(x) plot(x,p); % plot it entr=entr(:); disp(----- END -------)
1 2 3 4 5 6 7 8 9 10 11 12
%---------------------------------%ME1 % This script shows how to use the function ME_DENS1 % in the case of the Gamma distribution. (see Example 1.) xmin=0.0001; xmax=1; dx=0.01; % define the x axis x=[xmin:dx:xmax]; mu=[0.3,-1.5]; % define the mu values [lambda,p,entr]=me_dens1(mu,x); alpha=-lambda(3); beta=lambda(2); m=(1+alpha)/beta; sigma=m/beta; disp([mu alpha beta m sigma entr(length(entr))]) %----------------------------------
1 2 3 4 5 6 7 8 9 10
function fin=fin1_x(x); % This is the external function which calculates % the fin(x) in the special case of the Gamma distribution. % This is to be used with ME_dens1. M=3; fin=zeros(length(x),M); fin(:,1)=ones(size(x)); fin(:,2)=x; fin(:,3)=log(x); return
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
function [lambda,p,entr]=me_dens2(mu,x,lambda0) %ME_DENS2 % [LAMBDA,P,ENTR]=ME_DENS2(MU,X,LAMBDA0) % This program calculates the Lagrange Multipliers of the ME % probability density functions p(x) from the knowledge of the % N moment contstraints in the form: % E{x^n}=mu(n) n=0:N with mu(0)=1. % % MU is a table containing the constraints MU(n),n=1:N. % X is a table defining the range of the variation of x. % LAMBDA0 is a table containing the first estimate of the LAMBDAs. % (This argument is optional.) % LAMBDA is a table containing the resulting Lagrange parameters. % P is a table containing the resulting pdf p(x). % ENTR is a table containing the entropy values at each % iteration. % % Author: A. Mohammad-Djafari % Date : 10-01-1991 % mu=mu(:); mu=[1;mu]; % add mu(0)=1 x=x(:); lx=length(x); % x axis xmin=x(1); xmax=x(lx); dx=x(2)-x(1); % if(nargin == 2) % initialize LAMBDA lambda=zeros(size(mu)); % This produces a uniform lambda(1)=log(xmax-xmin); % distribution. else lambda=lambda0(:); end N=length(lambda); % M=2*N-1; % Calcul de fin(x)=x.^n fin=zeros(length(x),M); % fin(:,1)=ones(size(x)); % fi0(x)=1 for n=2:M fin(:,n)=x.*fin(:,n-1); end % iter=0; while 1 % start iterations iter=iter+1; disp(---------------); disp([iter=,num2str(iter)]); % p=exp(-(fin(:,1:N)*lambda)); % Calculate p(x) plot(x,p); % plot it % G=zeros(M,1); % Calculate Gn for n=1:M G(n)=dx*sum(fin(:,n).*p); end % entr(iter)=lambda*G(1:N); % Calculate the entropy value disp([Entropy=,num2str(entr(iter))]) % gnk=zeros(N,N); % Calculate gnk for i=1:N % Matrix G is a Hankel matrix gnk(:,i)=-G(i:N+i-1); end % v=mu-G(1:N); % Calculate v delta=gnk\v; % Calculate delta lambda=lambda+delta; % Calculate lambda eps=1e-6; % Stopping rules if(abs(delta./lambda)<eps), break, end if(iter>2) if(abs((entr(iter)-entr(iter-1))/entr(iter))<eps),break, end end end % p=exp(-(fin(:,1:N)*lambda)); % Calculate the final p(x) plot(x,p); % plot it entr=entr(:); disp(----- END -------) end
1 2 3 4 5 6 7 8
%ME2 % This script shows how to use the function ME_DENS2 % in the case of the quartic distribution. (see Example 2.) xmin=-1; xmax=1; dx=0.01; % define the x axis x=[xmin:dx:xmax]; mu=[0.1,.3,0.1,.15]; % define the mu values [lambda,p,entr]=me_dens2(mu,x); disp([mu;lambda;entr(length(entr))])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
function [lambda,p,entr]=me_dens3(mu,x,lambda0) %ME_DENS3 % [LAMBDA,P,ENTR]=ME_DENS3(MU,X,LAMBDA0) % This program calculates the Lagrange Multipliers of the ME % probability density functions p(x) from the knowledge of the % Fourier moments values : % E{exp[-j n w0 x]}=mu(n) n=0:N with mu(0)=1. % % MU is a table containing the constraints MU(n),n=1:N. % X is a table defining the range of the variation of x. % LAMBDA0 is a table containing the first estimate of the LAMBDAs. % (This argument is optional.) % LAMBDA is a table containing the resulting Lagrange parameters. % P is a table containing the resulting pdf p(x). % ENTR is a table containing the entropy values at each % iteration. % % Author: A. Mohammad-Djafari % Date : 10-01-1991 % mu=mu(:);mu=[1;mu]; % add mu(0)=1 x=x(:); lx=length(x); % x axis xmin=x(1); xmax=x(lx); dx=x(2)-x(1); if(nargin == 2) % initialize LAMBDA lambda=zeros(size(mu)); % This produces a uniform lambda(1)=log(xmax-xmin); % distribution. else lambda=lambda0(:); end N=length(lambda); % M=2*N-1; % Calculate fin(x)=exp[-jnw0x] fin=fin3_x(x,M); % fin3_x(x) is an external % % function which provides fin(x). iter=0; while 1 % start iterations iter=iter+1; disp(---------------); disp([iter=,num2str(iter)]); % % Calculate p(x) p=exp(-real(fin(:,1:N))*real(lambda)+imag(fin(:,1:N))*imag(lambda)); plot(x,p); % plot it % G=zeros(M,1); % Calculate Gn for n=1:M G(n)=dx*sum(fin(:,n).*p); end %plot([real(G(1:N)),real(mu),imag(G(1:N)),imag(mu)]) % entr(iter)=lambda*G(1:N); % Calculate the entropy disp([Entropy=,num2str(entr(iter))]) % gnk=zeros(N,N); % Calculate gnk for n=1:N % Matrix gnk is a Hermitian for k=1:n % Toeplitz matrix. gnk(n,k)=-G(n-k+1); % Lower triangle part end end for n=1:N for k=n+1:N gnk(n,k)=-conj(G(k-n+1)); % Upper triangle part end end % v=mu-G(1:N); % Calculate v delta=gnk\v; % Calculate delta lambda=lambda+delta; % Calculate lambda eps=1e-3; % Stopping rules if(abs(delta)./abs(lambda)<eps), break, end if(iter>2) if(abs((entr(iter)-entr(iter-1))/entr(iter))<eps),break, end end end % Calculate p(x) p=exp(-real(fin(:,1:N))*real(lambda)+imag(fin(:,1:N))*imag(lambda)); plot(x,p); % plot it entr=entr(:); disp(----- END -------) end
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
%ME3 % This scripts shows how to use the function ME_DENS3 % in the case of the trigonometric moments. clear;clf xmin=-5; xmax=5; dx=0.5; % define the x axis x=[xmin:dx:xmax];lx=length(x); p=(1/sqrt(2*pi))*exp(-.5*(x.*x));% Gaussian distribution plot(x,p);title(p(x)) % M=3;fin=fin3_x(x,M); % Calculate fin(x) % mu=zeros(M,1); % Calculate mun for n=1:M mu(n)=dx*sum(fin(:,n).*p); end % w0=2*pi/(xmax-xmin);w=w0*[0:M-1]; % Define the w axis % mu=mu(2:M); % Attention : mu(0) is added % in ME_DENS3 [lambda,p,entr]=me_dens3(mu,x); disp([mu;lambda;entr(length(entr))])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
function fin=fin3_x(x,M); % This is the external function which calculates % the fin(x) in the special case of the Fourier moments. % This is to be used with ME_DENS3. % x=x(:); lx=length(x); % x axis xmin=x(1); xmax=x(lx); dx=x(2)-x(1); % fin=zeros(lx,M); % fin(:,1)=ones(size(x)); % fi0(x)=1 w0=2*pi/(xmax-xmin);jw0x=(sqrt(-1)*w0)*x; for n=2:M fin(:,n)=exp(-(n-1)*jw0x); end return