Applied Mathematics and Computation 127 (2002) 249–260
www.elsevier.com/locate/amc
A generalized maxentropic inversion
procedure for noisy data
Amos Golan
a
a,*
, Henryk Gzyl
b
Department of Economics, American University, 200 Roper Hall, 4400 Massaachusetts Avenue
NW, Washington, DC 20016-8029, USA
b
USB, UCV. AP 52120, Caracas 1050-A, Venezuela
Abstract
In this note we present a way of solving linear inverse problems with convex constraints and noisy data. Combining the methods of maximum entropy in the mean
(MEM) and the generalized maximum entropy (GEM), we are able to treat both the
noiseless and the noisy cases as conceptually the same problem. This new generalized
inversion procedure is easy to apply and compute and is useful for a large class of
models in the natural and social sciences. Three detailed examples are developed and
discussed. Ó 2002 Elsevier Science Inc. All rights reserved.
Keywords: Constrained linear inverse problems; Maximum entropy; Noisy data
1. Introduction
During the 1950s, both Jaynes [7,8] and Kullback [11] and Kullback and
Liebler [15] introduced a variational method for characterizing probability
distributions, when the only available data consisted of the mean value (with
respect to the unknown distribution) of a few random variables. This method
was greatly applied and used in a variety of fields. See for example [9,10,12,16]
or any of the Kluwer’s volumes in the series ‘‘Maximum Entropy and Bayesian
Methods’’.
*
Corresponding author.
E-mail addresses: agolan@american.edu (A. Golan), hgzyl@reacciun.ve (H. Gzyl).
0096-3003/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved.
PII: S 0 0 9 6 - 3 0 0 3 ( 0 0 ) 0 0 1 7 2 - 7
250
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
More recently, during the last decade, the method of maximum entropy was
extended considerably to accommodate for data in terms of any arbitrary
convex constraints. Among the more recent studies are the ‘‘second level
maximum entropy method’’ of Gzyl [3], better known as the maximum entropy
in the mean (MEM) introduced by Bercher et al. [17], Gamboa and Gassiat
[18], Gzyl [5], and the work of Golan et al. [1] known as the generalized
maximum entropy (GME). Similar entropic methods are also developed and
applied by Zellner [13,14] in his Bayesian method of moments (BMOM), and
others (e.g., [6]).
The common idea behind these approaches consists in transforming, with
minumum a priori assumptions, an ill-conditioned, linear problem, subject to
non-linear but convex constraints, into a much simpler and much smaller
problem in convex optimization. Even though these approaches may seem
similar on the very abstract level, they differ in their treatment of the data (the
constraints) as they are developed based on the view that the data one observes
may be in terms of moments (pure or noisy) which takes us to the ME or
MEM, or may be in terms of noisy observations which takes us to the GME.
More specifically, a very interesting aspect of the maxentropic approach consists of the way one can deal with observational noise. For example, the
standard combination of a geometric and a maxentropic formulation, within
the ‘‘regularization’’ approach, is described in Gzyl [4]. But it is in [1] where an
all the way maxentropic approach is presented. To briefly describe the first
approach, consider the problem of finding an unknown vector b when the only
data y available come in the form of
y ¼ X b þ e;
ð1:1Þ
where the vector e denotes observational noise. The obvious geometric approach consists in looking for b in a ball BM ðy; T Þ ¼ fn : jjy X njj 6 T g, where
the metric M is related to our model of the noise, and the regularization parameter T is large enough so that BM (y,T) intersects the range of the linear
operator X. In the second approach, the GME, instead of the regularization
parameter, both the state and noise vectors are estimated simulataneously via a
generalized entropy function that is optimized with respect to each noisy observation.
Within the above framework, our objective is to formulate a maxentropic
alternative to these ways of dealing with noise. We will reformulate and generalize the GME and MEM problems such that the two are unified to a generalized inversion procedure. Having done this, we will extend the formulation
and investigate some of the basic properties of this approach. The basic idea is
that the noise ‘‘pushes’’ the data vector y out of the range of X. By means of the
technique we propose, the data vector is ‘‘pushed’’ back to the range of X, and
simultaneously a solution to the inverse problem with the corrected data is
obtained. A main outcome of this new approach is that even though our
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
251
original a priori model (assumption) of the noise assigns zero mean to the noise
vector, our posterior (not in the Bayesian sense) may assign a non-zero mean
value for the noise.
In Section 2 we formulate the basic maxentropic model for inverting noisy
data. To be consistent with the literature, this is done in two steps: the pure (or
noiseless) model and the general noisy linear model. In this way, we are able to
generalize the GME and show that it is possible to reduce the general noisy
linear model to the noiseless case. In Section 3 we provide three examples. Here
it will be clear how this GME method allows us to estimate both the noise and
the signal from the data based on different assumptions or priors. Finally, we
conclude in Section 4.
2. Problem statement and mathematical framework
Let
y ¼ X b þ e;
N
ð2:1Þ
K
where y 2 R , X is an N K known linear operator, b 2 R and e 2 RN is
some noise vector. We also assume that b 2 C, where C is a convex set (call it
Assumption A1 ). The objective is to estimate the vector b from the indirect
noisy observations y, where C is a given closed set that embodies some constraints that the unknown b is to satisfy. A typical example is given by
C ¼ fb 2 RK jbk 2 zk , zk ½; k ¼ 1; 2; . . . ; Kg. If the observation matrix X is irregular or ill-conditioned or if K >> N , the problem is ill-posed. This implies
that one has to (i) incorporate some prior knowledge, or constraints, on the
solution, or (ii) specify a certain criterion to choose among the infinitely many
solutions, or (iii) use both.
2.1. Case 1 – The noiseless case
Let y ¼ X b be as is defined by (2.1) but with e ¼ 0, and apply Assumption
A1 (i.e., b 2 C). This means that the data y are viewed as a set of N ‘‘pure’’
moments. To solve this problem by means of the MEM (e.g., [5,17,18]), or the
GME method of Golan et al. [1], we view the possible solutions as values of a
random variable B, defined on some probability space ðW; #; QÞ and replace the
quest for the b solving (2.1) by a quest for a probability P on ðW; #Þ; P
Q,
such that
y ¼ XEP ½B ;
ð2:2Þ
where P
Q means that there exists a positive measurable qðnÞ such that
dP ðnÞ ¼ qðnÞ dQðnÞ.
252
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
Even though it may be convinient to have W 6¼ C, in finite dimensional
situations (like those discussed here) we can consider W ¼ C, as the constraint
space, and # ¼ BðCÞ the Borel subsets of C. In this case BðnÞ ¼ n for n 2 C is
just the identity mapping. Further, the only assumption about the prior
probability distribution Q is that the closure of the convex set generated by the
support of Q, coincides with C.
Next, introduce the class of solutions
P ðQÞ ¼ fP
QjXEP fBg ¼ yg
ð2:3Þ
and assume that it is not empty. This implies that there is a solution to our
problem. Note also that P ðQÞ is convex. The concave entropy function on P ðQÞ
is defined by
Z
Z
dP
dP
SQ ðP Þ :¼
log
dQ ¼
qðnÞ log qðnÞ dQ
ð2:4Þ
dQ
C
C dQ
or 1 when j log qðnÞj is not P-integrable. From the concavity of x log x for
x > 0, one obtains the concavity of SQ ðP Þ on P ðQÞ.
For future reference, recall the familiar and trivial to prove, but extremely
important lemma.
Lemma 2.1. Let ðE; e; mÞ be any measure space. Let f and g be positive,
e-measurable functions, then
Z
f
Km ðf =gÞ ¼
f log
dm P 0
g
E
and Km ðf =gÞ ¼ 0 if and only if f ¼ g m-almost everywhere.
Next, consider the parametric family of measure on ðC; BðCÞÞ given by
dP ðkÞ ¼
1
exp hk; XBi dQ;
XðkÞ
ð2:5Þ
where hk; li denotes the scalar product of vectors in RN and XðkÞ is defined by
Z
XðkÞ ¼
expðhk; XBiÞ dQ:
ð2:6Þ
C
Now, let the role of the f in Lemma 2.1 be played by any P 2 PðQÞ, and that of
g to be played by any P ðkÞ defined in (2.5). Then, it is trivial to verify that,
although P ðkÞ may not be in PðQÞ,
SQ ðP ðkÞÞ ‘ðkÞ ¼ ln XðkÞ þ hk; yi
ð2:7Þ
and that according to Lemma 2.1
SðP Þ 6 ‘ðkÞ:
ð2:8Þ
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
253
That is, if for any k, such that XðkÞ is finite, the entropy SQ ðP ðkÞÞ maximizes
that of any P 2 P ðQÞ, it is sufficient to produce one k, call it k , such that P ðk Þ
is in P ðQÞ. Note that the expected value of the coordinate vector with respect to
such P ðk Þ satisfies Eq. (2.2). To be able to formally go on, it is sufficient to
assume that the set
DðX ; QÞ ¼ k 2 RN jXðkÞ < 1
ð2:9Þ
is not empty. In particular, the set DðX ; QÞ is convex (see [5] for examples) and
‘ðkÞ ¼ log XðkÞ þ ðk; yÞ
is convex on DðX ; QÞ. Note, that P ðQÞ being non-empty implies that ‘ðkÞ is
bounded below, and DðX ; QÞ being not empty means that SQ ðP Þ is bounded
above. A simple version of this result is stated below.
Theorem 2.1. Assume that P ðQÞ and DðX ; QÞ are not empty, and that the
infimum of ‘ðkÞ is reached in the interior of DðX; QÞ. Then, if k ¼
N
arginf ‘ðkÞ : k 2
and
Sup SQ ðP Þ : P 2 PðQÞ ¼
P ðk Þ 2 PðQÞ
R ;
N
inf ‘ðkÞ k 2 R .
Proof. Since ‘ðkÞ is differentiable in k and if k minimizes ‘ðkÞ, then
rk ‘ðk Þ ¼ 0, or
Z
Beðk ;XBÞ
X
dQ ¼ y;
XðkÞ
which means that P ðk Þ 2 PðQÞ. The rest of the proof is immediate.
Remark 2.1. Note that the distribution ðdP k0 Þ is called the posterior (or postdata) distribution of b as dQ is called the a priori (or prior) distribution.
However there is no direct Bayesian connection between them.
Remark 2.2. Note that for the traditional moment-constraints problems, the
beauty of this set-up consists of the enormous reduction in the size of (2.1) as
we only need to minimize the unconstrained function ‘ðkÞ.
2.2. Case 2 – The general noisy problem
Consider the problem consisting of solving
y ¼ X b þ e;
b 2 CS ;
ð2:10Þ
in which e is an RN -valued random variable describing the noise in the measurement of y, and y, X and b are defined as before. Assume that e ranges over
a convex set Cn . To simplify, we further assume that e : Cn ! Cn is the identity
254
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
mapping, and that the only prior statistical information (assumptions) about e
is carried by a probability measure Qn ðdgÞ and that the closure of the convex
set is generated by the support of Qn . Note, that this does not imply that Qn ðdgÞ
is the true distribution of e.
Remark 2.3. Note that this type of problem can be viewed in two different
ways, depending on the available data y. In the more traditional way (e.g.,
MEM, etc.) the y are noisy moments with possible K N . In the tradition
of the linear statistical model, however, the vector y reflects the N individual observations where in that case N K. Even though the solution
procedure discussed here is general enough to encompass both scenarios,
the emphasis here (unlike the MEM literature) is on solving the traditional
linear statistical model where the data (constraints) are in terms of noisy
observations.
To solve for b and for the posterior distribution for the noise e, we extend
the approach discussed in Case 2.1. Let C ¼ CS Cn , L ¼ BðCS Þ BðCn Þ and
Q ¼ QS Qn , which describe our a priori information about both sets of
variables (i.e., state and noise variables). Next, set d ¼ ðb; eÞ ¼ ðf; gÞ 2 C and
X ¼ ½X ; I where X is an N ðN þ KÞ matrix. Since our only data in (2.10)
are y, it can be viewed as a problem consisting of solving for d in
y ¼ X d;
d 2 C:
ð2:11Þ
Following the above (Case 2.1) solution procedure, we end up with some
k 2 RN which minimizes (2.7), where now
Z
XðkÞ ¼
eðk;X dÞ dQðdÞ
C
and also
d ¼ EP ðk Þ ðdÞ ¼
EP ðk Þ ðfÞ
EP ðk Þ ðgÞ
satisfies (2.11). Under this set-up, we can show the following.
Theorem 2.2. The components b and e, or similarly f and g, of d are independent
with respect to P ðk Þ.
Proof. Note that
Z
Z
XðkÞ ¼
ehk;X di dQðdÞ ¼
C
¼ XS ðkÞXn ðkÞ:
CS Cn
ehk;X fi ehk;gi dQS ðfÞ dQn ðgÞ
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
255
Therefore,
dP ðkÞ ¼
hk;X fi
hk;gi
1 hk;X di
e
e
e
dQS ðfÞ
dQðfÞ :
dQðdÞ ¼
XðkÞ
XS ðkÞ
Xn ðkÞ
ð2:12Þ
The function to be minimized in this case is
‘ðkÞ ¼ ln XS ðkÞ þ ln Xn ðkÞ þ hk; yi:
ð2:13Þ
Once k is found, it is inserted in (2.12) and according to Theorem 2.1,
d ¼ EP ðk Þ ðdÞ solves (2.10). Also, since P ðk Þ ¼ Pn ðk ÞPS ðk Þ is a product
measure, it follows that the components of d ¼ ðf; gÞ are independent with
respect to P ðk Þ.
Remark 2.4. An interesting consequence of this theorem
is that even though e
R
has zero mean with respect to the a priori Qn (i.e., Cn gQn ðdgÞ ¼ 0), it may have
mean with respect to the post-data (posteriors) Pn ðk Þ (i.e.,
Ra non-zero
gPn ðdgÞ 6¼ 0).
Cn
3. Examples
3.1. The uniform case
Consider the linear noisy model (2.1), where e is a realization of a random
variable n : ðWn ; #n ; Qn Þ ! Cn RN . Recall also that Cn is a closed, convex set
and we assumed that the prior distribution of n, conveyed by Qn , is such that Cn
equals the closure of the convex hull generated by the support of Qn .
We now construct the posteriors Pn and mean of n for the case where Wn ¼
N
Cn ¼ ½v; v , and Qn ðdgÞ ¼ ð1=2vÞ dg1 . . . dgN . This means that each one of the
noise components ei ði ¼ 1; 2; . . . ; N Þ is assumed to range in a symmetric around
zero interval with a prior uniform distribution in this range. This case reflects the
continuous limit of the GME model with a finite number of support points for
both space and noise components (e.g., [1]. By Theorem 2.2, there is a k , depending on the data y and the matrix X, such that the posterior Pn ðdgÞ is given by
dPn ðgÞ
ehk ;gi
dQn ðkÞ
¼
Xn ðk Þ
with Xn ðk Þ being explicitly given by
Z v
Z
N
Y
1
Xn ðkÞ ¼
ehk ;gi dQn ðgÞ ¼
eki gi dgi
2v
Cn
v
i¼1
N
Y
cosh ki v
¼
ki v
i¼1
ð3:1Þ
256
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
evaluated at k that minimizes
‘ðkÞ ¼ log XS ðkÞ þ log Xn ðkÞ þ hk; yi:
Given the posterior Pn ðdgÞ, the mean value of the jth component is
!
Z
Z v
cosh kj v
1
kj gj
ej ¼ EPn ðnj Þ ¼
gj dPn ðgÞ ¼
ge
dgj =
2v v j
kj v
Cn
)
(
1
¼
v tghkj v j ¼ 1; 2; . . . ; N :
kj
ð3:2Þ
Finally, we note that the corresponding solution to (2.11) satisfies
y ¼ X b þ e , or e ‘‘pushes’’ y back into X ðRNQ
Þ and b solves for y e ¼ X b .
K
Had we considered the constraint set C ¼ 1 ½zj ; zj with a uniform a priori
measure on each factor, and carried out the steps described above, we would
have obtained the following point estimate for each bk :
!Z
zk
1
o
bk ¼
eak fk dfk
ðzk zk Þ
oak
zk
(
)
a zk
1
zk e k zk eak zk eak zk eak zk
¼
þ
ð3:2aÞ
ðzk zk Þ
ak
a2
k
and in this case Xs ðkÞ is a product of factors
Xks ðk Þ
a zk
eak zk e kk
¼
ak ðzk zk Þ
for a ¼ X 0 k with X 0 denoting the transpose of X.
3.2. The normal case
Consider the opposite situation where the noise vector n is allowed to take
any value. Thus, Cn ¼ RN and #n ¼ RN . A common and reasonable prior
distribution for the noise variable is the Normal distribution Qn ðdgÞ ¼
N =2
ð2prÞ
expðg2 =2rÞ. To make this formulation more general and more realistic, we allow the noise components to be correlated among the N degrees of
freedom, or similarly we do not exclude the possibility that the N observations
are correlated via the vector n. Then,
N =2
det R
Qn ðdgÞ ¼
exp ðg0 R1 gÞ=2; ;
ð3:3Þ
2p
where R is the covariance matrix. For the time being, assume that r or the
covariance matrix R are known.
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
257
Starting with the simpler case (no autocorrelations), the corresponding
normalization factor Xn ðkÞ is
!Z
Z hk;gi g2 =2r
N
Y
e
1
e
2
eki gi gi =2r dgi
Xn ðkÞ ¼
dg ¼
N =2
ð2prÞ
ð2prÞN =2
i¼1
¼
N
Y
2
2
eki r=2 ¼ ek
r=2
ð3:4Þ
;
i¼1
Minimizing the dual, or unconstrained, problem
‘ðkÞ ¼ hk; yi þ log XS ðkÞ þ log Xn ðkÞ
X 2
¼ hk; yi þ log XS ðkÞ þ
ðrki =2Þ
ð3:5Þ
i
yields the desired k , which, in turn, determines Pn ðgÞ by
dPn ðgÞ ¼
2
ehk ;gi
eðgþk rÞ =2r
dg:
dQn ðgÞ ¼
Xn ðk Þ
ð2prÞN =2
Finally, the maxentropic mean of the post-data (or estimated) noise is
Z
Z ðgþk rÞ2 =2r
e
dg ¼ rk :
e ¼ g dPn ðgÞ ¼
ð2prÞN =2
ð3:6Þ
ð3:7Þ
This means that, for finite samples, the posterior distribution of the noise may
have a non-zero mean. Given e , the solution to the linear problem satisfies
y þ k r ¼ X b .
Next, we generalize this result to allow (but not force) for autocorrelation
among the N observed data points. Building on Eq. (3.3) the normalization
factor Xn ðkÞ is
N =2
Z
0 1
0
<k;g> det R
Xn ðkÞ ¼ e
eðg R gÞ dg ¼ eðg RgÞ=2 :
ð3:4aÞ
2p
Minimizing the dual analogue of Eq. (3.5) yields
ehk ;gi
dQn ðgÞ
Xn ðk Þ
N=2
det R
¼
exp ðRk þ gÞ; R1 ðRk þ gÞ =2 dg;
ð3:6aÞ
2p
where we note that hk; gi þ 12 g; R1 g þ hk; Rki ¼ ðRk þ gÞ; R1 ðRk þ gÞ =2.
Finally, e ¼ Rk and we can view y þ Rk as the corrected, or filtered, data.
Thus, the actual reconstructed b satisfies y þ Rk ¼ X b .
dPn ðgÞ ¼
258
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
3.3. Multinomial discrete choice problem
As an example of how an unordered multinomial discrete choice problem
can be formulated within this framework, when the unknown parameters are
multinomial probabilities, we follow Golan et al. [2]. Suppose, in an experiment
consisting of N trials, that binary random variables y1j ; y2j ; . . . ; yNj are observed,
where yij , for i ¼ 1; 2; . . . ; N , takes on one of J unordered categories for
j ¼ 1; 2; . . . ; J . Thus, on trial i, one of the J alternatives is observed in the form
of a binary variable yij , that equals unity iff alternative j is observed in trial i
and zero
Let the probability of alternative j on trial i be
otherwise.
pij ¼ Pr yij ¼ 1 and assume the pij are related to a set of explanatory variables
through the model
pij ðbÞ Prðyij ¼ 1xi ; bj Þ ¼ Gðx0 bj Þ > 0 for i ¼ 1; 2; . . . ; N and j
i
¼ 1; 2; . . . ; J
ð3:8Þ
where bj is a ðK 1Þ vector of unknown parameters x0i ¼ ðxi1 ; xi2 ; . . . ; xiK Þ is a
ð1 KÞ vector of covariates and GðÞ is a function, linking the probabilities pij
0
with
PJ the 0 linear structure xi bj , that maps the real line into [0,1] and
j¼1 Gðxi bj Þ ¼ 1. Traditionally, a log-likelihood function is specified on some
prior assumptions regarding the data generation process and the CDF GðÞ.
Alternatively, suppose we are given noisy data yij which we model as
yij ¼ Gðx0i bj Þ þ eij ¼ pij þ eij or y ¼ p þ e;
ð3:9Þ
where the pij are the unknown multinomial probabilities and the eij are noise
components contained in the natural interval Cn ¼ ½1; 1 .
Given the known covariates xi , we rewrite (3.9) as
ðIJ X 0 Þy ¼ ðIJ X 0 Þp þ ðIJ X 0 Þe;
ð3:10Þ
where X is ðN KÞ. There are NJ data points, yij , KJ moment relations (data
points) and NJ ð> KJ Þ unknown multinomial parameters.
To cast the problem in a GME (uniform priors) form, we use the procedure
of Example
3.1, but with a discrete support space. That is, we reparameterize
P
eij ¼ h vh wijh ¼ Ew ½V to transform the eij to probabilities that may take on
values over the interval [0,1]. Under this reparameterization, we may rewrite
(3.10) as
ðIJ X 0 Þy ¼ ðIJ X 0 Þp þ ðIJ X 0 ÞV w;
ð3:11Þ
where both the unknown p and w are in the form of proper probabilities.
Given the reparameterized inverse relation (3.11) involving the unknown
and unobservable p and w. The (discrete) GME solution is
pij ¼
expðx0i bj Þ
expðx0i bj Þ
P
¼
0
Xi ðbj Þ
j expðxi bj Þ
ð3:12Þ
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
259
and
wijh ¼
expðx0i bj vh Þ
expðx0i bj vh Þ
¼P
;
0
Wij ðbj Þ
h expðxi bj vh Þ
ð3:13Þ
where bj ¼ kj and k are the Lagrange multipliers associated with the moment restrictions (3.11). If the moment condition (3.10) or (3.11), is written in a
pure form (i.e., v ¼ 0 and/or e ¼ 0), then the corresponding ME formulation
yields a solution similar to the ML multinomial logit solution. Similarly, just
like the general problem (2.10), the unconstrained problem has the same basic
structure as this of Eq. (2.13).
To show the relationship with the traditional ML approach, we note that the
information matrix Iðbj Þ is
Iðbj Þ ¼
X expðx0i bj Þ
X
1
0
pij Xi ðbj Þ xi x0i ;
2 xi xi ¼
Xi ðbj Þ
i
i
ð3:14Þ
where (3.14) is the jth diagonal block of J 2 blocks of dimension ðK KÞ. At the
limit of a large sample, v ! 0 and b ! bML and therefore, this information
matrix converges to the ML information matrix. Finally, we note that the
formulation developed in Examples 3.1 and 3.2 can be easily applied and used
in this specific model.
4. Concluding remarks
In this paper we present a way for solving linear inverse problems with
convex constraints and noisy data. Within the Maximum entropy approach,
we follow the idea of the MEM and GME and are able to treat both the
noiseless and noisy problems as conceptually the same problem, while avoiding
the use of some external, pre-determined regularization parameter. We also
show that (i) the posterior noise and state estimates are independent of one
another, and that (ii) even though our original a priori model assigns zero
mean to the noise vector, our estimates (posterior) may assign a non-zero mean
value for the noise.
To show the generality and applicability of this approach we present three
examples. The examples differ in the way the noisy data exist (for the observer)
and in the prior information the observer has.
References
[1] A. Golan, G. Judge, D. Miller, Maximum Entropy Econometrics: Robust Estimation With
Limited Data, Wiley, New York, 1996.
[2] A. Golan, G. Judge, J. Perloff, A maximum entropy approach to recovering information from
multinomial response data, J. Am. Statist. Assoc. 91 (1996) 841.
260
A. Golan, H. Gzyl / Appl. Math. Comput. 127 (2002) 249–260
[3] H. Gzyl, The Maximum Entropy Method, World Scientific, Singapore, 1993.
[4] H. Gzyl, in: G. Erickson (Ed.), Maximum Entropy and Bayesian Studies, Kluwer Academic,
The Netherlands, 1998.
[5] H. Gzyl, Y. Velasquez, Linear Inverse Problems: The Maxentropic Connection, Kluwer
Academic, The Netherlands, forthcoming.
[6] G.W. Imbens, P. Johnson, R.H. Spady, Information theoretic approaches to inference in
moment condition models, Econometrica 66 (1998) 333.
[7] E.T. Jaynes, Information theory and statistical mechanics, Phys. Rev. 106 (1957) 620.
[8] E.T. Jaynes, Information theory and statistical mechanics, Phys. Rev. 108 (1957) 171.
[9] E.T. Jaynes, Prior information and ambiguity in inverse problems, in: D.W. McLaughlin (Ed.),
Inverse Problems, SIAM Proceedings, American Mathematical Society, Providence, RI, 1984,
p. 151.
[10] E.T. Jaynes, Where do we go from here? in: C.R. Smith, W.T. Grandy (Eds.), Maximum
Entropy and Inverse Problems, D. Reidel, Boston, MA, 1985.
[11] J. Kullback, Information Theory and Statistics, Wiley, New York, 1959.
[12] R.D. Levine, An information theoretical approach to inversion problems, J. Phys. A 13 (1980)
91.
[13] A. Zellner, Bayesian method of moments/instrumental variable bmom/iv analysis of mean and
regression models, in: J.C. Lee, W.C. Johnson, A. Zellner (Eds.), Modeling and Prediction:
Honoring Seymour Geisser, Springer, Berlin, 1996, p. 61.
[14] A. Zellner, The Bayesian method of moments theory and applications, in: T. Fomby, R.C. Hill
(Eds.), Advances in Econometrics, 1997.
[15] S. Kullback, R.A. Liebler, On information and sufficiency, Ann. Math. Statist. 22 (1951) 79.
[16] J.N. Kapur, Maximum Entropy Models in Science and Engineering, Wiley, New York, 1989.
[17] J.-F. Bercher, G.L. Besnerais, G. Demoment, in: J. Skiling, S. Sibisi (Eds.), Maximum Entropy
and Bayesian Studies, Kluwer, Dordrecht, 1996.
[18] F. Gamboa, E. Gassiat, Bayesian methods and maximum entropy for ill-posed inverse
problems, Ann. Statist. 25 (1997) 328.