Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
re
ections on the probability space induced by 
moment conditions with implications for Bayesian 
inference": a discussion 
Christian P. Robert 
Universite Paris-Dauphine, Paris  University of Warwick, Coventry 
bayesianstatistics@gmail.com
Outline 
what is the question? 
what could the question be? 
what is the answer? 
what could the answer be ?
what is the question? 
If one speci
es a set of moment functions collected 
together into a vector m(x, ) of dimension M, regards  
as random and asserts that some transformation Z(x, ) 
has distribution  , then what is required to use this 
information and then possibly a prior to make valid 
inference? R. Gallant, p.4
Priors without eorts 
I quest for model induced prior dating back to early 1900's 
[Lhoste, 1923] 
I reference priors such as Jereys' prior induced by sampling 
distribution 
[Jereys, 1939] 
I Fiducial distributions as Fisher's attempted answer 
[Fisher, 1956]
Fisher's t
ducial distribution 
When considering 
t = 
x -  
p 
s= 
n 
the ratio has a frequentist t distribution with n - 1 degrees of 
freedom
Fisher's t
ducial distribution 
However, no equivalent justi
cation in asserting that 
t = 
x -  
p 
s= 
n 
has a t posterior distribution with n - 1 degrees of freedom on , 
given (x, s) except when using a non-informative and improper 
prior (, 2) / 1=2 since, then 
p 
n) 
  Tn-1(x, s=
Fisher's t
ducial distribution 
Furthermore, neither Bayesian nor frequentist interpretation implies 
that 
t = 
x -  
p 
s= 
n 
has a t posterior distribution with n - 1 degrees of freedom jointly
what could the question be? 
Given a set of moment equations 
E[m(X1, . . . ,Xn, )] = 0 
(where both the Xi 's and  are random), can one derive a 
likelihood function and a prior distribution compatible with those 
constraints?
coherence across sample sizes n 
Highly complex question since it implies the integral equation 
Z 
Xn 
m(x1, . . . , xn, ) ()f (x1j)    f (xnj)ddx1    dxn = 0 
must or should have a solution in (, f ) for all n's. 
possible outside of a likelihood x prior modelling?
coherence across sample sizes n 
Highly complex question since it implies the integral equation 
Z 
Xn 
m(x1, . . . , xn, ) ()f (x1j)    f (xnj)ddx1    dxn = 0 
must or should have a solution in (, f ) for all n's. 
possible outside of a likelihood x prior modelling?
Zellner's Bayesian method of moments 
Given moment conditions on parameter  and 2 
E[jx1, . . . , xn] = xn E[2jx1, . . .] = s2 
n var(j2, x1, . . .) = 2=n 
derivation of a maximum entropy posterior 
j2, x1, . . .  N(xn, 2=n) -2jx1, . . .  Exp(s2 
n ) 
[Zellner, 1996] 
but incompatible with corresponding predictive distribution 
[Geisser  Seidenfeld, 1999]
Zellner's Bayesian method of moments 
Given moment conditions on parameter  and 2 
E[jx1, . . . , xn] = xn E[2jx1, . . .] = s2 
n var(j2, x1, . . .) = 2=n 
derivation of a maximum entropy posterior 
j2, x1, . . .  N(xn, 2=n) -2jx1, . . .  Exp(s2 
n ) 
[Zellner, 1996] 
but incompatible with corresponding predictive distribution 
[Geisser  Seidenfeld, 1999]
what is the answer? 
Under the condition that Z(, ) is surjective, 
p?(xj) =  (Z(x, )) 
and arbitrary choice of prior () 
I lhs and rhs operate on dierent spaces 
I no reason why density   should integrate against Lebesgue 
measure in n-dimensional Euclidean space 
I no direct connection with a genuine likelihood function, i.e., 
product of the densities of the Xi 's (conditional on )
what is the answer? 
Under the condition that Z(, ) is surjective, 
p?(xj) =  (Z(x, )) 
and arbitrary choice of prior () 
I lhs and rhs operate on dierent spaces 
I no reason why density   should integrate against Lebesgue 
measure in n-dimensional Euclidean space 
I no direct connection with a genuine likelihood function, i.e., 
product of the densities of the Xi 's (conditional on )
what could the answer be? 
A common situation that requires consideration of the 
notions that follow is that deriving the likelihood from a 
structural model is analytically intractable and one 
cannot verify that the numerical approximations one 
would have to make to circumvent the intractability are 
suciently accurate. R. Gallant, p.7
Approximative Bayesian answers 
De
ning joint distribution on (, x1, . . . , xn) through moment 
equations prevents regular Bayesian inference as likelihood is 
unavailable 
there may be alternative available: 
I Approximative Bayesian computation (ABC) and empirical 
likelihood based Bayesian inference 
[Tavare et al., 1999; Owen, 201; Mengersen et al., 2013] 
I INLA (Laplace), EP (expectation/propagation), 
[Martino et al., 2008; Barthelme  Chopin, 2014] 
I variational Bayes 
[Jaakkola  Jordan, 2000]

More Related Content

"reflections on the probability space induced by moment conditions with implications for Bayesian inference": a discussion

  • 1. re ections on the probability space induced by moment conditions with implications for Bayesian inference": a discussion Christian P. Robert Universite Paris-Dauphine, Paris University of Warwick, Coventry bayesianstatistics@gmail.com
  • 2. Outline what is the question? what could the question be? what is the answer? what could the answer be ?
  • 3. what is the question? If one speci
  • 4. es a set of moment functions collected together into a vector m(x, ) of dimension M, regards as random and asserts that some transformation Z(x, ) has distribution , then what is required to use this information and then possibly a prior to make valid inference? R. Gallant, p.4
  • 5. Priors without eorts I quest for model induced prior dating back to early 1900's [Lhoste, 1923] I reference priors such as Jereys' prior induced by sampling distribution [Jereys, 1939] I Fiducial distributions as Fisher's attempted answer [Fisher, 1956]
  • 7. ducial distribution When considering t = x - p s= n the ratio has a frequentist t distribution with n - 1 degrees of freedom
  • 9. ducial distribution However, no equivalent justi
  • 10. cation in asserting that t = x - p s= n has a t posterior distribution with n - 1 degrees of freedom on , given (x, s) except when using a non-informative and improper prior (, 2) / 1=2 since, then p n) Tn-1(x, s=
  • 12. ducial distribution Furthermore, neither Bayesian nor frequentist interpretation implies that t = x - p s= n has a t posterior distribution with n - 1 degrees of freedom jointly
  • 13. what could the question be? Given a set of moment equations E[m(X1, . . . ,Xn, )] = 0 (where both the Xi 's and are random), can one derive a likelihood function and a prior distribution compatible with those constraints?
  • 14. coherence across sample sizes n Highly complex question since it implies the integral equation Z Xn m(x1, . . . , xn, ) ()f (x1j) f (xnj)ddx1 dxn = 0 must or should have a solution in (, f ) for all n's. possible outside of a likelihood x prior modelling?
  • 15. coherence across sample sizes n Highly complex question since it implies the integral equation Z Xn m(x1, . . . , xn, ) ()f (x1j) f (xnj)ddx1 dxn = 0 must or should have a solution in (, f ) for all n's. possible outside of a likelihood x prior modelling?
  • 16. Zellner's Bayesian method of moments Given moment conditions on parameter and 2 E[jx1, . . . , xn] = xn E[2jx1, . . .] = s2 n var(j2, x1, . . .) = 2=n derivation of a maximum entropy posterior j2, x1, . . . N(xn, 2=n) -2jx1, . . . Exp(s2 n ) [Zellner, 1996] but incompatible with corresponding predictive distribution [Geisser Seidenfeld, 1999]
  • 17. Zellner's Bayesian method of moments Given moment conditions on parameter and 2 E[jx1, . . . , xn] = xn E[2jx1, . . .] = s2 n var(j2, x1, . . .) = 2=n derivation of a maximum entropy posterior j2, x1, . . . N(xn, 2=n) -2jx1, . . . Exp(s2 n ) [Zellner, 1996] but incompatible with corresponding predictive distribution [Geisser Seidenfeld, 1999]
  • 18. what is the answer? Under the condition that Z(, ) is surjective, p?(xj) = (Z(x, )) and arbitrary choice of prior () I lhs and rhs operate on dierent spaces I no reason why density should integrate against Lebesgue measure in n-dimensional Euclidean space I no direct connection with a genuine likelihood function, i.e., product of the densities of the Xi 's (conditional on )
  • 19. what is the answer? Under the condition that Z(, ) is surjective, p?(xj) = (Z(x, )) and arbitrary choice of prior () I lhs and rhs operate on dierent spaces I no reason why density should integrate against Lebesgue measure in n-dimensional Euclidean space I no direct connection with a genuine likelihood function, i.e., product of the densities of the Xi 's (conditional on )
  • 20. what could the answer be? A common situation that requires consideration of the notions that follow is that deriving the likelihood from a structural model is analytically intractable and one cannot verify that the numerical approximations one would have to make to circumvent the intractability are suciently accurate. R. Gallant, p.7
  • 22. ning joint distribution on (, x1, . . . , xn) through moment equations prevents regular Bayesian inference as likelihood is unavailable there may be alternative available: I Approximative Bayesian computation (ABC) and empirical likelihood based Bayesian inference [Tavare et al., 1999; Owen, 201; Mengersen et al., 2013] I INLA (Laplace), EP (expectation/propagation), [Martino et al., 2008; Barthelme Chopin, 2014] I variational Bayes [Jaakkola Jordan, 2000]
  • 24. ning joint distribution on (, x1, . . . , xn) through moment equations prevents regular Bayesian inference as likelihood is unavailable there may be alternative available: I Approximative Bayesian computation (ABC) and empirical likelihood based Bayesian inference [Tavare et al., 1999; Owen, 201; Mengersen et al., 2013] I INLA (Laplace), EP (expectation/propagation), [Martino et al., 2008; Barthelme Chopin, 2014] I variational Bayes [Jaakkola Jordan, 2000]
  • 25. Bayesian approximative answers I Using a fake likelihood does not prohibit Bayesian analysis, as shown in the paper with model in eqn. (45) I However this requires case-by-case consistency analysis since pseudo-likelihoods do not oer same garantees I Example of ABC model choice based on insucient statistics [Marin et al., 2014]
  • 26. Empirical likelihood (EL) Dataset x made of n independent replicates x = (x1, . . . , xn) of a rv X F Generalized moment condition pseudo-model EF h(X,) = 0, where h known function, and unknown parameter Induced empirical likelihood Lel(jx) = max p Yn i=1 pi for all p such that 0 6 pi 6 1, P i pi = 1, P i pih(xi ,) = 0 [Owen, 1988, B'ka, Empirical Likelihood, 2001]
  • 27. Empirical likelihood (EL) Dataset x made of n independent replicates x = (x1, . . . , xn) of a rv X F Generalized moment condition pseudo-model EF h(X,) = 0, where h known function, and unknown parameter Induced empirical likelihood Lel(jx) = max p Yn i=1 pi for all p such that 0 6 pi 6 1, P i pi = 1, P i pih(xi ,) = 0 [Owen, 1988, B'ka, Empirical Likelihood, 2001]
  • 28. Raw ABCel sampler Nave implementation: Act as if EL was an exact likelihood [Lazar, 2003, B'ka] for i = 1 ! N do generate i from the prior distribution () set the weight !i = Lel(i jxobs) end for return (i ,!i ), i = 1, . . . ,N I Output weighted sample of size N [Mengersen et al., 2013, PNAS]
  • 29. Raw ABCel sampler Nave implementation: Act as if EL was an exact likelihood [Lazar, 2003, B'ka] for i = 1 ! N do generate i from the prior distribution () set the weight !i = Lel(i jxobs) end for return (i ,!i ), i = 1, . . . ,N I Performance evaluated through eective sample size .XN ESS = 1 i=1 8 :!i .XN j=1 !j 9= 2 ; [Mengersen et al., 2013, PNAS]