Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Generalized Stein Identity and Matrix Differential Operators

  • Chapter
  • First Online:
Shrinkage Estimation for Mean and Covariance Matrices

Part of the book series: SpringerBriefs in Statistics ((JSSRES))

  • 925 Accesses

Abstract

In shrinkage estimation, the Stein (1973, 1981) identity is known as an integration by parts formula for deriving unbiased risk estimates. It is a simple but very powerful mathematical tool and has contributed significantly to the development of shrinkage estimation. This chapter provides a generalized Stein identity in matrix-variate normal distribution model and also some useful results on matrix differential operators for a unified application of the identity to high- and low-dimensional normal models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • M. Bilodeau, T. Kariya, Minimax estimators in the normal MANOVA model. J. Multivar. Anal. 28, 260–270 (1989)

    Article  MathSciNet  Google Scholar 

  • B. Efron, C. Morris, Families of minimax estimators of the mean of a multivariate normal distribution. Ann. Stat. 4, 11–21 (1976)

    Article  MathSciNet  Google Scholar 

  • W. Fleming, Functions of Several Variables, 2nd edn. (Springer, New York, 1977)

    Book  Google Scholar 

  • L.R. Haff, Minimax estimators for a multinormal precision matrix. J. Multivar. Anal. 7, 374–385 (1977)

    Article  MathSciNet  Google Scholar 

  • L.R. Haff, An identity for the Wishart distribution with applications. J. Multivar. Anal. 9, 531–544 (1979)

    Article  MathSciNet  Google Scholar 

  • Y. Konno, Improved estimation of matrix of normal mean and eigenvalues in the multivariate \(F\)-distribution. Doctoral dissertation, Institute of Mathematics, University of Tsukuba, 1992. http://mcm-www.jwu.ac.jp/~konno/

  • Y. Konno, Shrinkage estimators for large covariance matrices in multivariate real and complex normal distributions under an invariant quadratic loss. J. Multivar. Anal. 100, 2237–2253 (2009)

    Article  MathSciNet  Google Scholar 

  • T. Kubokawa, M.S. Srivastava, Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data. J. Multivar. Anal. 99, 1906–1928 (2008)

    Article  MathSciNet  Google Scholar 

  • Y. Sheena, Unbiased estimator of risk for an orthogonally invariant estimator of a covariance matrix. J. Jpn. Stat. Soc. 25, 35–48 (1995)

    Article  MathSciNet  Google Scholar 

  • C. Stein, Estimation of the mean of a multivariate normal distribution. Technical Reports No.48 (Department of Statistics, Stanford University, Stanford, 1973)

    Google Scholar 

  • C. Stein, Lectures on the theory of estimation of many parameters, in Proceedings of Scientific Seminars of the Steklov Institute Studies in the Statistical Theory of Estimation, Part I, vol. 74, ed. by I.A. Ibragimov, M.S. Nikulin (Leningrad Division, 1977), pp. 4–65

    Google Scholar 

  • C. Stein, Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9, 1135–1151 (1981)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hisayuki Tsukuma .

Appendix

Appendix

In this appendix, we first give another derivation of the Stein identity (5.3). Denote by \(f({\varvec{X}})\) the p.d.f. of \(\mathcal{N}_{m\times p}({\varvec{{\Theta }}},{\varvec{I}}_m\otimes {\varvec{{\Sigma }}})\). Let

$$ I_{ST}({\varvec{G}})=\int _{\mathbb {R}^{m\times p}} \mathrm{\,tr\,}\nabla _X \{{\varvec{G}}^\top f({\varvec{X}})\} (\mathrm{d}{\varvec{X}}). $$

It follows that \(\nabla _X f({\varvec{X}})=-({\varvec{X}}-{\varvec{{\Theta }}}){\varvec{{\Sigma }}}^{-1} f({\varvec{X}})\), so that

$$ I_{ST}({\varvec{G}})=E[\mathrm{\,tr\,}\nabla _X{\varvec{G}}^\top ]-E[\mathrm{\,tr\,}({\varvec{X}}-{\varvec{{\Theta }}}){\varvec{{\Sigma }}}^{-1}{\varvec{G}}^\top ] $$

provided the expectations exist. Hence the Stein identity (5.3) can be verified if \(I_{ST}({\varvec{G}})=0\).

For \(r>0\), let \(\mathbb {B}_r=\{{\varvec{X}}\in \mathbb {R}^{m\times p}:\Vert \mathrm {vec}({\varvec{X}})\Vert \le r\}\), where \(\Vert \cdot \Vert \) is the usual Euclidean norm and \(\mathrm {vec}(\cdot )\) is defined in Definition 2.3. Then \(\mathbb {B}_r\rightarrow \mathbb {R}^{m\times p}\) as \(r\rightarrow {\infty }\) and

$$ I_{ST}({\varvec{G}})=\lim _{r\rightarrow {\infty }} \int _{\mathbb {B}_r} \mathrm {vec}(\nabla _X)^\top \mathrm {vec}({\varvec{G}}f({\varvec{X}})) (\mathrm{d}{\varvec{X}}). $$

The boundary of \(\mathbb {B}_r\) is expressed by \(\partial \mathbb {B}_r=\{\mathrm {vec}({\varvec{X}})\in \mathbb {R}^{mp}:\Vert \mathrm {vec}({\varvec{X}})\Vert =r\}\). Denote by \({\varvec{u}}\) an outward unit normal vector at a point \(\mathrm {vec}({\varvec{X}})\in \partial \mathbb {B}_r\). Let \({\lambda }_{\partial \mathbb {B}_r}\) be Lebesgue measure on \(\partial \mathbb {B}_r\). By the Gauss divergence theorem,

$$ I_{ST}({\varvec{G}}) =\lim _{r\rightarrow {\infty }} \int _{\partial \mathbb {B}_r} {\varvec{u}}^\top \mathrm {vec}({\varvec{G}}) f({\varvec{X}})(\mathrm{d}{\lambda }_{\partial \mathbb {B}_r}). $$

For details of the Gauss divergence theorem, see Fleming (1977).

Let \(o(\cdot )\) be the Landau symbol, namely, for real-valued functions f(x) and g(x) with \(g(x)\ne 0\), we write \(f(x)=o(g(x))\) when \(\lim _{x\rightarrow c}|f(x)/g(x)|=0\) for an extended real number c. If

$$ \sup _{\mathrm {vec}({\varvec{X}})\in \partial \mathbb {B}_r} \Vert \mathrm {vec}({\varvec{G}})\Vert f({\varvec{X}})=o(r^{1-mp}) \quad {\text {as}}\,\,\,r\rightarrow {\infty }, $$

then \(I_{ST}({\varvec{G}})=0\). In fact,

$$\begin{aligned} \int _{\partial \mathbb {B}_r} |{\varvec{u}}^\top \mathrm {vec}({\varvec{G}})| f({\varvec{X}})(\mathrm{d}{\lambda }_{\partial \mathbb {B}_r})&\le \int _{\partial \mathbb {B}_r} \Vert \mathrm {vec}({\varvec{G}})\Vert f({\varvec{X}})(\mathrm{d}{\lambda }_{\partial \mathbb {B}_r}) \\&\le o(r^{1-mp})\int _{\partial \mathbb {B}_r} (\mathrm{d}{\lambda }_{\partial \mathbb {B}_r}) =o(1), \end{aligned}$$

because \(\int _{\partial \mathbb {B}_r} (\mathrm{d}{\lambda }_{\partial \mathbb {B}_r})\) is the surface area of the \((mp-1)\)-sphere of radius r in \(\mathbb {R}^{mp}\), namely, \(\int _{\partial \mathbb {B}_r} (\mathrm{d}{\lambda }_{\partial \mathbb {B}_r})\approx r^{mp-1}\).

Next, a simple derivation of the Haff identity (5.5) is provided by using the Gauss divergence theorem. The derivation is essentially the same as Haff (1977, 1979). Let \(f({\varvec{S}})\) be the p.d.f. of \(\mathcal{W}_p(n,{\varvec{{\Sigma }}})\). For a differentiable matrix-valued function \({\varvec{G}}\in \mathbb {S}_p\), let

$$ I_{HF}({\varvec{G}})=\int _{\mathbb {S}_p^{(+)}} \mathrm{\,tr\,}\mathrm {D}_S\{{\varvec{G}}f({\varvec{S}})\}(\mathrm{d}{\varvec{S}}) $$

Since \(\mathrm {D}_S|{\varvec{S}}|={\varvec{S}}^{-1}|{\varvec{S}}|\) and \(\mathrm {D}_S\mathrm{\,tr\,}{\varvec{{\Sigma }}}^{-1}{\varvec{S}}={\varvec{{\Sigma }}}^{-1}\), we get \(\mathrm {D}_S f({\varvec{S}})=\{(n-p-1){\varvec{S}}^{-1}-{\varvec{{\Sigma }}}^{-1}\}f({\varvec{S}})/2\), implying that

$$ I_{HF}({\varvec{G}}) =E[\mathrm{\,tr\,}\mathrm {D}_S{\varvec{G}}]+\frac{n-p-1}{2}E[\mathrm{\,tr\,}{\varvec{S}}^{-1}{\varvec{G}}]-\frac{1}{2}E[\mathrm{\,tr\,}{\varvec{{\Sigma }}}^{-1}{\varvec{G}}] $$

provided the expectations exist. Hence the Haff identity (5.5) follows if \(I_{HF}({\varvec{G}})=0\).

Denote by \(\partial /\partial {\varvec{S}}=(\partial /\partial s_{ij})\) the \(p\times p\) matrix differential operator with respect to \({\varvec{S}}\in \mathbb {S}_p\). For \({\varvec{A}}=(a_{ij})\in \mathbb {S}_p\), define

$$ \mathrm {Vec}({\varvec{A}})=(a_{11},a_{21},\ldots ,a_{p1},a_{22},a_{32},\ldots ,a_{p2},\ldots ,a_{p-1,p-1},a_{p,p-1},a_{pp})^\top \in \mathbb {R}^q, $$

where \(q=p(p+1)/2\). From symmetry of \({\varvec{G}}\), it holds that

$$\begin{aligned} \mathrm{\,tr\,}\mathrm {D}_S\{{\varvec{G}}f({\varvec{S}})\}&=\sum _{i=1}^p\sum _{j=1}^p\frac{1+{\delta }_{ij}}{2}\frac{\partial }{\partial s_{ij}}\{g_{ji}f({\varvec{S}})\} =\sum _{i=1}^p\sum _{j=1}^i\frac{\partial }{\partial s_{ij}}\{g_{ij}f({\varvec{S}})\} \\&=\mathrm {Vec}(\partial /\partial {\varvec{S}})^\top \mathrm {Vec}({\varvec{G}}f({\varvec{S}})), \end{aligned}$$

so that

$$ I_{HF}({\varvec{G}}) =\int _{\mathbb {S}_p^{(+)}} \mathrm {Vec}(\partial /\partial {\varvec{S}})^\top \mathrm {Vec}({\varvec{G}}f({\varvec{S}}))(\mathrm{d}{\varvec{S}}). $$

For \(r>0\), let \(\partial \mathbb {B}_r^q=\{\mathrm {Vec}({\varvec{S}})\in \mathbb {R}^q:\Vert \mathrm {Vec}({\varvec{S}})\Vert =r\}\) and, for \(0< r_1\le r_2<{\infty }\), let \(\mathbb {C}_{r_1,r_2}=\{\mathrm {Vec}({\varvec{S}})\in \mathbb {R}^q: r_1\le \Vert \mathrm {Vec}({\varvec{S}})\Vert \le r_2\}\). Then \(\mathbb {C}_{r_1,r_2}\cap \mathbb {S}_p^{(+)}\rightarrow \mathbb {S}_p^{(+)}\) as \(r_1\rightarrow 0\) and \(r_2\rightarrow {\infty }\). The boundary of \(\mathbb {C}_{r_1,r_2}\cap \mathbb {S}_p^{(+)}\) can be expressed as \(\bigcup _{i=1}^3 \partial \mathbb {B}_i\), where \(\partial \mathbb {B}_1\), \(\partial \mathbb {B}_2\) and \(\partial \mathbb {B}_3\) are certain sets satisfying \(\partial \mathbb {B}_1\subset \partial \mathbb {B}_{r_1}^q\), \(\partial \mathbb {B}_2\subset \partial \mathbb {B}_{r_2}^q\) and \(\partial \mathbb {B}_3\subset \partial \mathbb {S}_p^{(+)}\). Note that, for any point \({\varvec{S}}\in \partial \mathbb {S}_p^{(+)}\), \(|{\varvec{S}}|=0\), namely, \(f({\varvec{S}})=0\) when \(n-p-1>0\). Let \({\varvec{u}}_1=-\mathrm {Vec}({\varvec{S}})/\Vert \mathrm {Vec}({\varvec{S}})\Vert \) for \(\mathrm {Vec}({\varvec{S}})\in \partial \mathbb {B}_{r_1}^q\) and \({\varvec{u}}_2=\mathrm {Vec}({\varvec{S}})/\Vert \mathrm {Vec}({\varvec{S}})\Vert \) for \(\mathrm {Vec}({\varvec{S}})\in \partial \mathbb {B}_{r_2}^q\). Denote by \({\lambda }_{\partial \mathbb {B}_r^q}\) Lebesgue measure on \(\partial \mathbb {B}_r^q\). Using the Gauss divergence theorem gives

$$ I_{HF}({\varvec{G}}) = \lim _{r_1\rightarrow 0} \int _{\partial \mathbb {B}_1} {\varvec{u}}_1^\top \mathrm {Vec}({\varvec{G}}) f({\varvec{S}}) (\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_1}^q}) + \lim _{r_2\rightarrow {\infty }} \int _{\partial \mathbb {B}_2} {\varvec{u}}_2^\top \mathrm {Vec}({\varvec{G}}) f({\varvec{S}}) (\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_2}^q}). $$

Using the Landau symbol \(o(\cdot )\), we assume that

$$ \sup _{\mathrm {Vec}({\varvec{S}})\in \partial \mathbb {B}_1}\Vert \mathrm {Vec}({\varvec{G}})\Vert f({\varvec{S}})=o(r_1^{1-q}) \quad {\text {as}}\, r_1\rightarrow 0 $$

and

$$ \sup _{\mathrm {Vec}({\varvec{S}})\in \partial \mathbb {B}_2}\Vert \mathrm {Vec}({\varvec{G}})\Vert f({\varvec{S}})=o(r_2^{1-q}) \quad {\text {as}}\, r_2\rightarrow {\infty }. $$

Under these assumptions, we can see that

$$\begin{aligned} \int _{\partial \mathbb {B}_1} |{\varvec{u}}_1^\top \mathrm {Vec}({\varvec{G}})| f({\varvec{S}}) (\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_1}^q})&\le \int _{\partial \mathbb {B}_1} \Vert \mathrm {Vec}({\varvec{G}})\Vert f({\varvec{S}}) (\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_1}^q}) \\&\le o(r_1^{1-q}) \int _{\partial \mathbb {B}_{r_1}^q}(\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_1}^q}) =o(1) \qquad {\text {as}}\, r_1\rightarrow 0 \end{aligned}$$

and also

$$ \int _{\partial \mathbb {B}_2} |{\varvec{u}}_2^\top \mathrm {Vec}({\varvec{G}})| f({\varvec{S}}) (\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_2}^q}) \le o(r_2^{1-q}) \int _{\partial \mathbb {B}_{r_2}^q}(\mathrm{d}{\lambda }_{\partial \mathbb {B}_{r_2}^q}) =o(1) \qquad {\text {as}}\, r_2\rightarrow {\infty }, $$

so that \(I_{HF}({\varvec{G}})=0\).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tsukuma, H., Kubokawa, T. (2020). A Generalized Stein Identity and Matrix Differential Operators. In: Shrinkage Estimation for Mean and Covariance Matrices. SpringerBriefs in Statistics(). Springer, Singapore. https://doi.org/10.1007/978-981-15-1596-5_5

Download citation

Publish with us

Policies and ethics