Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Weak Versus Strong Dominance of Shrinkage Estimators

  • Original Article
  • Published:
Journal of Quantitative Economics Aims and scope Submit manuscript

Abstract

We consider the estimation of the mean of a multivariate normal distribution with known variance. Most studies consider the risk of competing estimators, that is the trace of the mean squared error matrix. In contrast we consider the whole mean squared error matrix, in particular its eigenvalues. We prove that there are only two distinct eigenvalues and apply our findings to the James–Stein and the Thompson class of estimators. It turns out that the famous Stein paradox is no longer a paradox when we consider the whole mean squared error matrix rather than only its trace.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. For the case of unknown common variance, see Example 6.4 in Lehmann and Casella (1998), p. 368, Hoffmann (2000), and Saleh (2006, Sect. 4.5).

  2. The maximum regret 0.1815 is much smaller than the maximum regret 0.4251 for the same estimator reported in Magnus (2002, p. 230). Apparently a computational or typographical error.

References

  • Abadir, K.M., and J.R. Magnus. 2005. Matrix algebra. Cambridge University Press.

    Book  Google Scholar 

  • Baranchik, A.J. 1964. Multiple regression and estimation of the mean of a multivariate normal distribution. Technical report, No. 51, Department of Statistics, Stanford University.

  • Baranchik, A.J. 1970. A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics 41: 642–645.

    Article  Google Scholar 

  • Bock, M.E. 1975. Minimax estimators of the mean of a multivariate normal distribution. The Annals of Statistics 3: 209–218.

    Article  Google Scholar 

  • Candès, E.J., C.A. Sing-Long, and J.D. Trzasko. 2013. Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Transactions on Signal Processing 61 (19): 4643–4657.

    Article  Google Scholar 

  • Casella, G. 1990. Estimators with nondecreasing risk: application of a chi-squared identity. Statistics and Probability Letters 10: 107–109.

    Article  Google Scholar 

  • Efron, B.E., and C. Morris. 1972. Limiting the risk of Bayes and empirical Bayes estimators-part II: the empirical Bayes case. Journal of the American Statistical Association 67: 130–139.

    Google Scholar 

  • Efron, B.E., and C. Morris. 1976. Families of minimax estimators of the mean of a multivariate normal distribution. Annals of Statistics 4: 11–21.

    Google Scholar 

  • Farebrother, R.W. 1975. The minimum mean square error linear estimator and ridge regression. Technometrics 17: 127–128.

    Article  Google Scholar 

  • Hansen, B.E. 2015. Shrinkage efficiency bounds. Econometric Theory 31: 860–879.

    Article  Google Scholar 

  • Hansen, B.E. 2016. The risk of James-Stein and Lasso shrinkage. Econometric Reviews 35: 1456–1470.

    Article  Google Scholar 

  • Hoffmann, K. 2000. Stein estimation—a review. Statistical Papers 41: 127–158.

    Article  Google Scholar 

  • James, W., and Stein, C.M. 1961. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 361–379. University of California Press.

  • Johnstone, I.M. (2019). Gaussian estimation: sequence and wavelet models. Draft version, available from statweb.stanford.edu.

  • Lehmann, E.L., and G. Casella. 1998. Theory of point estimation, 2nd ed. Berlim: Springer.

    Google Scholar 

  • Magnus, J.R. 1982. Multivariate error components analysis of linear and nonlinear regression models by maximum likelihood. Journal of Econometrics 19: 239–285.

    Article  Google Scholar 

  • Magnus, J.R. 2002. Estimation of the mean of a univariate normal distribution with known variance. Econometrics Journal 5: 225–236.

    Article  Google Scholar 

  • Magnus, J.R., and G. De Luca. 2016. Weighted-average least squares: a review. Journal of Economic Surveys 30: 117–148.

    Article  Google Scholar 

  • Mikkelsen, F.R., and N.R. Hansen. 2018. Degrees of freedom for piecewise Lipschitz estimators. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 54: 819–841.

    Article  Google Scholar 

  • Saleh, A.K.M.E. 2006. Theory of preliminary test and Stein-type estimation with applications. Hoboken: Wiley.

    Book  Google Scholar 

  • Shao, P.Y.-S., and W.E. Strawderman. 1994. Improving on the James-Stein positive-part estimator. Annals of Statistics 22: 1517–1538.

    Google Scholar 

  • Stein, C.M. 1956. Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 197–206. University of California Press.

  • Stein, C.M. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9: 1135–1151.

    Article  Google Scholar 

  • Strawderman, W.E., and A. Cohen. 1971. Admissibility of estimators of the mean vector of a multivariate normal distribution with quadratic loss. Annals of Mathematical Statistics 42: 270–96.

    Article  Google Scholar 

  • Theil, H. 1971. Principles of econometrics. Hoboken: Wiley.

    Google Scholar 

  • Thompson, J.R. 1968. Some shrinkage techniques for estimating the mean. Journal of the American Statistical Association 63: 113–122.

    Google Scholar 

  • Thompson, J.R. 1989. Empirical model building. Hoboken: Wiley.

    Book  Google Scholar 

  • Tibshirani, R.J. 2015. Degrees of freedom and model search. Statistica Sinica 25: 1265–1296.

    Google Scholar 

  • Van der Vaart, A.W. 1998. Asymptotic statistics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Xu, H., and A. Namba. 2018. MSE performance of the weighted average estimators consisting of shrinkage estimators when each individual regression coefficient is estimated. Communications in Statistics Theory and Methods. https://doi.org/10.1080/03610926.2018.1475569.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the Guest Editors of this special issue and to two referees for comments and suggestions. We are especially grateful to Akio Namba for providing the key to proving Proposition 1. Without his help this result would have remained a conjecture. Giuseppe De Luca acknowledges financial support from the MIUR PRIN PRJ-0324.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan R. Magnus.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

A Some Results Involving Idempotent Matrices

Our first result is not new.

Lemma 1

Let A be a symmetric idempotent \(p\times p\) matrix of rank \(r(A)=r\). Then, \(r(I_p-A)=p-r\) and \(A(I_p-A)=0\). Next, let

$$\begin{aligned} V=\nu _1 A + \nu _2(I_p-A). \end{aligned}$$

Then the eigenvalues of V are \(\nu _1\) (multiplicity r) and \(\nu _2\) (multiplicity \(p-r\)), its determinant is \(|V|=\nu _1^{r}\nu _2^{p-r}\), and its inverse is \(V^{-1}=(1/\nu _1) A + (1/\nu _2)(I_p-A)\) when \(\nu _1\ne 0\) and \(\nu _2\ne 0\).

Proof

This is a simple version of a much more general result, see Abadir and Magnus (2005, Exercises 8.72 and 8.73). \(\square \)

Now let’s consider an extension where the \(p\times p\) matrix V is partitioned into blocks of dimensions \(p_1\) and \(p_2\) as follows:

$$\begin{aligned} V =\begin{pmatrix} \nu _1J_1 + \nu _2(I_{p_1}-J_1) &{} \gamma \imath _1^{}\imath _2' \\ \gamma \imath _2^{}\imath _1' &{} \xi _1J_2 + \xi _2(I_{p_2}-J_2) \end{pmatrix}, \end{aligned}$$
(32)

where \(\imath _1^{}=(1,1,\ldots ,1)'\) and \(\imath _2^{}=(1,1,\ldots ,1)'\) have dimensions \(p_1\) and \(p_2\) respectively, and \(J_1=\imath _1^{}\imath _1'/p_1\) and \(J_2=\imath _2^{}\imath _2'/p_2\).

In the special case \(p_1=p_2=p\) we can write \(V= V_1 \otimes J + V_2\otimes (I_p-J)\), where

$$\begin{aligned} V_1=\begin{pmatrix} \nu _1 &{} \gamma p \\ \gamma p &{} \xi _1 \end{pmatrix}, \qquad V_2= \begin{pmatrix} \nu _2 &{} 0 \\ 0 &{} \xi _2 \end{pmatrix}, \qquad J=(1/p)\,\imath \imath ', \end{aligned}$$

which implies that the eigenvalues of V are given by \(\nu _2\) (\(p-1\) times), \(\xi _2\) (\(p-1\) times), and the two eigenvalues of \(V_1\) (Magnus 1982, Lemma 2.1).

When \(p_1\ne p_2\) we cannot write V in terms of Kronecker matrices, but the result is still essentially the same.

Lemma 2

The eigenvalues of V are \(\nu _2\) (\(p_1-1\) times), \(\xi _2\) (\(p_2-1\) times) and two additional eigenvalues \(\nu _1^*\) and \(\xi _1^*\) given by

$$\begin{aligned} \frac{\nu _1+\xi _1}{2} \pm \frac{1}{2}\sqrt{(\nu _1-\xi _1)^2 + 4\gamma ^2 p_1p_2}. \end{aligned}$$

The sum of the eigenvalues is

$$\begin{aligned} {\text {tr}}V= \nu _1 + (p_1-1)\nu _2 + \xi _1 + (p_2-1)\xi _2, \end{aligned}$$

and V is positive semidefinite if and only if \(\nu _1\), \(\nu _2\), \(\xi _1\), and \(\xi _2\) are all \(\ge 0\) and in addition \(\nu _1\xi _1\ge \gamma ^2p_1p_2\).

Proof

We write

$$\begin{aligned} V-\lambda I_p = \begin{pmatrix} \bar{\nu }_1 J_1 + \bar{\nu }_2 (I_{p_1}-J_1) &{} \gamma \imath _1^{}\imath _2' \\ \gamma \imath _2^{}\imath _1' &{} \bar{\xi }_1 J_2 + \bar{\xi }_2 (I_{p_2}-J_2) \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} \bar{\nu }_1=\nu _1-\lambda , \qquad \bar{\nu }_2=\nu _2-\lambda , \qquad \bar{\xi }_1=\xi _1-\lambda , \qquad \bar{\xi }_2=\xi _2-\lambda . \end{aligned}$$

The determinant is

$$\begin{aligned} |V-\lambda I_p|&= |\bar{\nu }_1J_1+\bar{\nu }_2(I_{p_1}-J_1)| \\&\quad \times |\bar{\xi }_1J_2+\bar{\xi }_2(I_{p_2}-J_2) - \gamma ^2\imath _2^{}\imath _1'[\bar{\nu }_1J_1+\bar{\nu }_2(I_{p_1}-J_1)]^{-1}\imath _1^{}\imath _2'|\\&= \bar{\nu }_1 {\bar{\nu }_2}^{p_1-1} \left| \bar{\xi }_1J_2+\bar{\xi }_2(I_{p_2}-J_2) - \gamma ^2\imath _2^{}\imath _1' \left( \frac{1}{\bar{\nu }_1}J_1+\frac{1}{\bar{\nu }_2}(I_{p_1}-J_1)\right) \imath _1^{}\imath _2'\right| \\&=\bar{\nu }_1{\bar{\nu }_2}^{p_1-1} |\bar{\xi }_1 J_2+\bar{\xi }_2 (I_{p_2}-J_2) - (\gamma ^2/\bar{\nu }_1)p_1p_2J_2| \\&=\bar{\nu }_1{\bar{\nu }_2}^{p_1-1} \left| \frac{\bar{\nu }_1\bar{\xi }_1 - \gamma ^2p_1p_2}{\bar{\nu }_1}J_2 + \bar{\xi }_2 (I_{p_2}-J_2)\right| \\&={\bar{\nu }_2}^{p_1-1}{\bar{\xi }_2}^{p_2-1}\left( \bar{\nu }_1\bar{\xi }_1 - \gamma ^2p_1p_2\right) \\&=(\nu _2-\lambda )^{p_1-1}(\xi _2-\lambda )^{p_2-1}\left( (\nu _1-\lambda )(\xi _1-\lambda ) - \gamma ^2p_1p_2\right) . \end{aligned}$$

Hence the eigenvalues of V are \(\nu _2\) (\(p_1-1\) times), \(\xi _2\) (\(p_2-1\) times), and the two solutions \(\nu _1^*\) and \(\xi _1^*\) of the quadratic equation \((\nu _1-\lambda )(\xi _1-\lambda ) - \gamma ^2p_1p_2=0\). Note that \(\nu _1^*+\xi _1^*=\nu _1+\xi _1\). In the special case \(\gamma =0\) we have \(\nu _1^*=\nu _1\) and \(\xi _1^*=\xi _1\).

The sum of the eigenvalues is

$$\begin{aligned} (p_1-1)\nu _2 + (p_2-1)\xi _2 + \nu _1 + \xi _1, \end{aligned}$$

and it easy to verify that this equals the trace of V, as of course it should. \(\square \)

B Stein’s Lemma

Stein’s lemma (Stein 1981) is a rather surprising and strong result. We first consider the univariate, then the multivariate case. The generalization is not straightforward.

Lemma 3

Let \(x\sim {\text {N}}(\theta ,1)\) and let \(h:\mathfrak {R}\rightarrow \mathfrak {R}\) be an absolutely continuous function with derivative \(h'\). Assume that \({\text {E}}|h'(x)|<\infty \). Then,

$$\begin{aligned} {\text {cov}}(h(x),x)={\text {E}}[h(x)(x-\theta )]={\text {E}}[h'(x)]. \end{aligned}$$

Proof

We write

$$\begin{aligned}{}[h(x)\phi (x-\theta )]'&= h'(x)\phi (x-\theta ) + h(x)\phi '(x-\theta )\\&= h'(x)\phi (x-\theta ) - h(x)(x-\theta )\phi (x-\theta ), \end{aligned}$$

where \(\phi \) denotes the standard normal density. Integrating gives

$$\begin{aligned} 0=\Bigl .h(x)\phi (x-\theta )\Bigr |_{-\infty }^{\infty } = {\text {E}}[h'(x)]-{\text {E}}[h(x)(x-\theta )]. \end{aligned}$$

\(\square \)

Note the requirement of absolute continuity, which imposes a smoothness property on h that is stronger than (uniform) continuity, but weaker than continuous differentiability. It guarantees that h is differentiable almost everywhere. In the applications it is important to place minimal restrictions on the function h, for example that it may be kinked.

The univariate version of Stein’s lemma is a powerful result with many nontrivial applications. As a simple example, let \(h(x)=x^m\). Then we immediately obtain all moments of the normal distribution through the recursion \({\text {E}}[x^{m+1}]=\theta {\text {E}}[x^{m}] + m{\text {E}}[x^{m-1}]\).

In the multivariate case we have \(x\sim {\text {N}}(\theta ,I_p)\) with \(p\ge 2\) and we need the concept of ‘almost differentiability’ (in Stein’s terminology). We write \(x = (x_i,x_{-i})\) to decompose a point \(x\in \mathfrak {R}^p\) in terms of its ith component \(x_i\) and all other components \(x_{-i}\). Thus, \(h(\cdot ,x_{-i})\) refers to h as a function of its ith argument with all other arguments fixed at \(x_{-i}\). Then h is ‘almost differentiable’ if for each \(i=1,\ldots ,p\) and almost every \(x_{-i}\in \mathfrak {R}^{p-1}\) the function \(h(\cdot ,x_{-i}):\mathfrak {R}\rightarrow \mathfrak {R}\) is absolutely continuous. An almost differentiable function h has partial derivatives almost everywhere.

Given this multivariate extension of the concept of absolute continuity, Stein’s lemma reads as follows.

Lemma 4

(Stein) Let \(x\sim {\text {N}}(\theta ,I_p)\) with \(p\ge 2\) and let \(h:\mathfrak {R}^p\rightarrow \mathfrak {R}\) be almost differentiable with \({\text {E}}\Vert \nabla h(x)\Vert <\infty \), where \(\nabla h(x)\) denotes the gradient of h(x). Then,

$$\begin{aligned} {\text {E}}\left[ h(x)(x-\theta )\right] ={\text {E}}[\nabla h(x)]. \end{aligned}$$

Proof

See Stein (1981, Lemma 2). \(\square \)

Stein’s result can be generalized straightforwardly to the case where h is a vector function.

Lemma 5

Let \(x\sim {\text {N}}(\theta ,I_p)\) with \(p\ge 2\) and let \(h:\mathfrak {R}^p\rightarrow \mathfrak {R}^q\). If \(h_j:\mathfrak {R}^p\rightarrow \mathfrak {R}\) is almost differentiable with \({\text {E}}\Vert \nabla h_j(x)\Vert <\infty \) for all \(j=1,\ldots ,q\), then

$$\begin{aligned} {\text {E}}\left[ h(x)(x-\theta )'\right] ={\text {E}}\left[ \frac{\partial h(x)}{\partial x'}\right] . \end{aligned}$$

Proof

This follows from Lemma 4 by considering each row of \(h(x)(x-\theta )'\) separately. Then,

$$\begin{aligned} {\text {E}}\left[ h_j(x)(x-\theta )'\right] ={\text {E}}\left[ \frac{\partial h_j(x)}{\partial x'}\right] \end{aligned}$$

for each j and the result follows. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Luca, G., Magnus, J.R. Weak Versus Strong Dominance of Shrinkage Estimators. J. Quant. Econ. 19 (Suppl 1), 239–266 (2021). https://doi.org/10.1007/s40953-021-00270-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40953-021-00270-y

Keywords

Mathematics Subject Classification

JEL Codes