Weak Versus Strong Dominance of Shrinkage Estimators

De Luca, Giuseppe; Magnus, Jan R.

doi:10.1007/s40953-021-00270-y

Weak Versus Strong Dominance of Shrinkage Estimators

Original Article
Published: 18 November 2021

Volume 19, pages 239–266, (2021)
Cite this article

Journal of Quantitative Economics Aims and scope Submit manuscript

Giuseppe De Luca¹ &
Jan R. Magnus^2,3

159 Accesses
2 Citations
Explore all metrics

Abstract

We consider the estimation of the mean of a multivariate normal distribution with known variance. Most studies consider the risk of competing estimators, that is the trace of the mean squared error matrix. In contrast we consider the whole mean squared error matrix, in particular its eigenvalues. We prove that there are only two distinct eigenvalues and apply our findings to the James–Stein and the Thompson class of estimators. It turns out that the famous Stein paradox is no longer a paradox when we consider the whole mean squared error matrix rather than only its trace.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning and the James–Stein estimator

Article Open access 30 June 2023

On Shrinkage Estimators and “Effective Degrees of Freedom”

Shrinkage estimation with logarithmic penalties

Article Open access 29 November 2023

Notes

For the case of unknown common variance, see Example 6.4 in Lehmann and Casella (1998), p. 368, Hoffmann (2000), and Saleh (2006, Sect. 4.5).
The maximum regret 0.1815 is much smaller than the maximum regret 0.4251 for the same estimator reported in Magnus (2002, p. 230). Apparently a computational or typographical error.

References

Abadir, K.M., and J.R. Magnus. 2005. Matrix algebra. Cambridge University Press.
Book Google Scholar
Baranchik, A.J. 1964. Multiple regression and estimation of the mean of a multivariate normal distribution. Technical report, No. 51, Department of Statistics, Stanford University.
Baranchik, A.J. 1970. A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics 41: 642–645.
Article Google Scholar
Bock, M.E. 1975. Minimax estimators of the mean of a multivariate normal distribution. The Annals of Statistics 3: 209–218.
Article Google Scholar
Candès, E.J., C.A. Sing-Long, and J.D. Trzasko. 2013. Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Transactions on Signal Processing 61 (19): 4643–4657.
Article Google Scholar
Casella, G. 1990. Estimators with nondecreasing risk: application of a chi-squared identity. Statistics and Probability Letters 10: 107–109.
Article Google Scholar
Efron, B.E., and C. Morris. 1972. Limiting the risk of Bayes and empirical Bayes estimators-part II: the empirical Bayes case. Journal of the American Statistical Association 67: 130–139.
Google Scholar
Efron, B.E., and C. Morris. 1976. Families of minimax estimators of the mean of a multivariate normal distribution. Annals of Statistics 4: 11–21.
Google Scholar
Farebrother, R.W. 1975. The minimum mean square error linear estimator and ridge regression. Technometrics 17: 127–128.
Article Google Scholar
Hansen, B.E. 2015. Shrinkage efficiency bounds. Econometric Theory 31: 860–879.
Article Google Scholar
Hansen, B.E. 2016. The risk of James-Stein and Lasso shrinkage. Econometric Reviews 35: 1456–1470.
Article Google Scholar
Hoffmann, K. 2000. Stein estimation—a review. Statistical Papers 41: 127–158.
Article Google Scholar
James, W., and Stein, C.M. 1961. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 361–379. University of California Press.
Johnstone, I.M. (2019). Gaussian estimation: sequence and wavelet models. Draft version, available from statweb.stanford.edu.
Lehmann, E.L., and G. Casella. 1998. Theory of point estimation, 2nd ed. Berlim: Springer.
Google Scholar
Magnus, J.R. 1982. Multivariate error components analysis of linear and nonlinear regression models by maximum likelihood. Journal of Econometrics 19: 239–285.
Article Google Scholar
Magnus, J.R. 2002. Estimation of the mean of a univariate normal distribution with known variance. Econometrics Journal 5: 225–236.
Article Google Scholar
Magnus, J.R., and G. De Luca. 2016. Weighted-average least squares: a review. Journal of Economic Surveys 30: 117–148.
Article Google Scholar
Mikkelsen, F.R., and N.R. Hansen. 2018. Degrees of freedom for piecewise Lipschitz estimators. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 54: 819–841.
Article Google Scholar
Saleh, A.K.M.E. 2006. Theory of preliminary test and Stein-type estimation with applications. Hoboken: Wiley.
Book Google Scholar
Shao, P.Y.-S., and W.E. Strawderman. 1994. Improving on the James-Stein positive-part estimator. Annals of Statistics 22: 1517–1538.
Google Scholar
Stein, C.M. 1956. Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 197–206. University of California Press.
Stein, C.M. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9: 1135–1151.
Article Google Scholar
Strawderman, W.E., and A. Cohen. 1971. Admissibility of estimators of the mean vector of a multivariate normal distribution with quadratic loss. Annals of Mathematical Statistics 42: 270–96.
Article Google Scholar
Theil, H. 1971. Principles of econometrics. Hoboken: Wiley.
Google Scholar
Thompson, J.R. 1968. Some shrinkage techniques for estimating the mean. Journal of the American Statistical Association 63: 113–122.
Google Scholar
Thompson, J.R. 1989. Empirical model building. Hoboken: Wiley.
Book Google Scholar
Tibshirani, R.J. 2015. Degrees of freedom and model search. Statistica Sinica 25: 1265–1296.
Google Scholar
Van der Vaart, A.W. 1998. Asymptotic statistics. Cambridge: Cambridge University Press.
Book Google Scholar
Xu, H., and A. Namba. 2018. MSE performance of the weighted average estimators consisting of shrinkage estimators when each individual regression coefficient is estimated. Communications in Statistics Theory and Methods. https://doi.org/10.1080/03610926.2018.1475569.
Article Google Scholar

Download references

Acknowledgements

We are grateful to the Guest Editors of this special issue and to two referees for comments and suggestions. We are especially grateful to Akio Namba for providing the key to proving Proposition 1. Without his help this result would have remained a conjecture. Giuseppe De Luca acknowledges financial support from the MIUR PRIN PRJ-0324.

Author information

Authors and Affiliations

Department of Economics, Business and Statistics, University of Palermo, Palermo, Italy
Giuseppe De Luca
Department of Econometrics and Data Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Jan R. Magnus
Tinbergen Institute, Amsterdam, The Netherlands
Jan R. Magnus

Authors

Giuseppe De Luca
View author publications
You can also search for this author in PubMed Google Scholar
Jan R. Magnus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan R. Magnus.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Some Results Involving Idempotent Matrices

Our first result is not new.

Lemma 1

Let A be a symmetric idempotent $p\times p$ matrix of rank $r(A)=r$. Then, $r(I_p-A)=p-r$ and $A(I_p-A)=0$. Next, let

$$\begin{aligned} V=\nu _1 A + \nu _2(I_p-A). \end{aligned}$$

Then the eigenvalues of V are $\nu _1$ (multiplicity r) and $\nu _2$ (multiplicity $p-r$), its determinant is $|V|=\nu _1^{r}\nu _2^{p-r}$, and its inverse is $V^{-1}=(1/\nu _1) A + (1/\nu _2)(I_p-A)$ when $\nu _1\ne 0$ and $\nu _2\ne 0$.

Proof

This is a simple version of a much more general result, see Abadir and Magnus (2005, Exercises 8.72 and 8.73). $\square $

Now let’s consider an extension where the $p\times p$ matrix V is partitioned into blocks of dimensions $p_1$ and $p_2$ as follows:

$$\begin{aligned} V =\begin{pmatrix} \nu _1J_1 + \nu _2(I_{p_1}-J_1) &{} \gamma \imath _1^{}\imath _2' \\ \gamma \imath _2^{}\imath _1' &{} \xi _1J_2 + \xi _2(I_{p_2}-J_2) \end{pmatrix}, \end{aligned}$$

(32)

where $\imath _1^{}=(1,1,\ldots ,1)'$ and $\imath _2^{}=(1,1,\ldots ,1)'$ have dimensions $p_1$ and $p_2$ respectively, and $J_1=\imath _1^{}\imath _1'/p_1$ and $J_2=\imath _2^{}\imath _2'/p_2$.

In the special case $p_1=p_2=p$ we can write $V= V_1 \otimes J + V_2\otimes (I_p-J)$, where

$$\begin{aligned} V_1=\begin{pmatrix} \nu _1 &{} \gamma p \\ \gamma p &{} \xi _1 \end{pmatrix}, \qquad V_2= \begin{pmatrix} \nu _2 &{} 0 \\ 0 &{} \xi _2 \end{pmatrix}, \qquad J=(1/p)\,\imath \imath ', \end{aligned}$$

which implies that the eigenvalues of V are given by $\nu _2$ ($p-1$ times), $\xi _2$ ($p-1$ times), and the two eigenvalues of $V_1$ (Magnus 1982, Lemma 2.1).

When $p_1\ne p_2$ we cannot write V in terms of Kronecker matrices, but the result is still essentially the same.

Lemma 2

The eigenvalues of V are $\nu _2$ ($p_1-1$ times), $\xi _2$ ($p_2-1$ times) and two additional eigenvalues $\nu _1^*$ and $\xi _1^*$ given by

$$\begin{aligned} \frac{\nu _1+\xi _1}{2} \pm \frac{1}{2}\sqrt{(\nu _1-\xi _1)^2 + 4\gamma ^2 p_1p_2}. \end{aligned}$$

The sum of the eigenvalues is

$$\begin{aligned} {\text {tr}}V= \nu _1 + (p_1-1)\nu _2 + \xi _1 + (p_2-1)\xi _2, \end{aligned}$$

and V is positive semidefinite if and only if $\nu _1$, $\nu _2$, $\xi _1$, and $\xi _2$ are all $\ge 0$ and in addition $\nu _1\xi _1\ge \gamma ^2p_1p_2$.

Proof

We write

$$\begin{aligned} V-\lambda I_p = \begin{pmatrix} \bar{\nu }_1 J_1 + \bar{\nu }_2 (I_{p_1}-J_1) &{} \gamma \imath _1^{}\imath _2' \\ \gamma \imath _2^{}\imath _1' &{} \bar{\xi }_1 J_2 + \bar{\xi }_2 (I_{p_2}-J_2) \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} \bar{\nu }_1=\nu _1-\lambda , \qquad \bar{\nu }_2=\nu _2-\lambda , \qquad \bar{\xi }_1=\xi _1-\lambda , \qquad \bar{\xi }_2=\xi _2-\lambda . \end{aligned}$$

The determinant is

$$\begin{aligned} |V-\lambda I_p|&= |\bar{\nu }_1J_1+\bar{\nu }_2(I_{p_1}-J_1)| \\&\quad \times |\bar{\xi }_1J_2+\bar{\xi }_2(I_{p_2}-J_2) - \gamma ^2\imath _2^{}\imath _1'[\bar{\nu }_1J_1+\bar{\nu }_2(I_{p_1}-J_1)]^{-1}\imath _1^{}\imath _2'|\\&= \bar{\nu }_1 {\bar{\nu }_2}^{p_1-1} \left| \bar{\xi }_1J_2+\bar{\xi }_2(I_{p_2}-J_2) - \gamma ^2\imath _2^{}\imath _1' \left( \frac{1}{\bar{\nu }_1}J_1+\frac{1}{\bar{\nu }_2}(I_{p_1}-J_1)\right) \imath _1^{}\imath _2'\right| \\&=\bar{\nu }_1{\bar{\nu }_2}^{p_1-1} |\bar{\xi }_1 J_2+\bar{\xi }_2 (I_{p_2}-J_2) - (\gamma ^2/\bar{\nu }_1)p_1p_2J_2| \\&=\bar{\nu }_1{\bar{\nu }_2}^{p_1-1} \left| \frac{\bar{\nu }_1\bar{\xi }_1 - \gamma ^2p_1p_2}{\bar{\nu }_1}J_2 + \bar{\xi }_2 (I_{p_2}-J_2)\right| \\&={\bar{\nu }_2}^{p_1-1}{\bar{\xi }_2}^{p_2-1}\left( \bar{\nu }_1\bar{\xi }_1 - \gamma ^2p_1p_2\right) \\&=(\nu _2-\lambda )^{p_1-1}(\xi _2-\lambda )^{p_2-1}\left( (\nu _1-\lambda )(\xi _1-\lambda ) - \gamma ^2p_1p_2\right) . \end{aligned}$$

Hence the eigenvalues of V are $\nu _2$ ($p_1-1$ times), $\xi _2$ ($p_2-1$ times), and the two solutions $\nu _1^*$ and $\xi _1^*$ of the quadratic equation $(\nu _1-\lambda )(\xi _1-\lambda ) - \gamma ^2p_1p_2=0$. Note that $\nu _1^*+\xi _1^*=\nu _1+\xi _1$. In the special case $\gamma =0$ we have $\nu _1^*=\nu _1$ and $\xi _1^*=\xi _1$.

The sum of the eigenvalues is

$$\begin{aligned} (p_1-1)\nu _2 + (p_2-1)\xi _2 + \nu _1 + \xi _1, \end{aligned}$$

and it easy to verify that this equals the trace of V, as of course it should. $\square $

B Stein’s Lemma

Stein’s lemma (Stein 1981) is a rather surprising and strong result. We first consider the univariate, then the multivariate case. The generalization is not straightforward.

Lemma 3

Let $x\sim {\text {N}}(\theta ,1)$ and let $h:\mathfrak {R}\rightarrow \mathfrak {R}$ be an absolutely continuous function with derivative $h'$. Assume that ${\text {E}}|h'(x)|<\infty $. Then,

$$\begin{aligned} {\text {cov}}(h(x),x)={\text {E}}[h(x)(x-\theta )]={\text {E}}[h'(x)]. \end{aligned}$$

Proof

We write

$$\begin{aligned}{}[h(x)\phi (x-\theta )]'&= h'(x)\phi (x-\theta ) + h(x)\phi '(x-\theta )\\&= h'(x)\phi (x-\theta ) - h(x)(x-\theta )\phi (x-\theta ), \end{aligned}$$

where $\phi $ denotes the standard normal density. Integrating gives

$$\begin{aligned} 0=\Bigl .h(x)\phi (x-\theta )\Bigr |_{-\infty }^{\infty } = {\text {E}}[h'(x)]-{\text {E}}[h(x)(x-\theta )]. \end{aligned}$$

$\square $

Note the requirement of absolute continuity, which imposes a smoothness property on h that is stronger than (uniform) continuity, but weaker than continuous differentiability. It guarantees that h is differentiable almost everywhere. In the applications it is important to place minimal restrictions on the function h, for example that it may be kinked.

The univariate version of Stein’s lemma is a powerful result with many nontrivial applications. As a simple example, let $h(x)=x^m$. Then we immediately obtain all moments of the normal distribution through the recursion ${\text {E}}[x^{m+1}]=\theta {\text {E}}[x^{m}] + m{\text {E}}[x^{m-1}]$.

In the multivariate case we have $x\sim {\text {N}}(\theta ,I_p)$ with $p\ge 2$ and we need the concept of ‘almost differentiability’ (in Stein’s terminology). We write $x = (x_i,x_{-i})$ to decompose a point $x\in \mathfrak {R}^p$ in terms of its ith component $x_i$ and all other components $x_{-i}$. Thus, $h(\cdot ,x_{-i})$ refers to h as a function of its ith argument with all other arguments fixed at $x_{-i}$. Then h is ‘almost differentiable’ if for each $i=1,\ldots ,p$ and almost every $x_{-i}\in \mathfrak {R}^{p-1}$ the function $h(\cdot ,x_{-i}):\mathfrak {R}\rightarrow \mathfrak {R}$ is absolutely continuous. An almost differentiable function h has partial derivatives almost everywhere.

Given this multivariate extension of the concept of absolute continuity, Stein’s lemma reads as follows.

Lemma 4

(Stein) Let $x\sim {\text {N}}(\theta ,I_p)$ with $p\ge 2$ and let $h:\mathfrak {R}^p\rightarrow \mathfrak {R}$ be almost differentiable with ${\text {E}}\Vert \nabla h(x)\Vert <\infty $, where $\nabla h(x)$ denotes the gradient of h(x). Then,

$$\begin{aligned} {\text {E}}\left[ h(x)(x-\theta )\right] ={\text {E}}[\nabla h(x)]. \end{aligned}$$

Proof

See Stein (1981, Lemma 2). $\square $

Stein’s result can be generalized straightforwardly to the case where h is a vector function.

Lemma 5

Let $x\sim {\text {N}}(\theta ,I_p)$ with $p\ge 2$ and let $h:\mathfrak {R}^p\rightarrow \mathfrak {R}^q$. If $h_j:\mathfrak {R}^p\rightarrow \mathfrak {R}$ is almost differentiable with ${\text {E}}\Vert \nabla h_j(x)\Vert <\infty $ for all $j=1,\ldots ,q$, then

$$\begin{aligned} {\text {E}}\left[ h(x)(x-\theta )'\right] ={\text {E}}\left[ \frac{\partial h(x)}{\partial x'}\right] . \end{aligned}$$

Proof

This follows from Lemma 4 by considering each row of $h(x)(x-\theta )'$ separately. Then,

$$\begin{aligned} {\text {E}}\left[ h_j(x)(x-\theta )'\right] ={\text {E}}\left[ \frac{\partial h_j(x)}{\partial x'}\right] \end{aligned}$$

for each j and the result follows. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Luca, G., Magnus, J.R. Weak Versus Strong Dominance of Shrinkage Estimators. J. Quant. Econ. 19 (Suppl 1), 239–266 (2021). https://doi.org/10.1007/s40953-021-00270-y

Download citation

Accepted: 27 October 2021
Published: 18 November 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s40953-021-00270-y

Keywords

Mathematics Subject Classification

JEL Codes

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weak Versus Strong Dominance of Shrinkage Estimators

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine learning and the James–Stein estimator

On Shrinkage Estimators and “Effective Degrees of Freedom”

Shrinkage estimation with logarithmic penalties

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

A Some Results Involving Idempotent Matrices

Lemma 1

Proof

Lemma 2

Proof

B Stein’s Lemma

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

JEL Codes

Subscribe and save

Buy Now

Search

Navigation