Abstract
We consider the estimation of the mean of a multivariate normal distribution with known variance. Most studies consider the risk of competing estimators, that is the trace of the mean squared error matrix. In contrast we consider the whole mean squared error matrix, in particular its eigenvalues. We prove that there are only two distinct eigenvalues and apply our findings to the James–Stein and the Thompson class of estimators. It turns out that the famous Stein paradox is no longer a paradox when we consider the whole mean squared error matrix rather than only its trace.
Similar content being viewed by others
Notes
The maximum regret 0.1815 is much smaller than the maximum regret 0.4251 for the same estimator reported in Magnus (2002, p. 230). Apparently a computational or typographical error.
References
Abadir, K.M., and J.R. Magnus. 2005. Matrix algebra. Cambridge University Press.
Baranchik, A.J. 1964. Multiple regression and estimation of the mean of a multivariate normal distribution. Technical report, No. 51, Department of Statistics, Stanford University.
Baranchik, A.J. 1970. A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics 41: 642–645.
Bock, M.E. 1975. Minimax estimators of the mean of a multivariate normal distribution. The Annals of Statistics 3: 209–218.
Candès, E.J., C.A. Sing-Long, and J.D. Trzasko. 2013. Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Transactions on Signal Processing 61 (19): 4643–4657.
Casella, G. 1990. Estimators with nondecreasing risk: application of a chi-squared identity. Statistics and Probability Letters 10: 107–109.
Efron, B.E., and C. Morris. 1972. Limiting the risk of Bayes and empirical Bayes estimators-part II: the empirical Bayes case. Journal of the American Statistical Association 67: 130–139.
Efron, B.E., and C. Morris. 1976. Families of minimax estimators of the mean of a multivariate normal distribution. Annals of Statistics 4: 11–21.
Farebrother, R.W. 1975. The minimum mean square error linear estimator and ridge regression. Technometrics 17: 127–128.
Hansen, B.E. 2015. Shrinkage efficiency bounds. Econometric Theory 31: 860–879.
Hansen, B.E. 2016. The risk of James-Stein and Lasso shrinkage. Econometric Reviews 35: 1456–1470.
Hoffmann, K. 2000. Stein estimation—a review. Statistical Papers 41: 127–158.
James, W., and Stein, C.M. 1961. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 361–379. University of California Press.
Johnstone, I.M. (2019). Gaussian estimation: sequence and wavelet models. Draft version, available from statweb.stanford.edu.
Lehmann, E.L., and G. Casella. 1998. Theory of point estimation, 2nd ed. Berlim: Springer.
Magnus, J.R. 1982. Multivariate error components analysis of linear and nonlinear regression models by maximum likelihood. Journal of Econometrics 19: 239–285.
Magnus, J.R. 2002. Estimation of the mean of a univariate normal distribution with known variance. Econometrics Journal 5: 225–236.
Magnus, J.R., and G. De Luca. 2016. Weighted-average least squares: a review. Journal of Economic Surveys 30: 117–148.
Mikkelsen, F.R., and N.R. Hansen. 2018. Degrees of freedom for piecewise Lipschitz estimators. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 54: 819–841.
Saleh, A.K.M.E. 2006. Theory of preliminary test and Stein-type estimation with applications. Hoboken: Wiley.
Shao, P.Y.-S., and W.E. Strawderman. 1994. Improving on the James-Stein positive-part estimator. Annals of Statistics 22: 1517–1538.
Stein, C.M. 1956. Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 197–206. University of California Press.
Stein, C.M. 1981. Estimation of the mean of a multivariate normal distribution. The Annals of Statistics 9: 1135–1151.
Strawderman, W.E., and A. Cohen. 1971. Admissibility of estimators of the mean vector of a multivariate normal distribution with quadratic loss. Annals of Mathematical Statistics 42: 270–96.
Theil, H. 1971. Principles of econometrics. Hoboken: Wiley.
Thompson, J.R. 1968. Some shrinkage techniques for estimating the mean. Journal of the American Statistical Association 63: 113–122.
Thompson, J.R. 1989. Empirical model building. Hoboken: Wiley.
Tibshirani, R.J. 2015. Degrees of freedom and model search. Statistica Sinica 25: 1265–1296.
Van der Vaart, A.W. 1998. Asymptotic statistics. Cambridge: Cambridge University Press.
Xu, H., and A. Namba. 2018. MSE performance of the weighted average estimators consisting of shrinkage estimators when each individual regression coefficient is estimated. Communications in Statistics Theory and Methods. https://doi.org/10.1080/03610926.2018.1475569.
Acknowledgements
We are grateful to the Guest Editors of this special issue and to two referees for comments and suggestions. We are especially grateful to Akio Namba for providing the key to proving Proposition 1. Without his help this result would have remained a conjecture. Giuseppe De Luca acknowledges financial support from the MIUR PRIN PRJ-0324.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
A Some Results Involving Idempotent Matrices
Our first result is not new.
Lemma 1
Let A be a symmetric idempotent \(p\times p\) matrix of rank \(r(A)=r\). Then, \(r(I_p-A)=p-r\) and \(A(I_p-A)=0\). Next, let
Then the eigenvalues of V are \(\nu _1\) (multiplicity r) and \(\nu _2\) (multiplicity \(p-r\)), its determinant is \(|V|=\nu _1^{r}\nu _2^{p-r}\), and its inverse is \(V^{-1}=(1/\nu _1) A + (1/\nu _2)(I_p-A)\) when \(\nu _1\ne 0\) and \(\nu _2\ne 0\).
Proof
This is a simple version of a much more general result, see Abadir and Magnus (2005, Exercises 8.72 and 8.73). \(\square \)
Now let’s consider an extension where the \(p\times p\) matrix V is partitioned into blocks of dimensions \(p_1\) and \(p_2\) as follows:
where \(\imath _1^{}=(1,1,\ldots ,1)'\) and \(\imath _2^{}=(1,1,\ldots ,1)'\) have dimensions \(p_1\) and \(p_2\) respectively, and \(J_1=\imath _1^{}\imath _1'/p_1\) and \(J_2=\imath _2^{}\imath _2'/p_2\).
In the special case \(p_1=p_2=p\) we can write \(V= V_1 \otimes J + V_2\otimes (I_p-J)\), where
which implies that the eigenvalues of V are given by \(\nu _2\) (\(p-1\) times), \(\xi _2\) (\(p-1\) times), and the two eigenvalues of \(V_1\) (Magnus 1982, Lemma 2.1).
When \(p_1\ne p_2\) we cannot write V in terms of Kronecker matrices, but the result is still essentially the same.
Lemma 2
The eigenvalues of V are \(\nu _2\) (\(p_1-1\) times), \(\xi _2\) (\(p_2-1\) times) and two additional eigenvalues \(\nu _1^*\) and \(\xi _1^*\) given by
The sum of the eigenvalues is
and V is positive semidefinite if and only if \(\nu _1\), \(\nu _2\), \(\xi _1\), and \(\xi _2\) are all \(\ge 0\) and in addition \(\nu _1\xi _1\ge \gamma ^2p_1p_2\).
Proof
We write
where
The determinant is
Hence the eigenvalues of V are \(\nu _2\) (\(p_1-1\) times), \(\xi _2\) (\(p_2-1\) times), and the two solutions \(\nu _1^*\) and \(\xi _1^*\) of the quadratic equation \((\nu _1-\lambda )(\xi _1-\lambda ) - \gamma ^2p_1p_2=0\). Note that \(\nu _1^*+\xi _1^*=\nu _1+\xi _1\). In the special case \(\gamma =0\) we have \(\nu _1^*=\nu _1\) and \(\xi _1^*=\xi _1\).
The sum of the eigenvalues is
and it easy to verify that this equals the trace of V, as of course it should. \(\square \)
B Stein’s Lemma
Stein’s lemma (Stein 1981) is a rather surprising and strong result. We first consider the univariate, then the multivariate case. The generalization is not straightforward.
Lemma 3
Let \(x\sim {\text {N}}(\theta ,1)\) and let \(h:\mathfrak {R}\rightarrow \mathfrak {R}\) be an absolutely continuous function with derivative \(h'\). Assume that \({\text {E}}|h'(x)|<\infty \). Then,
Proof
We write
where \(\phi \) denotes the standard normal density. Integrating gives
\(\square \)
Note the requirement of absolute continuity, which imposes a smoothness property on h that is stronger than (uniform) continuity, but weaker than continuous differentiability. It guarantees that h is differentiable almost everywhere. In the applications it is important to place minimal restrictions on the function h, for example that it may be kinked.
The univariate version of Stein’s lemma is a powerful result with many nontrivial applications. As a simple example, let \(h(x)=x^m\). Then we immediately obtain all moments of the normal distribution through the recursion \({\text {E}}[x^{m+1}]=\theta {\text {E}}[x^{m}] + m{\text {E}}[x^{m-1}]\).
In the multivariate case we have \(x\sim {\text {N}}(\theta ,I_p)\) with \(p\ge 2\) and we need the concept of ‘almost differentiability’ (in Stein’s terminology). We write \(x = (x_i,x_{-i})\) to decompose a point \(x\in \mathfrak {R}^p\) in terms of its ith component \(x_i\) and all other components \(x_{-i}\). Thus, \(h(\cdot ,x_{-i})\) refers to h as a function of its ith argument with all other arguments fixed at \(x_{-i}\). Then h is ‘almost differentiable’ if for each \(i=1,\ldots ,p\) and almost every \(x_{-i}\in \mathfrak {R}^{p-1}\) the function \(h(\cdot ,x_{-i}):\mathfrak {R}\rightarrow \mathfrak {R}\) is absolutely continuous. An almost differentiable function h has partial derivatives almost everywhere.
Given this multivariate extension of the concept of absolute continuity, Stein’s lemma reads as follows.
Lemma 4
(Stein) Let \(x\sim {\text {N}}(\theta ,I_p)\) with \(p\ge 2\) and let \(h:\mathfrak {R}^p\rightarrow \mathfrak {R}\) be almost differentiable with \({\text {E}}\Vert \nabla h(x)\Vert <\infty \), where \(\nabla h(x)\) denotes the gradient of h(x). Then,
Proof
See Stein (1981, Lemma 2). \(\square \)
Stein’s result can be generalized straightforwardly to the case where h is a vector function.
Lemma 5
Let \(x\sim {\text {N}}(\theta ,I_p)\) with \(p\ge 2\) and let \(h:\mathfrak {R}^p\rightarrow \mathfrak {R}^q\). If \(h_j:\mathfrak {R}^p\rightarrow \mathfrak {R}\) is almost differentiable with \({\text {E}}\Vert \nabla h_j(x)\Vert <\infty \) for all \(j=1,\ldots ,q\), then
Proof
This follows from Lemma 4 by considering each row of \(h(x)(x-\theta )'\) separately. Then,
for each j and the result follows. \(\square \)
Rights and permissions
About this article
Cite this article
De Luca, G., Magnus, J.R. Weak Versus Strong Dominance of Shrinkage Estimators. J. Quant. Econ. 19 (Suppl 1), 239–266 (2021). https://doi.org/10.1007/s40953-021-00270-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40953-021-00270-y