Best linear estimation via minimization of relative mean squared error

Su, Lin; Bondell, Howard D.

doi:10.1007/s11222-017-9792-0

Best linear estimation via minimization of relative mean squared error

Published: 30 November 2017

Volume 29, pages 33–42, (2019)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

746 Accesses
Explore all metrics

Abstract

We propose methods to construct a biased linear estimator for linear regression which optimizes the relative mean squared error (MSE). Although there have been proposed biased estimators which are shown to have smaller MSE than the ordinary least squares estimator, our construction is based on the minimization of relative MSE directly. The performance of the proposed methods is illustrated by a simulation study and a real data example. The results show that our methods can improve on MSE, particularly when there exists correlation among the predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On a Biased Prediction Based on Optimal Mean Square Error Criterion

Article 30 September 2024

Experimental and analytic comparison of the accuracy of different estimates of parameters in a linear regression model

Article 17 October 2017

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Article 04 October 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Akaike, H.: A new look at the statistical model identification. IEEE Trans. Automat. Control 19(6), 716–723 (1974)
Article MathSciNet MATH Google Scholar
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)
Article MathSciNet MATH Google Scholar
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In Blondel, V., Boyd, S., Kimura, H., (eds.) Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp. 95–110. Springer (2008). http://stanford.edu/~boyd/graph_dcp.html
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. (2014) http://cvxr.com/cvx
Hirst, J.D., King, R.D., Sternberg, M.J.: Quantitative structure-activity relationships by neural networks and inductive logic programming. i. the inhibition of dihydrofolate reductase by pyrimidines. J. Comput. Aided Mol. Des. 8(4), 405–420 (1994)
Article Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article MATH Google Scholar
Janson, L., Fithian, W., Hastie, T.J.: Effective degrees of freedom: a flawed metaphor. Biometrika 102(2), 479–485 (2015)
Article MathSciNet MATH Google Scholar
Liu, K.: A new class of blased estimate in linear regression. Commun. Stat. Theory Methods 22(2), 393–402 (1993)
Article Google Scholar
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

NC State University, Raleigh, NC, USA
Lin Su & Howard D. Bondell

Authors

Lin Su
View author publications
You can also search for this author in PubMed Google Scholar
Howard D. Bondell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Howard D. Bondell.

Appendices

Appendix A Proof of Theorem 1

Let ${{{\varvec{x}}}}_i$ and ${{{\varvec{m}}}}_i$ denote the ith row of ${{\varvec{X}}}$ and ${{\varvec{M}}}$, respectively. It is clear that when $\tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}\le c'$, $\mathrm{{Tr}}(\varvec{MM^T})$ can reach its unrestricted minimum 0 at $\hat{{{\varvec{M}}}}\,=\,\varvec{0}$. So as long as $\tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}\le c'$, $\hat{{{\varvec{M}}}}\,=\,\varvec{0}$.

If $c'=0$, the problem becomes to minimize ${{{\varvec{m}}}}_i^T{{{\varvec{m}}}}_i$ with respect to ${{\varvec{b}}}^T{{{\varvec{m}}}}_i=\tilde{\beta }_i$ for each $i=1, 2,\ldots ,p$, whose solution is ${{{\varvec{m}}}}_i = \frac{\tilde{\beta }_i}{{{\varvec{b}}}^T{{\varvec{b}}}}{{\varvec{b}}}$ by the property of Moore-Penrose pseudoinverse. Therefore, $\hat{{{\varvec{M}}}}\,=\,({{\varvec{b}}}^T{{\varvec{b}}})^{-1} {{\varvec{b}}}^T\otimes \tilde{\varvec{\beta }}$.

Now consider the situation that $\tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}> c'>0$. Note that, if we increase the upper bound for the constraint, the objective function minimum value will be non-increasing since we expand the feasible region. Let $c''$ denote the value such that the constraint equals $c''$ at the minimizer for a given bound, $c'$. Thus, it follows that $c''\le c'$ and the objective function is constant for any choice of bound between $c''$ and $c'$. Therefore, without loss of generality, assume that the solution is obtained on the boundary. Then, the optimization problem described in (6) is equivalent to minimizing $L({{\varvec{M}}},\lambda )$, which is defined as:

$$\begin{aligned} L({{\varvec{M}}},\lambda )=\sum _{i}^{p}{{{\varvec{m}}}}_i^T {{{\varvec{m}}}}_i+\lambda \left[ \sum _{i=1}^{p}({{\varvec{b}}}^T {{{\varvec{m}}}}_i-\tilde{\beta }_i)^2-c'\right] . \end{aligned}$$

Taking the derivative of $L({{\varvec{M}}},\lambda )$ with respect to each ${{{\varvec{m}}}}_i$ ($i=1, 2, \ldots , p$) and setting them to 0, we have

$$\begin{aligned} {{{\varvec{m}}}}_i=-\lambda ({{{\varvec{b}}^{{\varvec{Tm}}}}}_i -\tilde{\beta }_i){{\varvec{b}}}, \qquad i=1,2,\ldots ,p. \end{aligned}$$

(9)

From (9) and the constraint $\sum _{i=1}^{p}({{\varvec{b}}}^T\varvec{m}_i-\tilde{\beta }_i)^2=c'$, we can get that $\sum _{i=1}^{p}{{\varvec{m}}}^T{{\varvec{m}}}_i =\lambda ^2\sum _{i=1}^{p}({{\varvec{b}}}^T{{\varvec{m}}}_i -\tilde{\beta }_i)^2{{\varvec{b}}}^T{{\varvec{b}}} =\lambda ^2c'{{\varvec{b}}}^T{{\varvec{b}}}$. This implies that $\lambda $ cannot be a constant independent of $c'$, otherwise the objective function minimum value $\sum _{i=1}^{p}{{\varvec{m}}}^T{{\varvec{m}}}_i$ would be a strictly increasing function of $c'$. Since $\lambda $ is not a constant, we have $\lambda {{\varvec{b}}}^T{{\varvec{b}}} + 1 \ne 0$, in particular. Multiplying both sides of (9) by ${{\varvec{b}}}^T$, and rearranging terms, we obtain that

$$\begin{aligned} {{\varvec{b}}}^T{{\varvec{m}}}_i=\frac{\lambda {{\varvec{b}}}^T{{\varvec{b}}}\tilde{\beta }_i}{\lambda {{\varvec{b}}}^T {{\varvec{b}}}\,+\,1}. \end{aligned}$$

(10)

Plug (10) into the constraint $\sum _{i=1}^{p}({{\varvec{b}}}^T{{\varvec{m}}}_i-\tilde{\beta }_i)^2=c'$, and we can get

$$\begin{aligned} \lambda =\frac{-1\pm \sqrt{\sum _{i=1}^{p}\tilde{\beta }_i^2/c'}}{{{\varvec{b}}}^T{{\varvec{b}}}}. \end{aligned}$$

(11)

From (9) and (10), we have

$$\begin{aligned} {{\varvec{m}}}_i^T{{\varvec{m}}}_i&=\,\frac{\lambda ^2{{\varvec{b}}}^T{{\varvec{b}}}\tilde{\beta }_i^2}{(\lambda {{\varvec{b}}}^T{{\varvec{b}}}\,+\,1)^2}. \end{aligned}$$

(12)

For either choice of $\lambda $, the denominator of (12) is $\sum _{i=1}^{p}\tilde{\beta }_i^2/c'$, and we get $\sum _{i=1}^{p}{{\varvec{m}}}_i^T{{\varvec{m}}}_i=\lambda ^2c'{{\varvec{b}}}^T{{\varvec{b}}}$. We know that $\sum _{i=1}^{p}{{\varvec{m}}}_i^T{{\varvec{m}}}_i$ must be non-increasing in $c'$. Hence, $\lambda ^2$ must be non-increasing in $c'$. Combining (11) and (12), we get $\mathrm{{Tr}}(\varvec{MM}^T)=\sum _{i=1}^{p}{{\varvec{m}}}_i^T {{\varvec{m}}}_i=\frac{c'(-1\pm \sqrt{\sum _{i=1}^{p} \tilde{\beta }_i^2/c'})^2}{{{\varvec{b}}}^T{{\varvec{b}}}}$. Now, it can be shown directly that choosing $\lambda =\frac{-1+\sqrt{\sum _{i=1}^{p}\tilde{\beta }_i^2/c'}}{{{\varvec{b}}}^T{{\varvec{b}}}}$ is the correct solution, as it is the only one that makes $\mathrm{{Tr}}(\varvec{MM}^T)$ a strictly decreasing function of $c'$ for $0<c'<\tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}$. This strict monotonicity also proves that the optimal value is actually obtained on the boundary of the constraint for any $0<c'<\tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}$. Based on (9), $\hat{{{\varvec{m}}}}_i=(1/\lambda {{\varvec{I}}}_n \,+\,{{\varvec{b}}}{{\varvec{b}}}^T)^{-1}{{\varvec{b}}}\tilde{\beta }_i$ for all $i=1, 2, \ldots , p$, which is equivalent to solving a Ridge regression with single response $\tilde{\beta }_i$, covariate matrix ${{\varvec{b}}}^T$ and tuning parameter set at $1/\lambda $. This further shows that the solution of ${{\varvec{M}}}$ is unique.

Appendix B Proof of Proposition 1

In order to prove Proposition 1, we require the following two lemmas, which we state here for completeness.

Lemma 1

For all symmetric $p\times p$ matrix $\varvec{A}$ and $\forall \varvec{x}\in \mathbb {R}^p$, if $\varvec{x}\ne \varvec{0}$, then,

$$\begin{aligned} \lambda _p=\underset{\varvec{x}}{{\min }} \frac{\varvec{x}^T\varvec{Ax}}{\varvec{x}^T \varvec{x}}\le \frac{\varvec{x}^T\varvec{Ax}}{\varvec{x}^T\varvec{x}}\le \underset{\varvec{x}}{{\max }}\frac{\varvec{x}^T\varvec{Ax}}{\varvec{x}^T\varvec{x}}\,=\,\lambda _1, \end{aligned}$$

where $\lambda _1\ge \cdots \ge \lambda _p$ are the ordered eigenvalues of $\varvec{A}$.

Lemma 2

(Schur Lemma) The matrix $\begin{bmatrix} {{{\varvec{A}}}}&{{{\varvec{B}}}}\\ {{{\varvec{C}}}}&{{{\varvec{D}}}} \end{bmatrix}$ is n.n.d. iff the Schur complement of ${{{\varvec{D}}}}$, ${{{\varvec{A-BD}}^{-1}{{\varvec{C}}}}}$, is n.n.d.

Now to prove Proposition 1. By Lemma 1, $\underset{\varvec{\beta }\in \mathbb {R}^p}{\text {max}} \frac{\varvec{\beta }^T(\varvec{MX}\,-\,{{\varvec{I}}}_p)^T ({{{\varvec{MX}}}}\,-\,{{\varvec{I}}}_p)\varvec{\beta }}{\varvec{\beta }^T\varvec{\beta }}$ is the largest eigenvalue of matrix $({{{\varvec{MX}}}}\,-\,{{\varvec{I}}}_p)^T({{{\varvec{MX}}}}\,-\,{{\varvec{I}}}_p)$, noted as $\lambda _1$. Hence, problem (7) is equivalent to

The last equivalence is based on Lemma 2.

Appendix C Proof of Theorem 2

Let ${{{\varvec{x}}}}_j^*$ denote the jth column of ${{\varvec{X}}}$. From Theorem 1, when $c' \ge \tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}$, it follows that $\mathrm{{Tr}}({{\varvec{X}}}\hat{{{\varvec{M}}}})=0$, since $\hat{{{\varvec{M}}}}\,=\,\varvec{0}$. When $c'=0$, $\mathrm{{Tr}}({{\varvec{X}}}\hat{{{\varvec{M}}}})=\sum _{i=1}^{p}({{{\varvec{x}}}}_i^*)^T\hat{{{\varvec{m}}}}_i=[\sum _{i=1}^{p}(\varvec{x}_i^*)^T\tilde{\beta }_i](\frac{{{\varvec{b}}}}{{{\varvec{b}}}^T{{\varvec{b}}}})=1$. When $0< c' < \tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}$, each row of $\hat{{{\varvec{M}}}}$ satisfies (9) and (10). If we multiply $({{{\varvec{x}}}}_i^*)^T$ to both sides of (9) and by (10), we have

$$\begin{aligned} (\varvec{x}^*_i)^T\hat{{{\varvec{m}}}}_i =\frac{\lambda }{\lambda {{\varvec{b}}}^T{{\varvec{b}}}\,+\,1} \tilde{\beta }_i(\varvec{x}^*_i)^T{{\varvec{b}}}. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathrm{{Tr}}({{\varvec{X}}}\hat{{{\varvec{M}}}})=\sum _{i}^{p} (\varvec{x}_i^*)^T\hat{{{\varvec{m}}}}_i&=\frac{\lambda }{\lambda {{\varvec{b}}}^T{{\varvec{b}}}\,+\,1} \sum _{i=1}^{p}(\varvec{x}^*_i)^T\tilde{\beta }_i{{\varvec{b}}}\\&=\,\frac{\lambda }{\lambda {{\varvec{b}}}^T{{\varvec{b}}}\,+\,1} {{\varvec{b}}}^T{{\varvec{b}}}\\&= \,1-\frac{1}{\lambda {{\varvec{b}}}^T{{\varvec{b}}}\,+\,1}. \end{aligned}$$

Because ${{\varvec{b}}}\,=\,{{\varvec{X}}}\tilde{\varvec{\beta }}$ is nonzero, ${{\varvec{b}}}^T{{\varvec{b}}}$ is positive. We have already shown in the proof of Theorem 1 that $\lambda =\frac{-1+ \sqrt{\sum _{i=1}^{p}\tilde{\beta }_i^2/c'}}{{{\varvec{b}}}^T\varvec{b}}$, which is a strictly decreasing function of $c'$. So $\text {Tr}({{\varvec{X}}}\hat{{{\varvec{M}}}})$ is a strictly decreasing function of $c'$ when $0< c' < \tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}$. When $c'\rightarrow 0$, $\text {Tr}({{\varvec{X}}}\hat{{{\varvec{M}}}})\rightarrow 1$, and as $c'\rightarrow \tilde{\varvec{\beta }}^T\tilde{\varvec{\beta }}$, $\text {Tr}({{\varvec{X}}}\hat{{{\varvec{M}}}})\rightarrow 0$. Therefore, the statement in the theorem holds.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, L., Bondell, H.D. Best linear estimation via minimization of relative mean squared error. Stat Comput 29, 33–42 (2019). https://doi.org/10.1007/s11222-017-9792-0

Download citation

Received: 30 March 2017
Accepted: 20 November 2017
Published: 30 November 2017
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11222-017-9792-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Best linear estimation via minimization of relative mean squared error

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On a Biased Prediction Based on Optimal Mean Square Error Criterion

Experimental and analytic comparison of the accuracy of different estimates of parameters in a linear regression model

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A Proof of Theorem 1

Appendix B Proof of Proposition 1

Lemma 1

Lemma 2

Appendix C Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Best linear estimation via minimization of relative mean squared error

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On a Biased Prediction Based on Optimal Mean Square Error Criterion

Experimental and analytic comparison of the accuracy of different estimates of parameters in a linear regression model

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A Proof of Theorem 1

Appendix B Proof of Proposition 1

Lemma 1

Lemma 2

Appendix C Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation