Fast and positive definite estimation of large covariance matrix for high-dimensional data analysis

F Wen, L Chu, R Ying, P Liu - IEEE Transactions on Big Data, 2019 - ieeexplore.ieee.org
F Wen, L Chu, R Ying, P Liu
IEEE Transactions on Big Data, 2019ieeexplore.ieee.org
Large covariance matrix estimation is a fundamental problem in many high-dimensional
statistical analysis applications arises in economics and finance, bioinformatics, social
networks, and climate studies. To achieve reliable estimation in the high-dimensional
setting, an effective technique is to exploit the intrinsic structure of the covariance matrix, eg,
by sparsity regularization. For sparsity regularization, the lasso penalty is popular and
convenient due to its convexity but has a bias problem. A nonconvex penalty can alleviate …
Large covariance matrix estimation is a fundamental problem in many high-dimensional statistical analysis applications arises in economics and finance, bioinformatics, social networks, and climate studies. To achieve reliable estimation in the high-dimensional setting, an effective technique is to exploit the intrinsic structure of the covariance matrix, e.g., by sparsity regularization. For sparsity regularization, the lasso penalty is popular and convenient due to its convexity but has a bias problem. A nonconvex penalty can alleviate the bias problem, but the involved nonconvex problem under positive-definiteness constraint is generally difficult to solve. In this work, we propose an efficient algorithm for positive-definiteness constrained covariance estimation by combining the iteratively reweighted method and the alternative direction method of multipliers (ADMM). The iterative reweighting scheme can achieve better sparsity regularization than the lasso method. Meanwhile, the proposed algorithm solves convex subproblems in each iteration and hence is easy to converge. The efficiency and effectiveness of the proposed algorithm has been demonstrated by both simulation study and a gene clustering example for tumor tissues. Code for reproducing the results is available at https://github.com/FWen/pdlc.git.
ieeexplore.ieee.org