High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Loh, Po-Ling; Wainwright, Martin J.

doi:10.1214/12-AOS1018

Mathematics > Statistics Theory

arXiv:1109.3714 (math)

[Submitted on 16 Sep 2011 (v1), last revised 25 Sep 2012 (this version, v4)]

Title:High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Authors:Po-Ling Loh, Martin J. Wainwright

View PDF

Abstract:Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings.

Comments:	Published in at this http URL the Annals of Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)
Subjects:	Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)
Report number:	IMS-AOS-AOS1018
Cite as:	arXiv:1109.3714 [math.ST]
	(or arXiv:1109.3714v4 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.1109.3714
Journal reference:	Annals of Statistics 2012, Vol. 40, No. 3, 1637-1664
Related DOI:	https://doi.org/10.1214/12-AOS1018

Submission history

From: Po-Ling Loh [view email] [via VTEX proxy]
[v1] Fri, 16 Sep 2011 20:02:47 UTC (218 KB)
[v2] Thu, 10 May 2012 23:22:28 UTC (225 KB)
[v3] Thu, 23 Aug 2012 17:04:24 UTC (224 KB)
[v4] Tue, 25 Sep 2012 06:58:01 UTC (580 KB)

Mathematics > Statistics Theory

Title:High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators