research-article

Public Access

Fast Newton Hard Thresholding Pursuit for Sparsity Constrained Nonconvex Optimization

Authors:

Quanquan GuAuthors Info & Claims

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 757 - 766

https://doi.org/10.1145/3097983.3098165

Published: 13 August 2017 Publication History

Abstract

We propose a fast Newton hard thresholding pursuit algorithm for sparsity constrained nonconvex optimization. Our proposed algorithm reduces the per-iteration time complexity to linear in the data dimension d compared with cubic time complexity in Newton's method, while preserving faster computational and statistical convergence rates. In particular, we prove that the proposed algorithm converges to the unknown sparse model parameter at a composite rate, namely quadratic at first and linear when it gets close to the true parameter, up to the minimax optimal statistical precision of the underlying model. Thorough experiments on both synthetic and real datasets demonstrate that our algorithm outperforms the state-of-the-art optimization algorithms for sparsity constrained optimization.

References

[1]

Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and Tengyu Ma. Finding approximate local minima for nonconvex optimization in linear time. arXiv preprint arXiv:1611.01146 (2016).

[2]

Naman Agarwal, Brian Bullins, and Elad Hazan. Second Order Stochastic Optimization in Linear Time. arXiv preprint arXiv:1602.03943 (2016).

[3]

Zeyuan Allen-Zhu and Elad Hazan 2016. Variance reduction for faster non-convex optimization. arXiv preprint arXiv:1603.05643 (2016).

[4]

Sohail Bahmani, Bhiksha Raj, and Petros T Boufounos. 2013. Greedy sparsity-constrained optimization. The Journal of Machine Learning Research Vol. 14, 1 (2013), 807--841.

Digital Library

[5]

Christopher M Bishop. 1995. Neural networks for pattern recognition. Oxford university press.

Digital Library

[6]

Thomas Blumensath and Mike E Davies 2009. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, Vol. 27, 3 (2009), 265--274.

[7]

Raghu Bollapragada, Richard Byrd, and Jorge Nocedal. 2016. Exact and Inexact Subsampled Newton Methods for Optimization. arXiv preprint arXiv:1609.08502 (2016).

[8]

Jinghui Chen and Quanquan Gu 2016. Accelerated stochastic block coordinate gradient descent for sparsity constrained nonconvex optimization. In Conference on Uncertainty in Artificial Intelligence.

[9]

Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. 2014. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems. 1646--1654.

Digital Library

[10]

Murat A Erdogdu. 2015. Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma Advances in Neural Information Processing Systems. 1216--1224.

[11]

Murat A Erdogdu and Andrea Montanari 2015. Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems. 3052--3060.

[12]

Jianqing Fan and Runze Li 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, Vol. 96, 456 (2001), 1348--1360.

[13]

Simon Foucart. 2011. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. Vol. 49, 6 (2011), 2543--2563.

Digital Library

[14]

Saeed Ghadimi and Guanghui Lan 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization Vol. 23, 4 (2013), 2341--2368.

Digital Library

[15]

Quanquan Gu, Zhaoran Wang, and Han Liu 2014. Sparse pca with oracle property. In Advances in neural information processing systems. 1529--1537.

[16]

Prateek Jain, Ambuj Tewari, and Purushottam Kar. 2014. On Iterative Hard Thresholding Methods for High-dimensional M-Estimation NIPS. 685--693.

[17]

Ali Jalali, Christopher C. Johnson, and Pradeep K. Ravikumar. 2011. On learning discrete graphical models using greedy methods NIPS. 1935--1943.

[18]

Rie Johnson and Tong Zhang 2013. Accelerating stochastic gradient descent using predictive variance reduction NIPS. 315--323.

[19]

Shimon Kogan, Dimitry Levin, Bryan R Routledge, Jacob S Sagi, and Noah A Smith 2009. Predicting risk from financial reports with regression ACL. Association for Computational Linguistics, 272--280.

[20]

Ken Lang 1995. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning. 331--339.

[21]

Su-In Lee, Honglak Lee, Pieter Abbeel, and Andrew Y Ng. 2006. Efficient l 1 regularized logistic regression. AAAI, Vol. Vol. 6. 401--408.

[22]

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 1, 1 (2007), 2.

Digital Library

[23]

David D Lewis, Yiming Yang, Tony G Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research Vol. 5 (2004), 361--397.

Digital Library

[24]

Xingguo Li, Tuo Zhao, Roman Arora, Han Liu, and Jarvis Haupt 2016. Stochastic variance reduced optimization for nonconvex sparse learning. International Conference on Machine Learning (2016).

Digital Library

[25]

Po-Ling Loh and Martin J Wainwright 2013. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In NIPS. 476--484.

[26]

Stéphane G Mallat and Zhifeng Zhang 1993. Matching pursuits with time-frequency dictionaries. Signal Processing, IEEE Transactions on Vol. 41, 12 (1993), 3397--3415.

Digital Library

[27]

Philipp Moritz, Robert Nishihara, and Michael I Jordan. 2015. A Linearly-Convergent Stochastic L-BFGS Algorithm. arXiv preprint arXiv:1508.02087 (2015).

[28]

Mojmir Mutny. 2016. Stochastic Second-Order Optimization via von Neumann Series. arXiv preprint arXiv:1612.04694 (2016).

[29]

Deanna Needell and Joel A Tropp 2009. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, Vol. 26, 3 (2009), 301--321.

[30]

Sahand Negahban and Martin J Wainwright 2011. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics (2011), 1069--1097.

[31]

Sahand Negahban, Bin Yu, Martin J Wainwright, and Pradeep K Ravikumar 2009. A unified framework for high-dimensional analysis of $ M $-estimators with decomposable regularizers. In NIPS. 1348--1356.

[32]

Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. Vol. 87. Springer Science & Business Media.

Digital Library

[33]

Yurii Nesterov and Boris T Polyak 2006. Cubic regularization of Newton method and its global performance. Mathematical Programming Vol. 108, 1 (2006), 177--205.

Digital Library

[34]

Nam Nguyen, Deanna Needell, and Tina Woolf. 2014. Linear convergence of stochastic iterative greedy algorithms with sparse constraints. arXiv preprint arXiv:1407.0088 (2014).

[35]

Renkun Ni and Quanquan Gu 2016. Optimal statistical and computational rates for one bit matrix completion Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. 426--434.

[36]

Garvesh Raskutti, Martin J Wainwright, and Bin Yu. 2011. Minimax rates of estimation for high-dimensional linear regression over-balls. Information Theory, IEEE Transactions on Vol. 57, 10 (2011), 6976--6994.

Digital Library

[37]

Benjamin Recht. 2011. A simpler approach to matrix completion. The Journal of Machine Learning Research Vol. 12 (2011), 3413--3430.

Digital Library

[38]

Sashank J Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, and Alex Smola 2016. Stochastic variance reduction for nonconvex optimization. arXiv preprint arXiv:1603.06160 (2016).

[39]

Farbod Roosta-Khorasani and Michael W Mahoney 2016. Sub-Sampled Newton Methods I: Globally Convergent Algorithms. arXiv preprint arXiv:1601.04737 (2016).

[40]

Nicolas L Roux, Mark Schmidt, and Francis R Bach. 2012. A stochastic gradient method with an exponential convergence _rate for finite training sets Advances in Neural Information Processing Systems. 2663--2671.

[41]

Shai Shalev-Shwartz, Nathan Srebro, and Tong Zhang. 2010. Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization Vol. 20, 6 (2010), 2807--2832.

Digital Library

[42]

Shai Shalev-Shwartz and Tong Zhang 2013. Stochastic dual coordinate ascent methods for regularized loss. The Journal of Machine Learning Research Vol. 14, 1 (2013), 567--599.

Digital Library

[43]

Jie Shen and Ping Li. 2016. A Tight Bound of Hard Thresholding. arXiv preprint arXiv:1605.01656 (2016).

[44]

Lu Tian, Pan Xu, and Quanquan Gu 2016. Forward backward greedy algorithms for multi-task learning with faster rates Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence. AUAI Press, 735--744.

[45]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.

[46]

Joel A Tropp and Anna C Gilbert 2007. Signal recovery from random measurements via orthogonal matching pursuit. Information Theory, IEEE Transactions on Vol. 53, 12 (2007), 4655--4666.

Digital Library

[47]

Lingxiao Wang, Xiang Ren, and Quanquan Gu 2016. Precision matrix estimation in high dimensional gaussian graphical models with faster rates Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. 177--185.

[48]

Pan Xu and Quanquan Gu 2016. Semiparametric Differential Graph Models. In Advances in Neural Information Processing Systems. 1064--1072.

[49]

Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, and Michael W Mahoney. 2016. Sub-sampled Newton Methods with Non-uniform Sampling Advances in Neural Information Processing Systems. 3000--3008.

[50]

Haishan Ye, Luo Luo, and Zhihua Zhang 2017. A Unifying Framework for Convergence Analysis of Approximate Newton Methods. arXiv preprint arXiv:1702.08124 (2017).

[51]

Xiao-Tong Yuan, Ping Li, and Tong Zhang 2013. Gradient hard thresholding pursuit for sparsity-constrained optimization. arXiv preprint arXiv:1311.5750 (2013).

[52]

Xiao-Tong Yuan and Qingshan Liu 2014. Newton Greedy Pursuit: A Quadratic Approximation Method for Sparsity-Constrained Optimization CVPR. 4122--4129.

[53]

Aston Zhang and Quanquan Gu 2016. Accelerated Stochastic Block Coordinate Descent with Optimal Sampling Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2035--2044.

[54]

C. Zhang 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, Vol. 38, 2 (2010), 894--942.

[55]

Tong Zhang. 2011. Adaptive forward-backward greedy algorithm for learning sparse representations. Information Theory, IEEE Transactions on Vol. 57, 7 (2011), 4689--4708.

Digital Library

[56]

Tong Zhang and others. 2009. Some sharp performance bounds for least squares regression with L1 regularization. The Annals of Statistics Vol. 37, 5A (2009), 2109--2144.

[57]

Rongda Zhu and Quanquan Gu 2015. Towards a lower sample complexity for robust one-bit compressed sensing Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 739--747.

Cited By

Yang YYang CSun SLou SZhu XZhang X(2024)Igniting Advancements in Iron-Making Process Monitoring: Exploring a Novel and Cutting-Edge Joint Sparse-Constrained Data-Dependent Kernel CVA MethodIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.335327073(1-11)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3353270
Li SMa C(2023)Relaxation Quadratic Approximation Greedy Pursuit Method Based on Sparse LearningComputational Methods in Applied Mathematics10.1515/cmam-2023-005024:4(909-920)Online publication date: 4-Oct-2023
https://doi.org/10.1515/cmam-2023-0050
Li SLuo ZChen Y(2023)A Lagrange–Newton algorithm for tensor sparse principal component analysisOptimization10.1080/02331934.2023.2231482(1-19)Online publication date: 12-Jul-2023
https://doi.org/10.1080/02331934.2023.2231482
Show More Cited By

Index Terms

Fast Newton Hard Thresholding Pursuit for Sparsity Constrained Nonconvex Optimization
1. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Nonconvex optimization

Recommendations

Global and quadratic convergence of Newton hard-thresholding pursuit

Algorithms based on the hard thresholding principle have been well studied with sounding theoretical guarantees in the compressed sensing and more general sparsity-constrained optimization. It is widely observed in existing empirical studies that when a ...
Gradient hard thresholding pursuit

Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantee and impressive numerical performance. In this ...
Newton Hard-Thresholding Pursuit for Sparse Linear Complementarity Problem via A New Merit Function

Solutions to the linear complementarity problem (LCP) are naturally sparse in many applications such as bimatrix games and portfolio section problems. Despite that it gives rise to the hardness, sparsity makes optimization faster and enables relatively ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2017

2240 pages

ISBN:9781450348874

DOI:10.1145/3097983

General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

KDD '17

Sponsor:

KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2017

NS, Halifax, Canada

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
823
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)11

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang YYang CSun SLou SZhu XZhang X(2024)Igniting Advancements in Iron-Making Process Monitoring: Exploring a Novel and Cutting-Edge Joint Sparse-Constrained Data-Dependent Kernel CVA MethodIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.335327073(1-11)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3353270
Li SMa C(2023)Relaxation Quadratic Approximation Greedy Pursuit Method Based on Sparse LearningComputational Methods in Applied Mathematics10.1515/cmam-2023-005024:4(909-920)Online publication date: 4-Oct-2023
https://doi.org/10.1515/cmam-2023-0050
Li SLuo ZChen Y(2023)A Lagrange–Newton algorithm for tensor sparse principal component analysisOptimization10.1080/02331934.2023.2231482(1-19)Online publication date: 12-Jul-2023
https://doi.org/10.1080/02331934.2023.2231482
Tang YXue YDu X(2022)Seismic Data Reconstruction Based On Online Dictionary Learning and NHTPProceedings of the 2022 5th International Conference on Telecommunications and Communication Engineering10.1145/3577065.3577113(266-271)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.1145/3577065.3577113
Shang FWei BLiu HLiu YZhou PGong M(2022)Efficient Gradient Support Pursuit With Less Hard Thresholding for Cardinality-Constrained LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.308780533:12(7806-7817)Online publication date: Dec-2022
https://doi.org/10.1109/TNNLS.2021.3087805
Zhao CXiu NQi HLuo Z(2021)A Lagrange–Newton algorithm for sparse nonlinear programmingMathematical Programming10.1007/s10107-021-01719-xOnline publication date: 21-Oct-2021
https://doi.org/10.1007/s10107-021-01719-x
Shang FWei BLiu YLiu HWang SJiao L(2020)Stochastic Recursive Gradient Support Pursuit and Its Sparse Representation ApplicationsSensors10.3390/s2017490220:17(4902)Online publication date: 30-Aug-2020
https://doi.org/10.3390/s20174902
Jie FWang CChen FLi LWu X(2020)A Framework for Subgraph Detection in Interdependent Networks via Graph Block-Structured OptimizationIEEE Access10.1109/ACCESS.2020.30184978(157800-157818)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3018497
Zhu ZMa JZhang B(2020)A new conjugate gradient hard thresholding pursuit algorithm for sparse signal recoveryComputational and Applied Mathematics10.1007/s40314-020-01313-539:4Online publication date: 11-Sep-2020
https://doi.org/10.1007/s40314-020-01313-5
Liu XWei BShang FLiu HZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Loopless Semi-Stochastic Gradient Descent with Less Hard Thresholding for Sparse LearningProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358021(881-890)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358021
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents