Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3097983.3098165acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Fast Newton Hard Thresholding Pursuit for Sparsity Constrained Nonconvex Optimization

Published: 13 August 2017 Publication History

Abstract

We propose a fast Newton hard thresholding pursuit algorithm for sparsity constrained nonconvex optimization. Our proposed algorithm reduces the per-iteration time complexity to linear in the data dimension d compared with cubic time complexity in Newton's method, while preserving faster computational and statistical convergence rates. In particular, we prove that the proposed algorithm converges to the unknown sparse model parameter at a composite rate, namely quadratic at first and linear when it gets close to the true parameter, up to the minimax optimal statistical precision of the underlying model. Thorough experiments on both synthetic and real datasets demonstrate that our algorithm outperforms the state-of-the-art optimization algorithms for sparsity constrained optimization.

References

[1]
Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and Tengyu Ma. Finding approximate local minima for nonconvex optimization in linear time. arXiv preprint arXiv:1611.01146 (2016).
[2]
Naman Agarwal, Brian Bullins, and Elad Hazan. Second Order Stochastic Optimization in Linear Time. arXiv preprint arXiv:1602.03943 (2016).
[3]
Zeyuan Allen-Zhu and Elad Hazan 2016. Variance reduction for faster non-convex optimization. arXiv preprint arXiv:1603.05643 (2016).
[4]
Sohail Bahmani, Bhiksha Raj, and Petros T Boufounos. 2013. Greedy sparsity-constrained optimization. The Journal of Machine Learning Research Vol. 14, 1 (2013), 807--841.
[5]
Christopher M Bishop. 1995. Neural networks for pattern recognition. Oxford university press.
[6]
Thomas Blumensath and Mike E Davies 2009. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, Vol. 27, 3 (2009), 265--274.
[7]
Raghu Bollapragada, Richard Byrd, and Jorge Nocedal. 2016. Exact and Inexact Subsampled Newton Methods for Optimization. arXiv preprint arXiv:1609.08502 (2016).
[8]
Jinghui Chen and Quanquan Gu 2016. Accelerated stochastic block coordinate gradient descent for sparsity constrained nonconvex optimization. In Conference on Uncertainty in Artificial Intelligence.
[9]
Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. 2014. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems. 1646--1654.
[10]
Murat A Erdogdu. 2015. Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma Advances in Neural Information Processing Systems. 1216--1224.
[11]
Murat A Erdogdu and Andrea Montanari 2015. Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems. 3052--3060.
[12]
Jianqing Fan and Runze Li 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, Vol. 96, 456 (2001), 1348--1360.
[13]
Simon Foucart. 2011. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal. Vol. 49, 6 (2011), 2543--2563.
[14]
Saeed Ghadimi and Guanghui Lan 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization Vol. 23, 4 (2013), 2341--2368.
[15]
Quanquan Gu, Zhaoran Wang, and Han Liu 2014. Sparse pca with oracle property. In Advances in neural information processing systems. 1529--1537.
[16]
Prateek Jain, Ambuj Tewari, and Purushottam Kar. 2014. On Iterative Hard Thresholding Methods for High-dimensional M-Estimation NIPS. 685--693.
[17]
Ali Jalali, Christopher C. Johnson, and Pradeep K. Ravikumar. 2011. On learning discrete graphical models using greedy methods NIPS. 1935--1943.
[18]
Rie Johnson and Tong Zhang 2013. Accelerating stochastic gradient descent using predictive variance reduction NIPS. 315--323.
[19]
Shimon Kogan, Dimitry Levin, Bryan R Routledge, Jacob S Sagi, and Noah A Smith 2009. Predicting risk from financial reports with regression ACL. Association for Computational Linguistics, 272--280.
[20]
Ken Lang 1995. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning. 331--339.
[21]
Su-In Lee, Honglak Lee, Pieter Abbeel, and Andrew Y Ng. 2006. Efficient l 1 regularized logistic regression. AAAI, Vol. Vol. 6. 401--408.
[22]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 1, 1 (2007), 2.
[23]
David D Lewis, Yiming Yang, Tony G Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research Vol. 5 (2004), 361--397.
[24]
Xingguo Li, Tuo Zhao, Roman Arora, Han Liu, and Jarvis Haupt 2016. Stochastic variance reduced optimization for nonconvex sparse learning. International Conference on Machine Learning (2016).
[25]
Po-Ling Loh and Martin J Wainwright 2013. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In NIPS. 476--484.
[26]
Stéphane G Mallat and Zhifeng Zhang 1993. Matching pursuits with time-frequency dictionaries. Signal Processing, IEEE Transactions on Vol. 41, 12 (1993), 3397--3415.
[27]
Philipp Moritz, Robert Nishihara, and Michael I Jordan. 2015. A Linearly-Convergent Stochastic L-BFGS Algorithm. arXiv preprint arXiv:1508.02087 (2015).
[28]
Mojmir Mutny. 2016. Stochastic Second-Order Optimization via von Neumann Series. arXiv preprint arXiv:1612.04694 (2016).
[29]
Deanna Needell and Joel A Tropp 2009. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, Vol. 26, 3 (2009), 301--321.
[30]
Sahand Negahban and Martin J Wainwright 2011. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics (2011), 1069--1097.
[31]
Sahand Negahban, Bin Yu, Martin J Wainwright, and Pradeep K Ravikumar 2009. A unified framework for high-dimensional analysis of $ M $-estimators with decomposable regularizers. In NIPS. 1348--1356.
[32]
Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. Vol. 87. Springer Science & Business Media.
[33]
Yurii Nesterov and Boris T Polyak 2006. Cubic regularization of Newton method and its global performance. Mathematical Programming Vol. 108, 1 (2006), 177--205.
[34]
Nam Nguyen, Deanna Needell, and Tina Woolf. 2014. Linear convergence of stochastic iterative greedy algorithms with sparse constraints. arXiv preprint arXiv:1407.0088 (2014).
[35]
Renkun Ni and Quanquan Gu 2016. Optimal statistical and computational rates for one bit matrix completion Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. 426--434.
[36]
Garvesh Raskutti, Martin J Wainwright, and Bin Yu. 2011. Minimax rates of estimation for high-dimensional linear regression over-balls. Information Theory, IEEE Transactions on Vol. 57, 10 (2011), 6976--6994.
[37]
Benjamin Recht. 2011. A simpler approach to matrix completion. The Journal of Machine Learning Research Vol. 12 (2011), 3413--3430.
[38]
Sashank J Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, and Alex Smola 2016. Stochastic variance reduction for nonconvex optimization. arXiv preprint arXiv:1603.06160 (2016).
[39]
Farbod Roosta-Khorasani and Michael W Mahoney 2016. Sub-Sampled Newton Methods I: Globally Convergent Algorithms. arXiv preprint arXiv:1601.04737 (2016).
[40]
Nicolas L Roux, Mark Schmidt, and Francis R Bach. 2012. A stochastic gradient method with an exponential convergence _rate for finite training sets Advances in Neural Information Processing Systems. 2663--2671.
[41]
Shai Shalev-Shwartz, Nathan Srebro, and Tong Zhang. 2010. Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization Vol. 20, 6 (2010), 2807--2832.
[42]
Shai Shalev-Shwartz and Tong Zhang 2013. Stochastic dual coordinate ascent methods for regularized loss. The Journal of Machine Learning Research Vol. 14, 1 (2013), 567--599.
[43]
Jie Shen and Ping Li. 2016. A Tight Bound of Hard Thresholding. arXiv preprint arXiv:1605.01656 (2016).
[44]
Lu Tian, Pan Xu, and Quanquan Gu 2016. Forward backward greedy algorithms for multi-task learning with faster rates Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence. AUAI Press, 735--744.
[45]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.
[46]
Joel A Tropp and Anna C Gilbert 2007. Signal recovery from random measurements via orthogonal matching pursuit. Information Theory, IEEE Transactions on Vol. 53, 12 (2007), 4655--4666.
[47]
Lingxiao Wang, Xiang Ren, and Quanquan Gu 2016. Precision matrix estimation in high dimensional gaussian graphical models with faster rates Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. 177--185.
[48]
Pan Xu and Quanquan Gu 2016. Semiparametric Differential Graph Models. In Advances in Neural Information Processing Systems. 1064--1072.
[49]
Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, and Michael W Mahoney. 2016. Sub-sampled Newton Methods with Non-uniform Sampling Advances in Neural Information Processing Systems. 3000--3008.
[50]
Haishan Ye, Luo Luo, and Zhihua Zhang 2017. A Unifying Framework for Convergence Analysis of Approximate Newton Methods. arXiv preprint arXiv:1702.08124 (2017).
[51]
Xiao-Tong Yuan, Ping Li, and Tong Zhang 2013. Gradient hard thresholding pursuit for sparsity-constrained optimization. arXiv preprint arXiv:1311.5750 (2013).
[52]
Xiao-Tong Yuan and Qingshan Liu 2014. Newton Greedy Pursuit: A Quadratic Approximation Method for Sparsity-Constrained Optimization CVPR. 4122--4129.
[53]
Aston Zhang and Quanquan Gu 2016. Accelerated Stochastic Block Coordinate Descent with Optimal Sampling Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2035--2044.
[54]
C. Zhang 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, Vol. 38, 2 (2010), 894--942.
[55]
Tong Zhang. 2011. Adaptive forward-backward greedy algorithm for learning sparse representations. Information Theory, IEEE Transactions on Vol. 57, 7 (2011), 4689--4708.
[56]
Tong Zhang and others. 2009. Some sharp performance bounds for least squares regression with L1 regularization. The Annals of Statistics Vol. 37, 5A (2009), 2109--2144.
[57]
Rongda Zhu and Quanquan Gu 2015. Towards a lower sample complexity for robust one-bit compressed sensing Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 739--747.

Cited By

View all
  • (2024)Igniting Advancements in Iron-Making Process Monitoring: Exploring a Novel and Cutting-Edge Joint Sparse-Constrained Data-Dependent Kernel CVA MethodIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.335327073(1-11)Online publication date: 2024
  • (2023)Relaxation Quadratic Approximation Greedy Pursuit Method Based on Sparse LearningComputational Methods in Applied Mathematics10.1515/cmam-2023-005024:4(909-920)Online publication date: 4-Oct-2023
  • (2023)A Lagrange–Newton algorithm for tensor sparse principal component analysisOptimization10.1080/02331934.2023.2231482(1-19)Online publication date: 12-Jul-2023
  • Show More Cited By

Index Terms

  1. Fast Newton Hard Thresholding Pursuit for Sparsity Constrained Nonconvex Optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2017
    2240 pages
    ISBN:9781450348874
    DOI:10.1145/3097983
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. newton's method
    2. nonconvex optimization
    3. sparse learning
    4. sparsity constrained optimization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '17
    Sponsor:

    Acceptance Rates

    KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)87
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Igniting Advancements in Iron-Making Process Monitoring: Exploring a Novel and Cutting-Edge Joint Sparse-Constrained Data-Dependent Kernel CVA MethodIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.335327073(1-11)Online publication date: 2024
    • (2023)Relaxation Quadratic Approximation Greedy Pursuit Method Based on Sparse LearningComputational Methods in Applied Mathematics10.1515/cmam-2023-005024:4(909-920)Online publication date: 4-Oct-2023
    • (2023)A Lagrange–Newton algorithm for tensor sparse principal component analysisOptimization10.1080/02331934.2023.2231482(1-19)Online publication date: 12-Jul-2023
    • (2022)Seismic Data Reconstruction Based On Online Dictionary Learning and NHTPProceedings of the 2022 5th International Conference on Telecommunications and Communication Engineering10.1145/3577065.3577113(266-271)Online publication date: 28-Nov-2022
    • (2022)Efficient Gradient Support Pursuit With Less Hard Thresholding for Cardinality-Constrained LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.308780533:12(7806-7817)Online publication date: Dec-2022
    • (2021)A Lagrange–Newton algorithm for sparse nonlinear programmingMathematical Programming10.1007/s10107-021-01719-xOnline publication date: 21-Oct-2021
    • (2020)Stochastic Recursive Gradient Support Pursuit and Its Sparse Representation ApplicationsSensors10.3390/s2017490220:17(4902)Online publication date: 30-Aug-2020
    • (2020)A Framework for Subgraph Detection in Interdependent Networks via Graph Block-Structured OptimizationIEEE Access10.1109/ACCESS.2020.30184978(157800-157818)Online publication date: 2020
    • (2020)A new conjugate gradient hard thresholding pursuit algorithm for sparse signal recoveryComputational and Applied Mathematics10.1007/s40314-020-01313-539:4Online publication date: 11-Sep-2020
    • (2019)Loopless Semi-Stochastic Gradient Descent with Less Hard Thresholding for Sparse LearningProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358021(881-890)Online publication date: 3-Nov-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media