research-article

Coordinate linear variance reduction for generalized linear programming

AUTHORs: Chaobing Song, Stephen J. Wright, Cheuk Yin Lin, Jelena DiakonikolasAuthors Info & Claims

NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems

Article No.: 1602, Pages 22049 - 22063

Published: 03 April 2024 Publication History

Abstract

We study a class of generalized linear programs (GLP) in a large-scale setting, which includes a simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name Coordinate Linear Variance Reduction (CLVR; pronounced "clever"). CLVR yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm. When the regularization terms and constraints are separable, CLVR admits an efficient lazy update strategy that makes its complexity bounds scale with the number of nonzero elements of the linear constraint matrix in (GLP) rather than the matrix dimensions. Further, for the special case of linear programs and by exploiting sharpness, we propose a restart scheme for CLVR to obtain empirical linear convergence. Finally, we show that Distributionally Robust Optimization (DRO) problems with ambiguity sets based on both f-divergence and Wasserstein metrics can be reformulated as (GLPs) by introducing sparsely connected auxiliary variables. We complement our theoretical guarantees with numerical experiments that verify our algorithm's practical effectiveness in terms of wall-clock time and number of data passes.

Supplementary Material

Additional material (3600270.3601872_supp.pdf)

Supplemental material.

Download
1.01 MB

References

[1]

GLPK (GNU linear programming kit). https://www.gnu.org/software/glpk/, 2022.

[2]

A. Alacaoglu and Y. Malitsky. Stochastic variance reduction for variational inequality methods. In Conference on Learning Theory, pages 778-816, 2022.

[3]

A. Alacaoglu, Q. T. Dinh, O. Fercoq, and V. Cevher. Smooth primal-dual coordinate descent algorithms for nonsmooth convex optimization. In Proc. NIPS'17, 2017.

[4]

A. Alacaoglu, O. Fercoq, and V. Cevher. On the convergence of stochastic primal-dual hybrid gradient. arXiv preprint arXiv:1911.00799, 2019.

[5]

A. Alacaoglu, O. Fercoq, and V. Cevher. Random extrapolation for primal-dual coordinate descent. In Proc. ICML'20, 2020.

[6]

A. Alacaoglu, V. Cevher, and S. J. Wright. On the complexity of a practical primal-dual coordinate method, 2022. URL https://arxiv.org/abs/2201.07684.

[7]

Z. Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. In Proc. ACM STOC'17, 2017.

Digital Library

[8]

E. D. Andersen and K. D. Andersen. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High Performance Optimization, pages 197-232. Springer, 2000.

[9]

D. Applegate, M. Diaz, O. Hinder, H. Lu, M. Lubin, B. O'Donoghue, and W. Schudy. Practical large-scale linear programming using primal-dual hybrid gradient. Proc. NeurIPS'21, 2021.

[10]

D. Applegate, O. Hinder, H. Lu, and M. Lubin. Faster first-order primal-dual methods for linear programming using restarts and sharpness. arXiv preprint arXiv:2105.12715, 2021.

[11]

P. Bianchi. Ergodic convergence of a stochastic proximal point algorithm. SIAM Journal on Optimization, 26(4):2235-2260, 2016.

[12]

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge university press, 2004.

[13]

Y. Carmon, Y. Jin, A. Sidford, and K. Tian. Variance reduction for matrix games. In Proc. NeurIPS'19, 2019.

[14]

Y. Carmon, Y. Jin, A. Sidford, and K. Tian. Coordinate methods for matrix games. In Proceedings - 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS 2020), pages 283-293, November 2020.

[15]

A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40(1):120-145, 2011.

Digital Library

[16]

A. Chambolle, M. J. Ehrhardt, P. Richtárik, and C.-B. Schonlieb. Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization, 28(4):2783-2808, 2018.

Digital Library

[17]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1-27, 2011.

[18]

C. D. Dang and G. Lan. Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM Journal on Optimization, 25(2):856-881, 2015.

Digital Library

[19]

D. P. De Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850-865, 2003.

Digital Library

[20]

J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Society, 82(2):421-439, 1956.

[21]

J. Duchi and H. Namkoong. Learning models with uniform performance via distributionally robust optimization. Annals of Statistics, 49(3):1378-1406, 2021.

[22]

J. C. Duchi, P. W. Glynn, and H. Namkoong. Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 2021.

Digital Library

[23]

P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1):115-166, 2018.

Digital Library

[24]

O. Fercoq and P. Bianchi. A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM Journal on Optimization, 29(1):100-134, 2019.

Digital Library

[25]

X. Gao and S.-Z. Zhang. First-order algorithms for convex optimization with nonseparable objective and coupled constraints. Journal of the Operations Research Society of China, 5(2): 131-159, 2017.

[26]

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022. URL https://www.gurobi.com.

[27]

E. Y. Hamedani and A. Jalilzadeh. A stochastic variance-reduced accelerated primal-dual method for finite-sum saddle-point problems. arXiv preprint arXiv:2012.13456, 2020.

[28]

J. L. Hennessy and D. A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.

Digital Library

[29]

N. Ho-Nguyen and S. J. Wright. Adversarial classification via distributional robustness with Wasserstein ambiguity. Mathematical Programming, Series B, 2022.

Digital Library

[30]

A. J. Hoffman. On approximate solutions of systems of linear inequalities. In Selected Papers Of Alan J Hoffman: With Commentary, pages 174-176. World Scientific, 2003.

[31]

W. Hu, G. Niu, I. Sato, and M. Sugiyama. Does distributionally robust supervised learning give robust classifiers? In Proc. ICML'18, 2018.

[32]

R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Proc. NIPS'13, 2013.

[33]

A. Juditsky, A. Nemirovski, and C. Tauvel. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17-58, 2011.

[34]

G. Korpelevich. The extragradient method for finding saddle points and other problems. Ekonomika i Matematicheskie Metody, 12(5):747-756 (in Russian; English translation in Matekon), 1976.

[35]

P. Latafat, N. M. Freris, and P. Patrinos. A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Transactions on Automatic Control, 64 (10):4050-4065, 2019.

[36]

S. Lee and S. J. Wright. Manifold identification in dual averaging for regularized stochastic online learning. Journal of Machine Learning Research, 13(6), 2012.

[37]

D. Levy, Y. Carmon, J. C. Duchi, and A. Sidford. Large-scale methods for distributionally robust optimization. In Proc. NeurIPS'20, 2020.

[38]

J. Li, S. Huang, and A. M.-C. So. A first-order algorithmic framework for Wasserstein distributionally robust logistic regression. Proc. NeurIPS'19, 2019.

[39]

C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, M. J. Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4, 2020.

[40]

H. Lu and J. Yang. Nearly optimal linear convergence of stochastic primal-dual methods for linear programming. arXiv preprint arXiv:2111.05530, 2021.

[41]

O. Mangasarian. A Newton method for linear programming. Journal of Optimization Theory and Applications, 121(1):1-18, 2004.

Digital Library

[42]

O. L. Mangasarian. Normal solutions of linear programs. In Mathematical Programming at Oberwolfach II, pages 206-216. Springer, 1984.

[43]

O. L. Mangasarian and R. Meyer. Nonlinear perturbation of linear programs. SIAM Journal on Control and Optimization, 17(6):745-752, 1979.

Digital Library

[44]

H. Namkoong and J. C. Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Proc. NIPS'16, 2016.

[45]

A. Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229-251, 2004.

Digital Library

[46]

Y. Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2-3):319-344, 2007.

Digital Library

[47]

H. Ouyang, N. He, L. Tran, and A. Gray. Stochastic alternating direction method of multipliers. In Proc. ICML' 13, 2013.

[48]

Y. Ouyang and Y. Xu. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, pages 1-35, 2019.

[49]

A. Patrascu and I. Necoara. Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. Journal of Machine Learning Research, 18(1):7204-7245, 2017.

Digital Library

[50]

H. Rahimian and S. Mehrotra. Distributionally robust optimization: A Review. arXiv preprint arXiv:1908.05659, 2019.

[51]

R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5):877-898, 1976.

Digital Library

[52]

S. Shafieezadeh Abadeh, P. Mohajerin Esfahani, and D. Kuhn. Distributionally robust logistic regression. In Proc. NIPS'15, 2015.

[53]

A. Shapiro. On duality theory of conic linear problems. In Semi-Infinite Programming, pages 135-165. Springer, 2001.

[54]

C. Song, Y. Jiang, and Y. Ma. Variance reduction via accelerated dual averaging for finite-sum optimization. In Proc. NeurIPS'20, 2020.

[55]

C. Song, S. J. Wright, and J. Diakonikolas. Variance reduction via primal-dual accelerated dual averaging for nonsmooth convex finite-sums. In Proc. ICML'21, 2021.

[56]

J. Stoer. Duality in nonlinear programming and the minimax theorem. Numerische Mathematik, 5(1):371-379, 1963.

Digital Library

[57]

C. Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, 2009.

[58]

W. Wiesemann, D. Kuhn, and M. Sim. Distributionally robust convex optimization. Operations Research, 62(6):1358-1376, 2014.

[59]

L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11(Oct):2543-2596, 2010.

Digital Library

[60]

Y. Xu. Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM Journal on Optimization, 30(2):1664-1692, 2020.

Digital Library

[61]

Y. Yu, T. Lin, E. Mazumdar, and M. I. Jordan. Fast distributionally robust learning with variance reduced min-max optimization. arXiv preprint arXiv:2104.13326, 2021.

[62]

L. Zhao and D.-L. Zhu. On iteration complexity of a first-order primal-dual method for nonlinear convex cone programming. Journal of the Operations Research Society of China, 10(1):53-87, 2022.

[63]

D. Zhu and L. Zhao. Linear convergence of randomized primal-dual coordinate method for large-scale linear constrained convex programming. In International Conference on Machine Learning, pages 11619-11628. PMLR, 2020.

Index Terms

Coordinate linear variance reduction for generalized linear programming
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Linear programming
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Linear programming

Index terms have been assigned to the content through auto-classification.

Recommendations

Majorization minimization by coordinate descent for concave penalized generalized linear models

Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in ...
Distributed coordinate descent for generalized linear models with regularization

Generalized linear model with L1 and L2 regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text ...
Natural coordinate descent algorithm for L1-penalised regression in generalised linear models

The problem of finding the maximum likelihood estimates for the regression coefficients in generalised linear models with an 1 sparsity penalty is shown to be equivalent to minimising the unpenalised maximum log-likelihood function over a box with ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

November 2022

39114 pages

ISBN:9781713871088

Copyright © 2022 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 April 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents