Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3600270.3601872guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Coordinate linear variance reduction for generalized linear programming

Published: 03 April 2024 Publication History

Abstract

We study a class of generalized linear programs (GLP) in a large-scale setting, which includes a simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name Coordinate Linear Variance Reduction (CLVR; pronounced "clever"). CLVR yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm. When the regularization terms and constraints are separable, CLVR admits an efficient lazy update strategy that makes its complexity bounds scale with the number of nonzero elements of the linear constraint matrix in (GLP) rather than the matrix dimensions. Further, for the special case of linear programs and by exploiting sharpness, we propose a restart scheme for CLVR to obtain empirical linear convergence. Finally, we show that Distributionally Robust Optimization (DRO) problems with ambiguity sets based on both f-divergence and Wasserstein metrics can be reformulated as (GLPs) by introducing sparsely connected auxiliary variables. We complement our theoretical guarantees with numerical experiments that verify our algorithm's practical effectiveness in terms of wall-clock time and number of data passes.

Supplementary Material

Additional material (3600270.3601872_supp.pdf)
Supplemental material.

References

[1]
GLPK (GNU linear programming kit). https://www.gnu.org/software/glpk/, 2022.
[2]
A. Alacaoglu and Y. Malitsky. Stochastic variance reduction for variational inequality methods. In Conference on Learning Theory, pages 778-816, 2022.
[3]
A. Alacaoglu, Q. T. Dinh, O. Fercoq, and V. Cevher. Smooth primal-dual coordinate descent algorithms for nonsmooth convex optimization. In Proc. NIPS'17, 2017.
[4]
A. Alacaoglu, O. Fercoq, and V. Cevher. On the convergence of stochastic primal-dual hybrid gradient. arXiv preprint arXiv:1911.00799, 2019.
[5]
A. Alacaoglu, O. Fercoq, and V. Cevher. Random extrapolation for primal-dual coordinate descent. In Proc. ICML'20, 2020.
[6]
A. Alacaoglu, V. Cevher, and S. J. Wright. On the complexity of a practical primal-dual coordinate method, 2022. URL https://arxiv.org/abs/2201.07684.
[7]
Z. Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. In Proc. ACM STOC'17, 2017.
[8]
E. D. Andersen and K. D. Andersen. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High Performance Optimization, pages 197-232. Springer, 2000.
[9]
D. Applegate, M. Diaz, O. Hinder, H. Lu, M. Lubin, B. O'Donoghue, and W. Schudy. Practical large-scale linear programming using primal-dual hybrid gradient. Proc. NeurIPS'21, 2021.
[10]
D. Applegate, O. Hinder, H. Lu, and M. Lubin. Faster first-order primal-dual methods for linear programming using restarts and sharpness. arXiv preprint arXiv:2105.12715, 2021.
[11]
P. Bianchi. Ergodic convergence of a stochastic proximal point algorithm. SIAM Journal on Optimization, 26(4):2235-2260, 2016.
[12]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge university press, 2004.
[13]
Y. Carmon, Y. Jin, A. Sidford, and K. Tian. Variance reduction for matrix games. In Proc. NeurIPS'19, 2019.
[14]
Y. Carmon, Y. Jin, A. Sidford, and K. Tian. Coordinate methods for matrix games. In Proceedings - 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS 2020), pages 283-293, November 2020.
[15]
A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40(1):120-145, 2011.
[16]
A. Chambolle, M. J. Ehrhardt, P. Richtárik, and C.-B. Schonlieb. Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM Journal on Optimization, 28(4):2783-2808, 2018.
[17]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1-27, 2011.
[18]
C. D. Dang and G. Lan. Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM Journal on Optimization, 25(2):856-881, 2015.
[19]
D. P. De Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850-865, 2003.
[20]
J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Society, 82(2):421-439, 1956.
[21]
J. Duchi and H. Namkoong. Learning models with uniform performance via distributionally robust optimization. Annals of Statistics, 49(3):1378-1406, 2021.
[22]
J. C. Duchi, P. W. Glynn, and H. Namkoong. Statistics of robust optimization: A generalized empirical likelihood approach. Mathematics of Operations Research, 2021.
[23]
P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1):115-166, 2018.
[24]
O. Fercoq and P. Bianchi. A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM Journal on Optimization, 29(1):100-134, 2019.
[25]
X. Gao and S.-Z. Zhang. First-order algorithms for convex optimization with nonseparable objective and coupled constraints. Journal of the Operations Research Society of China, 5(2): 131-159, 2017.
[26]
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2022. URL https://www.gurobi.com.
[27]
E. Y. Hamedani and A. Jalilzadeh. A stochastic variance-reduced accelerated primal-dual method for finite-sum saddle-point problems. arXiv preprint arXiv:2012.13456, 2020.
[28]
J. L. Hennessy and D. A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.
[29]
N. Ho-Nguyen and S. J. Wright. Adversarial classification via distributional robustness with Wasserstein ambiguity. Mathematical Programming, Series B, 2022.
[30]
A. J. Hoffman. On approximate solutions of systems of linear inequalities. In Selected Papers Of Alan J Hoffman: With Commentary, pages 174-176. World Scientific, 2003.
[31]
W. Hu, G. Niu, I. Sato, and M. Sugiyama. Does distributionally robust supervised learning give robust classifiers? In Proc. ICML'18, 2018.
[32]
R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Proc. NIPS'13, 2013.
[33]
A. Juditsky, A. Nemirovski, and C. Tauvel. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17-58, 2011.
[34]
G. Korpelevich. The extragradient method for finding saddle points and other problems. Ekonomika i Matematicheskie Metody, 12(5):747-756 (in Russian; English translation in Matekon), 1976.
[35]
P. Latafat, N. M. Freris, and P. Patrinos. A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Transactions on Automatic Control, 64 (10):4050-4065, 2019.
[36]
S. Lee and S. J. Wright. Manifold identification in dual averaging for regularized stochastic online learning. Journal of Machine Learning Research, 13(6), 2012.
[37]
D. Levy, Y. Carmon, J. C. Duchi, and A. Sidford. Large-scale methods for distributionally robust optimization. In Proc. NeurIPS'20, 2020.
[38]
J. Li, S. Huang, and A. M.-C. So. A first-order algorithmic framework for Wasserstein distributionally robust logistic regression. Proc. NeurIPS'19, 2019.
[39]
C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, M. J. Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4, 2020.
[40]
H. Lu and J. Yang. Nearly optimal linear convergence of stochastic primal-dual methods for linear programming. arXiv preprint arXiv:2111.05530, 2021.
[41]
O. Mangasarian. A Newton method for linear programming. Journal of Optimization Theory and Applications, 121(1):1-18, 2004.
[42]
O. L. Mangasarian. Normal solutions of linear programs. In Mathematical Programming at Oberwolfach II, pages 206-216. Springer, 1984.
[43]
O. L. Mangasarian and R. Meyer. Nonlinear perturbation of linear programs. SIAM Journal on Control and Optimization, 17(6):745-752, 1979.
[44]
H. Namkoong and J. C. Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Proc. NIPS'16, 2016.
[45]
A. Nemirovski. Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229-251, 2004.
[46]
Y. Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2-3):319-344, 2007.
[47]
H. Ouyang, N. He, L. Tran, and A. Gray. Stochastic alternating direction method of multipliers. In Proc. ICML' 13, 2013.
[48]
Y. Ouyang and Y. Xu. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, pages 1-35, 2019.
[49]
A. Patrascu and I. Necoara. Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. Journal of Machine Learning Research, 18(1):7204-7245, 2017.
[50]
H. Rahimian and S. Mehrotra. Distributionally robust optimization: A Review. arXiv preprint arXiv:1908.05659, 2019.
[51]
R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5):877-898, 1976.
[52]
S. Shafieezadeh Abadeh, P. Mohajerin Esfahani, and D. Kuhn. Distributionally robust logistic regression. In Proc. NIPS'15, 2015.
[53]
A. Shapiro. On duality theory of conic linear problems. In Semi-Infinite Programming, pages 135-165. Springer, 2001.
[54]
C. Song, Y. Jiang, and Y. Ma. Variance reduction via accelerated dual averaging for finite-sum optimization. In Proc. NeurIPS'20, 2020.
[55]
C. Song, S. J. Wright, and J. Diakonikolas. Variance reduction via primal-dual accelerated dual averaging for nonsmooth convex finite-sums. In Proc. ICML'21, 2021.
[56]
J. Stoer. Duality in nonlinear programming and the minimax theorem. Numerische Mathematik, 5(1):371-379, 1963.
[57]
C. Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, 2009.
[58]
W. Wiesemann, D. Kuhn, and M. Sim. Distributionally robust convex optimization. Operations Research, 62(6):1358-1376, 2014.
[59]
L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11(Oct):2543-2596, 2010.
[60]
Y. Xu. Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM Journal on Optimization, 30(2):1664-1692, 2020.
[61]
Y. Yu, T. Lin, E. Mazumdar, and M. I. Jordan. Fast distributionally robust learning with variance reduced min-max optimization. arXiv preprint arXiv:2104.13326, 2021.
[62]
L. Zhao and D.-L. Zhu. On iteration complexity of a first-order primal-dual method for nonlinear convex cone programming. Journal of the Operations Research Society of China, 10(1):53-87, 2022.
[63]
D. Zhu and L. Zhao. Linear convergence of randomized primal-dual coordinate method for large-scale linear constrained convex programming. In International Conference on Machine Learning, pages 11619-11628. PMLR, 2020.

Index Terms

  1. Coordinate linear variance reduction for generalized linear programming
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems
      November 2022
      39114 pages

      Publisher

      Curran Associates Inc.

      Red Hook, NY, United States

      Publication History

      Published: 03 April 2024

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media