article

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Authors:

Jonathan EcksteinAuthors Info & Claims

Foundations and Trends® in Machine Learning, Volume 3, Issue 1

Pages 1 - 122

https://doi.org/10.1561/2200000016

Published: 01 January 2011 Publication History

Abstract

Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ℓ₁ problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

References

[1]

M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo, "Fast image recovery using variable splitting and constrained optimization," IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2345-2356, 2010.

Digital Library

[2]

M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo, "An Augmented Lagrangian Approach to the Constrained Optimization Formulation of Imaging Inverse Problems," IEEE Transactions on Image Processing, vol. 20, pp. 681-695, 2011.

Digital Library

[3]

E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorenson, LAPACK: A portable linear algebra library for high-performance computers. IEEE Computing Society Press, 1990.

[4]

K. J. Arrow and G. Debreu, "Existence of an equilibrium for a competitive economy," Econometrica: Journal of the Econometric Society, vol. 22, no. 3, pp. 265-290, 1954.

[5]

K. J. Arrow, L. Hurwicz, and H. Uzawa, Studies in Linear and Nonlinear Programming. Stanford University Press: Stanford, 1958.

[6]

K. J. Arrow and R. M. Solow, "Gradient methods for constrained maxima, with weakened assumptions," in Studies in Linear and Nonlinear Programming, (K. J. Arrow, L. Hurwicz, and H. Uzawa, eds.), Stanford University Press: Stanford, 1958.

[7]

O. Banerjee, L. E. Ghaoui, and A. d'Aspremont, "Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data," Journal of Machine Learning Research, vol. 9, pp. 485-516, 2008.

Digital Library

[8]

P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe, "Convexity, classification, and risk bounds," Journal of the American Statistical Association, vol. 101, no. 473, pp. 138-156, 2006.

[9]

H. H. Bauschke and J. M. Borwein, "Dykstra's alternating projection algorithm for two sets," Journal of Approximation Theory, vol. 79, no. 3, pp. 418-443, 1994.

Digital Library

[10]

H. H. Bauschke and J. M. Borwein, "On projection algorithms for solving convex feasibility problems," SIAM Review, vol. 38, no. 3, pp. 367-426, 1996.

Digital Library

[11]

A. Beck and M. Teboulle, "A fast iterative shrinkage-thresholding algorithm for linear inverse problems," SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.

Digital Library

[12]

S. Becker, J. Bobin, and E. J. Candès, "NESTA: A fast and accurate first-order method for sparse recovery," Available at http://www.acm.caltech. edu/~emmanuel/papers/NESTA.pdf, 2009.

[13]

J. F. Benders, "Partitioning procedures for solving mixed-variables programming problems," Numerische Mathematik, vol. 4, pp. 238-252, 1962.

Digital Library

[14]

A. Bensoussan, J.-L. Lions, and R. Temam, "Sur les méthodes de décomposition, de décentralisation et de coordination et applications," Methodes Mathematiques de l'Informatique, pp. 133-257, 1976.

[15]

D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 1982.

[16]

D. P. Bertsekas, Nonlinear Programming. Athena Scientific, second ed., 1999.

[17]

D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Prentice Hall, 1989.

Digital Library

[18]

J. M. Bioucas-Dias and M. A. T. Figueiredo, "Alternating Direction Algorithms for Constrained Sparse Regression: Application to Hyperspectral Unmixing," arXiv:1002.4527, 2010.

[19]

J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples. Canadian Mathematical Society, 2000.

[20]

S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.

Digital Library

[21]

L. M. Bregman, "Finding the common point of convex sets by the method of successive projections," Proceedings of the USSR Academy of Sciences, vol. 162, no. 3, pp. 487-490, 1965.

[22]

L. M. Bregman, "The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming," USSR Computational Mathematics and Mathematical Physics, vol. 7, no. 3, pp. 200-217, 1967.

[23]

H. Brézis, Opérateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de Hilbert. North-Holland: Amsterdam, 1973.

[24]

A. M. Bruckstein, D. L. Donoho, and M. Elad, "From sparse solutions of systems of equations to sparse modeling of signals and images," SIAM Review, vol. 51, no. 1, pp. 34-81, 2009.

Digital Library

[25]

Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, "HaLoop: Efficient Iterative Data Processing on Large Clusters," Proceedings of the 36th International Conference on Very Large Databases, 2010.

[26]

R. H. Byrd, P. Lu, and J. Nocedal, "A Limited Memory Algorithm for Bound Constrained Optimization," SIAM Journal on Scientific and Statistical Computing, vol. 16, no. 5, pp. 1190-1208, 1995.

Digital Library

[27]

E. J. Candès and Y. Plan, "Near-ideal model selection by l ₁ minimization," Annals of Statistics, vol. 37, no. 5A, pp. 2145-2177, 2009.

[28]

E. J. Candès, J. Romberg, and T. Tao, "Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information," IEEE Transactions on Information Theory, vol. 52, no. 2, p. 489, 2006.

Digital Library

[29]

E. J. Candès and T. Tao, "Near-optimal signal recovery from random projections: Universal encoding strategies?," IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406-5425, 2006.

Digital Library

[30]

Y. Censor and S. A. Zenios, "Proximal minimization algorithm with D- functions," Journal of Optimization Theory and Applications, vol. 73, no. 3, pp. 451-464, 1992.

Digital Library

[31]

Y. Censor and S. A. Zenios, Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, 1997.

Digital Library

[32]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, "BigTable: A distributed storage system for structured data," ACM Transactions on Computer Systems, vol. 26, no. 2, pp. 1-26, 2008.

Digital Library

[33]

G. Chen and M. Teboulle, "A proximal-based decomposition method for convex minimization problems," Mathematical Programming, vol. 64, pp. 81-101, 1994.

Digital Library

[34]

S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Review, vol. 43, pp. 129-159, 2001.

Digital Library

[35]

Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, "Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate," ACM Transactions on Mathematical Software, vol. 35, no. 3, p. 22, 2008.

Digital Library

[36]

W. Cheney and A. A. Goldstein, "Proximity maps for convex sets," Proceedings of the American Mathematical Society, vol. 10, no. 3, pp. 448-450, 1959.

[37]

C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun, "MapReduce for machine learning on multicore," in Advances in Neural Information Processing Systems, 2007.

[38]

J. F. Claerbout and F. Muir, "Robust modeling with erratic data," Geophysics, vol. 38, p. 826, 1973.

[39]

P. L. Combettes, "The convex feasibility problem in image recovery," Advances in Imaging and Electron Physics, vol. 95, pp. 155-270, 1996.

[40]

P. L. Combettes and J. C. Pesquet, "A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery," IEEE Journal on Selected Topics in Signal Processing, vol. 1, no. 4, pp. 564-574, 2007.

[41]

P. L. Combettes and J. C. Pesquet, "Proximal Splitting Methods in Signal Processing," arXiv:0912.3522, 2009.

[42]

P. L. Combettes and V. R. Wajs, "Signal recovery by proximal forward-backward splitting," Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168-1200, 2006.

[43]

G. B. Dantzig, Linear Programming and Extensions. RAND Corporation, 1963.

[44]

G. B. Dantzig and P. Wolfe, "Decomposition principle for linear programs," Operations Research, vol. 8, pp. 101-111, 1960.

Digital Library

[45]

I. Daubechies, M. Defrise, and C. D. Mol, "An iterative thresholding algorithm for linear inverse problems with a sparsity constraint," Communications on Pure and Applied Mathematics, vol. 57, pp. 1413-1457, 2004.

[46]

J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

Digital Library

[47]

J. W. Demmel, Applied Numerical Linear Algebra. SIAM: Philadelphia, PA, 1997.

[48]

A. P. Dempster, "Covariance selection," Biometrics, vol. 28, no. 1, pp. 157-175, 1972.

[49]

D. L. Donoho, "De-noising by soft-thresholding," IEEE Transactions on Information Theory, vol. 41, pp. 613-627, 1995.

Digital Library

[50]

D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.

Digital Library

[51]

D. L. Donoho, A. Maleki, and A. Montanari, "Message-passing algorithms for compressed sensing," Proceedings of the National Academy of Sciences, vol. 106, no. 45, p. 18914, 2009.

[52]

D. L. Donoho and Y. Tsaig, "Fast solution of l ₁-norm minimization problems when the solution may be sparse," Tech. Rep., Stanford University, 2006.

[53]

J. Douglas and H. H. Rachford, "On the numerical solution of heat conduction problems in two and three space variables," Transactions of the American Mathematical Society, vol. 82, pp. 421-439, 1956.

[54]

J. C. Duchi, A. Agarwal, and M. J. Wainwright, "Distributed Dual Averaging in Networks," in Advances in Neural Information Processing Systems, 2010.

[55]

J. C. Duchi, S. Gould, and D. Koller, "Projected subgradient methods for learning sparse Gaussians," in Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2008.

[56]

R. L. Dykstra, "An algorithm for restricted least squares regression," Journal of the American Statistical Association, vol. 78, pp. 837-842, 1983.

[57]

J. Eckstein, Splitting methods for monotone operators with applications to parallel optimization. PhD thesis, MIT, 1989.

[58]

J. Eckstein, "Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming," Mathematics of Operations Research, pp. 202-226, 1993.

[59]

J. Eckstein, "Parallel alternating direction multiplier decomposition of convex programs," Journal of Optimization Theory and Applications, vol. 80, no. 1, pp. 39-62, 1994.

Digital Library

[60]

J. Eckstein, "Some saddle-function splitting methods for convex programming," Optimization Methods and Software, vol. 4, no. 1, pp. 75-83, 1994.

[61]

J. Eckstein, "A practical general approximation criterion for methods of multipliers based on Bregman distances," Mathematical Programming, vol. 96, no. 1, pp. 61-86, 2003.

[62]

J. Eckstein and D. P. Bertsekas, "An alternating direction method for linear programming," Tech. Rep., MIT, 1990.

[63]

J. Eckstein and D. P. Bertsekas, "On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators," Mathematical Programming, vol. 55, pp. 293-318, 1992.

Digital Library

[64]

J. Eckstein and M. C. Ferris, "Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control," INFORMS Journal on Computing, vol. 10, pp. 218-235, 1998.

Digital Library

[65]

J. Eckstein and M. Fukushima, "Some reformulations and applications of the alternating direction method of multipliers," Large Scale Optimization: State of the Art, pp. 119-138, 1993.

[66]

J. Eckstein and B. F. Svaiter, "A family of projective splitting methods for the sum of two maximal monotone operators," Mathematical Programming, vol. 111, no. 1-2, p. 173, 2008.

Digital Library

[67]

J. Eckstein and B. F. Svaiter, "General projective splitting methods for sums of maximal monotone operators," SIAM Journal on Control and Optimization, vol. 48, pp. 787-811, 2009.

Digital Library

[68]

E. Esser, "Applications of Lagrangian-based alternating direction methods and connections to split Bregman," CAM report, vol. 9, p. 31, 2009.

[69]

H. Everett, "Generalized Lagrange multiplier method for solving problems of optimum allocation of resources," Operations Research, vol. 11, no. 3, pp. 399-417, 1963.

Digital Library

[70]

M. J. Fadili and J. L. Starck, "Monotone operator splitting for optimization problems in sparse recovery," IEEE ICIP, 2009.

[71]

A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Society for Industrial and Applied Mathematics, 1990. First published in 1968 by Research Analysis Corporation.

[72]

M. A. T. Figueiredo and J. M. Bioucas-Dias, "Restoration of Poissonian Images Using Alternating Direction Optimization," IEEE Transactions on Image Processing, vol. 19, pp. 3133-3145, 2010.

Digital Library

[73]

M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, "Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems," IEEE Journal on Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586-597, 2007.

[74]

P. A. Forero, A. Cano, and G. B. Giannakis, "Consensus-based distributed support vector machines," Journal of Machine Learning Research, vol. 11, pp. 1663-1707, 2010.

Digital Library

[75]

M. Fortin and R. Glowinski, Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland: Amsterdam, 1983.

[76]

M. Fortin and R. Glowinski, "On decomposition-coordination methods using an augmented Lagrangian," in Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems, (M. Fortin and R. Glowinski, eds.), North-Holland: Amsterdam, 1983.

[77]

M. Forum, MPI: A Message-Passing Interface Standard, version 2.2. High-Performance Computing Center: Stuttgart, 2009.

[78]

Y. Freund and R. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," in Computational Learning Theory, pp. 23-37, Springer, 1995.

[79]

J. Friedman, T. Hastie, and R. Tibshirani, "Sparse inverse covariance estimation with the graphical lasso," Biostatistics, vol. 9, no. 3, p. 432, 2008.

[80]

M. Fukushima, "Application of the alternating direction method of multipliers to separable convex programming problems," Computational Optimization and Applications, vol. 1, pp. 93-111, 1992.

[81]

D. Gabay, "Applications of the method of multipliers to variational inequalities," in Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems, (M. Fortin and R. Glowinski, eds.), North-Holland: Amsterdam, 1983.

[82]

D. Gabay and B. Mercier, "A dual algorithm for the solution of nonlinear variational problems via finite element approximations," Computers and Mathematics with Applications, vol. 2, pp. 17-40, 1976.

[83]

M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, and F. Rossi, GNU Scientific Library Reference Manual. Network Theory Ltd., third ed., 2002.

[84]

A. M. Geoffrion, "Generalized Benders decomposition," Journal of Optimization Theory and Applications, vol. 10, no. 4, pp. 237-260, 1972.

[85]

S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google file system," ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29-43, 2003.

Digital Library

[86]

R. Glowinski and A. Marrocco, "Sur l'approximation, par elements finis d'ordre un, et la resolution, par penalisation-dualité, d'une classe de problems de Dirichlet non lineares," Revue Française d'Automatique, Informatique, et Recherche Opérationelle, vol. 9, pp. 41-76, 1975.

[87]

R. Glowinski and P. L. Tallec, "Augmented Lagrangian methods for the solution of variational problems," Tech. Rep. 2965, University of Wisconsin-Madison, 1987.

[88]

T. Goldstein and S. Osher, "The split Bregman method for l ₁ regularized problems," SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 323-343, 2009.

Digital Library

[89]

E. G. Gol'shtein and N. V. Tret'yakov, "Modified Lagrangians in convex programming and their generalizations," Point-to-Set Maps and Mathematical Programming, pp. 86-97, 1979.

[90]

G. H. Golub and C. F. van Loan, Matrix Computations. Johns Hopkins University Press, third ed., 1996.

[91]

D. Gregor and A. Lumsdaine, "The Parallel BGL: A generic library for distributed graph computations," Parallel Object-Oriented Scientific Computing, 2005.

[92]

A. Halevy, P. Norvig, and F. Pereira, "The Unreasonable Effectiveness of Data," IEEE Intelligent Systems, vol. 24, no. 2, 2009.

Digital Library

[93]

K. B. Hall, S. Gilpin, and G. Mann, "MapReduce/BigTable for distributed optimization," in Neural Information Processing Systems: Workshop on Learning on Cores, Clusters, and Clouds, 2010.

[94]

T. Hastie and R. Tibshirani, Generalized Additive Models. Chapman & Hall, 1990.

[95]

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, second ed., 2009.

[96]

B. S. He, H. Yang, and S. L. Wang, "Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities," Journal of Optimization Theory and Applications, vol. 106, no. 2, pp. 337-356, 2000.

Digital Library

[97]

M. R. Hestenes, "Multiplier and gradient methods," Journal of Optimization Theory and Applications, vol. 4, pp. 302-320, 1969.

[98]

M. R. Hestenes, "Multiplier and gradient methods," in Computing Methods in Optimization Problems, (L. A. Zadeh, L. W. Neustadt, and A. V. Balakrishnan, eds.), Academic Press, 1969.

[99]

J.-B. Hiriart-Urruty and C. Lemaréchal, Fundamentals of Convex Analysis. Springer, 2001.

[100]

P. J. Huber, "Robust estimation of a location parameter," Annals of Mathematical Statistics, vol. 35, pp. 73-101, 1964.

[101]

S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, "l ₁ Trend filtering," SIAM Review, vol. 51, no. 2, pp. 339-360, 2009.

[102]

S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, "An interior-point method for large-scale l ₁-regularized least squares," IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 606-617, 2007.

[103]

K. Koh, S.-J. Kim, and S. Boyd, "An interior-point method for large-scale l ₁- regularized logistic regression," Journal of Machine Learning Research, vol. 1, no. 8, pp. 1519-1555, 2007.

[104]

D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

Digital Library

[105]

S. A. Kontogiorgis, Alternating directions methods for the parallel solution of large-scale block-structured optimization problems. PhD thesis, University of Wisconsin-Madison, 1994.

[106]

S. A. Kontogiorgis and R. R. Meyer, "A variable-penalty alternating directions method for convex optimization," Mathematical Programming, vol. 83, pp. 29-53, 1998.

Digital Library

[107]

L. S. Lasdon, Optimization Theory for Large Systems. MacMillan, 1970.

[108]

J. Lawrence and J. E. Spingarn, "On fixed points of non-expansive piecewise isometric mappings," Proceedings of the London Mathematical Society, vol. 3, no. 3, p. 605, 1987.

[109]

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, "Basic linear algebra subprograms for Fortran usage," ACM Transactions on Mathematical Software, vol. 5, no. 3, pp. 308-323, 1979.

Digital Library

[110]

D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," Advances in Neural Information Processing Systems, vol. 13, 2001.

[111]

J. Lin and M. Schatz, "Design Patterns for Efficient Graph Algorithms in MapReduce," in Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pp. 78-85, 2010.

[112]

P. L. Lions and B. Mercier, "Splitting algorithms for the sum of two nonlinear operators," SIAM Journal on Numerical Analysis, vol. 16, pp. 964-979, 1979.

Digital Library

[113]

D. C. Liu and J. Nocedal, "On the Limited Memory Method for Large Scale Optimization," Mathematical Programming B, vol. 45, no. 3, pp. 503-528, 1989.

Digital Library

[114]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, "GraphLab: A New Parallel Framework for Machine Learning," in Conference on Uncertainty in Artificial Intelligence, 2010.

[115]

Z. Lu, "Smooth optimization approach for sparse covariance selection," SIAM Journal on Optimization, vol. 19, no. 4, pp. 1807-1827, 2009.

Digital Library

[116]

Z. Lu, T. K. Pong, and Y. Zhang, "An Alternating Direction Method for Finding Dantzig Selectors," arXiv:1011.4604, 2010.

[117]

D. G. Luenberger, Introduction to Linear and Nonlinear Programming. Addison-Wesley: Reading, MA, 1973.

[118]

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Network flow algorithms for structured sparsity," Advances in Neural Information Processing Systems, vol. 24, 2010.

[119]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, "Pregel: A system for large-scale graph processing," in Proceedings of the 2010 International Conference on Management of Data, pp. 135-146, 2010.

[120]

A. F. T. Martins, M. A. T. Figueiredo, P. M. Q. Aguiar, N. A. Smith, and E. P. Xing, "An Augmented Lagrangian Approach to Constrained MAP Inference," in International Conference on Machine Learning, 2011.

[121]

G. Mateos, J.-A. Bazerque, and G. B. Giannakis, "Distributed sparse linear regression," IEEE Transactions on Signal Processing, vol. 58, pp. 5262-5276, Oct. 2010.

Digital Library

[122]

P. J. McCullagh and J. A. Nelder, Generalized Linear Models. Chapman & Hall, 1991.

[123]

N. Meinshausen and P. Bühlmann, "High-dimensional graphs and variable selection with the lasso," Annals of Statistics, vol. 34, no. 3, pp. 1436-1462, 2006.

[124]

A. Miele, E. E. Cragg, R. R. Iver, and A. V. Levy, "Use of the augmented penalty function in mathematical programming problems, part 1," Journal of Optimization Theory and Applications, vol. 8, pp. 115-130, 1971.

[125]

A. Miele, E. E. Cragg, and A. V. Levy, "Use of the augmented penalty function in mathematical programming problems, part 2," Journal of Optimization Theory and Applications, vol. 8, pp. 131-153, 1971.

[126]

A. Miele, P. E. Mosely, A. V. Levy, and G. M. Coggins, "On the method of multipliers for mathematical programming problems," Journal of Optimization Theory and Applications, vol. 10, pp. 1-33, 1972.

[127]

J.-J. Moreau, "Fonctions convexes duales et points proximaux dans un espace Hilbertien," Reports of the Paris Academy of Sciences, Series A, vol. 255, pp. 2897-2899, 1962.

[128]

D. Mosk-Aoyama, T. Roughgarden, and D. Shah, "Fully distributed algorithms for convex optimization problems," Available at http://theory.stanford.edu/~tim/papers/distrcvxopt.pdf, 2007.

[129]

I. Necoara and J. A. K. Suykens, "Application of a smoothing technique to decomposition in convex optimization," IEEE Transactions on Automatic Control, vol. 53, no. 11, pp. 2674-2679, 2008.

[130]

A. Nedic and A. Ozdaglar, "Distributed subgradient methods for multiagent optimization," IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48-61, 2009.

[131]

A. Nedic and A. Ozdaglar, "Cooperative distributed multi-agent optimization," in Convex Optimization in Signal Processing and Communications, (D. P. Palomar and Y. C. Eldar, eds.), Cambridge University Press, 2010.

[132]

Y. Nesterov, "A method of solving a convex programming problem with convergence rate O(1/k ²," Soviet Mathematics Doklady, vol. 27, no. 2, pp. 372-376, 1983.

[133]

Y. Nesterov, "Gradient methods for minimizing composite objective function," CORE Discussion Paper, Catholic University of Louvain, vol. 76, p. 2007, 2007.

[134]

M. Ng, P. Weiss, and X. Yuang, "Solving Constrained Total-Variation Image Restoration and Reconstruction Problems via Alternating Direction Methods," ICM Research Report, Available at http://www.optimization-online. org/DB_FILE/2009/10/2434.pdf, 2009.

[135]

J. Nocedal and S. J. Wright, Numerical Optimization. Springer-Verlag, 1999.

[136]

H. Ohlsson, L. Ljung, and S. Boyd, "Segmentation of ARX-models using sum-of-norms regularization," Automatica, vol. 46, pp. 1107-1111, 2010.

Digital Library

[137]

D. W. Peaceman and H. H. Rachford, "The numerical solution of parabolic and elliptic differential equations," Journal of the Society for Industrial and Applied Mathematics, vol. 3, pp. 28-41, 1955.

[138]

M. J. D. Powell, "A method for nonlinear constraints in minimization problems," in Optimization, (R. Fletcher, ed.), Academic Press, 1969.

[139]

A. Ribeiro, I. Schizas, S. Roumeliotis, and G. Giannakis, "Kalman filtering in wireless sensor networks -- Incorporating communication cost in state estimation problems," IEEE Control Systems Magazine, vol. 30, pp. 66-86, Apr. 2010.

[140]

R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970.

[141]

R. T. Rockafellar, "Augmented Lagrangians and applications of the proximal point algorithm in convex programming," Mathematics of Operations Research, vol. 1, pp. 97-116, 1976.

Digital Library

[142]

R. T. Rockafellar, "Monotone operators and the proximal point algorithm," SIAM Journal on Control and Optimization, vol. 14, p. 877, 1976.

Digital Library

[143]

R. T. Rockafellar and R. J.-B. Wets, "Scenarios and policy aggregation in optimization under uncertainty," Mathematics of Operations Research, vol. 16, no. 1, pp. 119-147, 1991.

Digital Library

[144]

R. T. Rockafellar and R. J.-B. Wets, Variational Analysis. Springer-Verlag, 1998.

[145]

L. Rudin, S. J. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms," Physica D, vol. 60, pp. 259-268, 1992.

Digital Library

[146]

A. Ruszczynski, "An augmented Lagrangian decomposition method for block diagonal linear programming problems," Operations Research Letters, vol. 8, no. 5, pp. 287-294, 1989.

Digital Library

[147]

A. Ruszczynski, "On convergence of an augmented Lagrangian decomposition method for sparse convex optimization," Mathematics of Operations Research, vol. 20, no. 3, pp. 634-656, 1995.

Digital Library

[148]

K. Scheinberg, S. Ma, and D. Goldfarb, "Sparse inverse covariance selection via alternating linearization methods," in Advances in Neural Information Processing Systems, 2010.

[149]

I. D. Schizas, G. Giannakis, S. Roumeliotis, and A. Ribeiro, "Consensus in ad hoc WSNs with noisy links -- part II: Distributed estimation and smoothing of random signals," IEEE Transactions on Signal Processing, vol. 56, pp. 1650-1666, Apr. 2008.

Digital Library

[150]

I. D. Schizas, A. Ribeiro, and G. B. Giannakis, "Consensus in ad hoc WSNs with noisy links -- part I: Distributed estimation of deterministic signals," IEEE Transactions on Signal Processing, vol. 56, pp. 350-364, Jan. 2008.

Digital Library

[151]

B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.

Digital Library

[152]

N. Z. Shor, Minimization Methods for Non-Differentiable Functions. Springer-Verlag, 1985.

[153]

J. E. Spingarn, "Applications of the method of partial inverses to convex programming: decomposition," Mathematical Programming, vol. 32, pp. 199-223, 1985.

[154]

G. Steidl and T. Teuber, "Removing multiplicative noise by Douglas-Rachford splitting methods," Journal of Mathematical Imaging and Vision, vol. 36, no. 2, pp. 168-184, 2010.

Digital Library

[155]

C. H. Teo, S. V. N. Vishwanathan, A. J. Smola, and Q. V. Le, "Bundle methods for regularized risk minimization," Journal of Machine Learning Research, vol. 11, pp. 311-365, 2010.

Digital Library

[156]

R. Tibshirani, "Regression shrinkage and selection via the lasso," Journal of the Royal Statistical Society, Series B, vol. 58, pp. 267-288, 1996.

[157]

P. Tseng, "Applications of a splitting algorithm to decomposition in convex programming and variational inequalities.," SIAM Journal on Control and Optimization, vol. 29, pp. 119-138, 1991.

Digital Library

[158]

P. Tseng, "Alternating projection-proximal methods for convex programming and variational inequalities," SIAM Journal on Optimization, vol. 7, pp. 951-965, 1997.

Digital Library

[159]

P. Tseng, "A modified forward-backward splitting method for maximal monotone mappings," SIAM Journal on Control and Optimization, vol. 38, p. 431, 2000.

Digital Library

[160]

J. N. Tsitsiklis, Problems in decentralized decision making and computation. PhD thesis, Massachusetts Institute of Technology, 1984.

[161]

J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, 1986.

[162]

H. Uzawa, "Market mechanisms and mathematical programming," Econometrica: Journal of the Econometric Society, vol. 28, no. 4, pp. 872-881, 1960.

[163]

H. Uzawa, "Walras' tâtonnement in the theory of exchange," The Review of Economic Studies, vol. 27, no. 3, pp. 182-194, 1960.

[164]

L. G. Valiant, "A bridging model for parallel computation," Communications of the ACM, vol. 33, no. 8, p. 111, 1990.

Digital Library

[165]

V. N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 2000.

[166]

J. von Neumann, Functional Operators, Volume 2: The Geometry of Orthogonal Spaces. Princeton University Press: Annals of Mathematics Studies, 1950. Reprint of 1933 lecture notes.

[167]

M. J. Wainwright and M. I. Jordan, "Graphical models, exponential families, and variational inference," Foundations and Trends in Machine Learning, vol. 1, no. 1-2, pp. 1-305, 2008.

Digital Library

[168]

L. Walras, Éléments d'économie politique pure, ou, Théorie de la richesse sociale. F. Rouge, 1896.

[169]

S. L. Wang and L. Z. Liao, "Decomposition method with a variable parameter for a class of monotone variational inequality problems," Journal of Optimization Theory and Applications, vol. 109, no. 2, pp. 415-429, 2001.

Digital Library

[170]

T. White, Hadoop: The Definitive Guide. O'Reilly Press, second ed., 2010.

[171]

J. M. Wooldridge, Introductory Econometrics: A Modern Approach. South Western College Publications, fourth ed., 2009.

[172]

L. Xiao and S. Boyd, "Fast linear iterations for distributed averaging," Systems & Control Letters, vol. 53, no. 1, pp. 65-78, 2004.

[173]

A. Y. Yang, A. Ganesh, Z. Zhou, S. S. Sastry, and Y. Ma, "A Review of Fast l ₁-Minimization Algorithms for Robust Face Recognition," arXiv:1007.3753, 2010.

[174]

J. Yang and X. Yuan, "An inexact alternating direction method for trace norm regularized least squares problem," Available at http://www. optimization-online.org, 2010.

[175]

J. Yang and Y. Zhang, "Alternating direction algorithms for l ₁-problems in compressive sensing," Preprint, 2009.

[176]

W. Yin, S. Osher, D. Goldfarb, and J. Darbon, "Bregman iterative algorithms for l ₁-minimization with applications to compressed sensing," SIAM Journal on Imaging Sciences, vol. 1, no. 1, pp. 143-168, 2008.

Digital Library

[177]

M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49-67, 2006.

[178]

X. M. Yuan, "Alternating direction methods for sparse covariance selection," Preprint, Available at http://www.optimization-online.org/DB_ FILE/2009/09/2390.pdf, 2009.

[179]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster computing with working sets," in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010.

[180]

T. Zhang, "Statistical behavior and consistency of classification methods based on convex risk minimization," Annals of Statistics, vol. 32, no. 1, pp. 56-85, 2004.

[181]

P. Zhao, G. Rocha, and B. Yu, "The composite absolute penalties family for grouped and hierarchical variable selection," Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009.

[182]

H. Zhu, A. Cano, and G. B. Giannakis, "Distributed consensus-based demodulation: algorithms and error analysis," IEEE Transactions on Wireless Communications, vol. 9, no. 6, pp. 2044-2054, 2010.

Digital Library

[183]

H. Zhu, G. B. Giannakis, and A. Cano, "Distributed in-network channel decoding," IEEE Transactions on Signal Processing, vol. 57, no. 10, pp. 3970-3983, 2009.

Digital Library

Cited By

London PVardi SEghbali RWierman A(2024)Black-Box Acceleration of Monotone Convex Program SolversOperations Research10.1287/opre.2022.235272:2(796-815)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1287/opre.2022.2352
Zhu DZhao LZhang S(2024)A First-Order Primal-Dual Method for Nonconvex Constrained Optimization Based on the Augmented LagrangianMathematics of Operations Research10.1287/moor.2022.135049:1(125-150)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1287/moor.2022.1350
Krug RLeugering GMartin ASchmidt MWeninger D(2024)A Consensus-Based Alternating Direction Method for Mixed-Integer and PDE-Constrained Gas Transport ProblemsINFORMS Journal on Computing10.1287/ijoc.2022.031936:2(397-416)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1287/ijoc.2022.0319
Show More Cited By

Index Terms

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

Inexact Alternating Direction Methods of Multipliers with Logarithmic---Quadratic Proximal Regularization

In the literature, it was shown recently that the Douglas---Rachford alternating direction method of multipliers can be combined with the logarithmic-quadratic proximal regularization for solving a class of variational inequalities with separable ...
On Convergence Rates of Proximal Alternating Direction Method of Multipliers
Abstract
In this paper we consider from two different aspects the proximal alternating direction method of multipliers (ADMM) in Hilbert spaces. We first consider the application of the proximal ADMM to solve well-posed linearly constrained two-block ...
Inexact alternating direction methods of multipliers for separable convex optimization

Inexact alternating direction multiplier methods (ADMMs) are developed for solving general separable convex optimization problems with a linear constraint and with an objective that is the sum of smooth and nonsmooth terms. The approach involves ...

Comments

Information & Contributors

Information

Published In

cover image Foundations and Trends® in Machine Learning

Foundations and Trends® in Machine Learning Volume 3, Issue 1

January 2011

125 pages

ISSN:1935-8237

EISSN:1935-8245

Issue’s Table of Contents

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 01 January 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2,924
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

London PVardi SEghbali RWierman A(2024)Black-Box Acceleration of Monotone Convex Program SolversOperations Research10.1287/opre.2022.235272:2(796-815)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1287/opre.2022.2352
Zhu DZhao LZhang S(2024)A First-Order Primal-Dual Method for Nonconvex Constrained Optimization Based on the Augmented LagrangianMathematics of Operations Research10.1287/moor.2022.135049:1(125-150)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1287/moor.2022.1350
Krug RLeugering GMartin ASchmidt MWeninger D(2024)A Consensus-Based Alternating Direction Method for Mixed-Integer and PDE-Constrained Gas Transport ProblemsINFORMS Journal on Computing10.1287/ijoc.2022.031936:2(397-416)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1287/ijoc.2022.0319
Soh YKannan RSao PChoi J(2024)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673128(107-116)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673128
Ye JHong YSu XYuan XXu F(2024)Plug-and-Play Algorithms for Dynamic Non-line-of-sight ImagingACM Transactions on Graphics10.1145/366513943:5(1-12)Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3665139
Palmer DChern ASolomon J(2024)Lifting Directional Fields to Minimal SectionsACM Transactions on Graphics10.1145/365819843:4(1-20)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658198
Chen YHan YChen JZhang ZMcadams ATeran J(2024)Position-Based Nonlinear Gauss-Seidel for Quasistatic HyperelasticityACM Transactions on Graphics10.1145/365815443:4(1-15)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658154
Pirhadi AMoslemi MCloninger AMilani MSalimi B(2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654963
Chen L(2024)A Conditionally Positive Define Kernel Low-rank Subspace ClusteringProceedings of the International Conference on Computer Vision and Deep Learning10.1145/3653781.3653802(1-5)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3653781.3653802
Yang WGao HZou WLiu THuang SMa JGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Low-Light Image Enhancement via Weighted Low-Rank Tensor Regularized Retinex ModelProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658008(767-775)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658008
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents