Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Published: 01 January 2011 Publication History

Abstract

Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ℓ1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

References

[1]
M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo, "Fast image recovery using variable splitting and constrained optimization," IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2345-2356, 2010.
[2]
M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo, "An Augmented Lagrangian Approach to the Constrained Optimization Formulation of Imaging Inverse Problems," IEEE Transactions on Image Processing, vol. 20, pp. 681-695, 2011.
[3]
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorenson, LAPACK: A portable linear algebra library for high-performance computers. IEEE Computing Society Press, 1990.
[4]
K. J. Arrow and G. Debreu, "Existence of an equilibrium for a competitive economy," Econometrica: Journal of the Econometric Society, vol. 22, no. 3, pp. 265-290, 1954.
[5]
K. J. Arrow, L. Hurwicz, and H. Uzawa, Studies in Linear and Nonlinear Programming. Stanford University Press: Stanford, 1958.
[6]
K. J. Arrow and R. M. Solow, "Gradient methods for constrained maxima, with weakened assumptions," in Studies in Linear and Nonlinear Programming, (K. J. Arrow, L. Hurwicz, and H. Uzawa, eds.), Stanford University Press: Stanford, 1958.
[7]
O. Banerjee, L. E. Ghaoui, and A. d'Aspremont, "Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data," Journal of Machine Learning Research, vol. 9, pp. 485-516, 2008.
[8]
P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe, "Convexity, classification, and risk bounds," Journal of the American Statistical Association, vol. 101, no. 473, pp. 138-156, 2006.
[9]
H. H. Bauschke and J. M. Borwein, "Dykstra's alternating projection algorithm for two sets," Journal of Approximation Theory, vol. 79, no. 3, pp. 418-443, 1994.
[10]
H. H. Bauschke and J. M. Borwein, "On projection algorithms for solving convex feasibility problems," SIAM Review, vol. 38, no. 3, pp. 367-426, 1996.
[11]
A. Beck and M. Teboulle, "A fast iterative shrinkage-thresholding algorithm for linear inverse problems," SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.
[12]
S. Becker, J. Bobin, and E. J. Candès, "NESTA: A fast and accurate first-order method for sparse recovery," Available at http://www.acm.caltech. edu/~emmanuel/papers/NESTA.pdf, 2009.
[13]
J. F. Benders, "Partitioning procedures for solving mixed-variables programming problems," Numerische Mathematik, vol. 4, pp. 238-252, 1962.
[14]
A. Bensoussan, J.-L. Lions, and R. Temam, "Sur les méthodes de décomposition, de décentralisation et de coordination et applications," Methodes Mathematiques de l'Informatique, pp. 133-257, 1976.
[15]
D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 1982.
[16]
D. P. Bertsekas, Nonlinear Programming. Athena Scientific, second ed., 1999.
[17]
D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Prentice Hall, 1989.
[18]
J. M. Bioucas-Dias and M. A. T. Figueiredo, "Alternating Direction Algorithms for Constrained Sparse Regression: Application to Hyperspectral Unmixing," arXiv:1002.4527, 2010.
[19]
J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples. Canadian Mathematical Society, 2000.
[20]
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[21]
L. M. Bregman, "Finding the common point of convex sets by the method of successive projections," Proceedings of the USSR Academy of Sciences, vol. 162, no. 3, pp. 487-490, 1965.
[22]
L. M. Bregman, "The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming," USSR Computational Mathematics and Mathematical Physics, vol. 7, no. 3, pp. 200-217, 1967.
[23]
H. Brézis, Opérateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de Hilbert. North-Holland: Amsterdam, 1973.
[24]
A. M. Bruckstein, D. L. Donoho, and M. Elad, "From sparse solutions of systems of equations to sparse modeling of signals and images," SIAM Review, vol. 51, no. 1, pp. 34-81, 2009.
[25]
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, "HaLoop: Efficient Iterative Data Processing on Large Clusters," Proceedings of the 36th International Conference on Very Large Databases, 2010.
[26]
R. H. Byrd, P. Lu, and J. Nocedal, "A Limited Memory Algorithm for Bound Constrained Optimization," SIAM Journal on Scientific and Statistical Computing, vol. 16, no. 5, pp. 1190-1208, 1995.
[27]
E. J. Candès and Y. Plan, "Near-ideal model selection by l 1 minimization," Annals of Statistics, vol. 37, no. 5A, pp. 2145-2177, 2009.
[28]
E. J. Candès, J. Romberg, and T. Tao, "Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information," IEEE Transactions on Information Theory, vol. 52, no. 2, p. 489, 2006.
[29]
E. J. Candès and T. Tao, "Near-optimal signal recovery from random projections: Universal encoding strategies?," IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406-5425, 2006.
[30]
Y. Censor and S. A. Zenios, "Proximal minimization algorithm with D- functions," Journal of Optimization Theory and Applications, vol. 73, no. 3, pp. 451-464, 1992.
[31]
Y. Censor and S. A. Zenios, Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, 1997.
[32]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, "BigTable: A distributed storage system for structured data," ACM Transactions on Computer Systems, vol. 26, no. 2, pp. 1-26, 2008.
[33]
G. Chen and M. Teboulle, "A proximal-based decomposition method for convex minimization problems," Mathematical Programming, vol. 64, pp. 81-101, 1994.
[34]
S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Review, vol. 43, pp. 129-159, 2001.
[35]
Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, "Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate," ACM Transactions on Mathematical Software, vol. 35, no. 3, p. 22, 2008.
[36]
W. Cheney and A. A. Goldstein, "Proximity maps for convex sets," Proceedings of the American Mathematical Society, vol. 10, no. 3, pp. 448-450, 1959.
[37]
C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun, "MapReduce for machine learning on multicore," in Advances in Neural Information Processing Systems, 2007.
[38]
J. F. Claerbout and F. Muir, "Robust modeling with erratic data," Geophysics, vol. 38, p. 826, 1973.
[39]
P. L. Combettes, "The convex feasibility problem in image recovery," Advances in Imaging and Electron Physics, vol. 95, pp. 155-270, 1996.
[40]
P. L. Combettes and J. C. Pesquet, "A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery," IEEE Journal on Selected Topics in Signal Processing, vol. 1, no. 4, pp. 564-574, 2007.
[41]
P. L. Combettes and J. C. Pesquet, "Proximal Splitting Methods in Signal Processing," arXiv:0912.3522, 2009.
[42]
P. L. Combettes and V. R. Wajs, "Signal recovery by proximal forward-backward splitting," Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168-1200, 2006.
[43]
G. B. Dantzig, Linear Programming and Extensions. RAND Corporation, 1963.
[44]
G. B. Dantzig and P. Wolfe, "Decomposition principle for linear programs," Operations Research, vol. 8, pp. 101-111, 1960.
[45]
I. Daubechies, M. Defrise, and C. D. Mol, "An iterative thresholding algorithm for linear inverse problems with a sparsity constraint," Communications on Pure and Applied Mathematics, vol. 57, pp. 1413-1457, 2004.
[46]
J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[47]
J. W. Demmel, Applied Numerical Linear Algebra. SIAM: Philadelphia, PA, 1997.
[48]
A. P. Dempster, "Covariance selection," Biometrics, vol. 28, no. 1, pp. 157-175, 1972.
[49]
D. L. Donoho, "De-noising by soft-thresholding," IEEE Transactions on Information Theory, vol. 41, pp. 613-627, 1995.
[50]
D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.
[51]
D. L. Donoho, A. Maleki, and A. Montanari, "Message-passing algorithms for compressed sensing," Proceedings of the National Academy of Sciences, vol. 106, no. 45, p. 18914, 2009.
[52]
D. L. Donoho and Y. Tsaig, "Fast solution of l 1-norm minimization problems when the solution may be sparse," Tech. Rep., Stanford University, 2006.
[53]
J. Douglas and H. H. Rachford, "On the numerical solution of heat conduction problems in two and three space variables," Transactions of the American Mathematical Society, vol. 82, pp. 421-439, 1956.
[54]
J. C. Duchi, A. Agarwal, and M. J. Wainwright, "Distributed Dual Averaging in Networks," in Advances in Neural Information Processing Systems, 2010.
[55]
J. C. Duchi, S. Gould, and D. Koller, "Projected subgradient methods for learning sparse Gaussians," in Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2008.
[56]
R. L. Dykstra, "An algorithm for restricted least squares regression," Journal of the American Statistical Association, vol. 78, pp. 837-842, 1983.
[57]
J. Eckstein, Splitting methods for monotone operators with applications to parallel optimization. PhD thesis, MIT, 1989.
[58]
J. Eckstein, "Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming," Mathematics of Operations Research, pp. 202-226, 1993.
[59]
J. Eckstein, "Parallel alternating direction multiplier decomposition of convex programs," Journal of Optimization Theory and Applications, vol. 80, no. 1, pp. 39-62, 1994.
[60]
J. Eckstein, "Some saddle-function splitting methods for convex programming," Optimization Methods and Software, vol. 4, no. 1, pp. 75-83, 1994.
[61]
J. Eckstein, "A practical general approximation criterion for methods of multipliers based on Bregman distances," Mathematical Programming, vol. 96, no. 1, pp. 61-86, 2003.
[62]
J. Eckstein and D. P. Bertsekas, "An alternating direction method for linear programming," Tech. Rep., MIT, 1990.
[63]
J. Eckstein and D. P. Bertsekas, "On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators," Mathematical Programming, vol. 55, pp. 293-318, 1992.
[64]
J. Eckstein and M. C. Ferris, "Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control," INFORMS Journal on Computing, vol. 10, pp. 218-235, 1998.
[65]
J. Eckstein and M. Fukushima, "Some reformulations and applications of the alternating direction method of multipliers," Large Scale Optimization: State of the Art, pp. 119-138, 1993.
[66]
J. Eckstein and B. F. Svaiter, "A family of projective splitting methods for the sum of two maximal monotone operators," Mathematical Programming, vol. 111, no. 1-2, p. 173, 2008.
[67]
J. Eckstein and B. F. Svaiter, "General projective splitting methods for sums of maximal monotone operators," SIAM Journal on Control and Optimization, vol. 48, pp. 787-811, 2009.
[68]
E. Esser, "Applications of Lagrangian-based alternating direction methods and connections to split Bregman," CAM report, vol. 9, p. 31, 2009.
[69]
H. Everett, "Generalized Lagrange multiplier method for solving problems of optimum allocation of resources," Operations Research, vol. 11, no. 3, pp. 399-417, 1963.
[70]
M. J. Fadili and J. L. Starck, "Monotone operator splitting for optimization problems in sparse recovery," IEEE ICIP, 2009.
[71]
A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Society for Industrial and Applied Mathematics, 1990. First published in 1968 by Research Analysis Corporation.
[72]
M. A. T. Figueiredo and J. M. Bioucas-Dias, "Restoration of Poissonian Images Using Alternating Direction Optimization," IEEE Transactions on Image Processing, vol. 19, pp. 3133-3145, 2010.
[73]
M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, "Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems," IEEE Journal on Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586-597, 2007.
[74]
P. A. Forero, A. Cano, and G. B. Giannakis, "Consensus-based distributed support vector machines," Journal of Machine Learning Research, vol. 11, pp. 1663-1707, 2010.
[75]
M. Fortin and R. Glowinski, Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland: Amsterdam, 1983.
[76]
M. Fortin and R. Glowinski, "On decomposition-coordination methods using an augmented Lagrangian," in Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems, (M. Fortin and R. Glowinski, eds.), North-Holland: Amsterdam, 1983.
[77]
M. Forum, MPI: A Message-Passing Interface Standard, version 2.2. High-Performance Computing Center: Stuttgart, 2009.
[78]
Y. Freund and R. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," in Computational Learning Theory, pp. 23-37, Springer, 1995.
[79]
J. Friedman, T. Hastie, and R. Tibshirani, "Sparse inverse covariance estimation with the graphical lasso," Biostatistics, vol. 9, no. 3, p. 432, 2008.
[80]
M. Fukushima, "Application of the alternating direction method of multipliers to separable convex programming problems," Computational Optimization and Applications, vol. 1, pp. 93-111, 1992.
[81]
D. Gabay, "Applications of the method of multipliers to variational inequalities," in Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems, (M. Fortin and R. Glowinski, eds.), North-Holland: Amsterdam, 1983.
[82]
D. Gabay and B. Mercier, "A dual algorithm for the solution of nonlinear variational problems via finite element approximations," Computers and Mathematics with Applications, vol. 2, pp. 17-40, 1976.
[83]
M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, and F. Rossi, GNU Scientific Library Reference Manual. Network Theory Ltd., third ed., 2002.
[84]
A. M. Geoffrion, "Generalized Benders decomposition," Journal of Optimization Theory and Applications, vol. 10, no. 4, pp. 237-260, 1972.
[85]
S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google file system," ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29-43, 2003.
[86]
R. Glowinski and A. Marrocco, "Sur l'approximation, par elements finis d'ordre un, et la resolution, par penalisation-dualité, d'une classe de problems de Dirichlet non lineares," Revue Française d'Automatique, Informatique, et Recherche Opérationelle, vol. 9, pp. 41-76, 1975.
[87]
R. Glowinski and P. L. Tallec, "Augmented Lagrangian methods for the solution of variational problems," Tech. Rep. 2965, University of Wisconsin-Madison, 1987.
[88]
T. Goldstein and S. Osher, "The split Bregman method for l 1 regularized problems," SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 323-343, 2009.
[89]
E. G. Gol'shtein and N. V. Tret'yakov, "Modified Lagrangians in convex programming and their generalizations," Point-to-Set Maps and Mathematical Programming, pp. 86-97, 1979.
[90]
G. H. Golub and C. F. van Loan, Matrix Computations. Johns Hopkins University Press, third ed., 1996.
[91]
D. Gregor and A. Lumsdaine, "The Parallel BGL: A generic library for distributed graph computations," Parallel Object-Oriented Scientific Computing, 2005.
[92]
A. Halevy, P. Norvig, and F. Pereira, "The Unreasonable Effectiveness of Data," IEEE Intelligent Systems, vol. 24, no. 2, 2009.
[93]
K. B. Hall, S. Gilpin, and G. Mann, "MapReduce/BigTable for distributed optimization," in Neural Information Processing Systems: Workshop on Learning on Cores, Clusters, and Clouds, 2010.
[94]
T. Hastie and R. Tibshirani, Generalized Additive Models. Chapman & Hall, 1990.
[95]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, second ed., 2009.
[96]
B. S. He, H. Yang, and S. L. Wang, "Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities," Journal of Optimization Theory and Applications, vol. 106, no. 2, pp. 337-356, 2000.
[97]
M. R. Hestenes, "Multiplier and gradient methods," Journal of Optimization Theory and Applications, vol. 4, pp. 302-320, 1969.
[98]
M. R. Hestenes, "Multiplier and gradient methods," in Computing Methods in Optimization Problems, (L. A. Zadeh, L. W. Neustadt, and A. V. Balakrishnan, eds.), Academic Press, 1969.
[99]
J.-B. Hiriart-Urruty and C. Lemaréchal, Fundamentals of Convex Analysis. Springer, 2001.
[100]
P. J. Huber, "Robust estimation of a location parameter," Annals of Mathematical Statistics, vol. 35, pp. 73-101, 1964.
[101]
S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, "l 1 Trend filtering," SIAM Review, vol. 51, no. 2, pp. 339-360, 2009.
[102]
S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, "An interior-point method for large-scale l 1-regularized least squares," IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 606-617, 2007.
[103]
K. Koh, S.-J. Kim, and S. Boyd, "An interior-point method for large-scale l 1- regularized logistic regression," Journal of Machine Learning Research, vol. 1, no. 8, pp. 1519-1555, 2007.
[104]
D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[105]
S. A. Kontogiorgis, Alternating directions methods for the parallel solution of large-scale block-structured optimization problems. PhD thesis, University of Wisconsin-Madison, 1994.
[106]
S. A. Kontogiorgis and R. R. Meyer, "A variable-penalty alternating directions method for convex optimization," Mathematical Programming, vol. 83, pp. 29-53, 1998.
[107]
L. S. Lasdon, Optimization Theory for Large Systems. MacMillan, 1970.
[108]
J. Lawrence and J. E. Spingarn, "On fixed points of non-expansive piecewise isometric mappings," Proceedings of the London Mathematical Society, vol. 3, no. 3, p. 605, 1987.
[109]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, "Basic linear algebra subprograms for Fortran usage," ACM Transactions on Mathematical Software, vol. 5, no. 3, pp. 308-323, 1979.
[110]
D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," Advances in Neural Information Processing Systems, vol. 13, 2001.
[111]
J. Lin and M. Schatz, "Design Patterns for Efficient Graph Algorithms in MapReduce," in Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pp. 78-85, 2010.
[112]
P. L. Lions and B. Mercier, "Splitting algorithms for the sum of two nonlinear operators," SIAM Journal on Numerical Analysis, vol. 16, pp. 964-979, 1979.
[113]
D. C. Liu and J. Nocedal, "On the Limited Memory Method for Large Scale Optimization," Mathematical Programming B, vol. 45, no. 3, pp. 503-528, 1989.
[114]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, "GraphLab: A New Parallel Framework for Machine Learning," in Conference on Uncertainty in Artificial Intelligence, 2010.
[115]
Z. Lu, "Smooth optimization approach for sparse covariance selection," SIAM Journal on Optimization, vol. 19, no. 4, pp. 1807-1827, 2009.
[116]
Z. Lu, T. K. Pong, and Y. Zhang, "An Alternating Direction Method for Finding Dantzig Selectors," arXiv:1011.4604, 2010.
[117]
D. G. Luenberger, Introduction to Linear and Nonlinear Programming. Addison-Wesley: Reading, MA, 1973.
[118]
J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Network flow algorithms for structured sparsity," Advances in Neural Information Processing Systems, vol. 24, 2010.
[119]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, "Pregel: A system for large-scale graph processing," in Proceedings of the 2010 International Conference on Management of Data, pp. 135-146, 2010.
[120]
A. F. T. Martins, M. A. T. Figueiredo, P. M. Q. Aguiar, N. A. Smith, and E. P. Xing, "An Augmented Lagrangian Approach to Constrained MAP Inference," in International Conference on Machine Learning, 2011.
[121]
G. Mateos, J.-A. Bazerque, and G. B. Giannakis, "Distributed sparse linear regression," IEEE Transactions on Signal Processing, vol. 58, pp. 5262-5276, Oct. 2010.
[122]
P. J. McCullagh and J. A. Nelder, Generalized Linear Models. Chapman & Hall, 1991.
[123]
N. Meinshausen and P. Bühlmann, "High-dimensional graphs and variable selection with the lasso," Annals of Statistics, vol. 34, no. 3, pp. 1436-1462, 2006.
[124]
A. Miele, E. E. Cragg, R. R. Iver, and A. V. Levy, "Use of the augmented penalty function in mathematical programming problems, part 1," Journal of Optimization Theory and Applications, vol. 8, pp. 115-130, 1971.
[125]
A. Miele, E. E. Cragg, and A. V. Levy, "Use of the augmented penalty function in mathematical programming problems, part 2," Journal of Optimization Theory and Applications, vol. 8, pp. 131-153, 1971.
[126]
A. Miele, P. E. Mosely, A. V. Levy, and G. M. Coggins, "On the method of multipliers for mathematical programming problems," Journal of Optimization Theory and Applications, vol. 10, pp. 1-33, 1972.
[127]
J.-J. Moreau, "Fonctions convexes duales et points proximaux dans un espace Hilbertien," Reports of the Paris Academy of Sciences, Series A, vol. 255, pp. 2897-2899, 1962.
[128]
D. Mosk-Aoyama, T. Roughgarden, and D. Shah, "Fully distributed algorithms for convex optimization problems," Available at http://theory.stanford.edu/~tim/papers/distrcvxopt.pdf, 2007.
[129]
I. Necoara and J. A. K. Suykens, "Application of a smoothing technique to decomposition in convex optimization," IEEE Transactions on Automatic Control, vol. 53, no. 11, pp. 2674-2679, 2008.
[130]
A. Nedic and A. Ozdaglar, "Distributed subgradient methods for multiagent optimization," IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48-61, 2009.
[131]
A. Nedic and A. Ozdaglar, "Cooperative distributed multi-agent optimization," in Convex Optimization in Signal Processing and Communications, (D. P. Palomar and Y. C. Eldar, eds.), Cambridge University Press, 2010.
[132]
Y. Nesterov, "A method of solving a convex programming problem with convergence rate O(1/k 2," Soviet Mathematics Doklady, vol. 27, no. 2, pp. 372-376, 1983.
[133]
Y. Nesterov, "Gradient methods for minimizing composite objective function," CORE Discussion Paper, Catholic University of Louvain, vol. 76, p. 2007, 2007.
[134]
M. Ng, P. Weiss, and X. Yuang, "Solving Constrained Total-Variation Image Restoration and Reconstruction Problems via Alternating Direction Methods," ICM Research Report, Available at http://www.optimization-online. org/DB_FILE/2009/10/2434.pdf, 2009.
[135]
J. Nocedal and S. J. Wright, Numerical Optimization. Springer-Verlag, 1999.
[136]
H. Ohlsson, L. Ljung, and S. Boyd, "Segmentation of ARX-models using sum-of-norms regularization," Automatica, vol. 46, pp. 1107-1111, 2010.
[137]
D. W. Peaceman and H. H. Rachford, "The numerical solution of parabolic and elliptic differential equations," Journal of the Society for Industrial and Applied Mathematics, vol. 3, pp. 28-41, 1955.
[138]
M. J. D. Powell, "A method for nonlinear constraints in minimization problems," in Optimization, (R. Fletcher, ed.), Academic Press, 1969.
[139]
A. Ribeiro, I. Schizas, S. Roumeliotis, and G. Giannakis, "Kalman filtering in wireless sensor networks -- Incorporating communication cost in state estimation problems," IEEE Control Systems Magazine, vol. 30, pp. 66-86, Apr. 2010.
[140]
R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970.
[141]
R. T. Rockafellar, "Augmented Lagrangians and applications of the proximal point algorithm in convex programming," Mathematics of Operations Research, vol. 1, pp. 97-116, 1976.
[142]
R. T. Rockafellar, "Monotone operators and the proximal point algorithm," SIAM Journal on Control and Optimization, vol. 14, p. 877, 1976.
[143]
R. T. Rockafellar and R. J.-B. Wets, "Scenarios and policy aggregation in optimization under uncertainty," Mathematics of Operations Research, vol. 16, no. 1, pp. 119-147, 1991.
[144]
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis. Springer-Verlag, 1998.
[145]
L. Rudin, S. J. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms," Physica D, vol. 60, pp. 259-268, 1992.
[146]
A. Ruszczynski, "An augmented Lagrangian decomposition method for block diagonal linear programming problems," Operations Research Letters, vol. 8, no. 5, pp. 287-294, 1989.
[147]
A. Ruszczynski, "On convergence of an augmented Lagrangian decomposition method for sparse convex optimization," Mathematics of Operations Research, vol. 20, no. 3, pp. 634-656, 1995.
[148]
K. Scheinberg, S. Ma, and D. Goldfarb, "Sparse inverse covariance selection via alternating linearization methods," in Advances in Neural Information Processing Systems, 2010.
[149]
I. D. Schizas, G. Giannakis, S. Roumeliotis, and A. Ribeiro, "Consensus in ad hoc WSNs with noisy links -- part II: Distributed estimation and smoothing of random signals," IEEE Transactions on Signal Processing, vol. 56, pp. 1650-1666, Apr. 2008.
[150]
I. D. Schizas, A. Ribeiro, and G. B. Giannakis, "Consensus in ad hoc WSNs with noisy links -- part I: Distributed estimation of deterministic signals," IEEE Transactions on Signal Processing, vol. 56, pp. 350-364, Jan. 2008.
[151]
B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[152]
N. Z. Shor, Minimization Methods for Non-Differentiable Functions. Springer-Verlag, 1985.
[153]
J. E. Spingarn, "Applications of the method of partial inverses to convex programming: decomposition," Mathematical Programming, vol. 32, pp. 199-223, 1985.
[154]
G. Steidl and T. Teuber, "Removing multiplicative noise by Douglas-Rachford splitting methods," Journal of Mathematical Imaging and Vision, vol. 36, no. 2, pp. 168-184, 2010.
[155]
C. H. Teo, S. V. N. Vishwanathan, A. J. Smola, and Q. V. Le, "Bundle methods for regularized risk minimization," Journal of Machine Learning Research, vol. 11, pp. 311-365, 2010.
[156]
R. Tibshirani, "Regression shrinkage and selection via the lasso," Journal of the Royal Statistical Society, Series B, vol. 58, pp. 267-288, 1996.
[157]
P. Tseng, "Applications of a splitting algorithm to decomposition in convex programming and variational inequalities.," SIAM Journal on Control and Optimization, vol. 29, pp. 119-138, 1991.
[158]
P. Tseng, "Alternating projection-proximal methods for convex programming and variational inequalities," SIAM Journal on Optimization, vol. 7, pp. 951-965, 1997.
[159]
P. Tseng, "A modified forward-backward splitting method for maximal monotone mappings," SIAM Journal on Control and Optimization, vol. 38, p. 431, 2000.
[160]
J. N. Tsitsiklis, Problems in decentralized decision making and computation. PhD thesis, Massachusetts Institute of Technology, 1984.
[161]
J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, 1986.
[162]
H. Uzawa, "Market mechanisms and mathematical programming," Econometrica: Journal of the Econometric Society, vol. 28, no. 4, pp. 872-881, 1960.
[163]
H. Uzawa, "Walras' tâtonnement in the theory of exchange," The Review of Economic Studies, vol. 27, no. 3, pp. 182-194, 1960.
[164]
L. G. Valiant, "A bridging model for parallel computation," Communications of the ACM, vol. 33, no. 8, p. 111, 1990.
[165]
V. N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 2000.
[166]
J. von Neumann, Functional Operators, Volume 2: The Geometry of Orthogonal Spaces. Princeton University Press: Annals of Mathematics Studies, 1950. Reprint of 1933 lecture notes.
[167]
M. J. Wainwright and M. I. Jordan, "Graphical models, exponential families, and variational inference," Foundations and Trends in Machine Learning, vol. 1, no. 1-2, pp. 1-305, 2008.
[168]
L. Walras, Éléments d'économie politique pure, ou, Théorie de la richesse sociale. F. Rouge, 1896.
[169]
S. L. Wang and L. Z. Liao, "Decomposition method with a variable parameter for a class of monotone variational inequality problems," Journal of Optimization Theory and Applications, vol. 109, no. 2, pp. 415-429, 2001.
[170]
T. White, Hadoop: The Definitive Guide. O'Reilly Press, second ed., 2010.
[171]
J. M. Wooldridge, Introductory Econometrics: A Modern Approach. South Western College Publications, fourth ed., 2009.
[172]
L. Xiao and S. Boyd, "Fast linear iterations for distributed averaging," Systems & Control Letters, vol. 53, no. 1, pp. 65-78, 2004.
[173]
A. Y. Yang, A. Ganesh, Z. Zhou, S. S. Sastry, and Y. Ma, "A Review of Fast l 1-Minimization Algorithms for Robust Face Recognition," arXiv:1007.3753, 2010.
[174]
J. Yang and X. Yuan, "An inexact alternating direction method for trace norm regularized least squares problem," Available at http://www. optimization-online.org, 2010.
[175]
J. Yang and Y. Zhang, "Alternating direction algorithms for l 1-problems in compressive sensing," Preprint, 2009.
[176]
W. Yin, S. Osher, D. Goldfarb, and J. Darbon, "Bregman iterative algorithms for l 1-minimization with applications to compressed sensing," SIAM Journal on Imaging Sciences, vol. 1, no. 1, pp. 143-168, 2008.
[177]
M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49-67, 2006.
[178]
X. M. Yuan, "Alternating direction methods for sparse covariance selection," Preprint, Available at http://www.optimization-online.org/DB_ FILE/2009/09/2390.pdf, 2009.
[179]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster computing with working sets," in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010.
[180]
T. Zhang, "Statistical behavior and consistency of classification methods based on convex risk minimization," Annals of Statistics, vol. 32, no. 1, pp. 56-85, 2004.
[181]
P. Zhao, G. Rocha, and B. Yu, "The composite absolute penalties family for grouped and hierarchical variable selection," Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009.
[182]
H. Zhu, A. Cano, and G. B. Giannakis, "Distributed consensus-based demodulation: algorithms and error analysis," IEEE Transactions on Wireless Communications, vol. 9, no. 6, pp. 2044-2054, 2010.
[183]
H. Zhu, G. B. Giannakis, and A. Cano, "Distributed in-network channel decoding," IEEE Transactions on Signal Processing, vol. 57, no. 10, pp. 3970-3983, 2009.

Cited By

View all
  • (2024)Black-Box Acceleration of Monotone Convex Program SolversOperations Research10.1287/opre.2022.235272:2(796-815)Online publication date: 1-Mar-2024
  • (2024)A First-Order Primal-Dual Method for Nonconvex Constrained Optimization Based on the Augmented LagrangianMathematics of Operations Research10.1287/moor.2022.135049:1(125-150)Online publication date: 1-Feb-2024
  • (2024)A Consensus-Based Alternating Direction Method for Mixed-Integer and PDE-Constrained Gas Transport ProblemsINFORMS Journal on Computing10.1287/ijoc.2022.031936:2(397-416)Online publication date: 1-Mar-2024
  • Show More Cited By

Index Terms

  1. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Foundations and Trends® in Machine Learning
      Foundations and Trends® in Machine Learning  Volume 3, Issue 1
      January 2011
      125 pages
      ISSN:1935-8237
      EISSN:1935-8245
      Issue’s Table of Contents

      Publisher

      Now Publishers Inc.

      Hanover, MA, United States

      Publication History

      Published: 01 January 2011

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 02 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Black-Box Acceleration of Monotone Convex Program SolversOperations Research10.1287/opre.2022.235272:2(796-815)Online publication date: 1-Mar-2024
      • (2024)A First-Order Primal-Dual Method for Nonconvex Constrained Optimization Based on the Augmented LagrangianMathematics of Operations Research10.1287/moor.2022.135049:1(125-150)Online publication date: 1-Feb-2024
      • (2024)A Consensus-Based Alternating Direction Method for Mixed-Integer and PDE-Constrained Gas Transport ProblemsINFORMS Journal on Computing10.1287/ijoc.2022.031936:2(397-416)Online publication date: 1-Mar-2024
      • (2024)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673128(107-116)Online publication date: 12-Aug-2024
      • (2024)Plug-and-Play Algorithms for Dynamic Non-line-of-sight ImagingACM Transactions on Graphics10.1145/366513943:5(1-12)Online publication date: 25-Jun-2024
      • (2024)Lifting Directional Fields to Minimal SectionsACM Transactions on Graphics10.1145/365819843:4(1-20)Online publication date: 19-Jul-2024
      • (2024)Position-Based Nonlinear Gauss-Seidel for Quasistatic HyperelasticityACM Transactions on Graphics10.1145/365815443:4(1-15)Online publication date: 19-Jul-2024
      • (2024)OTClean: Data Cleaning for Conditional Independence Violations using Optimal TransportProceedings of the ACM on Management of Data10.1145/36549632:3(1-26)Online publication date: 30-May-2024
      • (2024)A Conditionally Positive Define Kernel Low-rank Subspace ClusteringProceedings of the International Conference on Computer Vision and Deep Learning10.1145/3653781.3653802(1-5)Online publication date: 19-Jan-2024
      • (2024)Low-Light Image Enhancement via Weighted Low-Rank Tensor Regularized Retinex ModelProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658008(767-775)Online publication date: 30-May-2024
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media