article

Distributed coordinate descent for generalized linear models with regularization

Authors:

A. GenkinAuthors Info & Claims

Pattern Recognition and Image Analysis, Volume 27, Issue 2

Pages 349 - 364

https://doi.org/10.1134/S1054661817020122

Published: 01 April 2017 Publication History

Abstract

Generalized linear model with L1 and L2 regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.

References

[1]

A. Genkin, D. D. Lewis, and D. Madigan, "Largescale Bayesian logistic regression for text categorization," Technometrics 49 (3), 291---304 (2007).

[2]

H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D.Golovin,_S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica, "Ad click prediction: a view from the trenches," in Proc. 19th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Chicago, 2013), pp. 1222---1230.

Digital Library

[3]

G. X. Yuan, K. W. Chang, C. J. Hsieh, and C. J. Lin, "A comparison of optimization methods and software for large-scale l1-regularized linear classification," J. Mach. Learning Res. 11, 3183---3234 (2010).

Digital Library

[4]

J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin, "Parallel coordinate descent for l1-regularized loss minimization," in Proc. 28th Int. Conf. on Machine Learning (Washington, July 2011).

Digital Library

[5]

G. X. Yuan, C. H. Ho, and C. J. Lin, "Recent advances of large-scale linear classification," Proc. IEEE 100 (9), 2584---2603 (2012).

[6]

M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 68 (1), 49---67 (2006).

[7]

L. Meier, S. Van De Geer, and P. Buhlmann, "The group lasso for logistic regression," J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 70 (1), 53---71 (2008).

[8]

W. J. Fu, "Penalized regressions: the bridge versus the lasso," J. Comput. Graph. Stat 7 (3), 397---416 (1998).

[9]

J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its oracle properties," J. Am. Stat. Assoc. 96 (456), 1348---1360 (2001).

[10]

C. Sutton and A. McCallum, "An introduction to conditional random fields for relational learning," in Introduction to Statistical Relational Learning (2006), pp. 93---128.

[11]

H. F. Yu, F. L. Huang, and C. J. Lin, "Dual coordinate descent methods for logistic regression and maximum entropy models," Mach. Learn. 85 (1---2), 41---75 (2011).

Digital Library

[12]

N. Karampatziakis and J. Langford, Online importance weight aware updates (2010). arXiv:1011.1576.

[13]

J. Friedman, T. Hastie, and R. Tibshirani, "Regularization paths for generalized linear models via coordinate descent," J. Stat. Software 33 (1), 1 (2010).

[14]

G. X. Yuan, C. H. Ho, and C. J. Lin, "An improved glmnet for l1-regularized logistic regression," J. Mach. Learning Res. 13 (1), 1999---2030 (2012).

Digital Library

[15]

S. Balakrishnan and D. Madigan, "Algorithms for sparse linear classifiers in the massive data setting," J. Mach. Learning Res. 9, 313---337 (2008).

Digital Library

[16]

J. Langford, L. Li, and T. Zhang, "Sparse online learning via truncated gradient," in Advances in Neural Information Processing Systems (2009), pp. 905---912.

Digital Library

[17]

H. B. McMahan, "Follow-the-regularized-leader and mirror descent: equivalence theorems and L1 regularization," in Proc. AISTATS (Ft. Lauderdale, FL, 2011), pp. 525---533.

[18]

A. Agarwal, O. Chapelle, M. Dudik, and J. Langford, "A reliable effective terascale linear learning system," J. Mach. Learning Res. 15 (1), 1111---1133 (2014).

Digital Library

[19]

Z. Peng, M. Yan, and W. Yin, "Parallel and distributed sparse optimization," in Proc. Asilomar Conf. on Signals, Systems and Computers (Pacific Grove, 2013), pp. 659---646.

[20]

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, "Parallelized stochastic gradient descent," in Proc. Conf. on Advances in Neural Information Processing Systems (Vancouver, 2010), pp. 2595---2603.

Digital Library

[21]

Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, and E. P. Xing, "More effective distributed ml via a stale synchronous parallel parameter server," in Proc. Conf. on Advances in Neural Information Processing Systems (Lake Tahoe, 2013), pp. 1223---1231.

Digital Library

[22]

P. Richtárik and M. Takáă¿, "Parallel coordinate descent methods for big data optimization," Math.Programm. 156 (1), 433---484 (2015).

Digital Library

[23]

P. Tseng and S. Yun, "A coordinate gradient descent method for nonsmooth separable minimization," Math. Programm. 117 (1---2), 387---423 (2009).

Digital Library

[24]

J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," in Proc. OSDI'04 (San Francisco, 2004).

Digital Library

[25]

Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein, "Graphlab: a new framework for parallel machine learning," in Proc. 26th Conf. on Uncertainty in Artificial Intelligence (UAI2010) (Catalina Island, CA, 2010).

Digital Library

[26]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: cluster computing with working sets," HotCloud 10, 10 (2010).

Digital Library

[27]

A. Kumar, A. Beutel, Q. Ho, and E. P. Xing, Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data (2014).

[28]

A. Smola and S. Narayanamurthy, "An architecture for parallel topic models," Proc. VLDB Endowment 3 (1---2), 703---710 (2010).

Digital Library

[29]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Found. Trends Mach. Learn. 3 (1), 1---122 (2011).

Digital Library

[30]

M. A. Suchard, S. E. Simpson, I. Zorych, P. Ryan, and D. Madigan, "Massive parallelization of serial inference algorithms for a complex generalized linear model," ACM Trans. Model. Comput. Simulat. (TOMACS) 23 (1), 10 (2013).

Digital Library

[31]

J. Nocedal, "Updating quasi-Newton matrices with limited storage," Math. Comput. 35 (151), 773---782 (1980).

[32]

D. Jesse and G. Mark, "The relationship between precision-recall and ROC curves," in Proc. 23rd Int. Conf. on Machine Learning (Association for Computing Machinery, New York, 2006), pp. 233---240.

Digital Library

[33]

A. Beck and M. Teboulle, "A fast iterative shrinkagethresholding algorithm for linear inverse problems," SIAM J. Imag. Sci. 2 (1), 183---0202 (2009).

Digital Library

[34]

V. Smith, S. Forte, M. I. Jordan, and M. Jaggi, L1-regularized distributed optimization: a communicationeffcient primal-dual framework (2015). arXiv:1512.04011.

[35]

http://schools-wikipedia.org/wp/n/Normal_distribution.htm.

Cited By

Fan YLi JLin N(2023)Residual projection for quantile regression in vertically partitioned big dataData Mining and Knowledge Discovery10.1007/s10618-022-00914-437:2(710-735)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1007/s10618-022-00914-4
Khan JAlam AHussain JLee Y(2019)EnSWFApplied Intelligence10.1007/s10489-019-01425-449:8(3123-3145)Online publication date: 2-Aug-2019
https://dl.acm.org/doi/10.1007/s10489-019-01425-4

Distributed coordinate descent for generalized linear models with regularization
1. Computing methodologies

Recommendations

Fast image deconvolution using closed-form thresholding formulas of Lq(q=12,23) regularization

In this paper, we focus on the research of fast deconvolution algorithm based on the non-convex L"q(q=12,23) sparse regularization. Recently, we have deduced the closed-form thresholding formula for L"1"2 regularization model (Xu (2010) [1]). In this ...
$l_p$ Regularization for Ensemble Kalman Inversion

Ensemble Kalman inversion (EKI) is a derivative-free optimization method that lies between the deterministic and probabilistic approaches for inverse problems. EKI iterates the Kalman update of ensemble-based Kalman filters, whose ensemble converges to a ...
Nonlinear regularization techniques for seismic tomography

The effects of several nonlinear regularization techniques are discussed in the framework of 3D seismic tomography. Traditional, linear, @?"2 penalties are compared to so-called sparsity promoting @?"1 and @?"0 penalties, and a total variation penalty. ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition and Image Analysis

Pattern Recognition and Image Analysis Volume 27, Issue 2

April 2017

212 pages

ISSN:1054-6618

Issue’s Table of Contents

Copyright © Copyright © 2017 Pleiades Publishing, Ltd.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 April 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan YLi JLin N(2023)Residual projection for quantile regression in vertically partitioned big dataData Mining and Knowledge Discovery10.1007/s10618-022-00914-437:2(710-735)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1007/s10618-022-00914-4
Khan JAlam AHussain JLee Y(2019)EnSWFApplied Intelligence10.1007/s10489-019-01425-449:8(3123-3145)Online publication date: 2-Aug-2019
https://dl.acm.org/doi/10.1007/s10489-019-01425-4

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents