Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Distributed coordinate descent for generalized linear models with regularization

Published: 01 April 2017 Publication History

Abstract

Generalized linear model with L1 and L2 regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.

References

[1]
A. Genkin, D. D. Lewis, and D. Madigan, "Largescale Bayesian logistic regression for text categorization," Technometrics 49 (3), 291---304 (2007).
[2]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D.Golovin,_S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica, "Ad click prediction: a view from the trenches," in Proc. 19th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Chicago, 2013), pp. 1222---1230.
[3]
G. X. Yuan, K. W. Chang, C. J. Hsieh, and C. J. Lin, "A comparison of optimization methods and software for large-scale l1-regularized linear classification," J. Mach. Learning Res. 11, 3183---3234 (2010).
[4]
J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin, "Parallel coordinate descent for l1-regularized loss minimization," in Proc. 28th Int. Conf. on Machine Learning (Washington, July 2011).
[5]
G. X. Yuan, C. H. Ho, and C. J. Lin, "Recent advances of large-scale linear classification," Proc. IEEE 100 (9), 2584---2603 (2012).
[6]
M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 68 (1), 49---67 (2006).
[7]
L. Meier, S. Van De Geer, and P. Buhlmann, "The group lasso for logistic regression," J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 70 (1), 53---71 (2008).
[8]
W. J. Fu, "Penalized regressions: the bridge versus the lasso," J. Comput. Graph. Stat 7 (3), 397---416 (1998).
[9]
J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its oracle properties," J. Am. Stat. Assoc. 96 (456), 1348---1360 (2001).
[10]
C. Sutton and A. McCallum, "An introduction to conditional random fields for relational learning," in Introduction to Statistical Relational Learning (2006), pp. 93---128.
[11]
H. F. Yu, F. L. Huang, and C. J. Lin, "Dual coordinate descent methods for logistic regression and maximum entropy models," Mach. Learn. 85 (1---2), 41---75 (2011).
[12]
N. Karampatziakis and J. Langford, Online importance weight aware updates (2010). arXiv:1011.1576.
[13]
J. Friedman, T. Hastie, and R. Tibshirani, "Regularization paths for generalized linear models via coordinate descent," J. Stat. Software 33 (1), 1 (2010).
[14]
G. X. Yuan, C. H. Ho, and C. J. Lin, "An improved glmnet for l1-regularized logistic regression," J. Mach. Learning Res. 13 (1), 1999---2030 (2012).
[15]
S. Balakrishnan and D. Madigan, "Algorithms for sparse linear classifiers in the massive data setting," J. Mach. Learning Res. 9, 313---337 (2008).
[16]
J. Langford, L. Li, and T. Zhang, "Sparse online learning via truncated gradient," in Advances in Neural Information Processing Systems (2009), pp. 905---912.
[17]
H. B. McMahan, "Follow-the-regularized-leader and mirror descent: equivalence theorems and L1 regularization," in Proc. AISTATS (Ft. Lauderdale, FL, 2011), pp. 525---533.
[18]
A. Agarwal, O. Chapelle, M. Dudik, and J. Langford, "A reliable effective terascale linear learning system," J. Mach. Learning Res. 15 (1), 1111---1133 (2014).
[19]
Z. Peng, M. Yan, and W. Yin, "Parallel and distributed sparse optimization," in Proc. Asilomar Conf. on Signals, Systems and Computers (Pacific Grove, 2013), pp. 659---646.
[20]
M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, "Parallelized stochastic gradient descent," in Proc. Conf. on Advances in Neural Information Processing Systems (Vancouver, 2010), pp. 2595---2603.
[21]
Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, and E. P. Xing, "More effective distributed ml via a stale synchronous parallel parameter server," in Proc. Conf. on Advances in Neural Information Processing Systems (Lake Tahoe, 2013), pp. 1223---1231.
[22]
P. Richtárik and M. Takáă¿, "Parallel coordinate descent methods for big data optimization," Math.Programm. 156 (1), 433---484 (2015).
[23]
P. Tseng and S. Yun, "A coordinate gradient descent method for nonsmooth separable minimization," Math. Programm. 117 (1---2), 387---423 (2009).
[24]
J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," in Proc. OSDI'04 (San Francisco, 2004).
[25]
Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein, "Graphlab: a new framework for parallel machine learning," in Proc. 26th Conf. on Uncertainty in Artificial Intelligence (UAI2010) (Catalina Island, CA, 2010).
[26]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: cluster computing with working sets," HotCloud 10, 10 (2010).
[27]
A. Kumar, A. Beutel, Q. Ho, and E. P. Xing, Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data (2014).
[28]
A. Smola and S. Narayanamurthy, "An architecture for parallel topic models," Proc. VLDB Endowment 3 (1---2), 703---710 (2010).
[29]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Found. Trends Mach. Learn. 3 (1), 1---122 (2011).
[30]
M. A. Suchard, S. E. Simpson, I. Zorych, P. Ryan, and D. Madigan, "Massive parallelization of serial inference algorithms for a complex generalized linear model," ACM Trans. Model. Comput. Simulat. (TOMACS) 23 (1), 10 (2013).
[31]
J. Nocedal, "Updating quasi-Newton matrices with limited storage," Math. Comput. 35 (151), 773---782 (1980).
[32]
D. Jesse and G. Mark, "The relationship between precision-recall and ROC curves," in Proc. 23rd Int. Conf. on Machine Learning (Association for Computing Machinery, New York, 2006), pp. 233---240.
[33]
A. Beck and M. Teboulle, "A fast iterative shrinkagethresholding algorithm for linear inverse problems," SIAM J. Imag. Sci. 2 (1), 183---0202 (2009).
[34]
V. Smith, S. Forte, M. I. Jordan, and M. Jaggi, L1-regularized distributed optimization: a communicationeffcient primal-dual framework (2015). arXiv:1512.04011.
[35]
http://schools-wikipedia.org/wp/n/Normal_distribution.htm.

Cited By

View all
  1. Distributed coordinate descent for generalized linear models with regularization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Pattern Recognition and Image Analysis
    Pattern Recognition and Image Analysis  Volume 27, Issue 2
    April 2017
    212 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 April 2017

    Author Tags

    1. generalized linear model
    2. large-scale learning
    3. regularization
    4. sparsity

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media