article

Free access

Algorithms for Sparse Linear Classifiers in the Massive Data Setting

Authors:

Suhrid Balakrishnan,

David MadiganAuthors Info & Claims

The Journal of Machine Learning Research, Volume 9

Pages 313 - 337

Published: 01 June 2008 Publication History

Abstract

Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.

References

[1]

J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley and Sons, Inc., 1994.

[2]

S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by Basis Pursuit. SIAM Journal on Scientific Computing, 20(1):33-61, January 1999.

Digital Library

[3]

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407-499, 2004.

[4]

E. Eskin, A. J. Smola, and S.V.N. Vishwanathan. Laplace Propagation. In Neural Information Processing Systems, 16. MIT Press, 2003.

[5]

M. A. T. Figueiredo and A. K. Jain. Bayesian learning of sparse classifiers. In Proceedings of the Computer Vision and Pattern Recognition Conference, volume 1, pages 35-41, 2001.

[6]

W. J. Fu. Penalized regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7(3):397-416, 1998.

[7]

A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logisitic regression for text categorization. Technometrics, 49:291-304, 2007.

[8]

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the Support Vector Machine. Journal of Machine Learning Research, 5:1391-1415, 2004.

Digital Library

[9]

T. Jaakkola and M. Jordan. Bayesian parameter estimation via variational methods. Statistics and Computing, 10:25-37, 2000.

Digital Library

[10]

R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90: 773-795, 1995.

[11]

K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l ₁-regularized logistic regression. Journal of Machine Learning Research, 8:1519-1555, 2007.

Digital Library

[12]

P. Komarek and A. Moore. Making logistic regression a core data mining tool: A practical investigation of accuracy, speed, and simplicity. Technical Report TR-05-27, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, May 2005.

[13]

B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink. Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analalysis and Machine Intelligence, 2005.

Digital Library

[14]

D. D. Lewis. Reuters-21578 text categorization test collection: Distribution 1.0 readme file (v 1.3)., 2004. URL http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt.

[15]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.

Digital Library

[16]

D. J. C. MacKay. Probable networks and plausible predictions: a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6:469-505, 1995.

[17]

T. P. Minka. Expectation Propagation for approximate Bayesian inference. In Jack Breese and Daphne Koller, editors, Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), pages 362-369, San Francisco, CA, August 2-5 2001a. Morgan Kaufmann Publishers.

Digital Library

[18]

T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001b.

Digital Library

[19]

M. Opper. A Bayesian approach to on-line learning. In D. Saad, editor, Online Learning in Neural Networks, pages 363-378. Cambridge University Press, 1998.

Digital Library

[20]

M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20(3):389-403, July 2000.

[21]

Y. Qi, T. P. Minka, R. W. Picard, and Z. Ghahramani. Predictive automatic relevance determination by Expectation Propagation. In Proceedings of Twenty-first International Conference on Machine Learning, Banff, Alberta, Canada, July 4-8 2004.

Digital Library

[22]

R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, N.J, 1970.

[23]

S. K. Shevade and S. S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics., 19(17):2246-2253, 2003.

[24]

R. J. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):267-288, 1996.

[25]

T. Zhang. On the dual formulation of regularized linear systems. Machine Learning, 46:91-129, 2002.

Digital Library

[26]

T. Zhang and F. J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, 4(1):5-31, 2001.

Digital Library

[27]

H. Zou. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association , 101:1418-1429, 2006.

Cited By

Li XChing A(2024)How Does a Firm Adapt in a Changing World? The Case of Prosper MarketplaceMarketing Science10.1287/mksc.2022.019843:3(673-693)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1287/mksc.2022.0198
Lam HLiu ZKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Bootstrap in high dimension with low computationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619168(18419-18453)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619168
Bokadia HCole JTorres E(2020)Neural Connectivity Evolution during Adaptive Learning with and without ProprioceptionProceedings of the 7th International Conference on Movement and Computing10.1145/3401956.3404232(1-4)Online publication date: 15-Jul-2020
https://dl.acm.org/doi/10.1145/3401956.3404232
Show More Cited By

Index Terms

Algorithms for Sparse Linear Classifiers in the Massive Data Setting
1. Computing methodologies
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Learning sparse classifiers with difference of convex functions algorithms
the 8th International Conference on Optimization: Techniques and Applications

Sparsity of a classifier is a desirable condition for high-dimensional data and large sample sizes. This paper investigates the two complementary notions of sparsity for binary classification: sparsity in the number of features and sparsity in the ...
Building sparse multiple-kernel SVM classifiers

The support vector machines (SVMs) have been very successful in many machine learning problems. However, they can be slow during testing because of the possibly large number of support vectors obtained. Recently, Wu et al. (2005) proposed a sparse ...
Generalization error bounds for multiclass sparse linear classifiers

We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 9, Issue

6/1/2008

1964 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 June 2008

Published in JMLR Volume 9

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)9

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li XChing A(2024)How Does a Firm Adapt in a Changing World? The Case of Prosper MarketplaceMarketing Science10.1287/mksc.2022.019843:3(673-693)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1287/mksc.2022.0198
Lam HLiu ZKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Bootstrap in high dimension with low computationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619168(18419-18453)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619168
Bokadia HCole JTorres E(2020)Neural Connectivity Evolution during Adaptive Learning with and without ProprioceptionProceedings of the 7th International Conference on Movement and Computing10.1145/3401956.3404232(1-4)Online publication date: 15-Jul-2020
https://dl.acm.org/doi/10.1145/3401956.3404232
Zhao PWang DWu PHoi S(2020)A Unified Framework for Sparse Online LearningACM Transactions on Knowledge Discovery from Data10.1145/336155914:5(1-20)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3361559
Zhang DHou YZeng LZhao W(2019)Hardware Acceleration Implementation of Sparse Coding Algorithm With Spintronic DevicesIEEE Transactions on Nanotechnology10.1109/TNANO.2019.291614918(518-531)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TNANO.2019.2916149
Akhlaghi MSukhov S(2018)Knowledge Fusion in Feedforward Artificial Neural NetworksNeural Processing Letters10.1007/s11063-017-9712-548:1(257-272)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s11063-017-9712-5
Trofimov IGenkin A(2017)Distributed coordinate descent for generalized linear models with regularizationPattern Recognition and Image Analysis10.1134/S105466181702012227:2(349-364)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1134/S1054661817020122
Jae-sun Seo Binbin Lin Minkyu Kim Pai-Yu Chen Kadetotad DZihan Xu Mohanty AVrudhula SShimeng Yu Jieping Ye Yu Cao (2015)On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic DevicesIEEE Transactions on Nanotechnology10.1109/TNANO.2015.247886114:6(969-979)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1109/TNANO.2015.2478861
Yang HLyu MKing I(2013)Efficient online learning for multitask feature selectionACM Transactions on Knowledge Discovery from Data10.1145/2499907.24999097:2(1-27)Online publication date: 2-Aug-2013
https://dl.acm.org/doi/10.1145/2499907.2499909
Pillow JScott J(2012)Fully Bayesian inference for neural models with negative-binomial spikingProceedings of the 26th International Conference on Neural Information Processing Systems - Volume 210.5555/2999325.2999347(1898-1906)Online publication date: 3-Dec-2012
https://dl.acm.org/doi/10.5555/2999325.2999347
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents