Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Algorithms for Sparse Linear Classifiers in the Massive Data Setting

Published: 01 June 2008 Publication History

Abstract

Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.

References

[1]
J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley and Sons, Inc., 1994.
[2]
S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by Basis Pursuit. SIAM Journal on Scientific Computing, 20(1):33-61, January 1999.
[3]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407-499, 2004.
[4]
E. Eskin, A. J. Smola, and S.V.N. Vishwanathan. Laplace Propagation. In Neural Information Processing Systems, 16. MIT Press, 2003.
[5]
M. A. T. Figueiredo and A. K. Jain. Bayesian learning of sparse classifiers. In Proceedings of the Computer Vision and Pattern Recognition Conference, volume 1, pages 35-41, 2001.
[6]
W. J. Fu. Penalized regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7(3):397-416, 1998.
[7]
A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logisitic regression for text categorization. Technometrics, 49:291-304, 2007.
[8]
T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the Support Vector Machine. Journal of Machine Learning Research, 5:1391-1415, 2004.
[9]
T. Jaakkola and M. Jordan. Bayesian parameter estimation via variational methods. Statistics and Computing, 10:25-37, 2000.
[10]
R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90: 773-795, 1995.
[11]
K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l 1-regularized logistic regression. Journal of Machine Learning Research, 8:1519-1555, 2007.
[12]
P. Komarek and A. Moore. Making logistic regression a core data mining tool: A practical investigation of accuracy, speed, and simplicity. Technical Report TR-05-27, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, May 2005.
[13]
B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink. Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analalysis and Machine Intelligence, 2005.
[14]
D. D. Lewis. Reuters-21578 text categorization test collection: Distribution 1.0 readme file (v 1.3)., 2004. URL http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt.
[15]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
[16]
D. J. C. MacKay. Probable networks and plausible predictions: a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6:469-505, 1995.
[17]
T. P. Minka. Expectation Propagation for approximate Bayesian inference. In Jack Breese and Daphne Koller, editors, Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), pages 362-369, San Francisco, CA, August 2-5 2001a. Morgan Kaufmann Publishers.
[18]
T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001b.
[19]
M. Opper. A Bayesian approach to on-line learning. In D. Saad, editor, Online Learning in Neural Networks, pages 363-378. Cambridge University Press, 1998.
[20]
M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20(3):389-403, July 2000.
[21]
Y. Qi, T. P. Minka, R. W. Picard, and Z. Ghahramani. Predictive automatic relevance determination by Expectation Propagation. In Proceedings of Twenty-first International Conference on Machine Learning, Banff, Alberta, Canada, July 4-8 2004.
[22]
R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, N.J, 1970.
[23]
S. K. Shevade and S. S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics., 19(17):2246-2253, 2003.
[24]
R. J. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):267-288, 1996.
[25]
T. Zhang. On the dual formulation of regularized linear systems. Machine Learning, 46:91-129, 2002.
[26]
T. Zhang and F. J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, 4(1):5-31, 2001.
[27]
H. Zou. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association , 101:1418-1429, 2006.

Cited By

View all
  • (2024)How Does a Firm Adapt in a Changing World? The Case of Prosper MarketplaceMarketing Science10.1287/mksc.2022.019843:3(673-693)Online publication date: 1-May-2024
  • (2023)Bootstrap in high dimension with low computationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619168(18419-18453)Online publication date: 23-Jul-2023
  • (2020)Neural Connectivity Evolution during Adaptive Learning with and without ProprioceptionProceedings of the 7th International Conference on Movement and Computing10.1145/3401956.3404232(1-4)Online publication date: 15-Jul-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 9, Issue
6/1/2008
1964 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 June 2008
Published in JMLR Volume 9

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)9
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)How Does a Firm Adapt in a Changing World? The Case of Prosper MarketplaceMarketing Science10.1287/mksc.2022.019843:3(673-693)Online publication date: 1-May-2024
  • (2023)Bootstrap in high dimension with low computationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619168(18419-18453)Online publication date: 23-Jul-2023
  • (2020)Neural Connectivity Evolution during Adaptive Learning with and without ProprioceptionProceedings of the 7th International Conference on Movement and Computing10.1145/3401956.3404232(1-4)Online publication date: 15-Jul-2020
  • (2020)A Unified Framework for Sparse Online LearningACM Transactions on Knowledge Discovery from Data10.1145/336155914:5(1-20)Online publication date: 17-Aug-2020
  • (2019)Hardware Acceleration Implementation of Sparse Coding Algorithm With Spintronic DevicesIEEE Transactions on Nanotechnology10.1109/TNANO.2019.291614918(518-531)Online publication date: 1-Jan-2019
  • (2018)Knowledge Fusion in Feedforward Artificial Neural NetworksNeural Processing Letters10.1007/s11063-017-9712-548:1(257-272)Online publication date: 1-Aug-2018
  • (2017)Distributed coordinate descent for generalized linear models with regularizationPattern Recognition and Image Analysis10.1134/S105466181702012227:2(349-364)Online publication date: 1-Apr-2017
  • (2015)On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic DevicesIEEE Transactions on Nanotechnology10.1109/TNANO.2015.247886114:6(969-979)Online publication date: 1-Nov-2015
  • (2013)Efficient online learning for multitask feature selectionACM Transactions on Knowledge Discovery from Data10.1145/2499907.24999097:2(1-27)Online publication date: 2-Aug-2013
  • (2012)Fully Bayesian inference for neural models with negative-binomial spikingProceedings of the 26th International Conference on Neural Information Processing Systems - Volume 210.5555/2999325.2999347(1898-1906)Online publication date: 3-Dec-2012
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media