research-article

Efficient projections onto the l₁-ball for learning in high dimensions

Authors:

Shai Shalev-Shwartz,

Tushar ChandraAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 272 - 279

https://doi.org/10.1145/1390156.1390191

Published: 05 July 2008 Publication History

Abstract

We describe efficient algorithms for projecting a vector onto the l₁-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l₁-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with l₁ projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.

References

[1]

Beck, A., & Teboulle, M. (2003). Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167--175.

Digital Library

[2]

Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.

[3]

Candes, E. J. (2006). Compressive sampling. Proc. of the Int. Congress of Math., Madrid, Spain.

[4]

Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to algorithms. MIT Press.

Digital Library

[5]

Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47.

Digital Library

[6]

Donoho, D. (2006a). Compressed sensing. Technical Report, Stanford University.

[7]

Donoho, D. (2006b). For most large underdetermined systems of linear equations, the minimal l ₁-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59.

[8]

Friedman, J., Hastie, T., & Tibshirani, R. (2007). Pathwise co-ordinate optimization. Annals of Applied Statistics, 1, 302--332.

[9]

Gafni, E., & Bertsekas, D. P. (1984). Two-metric projection methods for constrained optimization. SIAM Journal on Control and Optimization, 22, 936--964.

[10]

Hazan, E. (2006). Approximate convex optimization by online game playing. Unpublished manuscript.

[11]

Kim, S.-J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l ₁-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 4, 606--617.

[12]

Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64.

Digital Library

[13]

Koh, K., Kim, S.-J., & Boyd, S. (2007). An interior-point method for large-scale l ₁-regularized logistic regression. Journal of Machine Learning Research, 8, 1519--1555.

Digital Library

[14]

Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361--397.

Digital Library

[15]

Ng, A. (2004). Feature selection, l ₁ vs. l ₂ regularization, and rotational invariance. Proceedings of the Twenty-First International Conference on Machine Learning.

Digital Library

[16]

Shalev-Shwartz, S., & Singer, Y. (2006). Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7 (July), 1567--1599.

Digital Library

[17]

Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for SVM. Proceedings of the 24th International Conference on Machine Learning.

Digital Library

[18]

Tarjan, R. E. (1983). Data structures and network algorithms. Society for Industrial and Applied Mathematics.

Digital Library

[19]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, 267--288.

[20]

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the Twentieth International Conference on Machine Learning.

Cited By

Cho M(2025)Iterative Thresholding and Projection Algorithms and Model-Based Deep Neural Networks for Sparse LQR Control DesignIEEE Transactions on Automatic Control10.1109/TAC.2024.345308970:2(1100-1114)Online publication date: Feb-2025
https://doi.org/10.1109/TAC.2024.3453089
Baraldi RKouri D(2025)Efficient proximal subproblem solvers for a nonsmooth trust-region methodComputational Optimization and Applications10.1007/s10589-024-00628-xOnline publication date: 4-Jan-2025
https://doi.org/10.1007/s10589-024-00628-x
Jiang WYang SYang WWang YWan YZhang LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Projection-free variance reduction methods for stochastic constrained multi-level compositional optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692952(21962-21987)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692952
Show More Cited By

Index Terms

Efficient projections onto the l₁-ball for learning in high dimensions

Recommendations

Enhanced Locality Preserving Projections
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data. In this paper, a new manifold learning algorithm, called Enhanced Locality Preserving Projections, to identify the underlying manifold ...
Locality preserving projections
Locality preserving discriminant projections
ICIC'09: Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications

A new manifold learning algorithm called locality preserving discriminant projections (LPDP) is proposed by adding between-class scatter matrix and within-class scatter matrix into locality preserving projections (LPP). LPDP can preserve locality and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Pascal
University of Helsinki
Xerox
Federation of Finnish Learned Societies
Google Inc.
NSF
Machine Learning Journal/Springer
Microsoft Research: Microsoft Research
Intel: Intel
Yahoo!
Helsinki Institute for Information Technology
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

630
Total Citations
View Citations
2,698
Total Downloads

Downloads (Last 12 months)257
Downloads (Last 6 weeks)22

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cho M(2025)Iterative Thresholding and Projection Algorithms and Model-Based Deep Neural Networks for Sparse LQR Control DesignIEEE Transactions on Automatic Control10.1109/TAC.2024.345308970:2(1100-1114)Online publication date: Feb-2025
https://doi.org/10.1109/TAC.2024.3453089
Baraldi RKouri D(2025)Efficient proximal subproblem solvers for a nonsmooth trust-region methodComputational Optimization and Applications10.1007/s10589-024-00628-xOnline publication date: 4-Jan-2025
https://doi.org/10.1007/s10589-024-00628-x
Jiang WYang SYang WWang YWan YZhang LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Projection-free variance reduction methods for stochastic constrained multi-level compositional optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692952(21962-21987)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692952
Colbert IPappalardo APetri-Koenig JUmuroglu YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A2Q+Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692439(9275-9291)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692439
Lin CHe X(2024)Portfolio Optimization with Multi-Trend Objective and Accelerated Quasi-Newton MethodSymmetry10.3390/sym1607082116:7(821)Online publication date: 30-Jun-2024
https://doi.org/10.3390/sym16070821
Xie KYin JYu HFu HChu Y(2024)Passive Aggressive Ensemble for Online Portfolio SelectionMathematics10.3390/math1207095612:7(956)Online publication date: 23-Mar-2024
https://doi.org/10.3390/math12070956
Li HWang H(2024)Enhanced Multi-View Low-Rank Graph Optimization for Dimensionality ReductionElectronics10.3390/electronics1312242113:12(2421)Online publication date: 20-Jun-2024
https://doi.org/10.3390/electronics13122421
Li HLi ZWang H(2024)Multi-view latent space learning framework via adaptive graph embeddingJournal of Electronic Imaging10.1117/1.JEI.33.6.06301633:06Online publication date: 1-Nov-2024
https://doi.org/10.1117/1.JEI.33.6.063016
He PLu SXu FKang YYan QShi Q(2024)A Parallel Zeroth-Order Framework for Efficient Cellular Network OptimizationIEEE Transactions on Wireless Communications10.1109/TWC.2024.345410623:11(17522-17538)Online publication date: Nov-2024
https://doi.org/10.1109/TWC.2024.3454106
Ma KXu QZeng JLiu WCao XSun YHuang Q(2024)Sequential Manipulation Against Rank Aggregation: Theory and AlgorithmIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341671046:12(9353-9370)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3416710
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten