research-article

Indexed block coordinate descent for large-scale linear classification with limited memory

Authors:

Ian En-Hsu Yen,

Shou-De LinAuthors Info & Claims

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 248 - 256

https://doi.org/10.1145/2487575.2487626

Published: 11 August 2013 Publication History

Abstract

Linear Classification has achieved complexity linear to the data size. However, in many applications, data contain large amount of samples that does not help improve the quality of model, but still cost much I/O and memory to process. In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. In particular, we employ truncated loss function to induce a series of convex programs with superior dual sparsity, and solve each dual using Indexed Block Coordinate Descent, which makes use of Approximate Nearest Neighbor (ANN) search to select active dual variables without I/O cost on irrelevant samples. We prove that, despite the bias and weak guarantee from ANN query, the proposed algorithm has global convergence to the solution defined on entire dataset, with sublinear complexity each iteration. Experiments in both sufficient and limited memory conditions show that the proposed approach learns many times faster than other state-of-the-art solvers without sacrificing accuracy.

References

[1]

J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In CVPR, 1997.

Digital Library

[2]

K.-W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In SIGKDD. ACM, 2011.

Digital Library

[3]

O. Chapelle, C. B. Do, Q. V. Le, A. J. Smola, and C. H. Teo. Tighter bounds for structured estimation. In NIPS, 2008.

[4]

R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 2002.

Digital Library

[5]

R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML, 2006.

Digital Library

[6]

I. S. Dhillon, P. D. Ravikumar, and A. Tewari. Nearest neighbor based greedy coordinate descent. In NIPS, 2011.

Digital Library

[7]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9, 2008.

Digital Library

[8]

C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, 2008.

Digital Library

[9]

P. Jain, S. Vijayanarasimhan, and K. Grauman. Hashing hyperplane queries to near points with applications to large-scale active learning. In NIPS, 2010.

[10]

T. Joachims. Training linear SVMs in linear time. In SIGKDD, 2006.

Digital Library

[11]

C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression, 2008.

[12]

T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.

[13]

Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Optim. Theory, 72, 1992.

Digital Library

[14]

Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In ICVLDB, 2007.

Digital Library

[15]

P. Ram and A. G. Gray. Maximum inner-product search using cone trees. In KDD, 2012.

Digital Library

[16]

S. Shalev-Shwartz, K. Crammer, O. Dekel, and Y. Singer. Online passive-aggressive algorithms. In NIPS, 2003.

[17]

S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM. In ICML, 2007.

Digital Library

[18]

B. K. Sriperumbudur and G. R. G. Lanckriet. On the convergence of the concave-convex procedure. In NIPS, 2009.

[19]

I. Steinwart. Sparseness of support vector machines. JMLR, 4, 2003.

Digital Library

[20]

P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Math. Program, 117, 2009.

Digital Library

[21]

L. Wang, H. Jia, and J. Li. Letters: Training robust support vector machine with smooth ramp loss in the primal space. Neurocomput., 71, 2008.

Digital Library

[22]

Z. Wang and S. Vucetic. Fast online training of ramp loss support vector machines. In ICDM, 2009.

Digital Library

[23]

I. E. Yen, N. Peng., P. Wang, and S. Lin. On convergence rate of concave-convex procedure. In NIPS, 2012.

[24]

H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. SIGKDD, 2010.

Digital Library

[25]

A. L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2002.

Digital Library

Cited By

Wszola EMendler-Dunner CJaggi MPuschel M(2019)On Linear Learning with Manycore Processors2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC.2019.00032(184-194)Online publication date: Dec-2019
https://doi.org/10.1109/HiPC.2019.00032
Yen ILin SLin S(2015)A dual-augmented block minimization framework for learning with limited memoryProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 210.5555/2969442.2969639(3582-3590)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.5555/2969442.2969639
Yen IZhong KHsieh CRavikumar PDhillon I(2015)Sparse Linear Programming via primal and dual augmented coordinate descentProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 210.5555/2969442.2969504(2368-2376)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.5555/2969442.2969504

Index Terms

Indexed block coordinate descent for large-scale linear classification with limited memory
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures

Recommendations

BFGS with Update Skipping and Varying Memory

We give conditions under which limited-memory quasi-Newton methods with exact line searches will terminate in n steps when minimizing n-dimensional quadratic functions. We show that although all Broyden family methods terminate in n steps in their full-...
Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization

In this paper we analyze several new methods for solving nonconvex optimization problems with the objective function consisting of a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we ...
On the complexity analysis of randomized block-coordinate descent methods

In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in Nesterov (SIAM J Optim 22(2):341---362, 2012), Richtárik and Takáă (Math Program 144(1---2):1---38, 2014) for minimizing the sum of a smooth convex function and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2013

1534 pages

ISBN:9781450321747

DOI:10.1145/2487575

Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD' 13

Sponsor:

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 11 - 14, 2013

Illinois, Chicago, USA

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
607
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wszola EMendler-Dunner CJaggi MPuschel M(2019)On Linear Learning with Manycore Processors2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC.2019.00032(184-194)Online publication date: Dec-2019
https://doi.org/10.1109/HiPC.2019.00032
Yen ILin SLin S(2015)A dual-augmented block minimization framework for learning with limited memoryProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 210.5555/2969442.2969639(3582-3590)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.5555/2969442.2969639
Yen IZhong KHsieh CRavikumar PDhillon I(2015)Sparse Linear Programming via primal and dual augmented coordinate descentProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 210.5555/2969442.2969504(2368-2376)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.5555/2969442.2969504

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten