research-article

On updates that constrain the features' connections during learning

Authors:

Jian HuangAuthors Info & Claims

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 515 - 523

https://doi.org/10.1145/1401890.1401954

Published: 24 August 2008 Publication History

Abstract

In many multiclass learning scenarios, the number of classes is relatively large (thousands,...), or the space and time efficiency of the learning system can be crucial. We investigate two online update techniques especially suited to such problems. These updates share a sparsity preservation capacity: they allow for constraining the number of prediction connections that each feature can make. We show that one method, exponential moving average, is solving a "discrete" regression problem for each feature, changing the weights in the direction of minimizing the quadratic loss. We design the other method to improve a hinge loss subject to constraints, for better accuracy. We empirically explore the methods, and compare performance to previous indexing techniques, developed with the same goals, as well as other online algorithms based on prototype learning. We observe that while the classification accuracies are very promising, improving over previous indexing techniques, the scalability benefits are preserved.

References

[1]

K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. JMLR, 7, 2006.

Digital Library

[2]

K. Crammer and Y. Singer. A family of additive online algorithms for category ranking. JMLR, 3, 2003.

Digital Library

[3]

B. D. Davison and H. Hirsh. Predicting sequences of user actions. In AAAI-98/ICML'98 Workshop on Predicting the Future: AI Approaches to Time Series Analysis, 1998.

[4]

R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley & Sons, 2 edition, 2001.

Digital Library

[5]

S. Dumais and H. Chen. Hierarchical classification of web content. In SIGIR, 2000.

Digital Library

[6]

B. S. Everitt. Cambridge Dictionary of Statistics. Cambridge University Press, 2nd edition edition, 2003.

[7]

D. A. Forsyth and J. Ponce. Computer Vision. Prentice Hall, 2003.

[8]

D. A. Forsyth and J. Ponce. Computer Vision. Prentice Hall, 2003.

[9]

C. Genest and J. V. Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1(1):114--148, 1986.

[10]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.

[11]

S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 2006.

Digital Library

[12]

J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Inf. and Comput., 1997.

Digital Library

[13]

B. Korvemaker and R. Greiner. Predicting UNIX command lines: Adjusting to user patterns. In AAAI/IAAI, 2000.

Digital Library

[14]

K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331--339, 1995.

[15]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. JMLR, 5:361--397, 2004.

Digital Library

[16]

T. Liu, Y. Yang, H. Wan, H. Zeng, Z. Chen, and W. Ma. Support vector machines classification with very large scale taxonomy. SIGKDD Explorations, 7, 2005.

Digital Library

[17]

O. Madani. Exploring massive learning via a prediction system. In AAAI Fall Symposium Series: Computational Approaches to Representation Change During Learning and Development, 2007.

[18]

O. Madani and M. Connor. Large-scale many-class learning. In SIAM Conference on Data Mining (SDM), 2008.

[19]

O. Madani, W. Greiner, D. Kempe, and M. Salavatipour. Recall systems: Efficient learning and use of category indices. In AISTATS, 2007.

[20]

O. Madani and J. Huang. On updates that constrain the features? connections during learning. Technical report, SRI International, AI Center, 2008. In preparation.

[21]

C. Mesterharm. A multi-class linear learning algorithm related to Winnow. In NIPS, 2000.

[22]

J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumption of Naive Bayes text classifiers. In ICML, 2003.

[23]

R. Rifkin and A. Klautau. In defense of one-vs-all classification. JMLR, 5, 2004.

Digital Library

[24]

R. Rosenfeld. Two decades of statistical language modeling: Where do we go from here? IEEE, 88(8), 2000.

[25]

F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002.

Digital Library

[26]

R. Sutton and A. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.

Digital Library

Cited By

Wang XZhao HLu B(2014)A Meta-Top-Down Method for Large-Scale Hierarchical ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.3026:3(500-513)Online publication date: 1-Mar-2014
https://dl.acm.org/doi/10.1109/TKDE.2013.30
Huang JTreeratpituk PTaylor SGiles CJoshi AHuang CJurafsky D(2010)Enhancing cross document coreference of web documents with context similarity and very large scale text categorizationProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873836(483-491)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1873781.1873836
Kosmopoulos AGaussier EPaliouras GAseervatham S(2010)The ECIR 2010 large scale hierarchical classification workshopACM SIGIR Forum10.1145/1842890.184289444:1(23-32)Online publication date: 18-Aug-2010
https://dl.acm.org/doi/10.1145/1842890.1842894
Show More Cited By

Index Terms

On updates that constrain the features' connections during learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Adapting naive Bayes tree for text classification

Naive Bayes (NB) is one of the top 10 algorithms thanks to its simplicity, efficiency, and interpretability. To weaken its attribute independence assumption, naive Bayes tree (NBTree) has been proposed. NBTree is a hybrid algorithm, which deploys a ...
Confidence-weighted linear classification for text categorization

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2008

1116 pages

ISBN:9781605581934

DOI:10.1145/1401890

General Chair:
Ying Li
Microsoft adCenter Labs
,
Program Chairs:
Bing Liu
University of Illinois at Chicago
,
Sunita Sarawagi
Indian Institute of Technology, Bombay

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD08

Sponsor:

KDD08: The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 24 - 27, 2008

Nevada, Las Vegas, USA

Acceptance Rates

KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
315
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang XZhao HLu B(2014)A Meta-Top-Down Method for Large-Scale Hierarchical ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.3026:3(500-513)Online publication date: 1-Mar-2014
https://dl.acm.org/doi/10.1109/TKDE.2013.30
Huang JTreeratpituk PTaylor SGiles CJoshi AHuang CJurafsky D(2010)Enhancing cross document coreference of web documents with context similarity and very large scale text categorizationProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873836(483-491)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1873781.1873836
Kosmopoulos AGaussier EPaliouras GAseervatham S(2010)The ECIR 2010 large scale hierarchical classification workshopACM SIGIR Forum10.1145/1842890.184289444:1(23-32)Online publication date: 18-Aug-2010
https://dl.acm.org/doi/10.1145/1842890.1842894
Madani OBui HYeh E(2009)Efficient online learning and prediction of users' desktop actionsProceedings of the 21st International Joint Conference on Artificial Intelligence10.5555/1661445.1661679(1457-1462)Online publication date: 11-Jul-2009
https://dl.acm.org/doi/10.5555/1661445.1661679
Madani OConnor MGreiner W(2009)Learning When Concepts AboundThe Journal of Machine Learning Research10.5555/1577069.175587210(2571-2613)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1577069.1755872
Huang JMadani OGiles CShanahan JAmer-Yahia SManolescu IZhang YEvans DKolcz AChoi KChowdury A(2008)Error-driven generalist+experts (edge)Proceedings of the 17th ACM conference on Information and knowledge management10.1145/1458082.1458097(83-92)Online publication date: 26-Oct-2008
https://dl.acm.org/doi/10.1145/1458082.1458097

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents