article

Live and learn from mistakes: A lightweight system for document classification

Authors:

Yevgen Borodin,

Valentin Polishchuk,

I. V. Ramakrishnan,

Amanda StentAuthors Info & Claims

Information Processing and Management: an International Journal, Volume 49, Issue 1

Pages 83 - 98

https://doi.org/10.1016/j.ipm.2012.02.001

Published: 01 January 2013 Publication History

Abstract

We present a Life-Long Learning from Mistakes (3LM) algorithm for document classification, which could be used in various scenarios such as spam filtering, blog classification, and web resource categorization. We extend the ideas of online clustering and batch-mode centroid-based classification to online learning with negative feedback. The 3LM is a competitive learning algorithm, which avoids over-smoothing, characteristic of the centroid-based classifiers, by using a different class representative, which we call clusterhead. The clusterheads competing for vector-space dominance are drawn toward misclassified documents, eventually bringing the model to a ''balanced state'' for a fixed distribution of documents. Subsequently, the clusterheads oscillate between the misclassified documents, heuristically minimizing the rate of misclassifications, an NP-complete problem. Further, the 3LM algorithm prevents over-fitting by ''leashing'' the clusterheads to their respective centroids. A clusterhead provably converges if its class can be separated by a hyper-plane from all other classes. Lifelong learning with fixed learning rate allows 3LM to adapt to possibly changing distribution of the data and continually learn and unlearn document classes. We report on our experiments, which demonstrate high accuracy of document classification on Reuters21578, OHSUMED, and TREC07p-spam datasets. The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naive Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.

References

[1]

The handbook of brain theory and neural networks. 2008. MIT Press.

[2]

Basagni, S., Herrin, K., Bruschi, D., Rosti, E. (2001). Secure pebblenets. In Proceedings of the 2nd ACM international symposium on mobile ad hoc networking &; computing (pp. 156-163). Long Beach, CA, USA: ACM.

Digital Library

[3]

Online adaptive decision trees: Pattern classification and function approximation. Neural Computation. v18 i9. 2062-2101.

Digital Library

[4]

Berikov, V., Litvinenko, A. (2003). Methods for statistical data analysis with decision tree. Novosibirsk Sobolev Institute of Mathematics.

[5]

Beroule, D. (1988). The never-ending learning. In Proceedings of the NATO advanced research workshop on neural computers (pp. 219-230).

Digital Library

[6]

Dynamics of on-line competitive learning. A Letters Journal Exploring the Frontiers of Physics. v38 i1. 73-78.

[7]

Bloehdorn, S., Hotho, A. (2004). Boosting for text classification with semantic features. In Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD conference on knowledge discovery and data mining.

[8]

Bordes, A., Bottou, L. (2005). The Huller: A simple and efficient online SVM. In Proceedings of ECML, 16th European conference on machine learning.

Digital Library

[9]

Convergence properties of the K-means algorithms. Advances in Neural Information Processing Systems. v7. 585-592.

[10]

Chai, K.M.A., Chieu, H.L., Ng, H.T. (2002). Bayesian online classifiers for text classification and filtering. In Proceedings of SIGIR '02: The 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 97-104).

Digital Library

[11]

Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE.

[12]

Concept decompositions for large sparse text data using clustering. Machine Learning. v42 i1/2. 143-175.

Digital Library

[13]

Dumais, S., Furnas, G. W., Landauer, T. K., Deerwester, S., Harshman, R. (1988). Using latent semantic analysis to improve information retrieval.

[14]

Garey, M. R., Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness. New York, NY: W.H. Freeman.

Digital Library

[15]

Godbole, S., Harpale, A., Sarawagi, S., Chakrabarti, S. (2004). Document classification through interactive supervision of document and term labels. In Proceedings of PKDD '04: The 8th European conference on principles and practice of knowledge discovery in databases (pp. 185-196). Pisa, Italy: Springer-Verlag New York, Inc.

Digital Library

[16]

Guan, H., Zhou, J., Guo, M. (2009). A class-feature-centroid classifier for text categorization. In Proceedings of the 18th international conference on world wide web. Madrid, Spain: ACM.

Digital Library

[17]

Han, E.-H., Karypis, G. (2000). Centroid-based document classification: Analysis and experimental results. Principles of Data Mining and Knowledge Discovery, 424-431.

Digital Library

[18]

Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning.

Digital Library

[19]

King, I., Lau, T. (1997). Competitive learning clustering for information retrieval in image databases. In Proceedings of the 1997 international conference on neural networks.

[20]

Kumar, D., Aseri, T. C., Patel, R. B. (2009). EECHE: energy-efficient cluster head election protocol for heterogeneous wireless sensor networks. In Proceedings of the international conference on advances in computing, communication and control (pp. 75-80). Mumbai, India: ACM.

Digital Library

[21]

Hierarchical document classification using automatically generated hierarchy. Journal of Intelligent Information Systems. v29 i2. 211-230.

Digital Library

[22]

Merkl, D. (1999). Document classification with self-organizing maps. In Kohonen maps (pp. 183-197). Elsevier: Amsterdam.

[23]

Minsky, M. L., Papert, S. A. (1969). Perceptrons. Cambridge, MA: MIT Press.

[24]

Opper, M. (1998). A Bayesian approach to on-line learning. In On-line learning in neural networks (pp. 363-378). New York, NY, USA: Cambridge University Press; ISBN:0-521-65263-4.

Digital Library

[25]

Rennie, J. (2000). ifile: An application of machine learning to e-mail filtering. In Proceedings of the KDD-2000 workshop on text mining.

[26]

Rish, I. (2001). An empirical study of the naive Bayes classifier.

[27]

Rocchio, J. (1971). Relevance feedback in information retrieval. In Salton: The SMART reitrieval system: Experiments in automatic document processing (pp. 313-323). Prentice-Hall.

[28]

Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. Learning for text categorization: Papers from the 1998 Workshop.

[29]

Sculley, D., Wachman, G. M. (2007). Relaxed online SVM for spam filtering. In SIGIR, 2007: Proceedings of the 30th annual international ACM SIGIR conference.

Digital Library

[30]

Solla, S., Winther, O. (1998). Optimal perceptron learning: An online bayesian approach. In On-line learning in neural networks. New York, NY, USA: Cambridge University Press; ¿1998 ISBN:0-521-65263-4.

Digital Library

[31]

Vapnik, V. (1992). Principles of risk minimization for learning theory. In D.S. Lippman, J.E. Moody, D.S. Touretzky (Eds.), Advances in neural information processing systems (Vol. 3, pp. 831-838).

[32]

Zhang, Z., Guo, C., Yu, S., Qi, D.Y., Long, S. (2005). Web prediction using online support vector machine. ICTAI 05.

Digital Library

[33]

Zhong, S. (2005). Efficient online spherical k-means clustering. Neural Networks. In Proceedings of the 2005 IEEE international joint conference on IJCNN '05 (Vol, 5, pp. 3180-3185).

[34]

Generative model-based document clustering: a comparative study. Knowledge and Information Systems. v8 i3. 374-384.

Digital Library

Live and learn from mistakes: A lightweight system for document classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

A New Diverse AdaBoost Classifier
AICI '10: Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence - Volume 01

AdaBoost is one of the most popular algorithms to construct a strong classifier with linear combination of member classifiers. The member classifiers are selected to minimize the errors in each iteration step during training process. AdaBoost provides ...
The random boosting ensemble classifier for land-use image classification

This paper presents a random boosting ensemble (RBE) classifier for remote sensing image classification, which introduces the random projection feature selection and bootstrap methods to obtain base classifiers for classifier ensemble. The RBE method is ...
A statistical method for binary classification of images
DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering

The classification of documents with sparse text, and video analysis, relies on accurate image classification. We herein present a method for binary classification that accommodates any number of individual classifiers. Each individual classifier is ...

Comments

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal

Information Processing and Management: an International Journal Volume 49, Issue 1

January, 2013

405 pages

ISSN:0306-4573

Issue’s Table of Contents

Copyright © Elsevier Ltd © 2012.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 January 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents