Abstract
In this paper, a novel text categorization method based on multi-class Support Vector Machines (SVMs) with Rocchio ensemble is proposed for Internet information classification and filtering. The multi-class SVM classifier with Rocchio ensemble has a novel cascaded architecture in which a Rocchio linear classifier processes all the data and only selected part of the data is re-processed by the multi-class SVM classifier. The data selection for SVM is based on the validation results of the Rocchio classifier so that only data classes with lower precision is processed by the SVM classifier. The whole cascaded ensemble classifier takes advantages of the multi-class SVM as well as the Rocchio classifier. In one aspect, the small computational cost or fast processing speed of Rocchio is suitable for large-scale web information classification and filtering applications such as spam mail filtering at network gateways. On the other hand, the good generalization ability of multi-class SVMs can be employed to improve Rocchio’s precision further. The whole ensemble classifier can be viewed as an efficient approach to compromising processing speed and precision of different classifiers. Experimental results on real web text data illustrate the effectiveness of the proposed method.
Supported by the National Natural Science Foundation of China Under Grants 60303012, 90104001, Specialized Research Fund for the Doctoral Program of Higher Education under Grant 20049998027, Chinese Post-Doctor Science Foundation under Grant 200403500202, and A Project Supported by Scientific Research Fund of Hunan Provincial Education Department.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Konstantinos, V.C., et al.: Automatic Web Rating: Filtering Obscene Content on the Web. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, p. 403. Springer, Heidelberg (2000)
Schneider, K.: A Comparison of Event Models for Naïve Bayes Anti-spam E-mail Filtering. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003 (2003)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys (1999)
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: Proceedings of 14th International Conference on Machine Learning ICML 1997 (1997)
Kom, Y.J., et al.: Automatic Text Categorization by Unsupervised Learning. In: Proceedings of the 17th Conference on Computational Linguistics, vol. 1 (2000)
Ittner, D.J., Lewis, D.D., Kim, Y.-H., et al.: Text Filtering by Boosting Naive Bayes Classifiers. In: Proceedings of 23rd ACM International Conference on Research and Development in Information Retrieval, SIGIR 2000, Athens, Greece, pp. 168–175 (2000)
Lewis, D.D., Cartlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of 11th International Conference on Machine Learning, ICML 1994, New Brunswick, NJ, pp. 148–156 (1994)
Merkl, D.: Text Classification with Self-Organizing Maps: Some Lessons Learned. Neurocomputing 21(1-3), 61–77 (1998)
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of 16th International Conference on Machine Learning, ICML 1999, Bled, Slovenia, pp. 200–209 (1999)
Zhou, Z.-H., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1-2), 239a–263a (2002)
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Freund, Y.: Boosting a Weak Algorithm by Majority. Information and Computation 121(2), 256–285 (1995)
Dietterich, T.G., Bakiri, G.: Solving Multiclass Learning Problems via Error-correcting Output Codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
Rifkin, R., Klautau, A.: In Defense of One-Vs-All Classification. Journal of Machine Learning Research 5, 143–151 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, X., Zhang, B., Zhong, Q. (2005). Text Categorization Using SVMs with Rocchio Ensemble for Internet Information Classification. In: Lu, X., Zhao, W. (eds) Networking and Mobile Computing. ICCNMC 2005. Lecture Notes in Computer Science, vol 3619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11534310_107
Download citation
DOI: https://doi.org/10.1007/11534310_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28102-3
Online ISBN: 978-3-540-31868-2
eBook Packages: Computer ScienceComputer Science (R0)