Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Text Categorization Using SVMs with Rocchio Ensemble for Internet Information Classification

  • Conference paper
Networking and Mobile Computing (ICCNMC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 3619))

Included in the following conference series:

  • 782 Accesses

Abstract

In this paper, a novel text categorization method based on multi-class Support Vector Machines (SVMs) with Rocchio ensemble is proposed for Internet information classification and filtering. The multi-class SVM classifier with Rocchio ensemble has a novel cascaded architecture in which a Rocchio linear classifier processes all the data and only selected part of the data is re-processed by the multi-class SVM classifier. The data selection for SVM is based on the validation results of the Rocchio classifier so that only data classes with lower precision is processed by the SVM classifier. The whole cascaded ensemble classifier takes advantages of the multi-class SVM as well as the Rocchio classifier. In one aspect, the small computational cost or fast processing speed of Rocchio is suitable for large-scale web information classification and filtering applications such as spam mail filtering at network gateways. On the other hand, the good generalization ability of multi-class SVMs can be employed to improve Rocchio’s precision further. The whole ensemble classifier can be viewed as an efficient approach to compromising processing speed and precision of different classifiers. Experimental results on real web text data illustrate the effectiveness of the proposed method.

Supported by the National Natural Science Foundation of China Under Grants 60303012, 90104001, Specialized Research Fund for the Doctoral Program of Higher Education under Grant 20049998027, Chinese Post-Doctor Science Foundation under Grant 200403500202, and A Project Supported by Scientific Research Fund of Hunan Provincial Education Department.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Konstantinos, V.C., et al.: Automatic Web Rating: Filtering Obscene Content on the Web. In: Borbinha, J.L., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, p. 403. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Schneider, K.: A Comparison of Event Models for Naïve Bayes Anti-spam E-mail Filtering. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003 (2003)

    Google Scholar 

  3. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys (1999)

    Google Scholar 

  4. Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: Proceedings of 14th International Conference on Machine Learning ICML 1997 (1997)

    Google Scholar 

  5. Kom, Y.J., et al.: Automatic Text Categorization by Unsupervised Learning. In: Proceedings of the 17th Conference on Computational Linguistics, vol. 1 (2000)

    Google Scholar 

  6. Ittner, D.J., Lewis, D.D., Kim, Y.-H., et al.: Text Filtering by Boosting Naive Bayes Classifiers. In: Proceedings of 23rd ACM International Conference on Research and Development in Information Retrieval, SIGIR 2000, Athens, Greece, pp. 168–175 (2000)

    Google Scholar 

  7. Lewis, D.D., Cartlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of 11th International Conference on Machine Learning, ICML 1994, New Brunswick, NJ, pp. 148–156 (1994)

    Google Scholar 

  8. Merkl, D.: Text Classification with Self-Organizing Maps: Some Lessons Learned. Neurocomputing 21(1-3), 61–77 (1998)

    Article  Google Scholar 

  9. Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of 16th International Conference on Machine Learning, ICML 1999, Bled, Slovenia, pp. 200–209 (1999)

    Google Scholar 

  10. Zhou, Z.-H., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1-2), 239a–263a (2002)

    Article  MATH  MathSciNet  Google Scholar 

  11. Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  12. Freund, Y.: Boosting a Weak Algorithm by Majority. Information and Computation 121(2), 256–285 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  13. Dietterich, T.G., Bakiri, G.: Solving Multiclass Learning Problems via Error-correcting Output Codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)

    MATH  Google Scholar 

  14. Rifkin, R., Klautau, A.: In Defense of One-Vs-All Classification. Journal of Machine Learning Research 5, 143–151 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, X., Zhang, B., Zhong, Q. (2005). Text Categorization Using SVMs with Rocchio Ensemble for Internet Information Classification. In: Lu, X., Zhao, W. (eds) Networking and Mobile Computing. ICCNMC 2005. Lecture Notes in Computer Science, vol 3619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11534310_107

Download citation

  • DOI: https://doi.org/10.1007/11534310_107

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28102-3

  • Online ISBN: 978-3-540-31868-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics