Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

2-Way Text Classification for Harmful Web Documents

  • Conference paper
Computational Science and Its Applications - ICCSA 2006 (ICCSA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3981))

Included in the following conference series:

  • 636 Accesses

Abstract

The openness of the Web allows any user to access almost any type of information. However, some information, such as adult content, is not appropriate for all users, notably children. Additionally for adults, some contents included in abnormal porn sites can do ordinary people’s mental health harm. In this paper, we propose an efficient 2-way text filter for blocking harmful web documents and also present a new criterion for clear classification. It filters off 0-grade web texts containing no harmful words using pattern matching with harmful words dictionaries, and classifies 1-grade,2-grade and 3-grade web texts using a machine learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Internet Contents Rating Association, http://www.icra.org

  2. Safenet, http://www.safenet.ne.kr/english/intro/overview.html

  3. Information Communication Ethics Committee, http://www.icec.or.kr

  4. Siolas, G.: Support Vector Machines based on a semantic kernel for text categorization. In: IJCNN 2000, vol. 5, pp. 205–209 (2000)

    Google Scholar 

  5. Support vector machine-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/SVM

  6. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  7. Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th international conference on Machine Learning, pp. 143–151 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, Y., Nam, T., Won, D. (2006). 2-Way Text Classification for Harmful Web Documents. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_57

Download citation

  • DOI: https://doi.org/10.1007/11751588_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34072-0

  • Online ISBN: 978-3-540-34074-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics