Abstract
The openness of the Web allows any user to access almost any type of information. However, some information, such as adult content, is not appropriate for all users, notably children. Additionally for adults, some contents included in abnormal porn sites can do ordinary people’s mental health harm. In this paper, we propose an efficient 2-way text filter for blocking harmful web documents and also present a new criterion for clear classification. It filters off 0-grade web texts containing no harmful words using pattern matching with harmful words dictionaries, and classifies 1-grade,2-grade and 3-grade web texts using a machine learning algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Internet Contents Rating Association, http://www.icra.org
Safenet, http://www.safenet.ne.kr/english/intro/overview.html
Information Communication Ethics Committee, http://www.icec.or.kr
Siolas, G.: Support Vector Machines based on a semantic kernel for text categorization. In: IJCNN 2000, vol. 5, pp. 205–209 (2000)
Support vector machine-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/SVM
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on Machine Learning, pp. 412–420 (1997)
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th international conference on Machine Learning, pp. 143–151 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, Y., Nam, T., Won, D. (2006). 2-Way Text Classification for Harmful Web Documents. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_57
Download citation
DOI: https://doi.org/10.1007/11751588_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34072-0
Online ISBN: 978-3-540-34074-4
eBook Packages: Computer ScienceComputer Science (R0)