2-Way Text Classification for Harmful Web Documents

Kim, Youngsoo; Nam, Taekyong; Won, Dongho

doi:10.1007/11751588_57

Youngsoo Kim^24,25,
Taekyong Nam²⁴ &
Dongho Won²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3981))

Included in the following conference series:

International Conference on Computational Science and Its Applications

636 Accesses

Abstract

The openness of the Web allows any user to access almost any type of information. However, some information, such as adult content, is not appropriate for all users, notably children. Additionally for adults, some contents included in abnormal porn sites can do ordinary people’s mental health harm. In this paper, we propose an efficient 2-way text filter for blocking harmful web documents and also present a new criterion for clear classification. It filters off 0-grade web texts containing no harmful words using pattern matching with harmful words dictionaries, and classifies 1-grade,2-grade and 3-grade web texts using a machine learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Text Classification for Web-Based Malayalam Documents

A novel approach to detect, analyze and block adversarial web pages

Article 02 July 2024

Classification of Human and Machine-Generated Texts Using Lexical Features and Supervised/Unsupervised Machine Learning Algorithms

References

Internet Contents Rating Association, http://www.icra.org
Safenet, http://www.safenet.ne.kr/english/intro/overview.html
Information Communication Ethics Committee, http://www.icec.or.kr
Siolas, G.: Support Vector Machines based on a semantic kernel for text categorization. In: IJCNN 2000, vol. 5, pp. 205–209 (2000)
Google Scholar
Support vector machine-Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/SVM
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on Machine Learning, pp. 412–420 (1997)
Google Scholar
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th international conference on Machine Learning, pp. 143–151 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Network Security Group, Electronics and Telecommunications Research Institute (ETRI), 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-350, Korea
Youngsoo Kim & Taekyong Nam
Information Security Group, School of Information and Communication Engineering, Sungkyunkwan University, 300 Cheoncheon-dong, Jangan-gu, Suwon, Gyeonggi-do, 440-746, Korea
Youngsoo Kim & Dongho Won

Authors

Youngsoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Taekyong Nam
View author publications
You can also search for this author in PubMed Google Scholar
Dongho Won
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
OptimaNumerics Ltd., Cathedral House, 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
School of Information and Communication Engineering, Sungkyunkwan University, Korea
Hyunseung Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, Y., Nam, T., Won, D. (2006). 2-Way Text Classification for Harmful Web Documents. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_57

Download citation

DOI: https://doi.org/10.1007/11751588_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34072-0
Online ISBN: 978-3-540-34074-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

2-Way Text Classification for Harmful Web Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automatic Text Classification for Web-Based Malayalam Documents

A novel approach to detect, analyze and block adversarial web pages

Classification of Human and Machine-Generated Texts Using Lexical Features and Supervised/Unsupervised Machine Learning Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

2-Way Text Classification for Harmful Web Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automatic Text Classification for Web-Based Malayalam Documents

A novel approach to detect, analyze and block adversarial web pages

Classification of Human and Machine-Generated Texts Using Lexical Features and Supervised/Unsupervised Machine Learning Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation