Article

Automatic web pages categorization with ReliefF and Hidden Naive Bayes

Authors:

Rongfang BieAuthors Info & Claims

SAC '07: Proceedings of the 2007 ACM symposium on Applied computing

Pages 617 - 621

https://doi.org/10.1145/1244002.1244143

Published: 11 March 2007 Publication History

Abstract

A great challenge of web mining arises from the increasingly large web pages and the high dimensionality associated with natural language. Since classifying web pages of an interesting class is often the first step of mining the web, web page categorization/classification is one of the essential techniques for web mining. One of the main challenges of web page classification is the high dimensional text vocabulary space. In this research, we propose a Hidden Naive Bayes based method for web page classification. We also propose to use the ReliefF feature selection method for selecting relevant words to improve the classification performance. Comparisons with traditional techniques are provided. Results on benchmark dataset show that the proposed methods are promising for accurate web page classification.

References

[1]

M. Robnik-Sikonja and I. Kononenko: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53(1--2):23.69 (2003)

Digital Library

[2]

Kononenko, I. and E. Simec: Induction of Decision Trees using ReliefF. In: G. Della Riccia, R. Kruse, and R. Viertl (eds.): Mathematical and Statistical Methods in Artificial Intelligence, CISM Courses and Lectures No. 363. Springer Verlag (1995)

[3]

I. Kononenko. Estimating Attributes: Analysis and Extensions of Relief. In Proceedings of ECML'94, pages 171.182. Springer-Verlag New York, Inc. (1994)

Digital Library

[4]

Kononenko, I., E. Simec, and M. Robnik- Sikonja: Overcoming the Myopia of Inductive Learning Algorithms with ReliefF. Applied Intelligence 7, 39--55 (1997)

Digital Library

[5]

Yuhang Wang and Fillia Makedon: Application of Relief-F Feature Filtering Algorithm to Selecting Informative Genes for Cancer Classification using Microarray Data (poster paper). In Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pages 497--498, Stanford, California (2004)

Digital Library

[6]

Kira, K. and L. A. Rendell: The Feature Selection Problem: Traditional Methods and New Algorithm. In: Proceedings of AAAI'92 (1992)

[7]

Kira, K. and L. A. Rendell: A Practical Approach to Feature Selection. In: D. Sleeman and P. Edwards (eds.): Machine Learning: Proceedings of International Conference (ICML'92). pp. 249--256, Morgan Kaufmann (1992)

[8]

H. Mase: Experiments on Automatic Web Page Categorization for IR System. Technical Report, Stanford Univ., Stanford, Calif. (1998)

[9]

I. Witten, E. Frank: Data Mining -- Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann (2000)

Digital Library

[10]

J. Ross Quinlan: Induction of Decision Trees. Machine Learning, 1:81--106 (1986)

Digital Library

[11]

Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA. (1993)

Digital Library

[12]

Industry Sector Dataset: http://www.cs.cmu.edu/~TextLearning/datasets.html (2005)

[13]

Corinna Cortes and Vladimir Vapnik: Support-vector Networks. Machine Learning, 20(3):273--297 (1995)

Digital Library

[14]

J. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola, eds., MIT Press (1998)

Digital Library

[15]

S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy: Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation, 13(3), pp 637--649 (2001)

Digital Library

[16]

Karl-Michael Schneider: A Comparison of Event Models for Naïve Bayes Anti-Spam E-Mail Filtering. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 307--314, April, (2003)

Digital Library

[17]

H. Zhang, L. Jiang, J. Su: Hidden Naive Bayes. Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05). pp.919--924, AAAI Press (2005)

Digital Library

[18]

Hwanjo Yu, Jiawei Han, Kevin Chen-Chuan Chang: PEBL: Web Page Classification without Negative Examples. IEEE Trans. Knowl. Data Eng. 16(1): 70--81 (2004)

Digital Library

[19]

S. Dumais, and H. Chen, Hierarchical Classification of Web Content. Proc. 23rd ACM Int'l Conf. Research and Development in Information Retrieval (SIGIR '00), pp. 256--263 (2000)

Digital Library

[20]

W. Wong and A. W. Fu: Finding Structure and Characteristics of Web Documents for Classification. Proc. 2000 ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD '00), pp. 96--105 (2000)

[21]

J. Yi and N. Sundaresan: A Classifier for Semi-Structured Documents, Proc. Sixth Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 340--344 (2000)

Digital Library

[22]

H. Oh, S. Myaeng, and M. Lee: A Practical Hypertext Categorization Method Using Links and Incrementally Available Class Information, Proc. 23rd ACM Int'l Conf. Research and Development in Information Retrieval (SIGIR '00), pp. 264--271 (2000)

Digital Library

[23]

L. K. Shih, David R. Karger: Using Urls and Table Layout for Web Classification Tasks. WWW 2004: 193--202 (2004)

Digital Library

[24]

Stemming: http://www.comp.lancs.ac.uk/computing/research/stemming/general/ (Access 2006)

[25]

Chickering, D. M. Learning Bayesian networks is NP-Complete. In Fisher, D., and Lenz, H., eds., Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag. 121--130 (1996)

Cited By

Al-Hameli BAlsewari ABasurra SBhogal JAli M(2023)Diabetes disease prediction system using HNB classifier based on discretization methodJournal of Integrative Bioinformatics10.1515/jib-2021-003720:1Online publication date: 23-Feb-2023
https://doi.org/10.1515/jib-2021-0037
Buathong WJarupunphol P(2018)Further Experiments on A Combination of Linear SVM Weight and ReliefF for Dimensionality ReductionProceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality10.1145/3293663.3293682(6-9)Online publication date: 23-Nov-2018
https://dl.acm.org/doi/10.1145/3293663.3293682
Roul RNanda APatel VSahay S(2015)Extreme learning machines in the field of text classification2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176204(1-7)Online publication date: Jun-2015
https://doi.org/10.1109/SNPD.2015.7176204
Show More Cited By

Index Terms

Automatic web pages categorization with ReliefF and Hidden Naive Bayes
1. Information systems
  1. Information systems applications

Recommendations

Active Hidden Naive Bayes
PCI '20: Proceedings of the 24th Pan-Hellenic Conference on Informatics

Over the years, many learners that take advantage of the Bayesian theory have been developed and proved to be both efficient and performant in terms of classification predictiveness. Hidden Naive Bayes is no exception since its polynomial complexity ...
A Comparison Study of Bayesian Classifiers on Web Pages Classification
Abstract
With the development of internet, web mining has become a hotspot of data mining. The first step of web mining is to classify web pages into interesting classes, so the classification is one of the essential techniques for web mining. In this ...
A comprehensive review of recursive Naïve Bayes Classifiers

In this paper we provide a comprehensive empirical review of a variant of the Recursive Naïve Baye Classifier (RNBC*) in comparison to simple Naïve Bayes and C4.5. We show that in terms of a zero one loss cost function for classification accuracy, RNBC* ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '07: Proceedings of the 2007 ACM symposium on Applied computing

March 2007

1688 pages

ISBN:1595934804

DOI:10.1145/1244002

Conference Chairs:
Yookun Cho
Seoul National University, Seoul, Korea
,
Roger L. Wainwright
University of Tulsa, Tulsa, Oklahoma
,
Hisham M. Haddad
Kennesaw State University, Kennesaw, Georgia
,
Sung Y. Shin
South Dakota State University, Brookings, South Dakota
,
Program Chair:
Yong Wan Koo
The University of Suwon, Gyeongggi-do, Korea

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SAC07

Sponsor:

SIGAPP

SAC07: The 2007 ACM Symposium on Applied Computing

March 11 - 15, 2007

Seoul, Korea

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
390
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Al-Hameli BAlsewari ABasurra SBhogal JAli M(2023)Diabetes disease prediction system using HNB classifier based on discretization methodJournal of Integrative Bioinformatics10.1515/jib-2021-003720:1Online publication date: 23-Feb-2023
https://doi.org/10.1515/jib-2021-0037
Buathong WJarupunphol P(2018)Further Experiments on A Combination of Linear SVM Weight and ReliefF for Dimensionality ReductionProceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality10.1145/3293663.3293682(6-9)Online publication date: 23-Nov-2018
https://dl.acm.org/doi/10.1145/3293663.3293682
Roul RNanda APatel VSahay S(2015)Extreme learning machines in the field of text classification2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176204(1-7)Online publication date: Jun-2015
https://doi.org/10.1109/SNPD.2015.7176204
Buathong WMeesad P(2013)Enhancing the Efficiency of Dimensionality Reduction Using a Combined Linear SVM Weight with ReliefF Feature Selection MethodThe 9th International Conference on Computing and InformationTechnology (IC2IT2013)10.1007/978-3-642-37371-8_16(125-134)Online publication date: 2013
https://doi.org/10.1007/978-3-642-37371-8_16
Wen HFang LGuan L(2008)Automatic Web Page Classification Using Various FeaturesProceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing10.1007/978-3-540-89796-5_38(368-376)Online publication date: 9-Dec-2008
https://dl.acm.org/doi/10.1007/978-3-540-89796-5_38

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents