Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1244002.1244143acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Automatic web pages categorization with ReliefF and Hidden Naive Bayes

Published: 11 March 2007 Publication History
  • Get Citation Alerts
  • Abstract

    A great challenge of web mining arises from the increasingly large web pages and the high dimensionality associated with natural language. Since classifying web pages of an interesting class is often the first step of mining the web, web page categorization/classification is one of the essential techniques for web mining. One of the main challenges of web page classification is the high dimensional text vocabulary space. In this research, we propose a Hidden Naive Bayes based method for web page classification. We also propose to use the ReliefF feature selection method for selecting relevant words to improve the classification performance. Comparisons with traditional techniques are provided. Results on benchmark dataset show that the proposed methods are promising for accurate web page classification.

    References

    [1]
    M. Robnik-Sikonja and I. Kononenko: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53(1--2):23.69 (2003)
    [2]
    Kononenko, I. and E. Simec: Induction of Decision Trees using ReliefF. In: G. Della Riccia, R. Kruse, and R. Viertl (eds.): Mathematical and Statistical Methods in Artificial Intelligence, CISM Courses and Lectures No. 363. Springer Verlag (1995)
    [3]
    I. Kononenko. Estimating Attributes: Analysis and Extensions of Relief. In Proceedings of ECML'94, pages 171.182. Springer-Verlag New York, Inc. (1994)
    [4]
    Kononenko, I., E. Simec, and M. Robnik- Sikonja: Overcoming the Myopia of Inductive Learning Algorithms with ReliefF. Applied Intelligence 7, 39--55 (1997)
    [5]
    Yuhang Wang and Fillia Makedon: Application of Relief-F Feature Filtering Algorithm to Selecting Informative Genes for Cancer Classification using Microarray Data (poster paper). In Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pages 497--498, Stanford, California (2004)
    [6]
    Kira, K. and L. A. Rendell: The Feature Selection Problem: Traditional Methods and New Algorithm. In: Proceedings of AAAI'92 (1992)
    [7]
    Kira, K. and L. A. Rendell: A Practical Approach to Feature Selection. In: D. Sleeman and P. Edwards (eds.): Machine Learning: Proceedings of International Conference (ICML'92). pp. 249--256, Morgan Kaufmann (1992)
    [8]
    H. Mase: Experiments on Automatic Web Page Categorization for IR System. Technical Report, Stanford Univ., Stanford, Calif. (1998)
    [9]
    I. Witten, E. Frank: Data Mining -- Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann (2000)
    [10]
    J. Ross Quinlan: Induction of Decision Trees. Machine Learning, 1:81--106 (1986)
    [11]
    Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA. (1993)
    [12]
    Industry Sector Dataset: http://www.cs.cmu.edu/~TextLearning/datasets.html (2005)
    [13]
    Corinna Cortes and Vladimir Vapnik: Support-vector Networks. Machine Learning, 20(3):273--297 (1995)
    [14]
    J. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola, eds., MIT Press (1998)
    [15]
    S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy: Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation, 13(3), pp 637--649 (2001)
    [16]
    Karl-Michael Schneider: A Comparison of Event Models for Naïve Bayes Anti-Spam E-Mail Filtering. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 307--314, April, (2003)
    [17]
    H. Zhang, L. Jiang, J. Su: Hidden Naive Bayes. Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05). pp.919--924, AAAI Press (2005)
    [18]
    Hwanjo Yu, Jiawei Han, Kevin Chen-Chuan Chang: PEBL: Web Page Classification without Negative Examples. IEEE Trans. Knowl. Data Eng. 16(1): 70--81 (2004)
    [19]
    S. Dumais, and H. Chen, Hierarchical Classification of Web Content. Proc. 23rd ACM Int'l Conf. Research and Development in Information Retrieval (SIGIR '00), pp. 256--263 (2000)
    [20]
    W. Wong and A. W. Fu: Finding Structure and Characteristics of Web Documents for Classification. Proc. 2000 ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD '00), pp. 96--105 (2000)
    [21]
    J. Yi and N. Sundaresan: A Classifier for Semi-Structured Documents, Proc. Sixth Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 340--344 (2000)
    [22]
    H. Oh, S. Myaeng, and M. Lee: A Practical Hypertext Categorization Method Using Links and Incrementally Available Class Information, Proc. 23rd ACM Int'l Conf. Research and Development in Information Retrieval (SIGIR '00), pp. 264--271 (2000)
    [23]
    L. K. Shih, David R. Karger: Using Urls and Table Layout for Web Classification Tasks. WWW 2004: 193--202 (2004)
    [24]
    Stemming: http://www.comp.lancs.ac.uk/computing/research/stemming/general/ (Access 2006)
    [25]
    Chickering, D. M. Learning Bayesian networks is NP-Complete. In Fisher, D., and Lenz, H., eds., Learning from Data: Artificial Intelligence and Statistics V. Springer-Verlag. 121--130 (1996)

    Cited By

    View all
    • (2023)Diabetes disease prediction system using HNB classifier based on discretization methodJournal of Integrative Bioinformatics10.1515/jib-2021-003720:1Online publication date: 23-Feb-2023
    • (2018)Further Experiments on A Combination of Linear SVM Weight and ReliefF for Dimensionality ReductionProceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality10.1145/3293663.3293682(6-9)Online publication date: 23-Nov-2018
    • (2015)Extreme learning machines in the field of text classification2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176204(1-7)Online publication date: Jun-2015
    • Show More Cited By

    Index Terms

    1. Automatic web pages categorization with ReliefF and Hidden Naive Bayes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
      March 2007
      1688 pages
      ISBN:1595934804
      DOI:10.1145/1244002
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 March 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Hidden Naive Bayes
      2. ReliefF feature selection
      3. web mining

      Qualifiers

      • Article

      Conference

      SAC07
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Diabetes disease prediction system using HNB classifier based on discretization methodJournal of Integrative Bioinformatics10.1515/jib-2021-003720:1Online publication date: 23-Feb-2023
      • (2018)Further Experiments on A Combination of Linear SVM Weight and ReliefF for Dimensionality ReductionProceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality10.1145/3293663.3293682(6-9)Online publication date: 23-Nov-2018
      • (2015)Extreme learning machines in the field of text classification2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176204(1-7)Online publication date: Jun-2015
      • (2013)Enhancing the Efficiency of Dimensionality Reduction Using a Combined Linear SVM Weight with ReliefF Feature Selection MethodThe 9th International Conference on Computing and InformationTechnology (IC2IT2013)10.1007/978-3-642-37371-8_16(125-134)Online publication date: 2013
      • (2008)Automatic Web Page Classification Using Various FeaturesProceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing10.1007/978-3-540-89796-5_38(368-376)Online publication date: 9-Dec-2008

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media