Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11925231_97guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A high performance prototype system for chinese text categorization

Published: 13 November 2006 Publication History
  • Get Citation Alerts
  • Abstract

    How to improve the accuracy of categorization is a big challenge in text categorization. This paper proposes a high performance prototype system for Chinese text categorization, which mainly includes feature extraction subsystem, feature selection subsystem, and reliability evaluation subsystem for classification results. The proposed prototype system employs a two-step classifying strategy. First, the features that are effective for all testing texts are used to classify texts. Then, the reliability evaluation subsystem evaluates the classification results directly according to the outputs of the classifier, and divides them into two parts: texts classified reliable or not. Only for the texts classified unreliable at the first step, go to the second step. Second, a classifier uses the features that are more subtle and powerful for those texts classified unreliable to classify the texts. The proposed prototype system is successfully implemented in a case that exploits a Naive Bayesian classifier as the classifier in the first and second steps. Experiments show that the proposed prototype system achieves a high performance.

    References

    [1]
    Sebastiani, F. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1-47, 2002.
    [2]
    Lewis, D. Naive Bayes at Forty: The Independence Assumption in Information Retrieval. In Proceedings of ECML-98, 4-15, 1998.
    [3]
    Salton, G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.
    [4]
    Mitchell, T.M. Machine Learning. McCraw Hill, New York, NY, 1996.
    [5]
    Yang, Y., and Liu, X. A Re-examination of Text Categorization Methods. In Proceedings of SIGIR-99, 42-49, 1999.
    [6]
    Xinghua Fan. Causality Reasoning and Text Categorization, Postdoctoral Research Report of Tsinghua University, P.R. China, April 2004.
    [7]
    Xinghua Fan, Maosong Sun, Key-sun Choi, and Qin Zhang. Classifying Chinese texts in two steps. IJCNLP2005, LNAI3651, pp. 302-313, 2005.
    [8]
    Xinghua Fan, Maosong Sun. A high performance two-class Chinese text categorization method. Chinese Journal of Computers, 29(1), 124-131, 2006.
    [9]
    Dumais, S.T., Platt, J., Hecherman, D., and Sahami, M. Inductive Learning Algorithms and Representation for Text Categorization. In Proceedings of CIKM-98, Bethesda, MD, 148-155, 1998.
    [10]
    Sahami, M., Dumais, S., Hecherman, D., and Horvitz, E. A. Bayesian Approach to Filtering Junk E-Mail. In Learning for Text Categorization: Papers from the AAAI Workshop, 55-62, Madison Wisconsin. AAAI Technical Report WS-98-05, 1998.

    Index Terms

    1. A high performance prototype system for chinese text categorization
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          MICAI'06: Proceedings of the 5th Mexican international conference on Artificial Intelligence
          November 2006
          1232 pages
          ISBN:3540490264
          • Editors:
          • Alexander Gelbukh,
          • Carlos Alberto Reyes-Garcia

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 13 November 2006

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Jul 2024

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media