Article

Fast text categorization based on a novel class space model

Authors:

Yushu LiuAuthors Info & Claims

MICAI'06: Proceedings of the 5th Mexican international conference on Artificial Intelligence

Pages 1007 - 1016

https://doi.org/10.1007/11925231_96

Published: 13 November 2006 Publication History

Abstract

Automatic categorization has been shown to be an accurate alternative to manual categorization in which documents are processed and automatically assigned to pre-defined categories. The accuracy of different methods for categorization has been studied largely, but their efficiency has seldom been mentioned. Aiming to maintain effectiveness while improving efficiency, we proposed a fast algorithm for text categorization and a compressed document vector representation method based on a novel class space model. The experiments proved our methods have better efficiency and tolerable effectiveness.

References

[1]

Yang, Y. & Liu, X. A re-examination of text categorization. The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 42-49). Morgan Kaufmann, 1999.

[2]

Rocchio, J. Relevance feedback in information retrieval. The Smart Retrieval System-Experiments in Automatic Document Proceeding, (pp. 313-323). Prentice-Hall, Englewood, Cliffs, New Jersy. 1971.

[3]

Yang, Y. Expert Network: Effective and efficient Learning from human decisions in text categorization and retrieval. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, (pp. 13-22). Dublin, Ireland, July, 1994.

[4]

Salton, G. & Mcgill, M. J. An Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

[5]

Salton, G. Automatic text processing: the transformation. Addison Wesley, 1989.

[6]

Aas, K. & Eikvil, L. Text categorisation: A survey. Technical report, Norwegian Computing Center. http://citeseer.nj.nec.com/aas99text.html, 1999.

[7]

Huang, R. & Guo, S.H. Research and Implementation of Text Categorization System Based on Class Space Model (in Chinese). Application Research of Computers, 22(8), 60-64, 2005.

[8]

Arango, G., Williams, G. & Iscoe, N. Domain Modeling for Software. The International Conference on. Software Engineering. ACM Press, Austin, Texas, 1991.

[9]

Lewis, D.D. Reuters-21578 Text Categorization Test Collection. http://www.daviddlewis.com/resources/testcollections/reuters21578, 2004.

[10]

Yang, Y. & Pedersen, J.O. A comparative study on feature selection in text categorization. Proceedings of ICML 297, 14th International Conference on Machine Learning, (pp. 412-420). San Francisco: Morgan Kaufmann Publishers Inc., 1997.

[11]

Sebastiani, F. Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47, 2002.

[12]

Zhou, S.G., Ling, T.W., Guan, J.H., Hu, J.T. & Zhou, A.Y. Fast text classification: a training-corpus pruning based approach. Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on 26-28 March 2003, (pp. 127-136).

[13]

Lewi, D.D. & Ringuette, M. A comparison of two learning algorithms for text classification. In Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), (pp. 81-93), 1994.

[14]

Wiener, E., Pedersen, J.O., & Weigend, A.S. A neural network approach to topic spotting. The Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), (pp. 317-332). Las Vegas, NV, 1995.

[15]

Shanks, V. & Williams, H.E. Fast categorisation of large document collections. String Processing and Information Retrieval (SPIRE 2001), (pp. 194-204), 2001.

[16]

Vapnik, V. The Nature of Statistical Learning Theory. New York. Springer-Verlag, 1995.

[17]

Yang, Y., Chute, C.G. An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systmes (TOIS), 12(3), (pp. 252-277), 1994.

[18]

Aote C., Damerau, F. & Weiss, S. Text mining with decision rules and decision trees. Workshop on Learning from text and the Web, Conference on Automated Learning and Discovery, 1998.

[19]

Mitchell, T. Machine Learning. McGraw: Hill, 1996.

Recommendations

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Text categorization pertains to the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the subsequent assignment of unclassified documents to appropriate categories. Most ...
An Evaluation of Passage-Based Text Categorization

Researches in text categorization have been confined to whole-document-level classification, probably due to lack of full-text test collections. However, full-length documents available today in large quantities pose renewed interests in text ...
Arabic Text Categorization Based on Arabic Wikipedia

This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

MICAI'06: Proceedings of the 5th Mexican international conference on Artificial Intelligence

November 2006

1232 pages

ISBN:3540490264

Editors:
Alexander Gelbukh
Center for Computing Research, National Polytechnic Institute, Mexico City, México
,
Carlos Alberto Reyes-Garcia
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro No. 1, Sta. Ma. Tonanzintla, Puebla, México

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 November 2006

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents