Abstract
In the territory of text categorization, the distribution and quality of sample set is highly influential to categorization result. Associated rule categorization ARC-BC is effective under common circumstances. The accuracy of categorization obviously falls as distribution of feature words of training samples is uneven. In this paper, a Chinese text classification approach was proposed based on sample weighting associated rules (SW-ARC). The approach improved substantial classification efficiency by performing self-adapting sample weights adjustment. Experiment result shows SW-ARC can solve the quality fall caused by uneven distribution of feature words. Macro-average recall of open test increases from 50% of ARC-BC to 70% of SW-ARC, Macro-average precision increases from 28% of ARC-BC to 70% of SW-ARC.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple classification rules. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), San Jose, California, US (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), New York City, NY, pp. 80–86 (1998)
Zaïane, O.R., Antonie, M.L.: Classifying text documents by associating terms with text categories. In: Proceedings of the 13th Australasian Database Conference (ADC), Melbourne, Australia, pp. 215–222 (2002)
Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, pp. 148–157 (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499 (1994)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX (2000)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval, Berkeley, CA, pp. 42–49 (1999)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML), Nashville, US (1997)
Michell, T.M.: Machine Learning. China Machine Press, Beijing (2003)
Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems (TOIS) 12(3), 252–277 (1994)
Wiener, E.: A neural network approach to topic spotting. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR), Las Vegas, US (1995)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning (ECML), Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, J., Chen, X., Chen, Y., Hu, Y. (2005). Association Classification Based on Sample Weighting. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_77
Download citation
DOI: https://doi.org/10.1007/11540007_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)