Abstract
This paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration’, is presented. The experiments based on a manually word-segmented and part-of-speech tagged corpus with about 5.8 million words show that this true integration achieves 98.61% F-measure in word segmentation, 95.18% F-measure in part-of-speech tagging, and 93.86% F-measure in word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent. The experimental results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 133–143 (2003)
Emerson, T.: The Second International Chinese Word Segmentation Bakeoff. In: Proceedings of the Third SIHAN Workshop on Chinese Language Processing, Jeju, Korea (2005)
Liang, N.Y.: Knowledge of Chinese Word Segmentation. Journal of Chinese Information Processing 4(2), 29–33 (1990)
Sun, M.S., Lai, B.Y., et al.: Some Issues on Statistical Approach to Chinese Word Identification. In: Proceedings of the 3rd International Conference on Chinese Information Processing, Beijing, pp. 246–253 (1992)
Chang, C.H., Chen, C.D.: A Study on Integrating Chinese Word Segmentation and Part-of-speech Tagging. Communications of COLIPS 3(2), 69–77 (1993)
Lai, B.Y., Sun, M.S., et al.: Tagging-based First Order Markov Model Approach to Chinese Word Identification. In: Proceedings of 1992 International Conference on Computer Processing of Chinese and Oriental Languages, Florida, USA (1992)
Bai, S.H.: The Method of Integration of Word Segmentation and Part-of-speech Tagging in Chinese Texts. In: Advance and Application of Computational Linguistics, pp. 56–61. Tsinghua University Press, Beijing (1995)
Lai, B.Y., Sun, M.S., et al.: Chinese Word Segmentation and Part-of-speech Tagging in One Step. In: Proceedings of International Conference: 1997 Research on Computational Linguistics, Taipei, pp. 229–236 (1997)
Wu, A.D., Jiang, Z.X.: Word Segmentation in Sentence Analysis. In: Proceedings of the 1998 International Conference on Chinese Information Processing, Beijing, pp. 169–180 (1998)
Sun, M.S., Xu, D.L., Tsou, B.K.: Integrated Chinese Word Segmentation and Part-of-speech Tagging Based on the Divide-and-Conquer Strategy. In: Proceedings of IEEE-NLPKE, Beijing (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, M., Xu, D., Tsou, B.K., Lu, H. (2006). An Integrated Approach to Chinese Word Segmentation and Part-of-Speech Tagging. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_31
Download citation
DOI: https://doi.org/10.1007/11940098_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)