Abstract
Most of the Chinese pre-trained models adopt characters as basic units for downstream tasks. However, these models ignore the information carried by words and thus lead to the loss of some important semantics. In this paper, we propose a new method to exploit word structure and integrate lexical semantics into character representations of pre-trained models. Specifically, we project a word’s embedding into its internal characters’ embeddings according to the similarity weight. To strengthen the word boundary information, we mix the representations of the internal characters within a word. After that, we apply a word-to-character alignment attention mechanism to emphasize important characters by masking unimportant ones. Moreover, in order to reduce the error propagation caused by word segmentation, we present an ensemble approach to combine segmentation results given by different tokenizers. The experimental results show that our approach achieves superior performance over the basic pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks: sentiment classification, sentence pair matching, natural language inference and machine reading comprehension. We make further analysis to prove the effectiveness of each component of our model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Conneau, A., et al.: XNLI: evaluating cross-lingual sentence representations. In EMNLP, pp. 2475–2485 (2018)
Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT. arXiv preprint arXiv:1906.08101 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL, pp. 4171–4186 (2019)
Diao, S., Bai, J., Song, Y., Zhang, T., Wang, Y.: ZEN: pre-training Chinese text encoder enhanced by n-gram representations. In EMNLP, pp. 4729–4740 (2019)
Jiao, Z., Sun, S., Sun, K.: Chinese lexical analysis with deep BI-GRU-CRF network. arXiv preprint arXiv:1807.01882 (2018)
Lai, Y., Liu, Y., Feng, Y., Huang, S., Zhao, D.: Lattice-BERT: leveraging multi-granularity representations in Chinese pre-trained language models. In NAACL, pp. 1716–1731 (2021)
Li, X., Yan, H., Qiu, X., Huang, X.: Flat: Chinese NER using flat-lattice transformer. In ACL, (2020)
Liu, W., Fu, X., Zhang, Y., Xiao, W.: Lexicon enhanced Chinese sequence labelling using BERT adapter. In ACL, pp. 5847–5858 (2021)
Liu, X., Chen, Q., Deng, C., Zeng, H., Chen, J., Li, D., Tang, B.: Lcqmc: A large-scale chinese question matching corpus. In COLING, pp. 1952–1962 (2018)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam (2018)
Luo, R., Xu, J., Zhang, Y., Ren, X., Sun, X.: Pkuseg: A toolkit for multi-domain chinese word segmentation. CoRR abs/1906.11455 (2019)
Ma, R., Peng, M., Zhang, Q., Huang, X.: Simplify the usage of lexicon in Chinese NER. In ACL, pp. 5951–5960 (2019)
Mengge, X., Yu, B., Liu, T., Zhang, Y., Meng, E., Wang, B.: Porous lattice transformer encoder for Chinese NER. In COLING, pp. 3831–3841 (2020)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In NeurIPS, pp. 8026–8037 (2019)
Shao, C.C., Liu, T., Lai, Y., Tseng, Y., Tsai, S.: DRCD: a Chinese machine reading comprehension dataset. arXiv preprint arXiv:1806.00920 (2018)
Song, Y., Shi, S., Li, J., Zhang, H.: Directional skip-gram: explicitly distinguishing left and right context for word embeddings. In NAACL, pp. 175–180 (2018)
Su, J., Tan, Z., Xiong, D., Ji, R., Shi, X., Liu, Y.: Lattice-based recurrent neural network encoders for neural machine translation. In AAAI, pp. 3302–3308 (2017)
Sun, Y., et al.: Ernie: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNET: generalized autoregressive pretraining for language understanding. In NeurIPS, pp. 5754–5764 (2019)
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In ACL, pp. 1554–1564 (2018)
Zhu, D.: Lexical notes on Chinese grammar (in Chinese). The Commercial Press (1982)
Acknowledgement
This work is supported by the National Hi-Tech RD Program of China (2020AAA0106600), the National Natural Science Foundation of China (62076008) and the KeyProject of Natural Science Foundation of China (61936012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, W., Sun, R., Wu, Y. (2022). Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-17120-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)