A Chinese named entity recognition model: integrating label knowledge and lexicon information

Yuan, Yihan; Zhang, Qinghua; Zhou, Xiong; Gao, Man

doi:10.1007/s13042-024-02207-2

A Chinese named entity recognition model: integrating label knowledge and lexicon information

Original Article
Published: 16 May 2024

(2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Yihan Yuan^1,2,3,4,
Qinghua Zhang ORCID: orcid.org/0000-0002-6154-4656^2,3,4,
Xiong Zhou^1,2,3,4 &
…
Man Gao^1,2,3,4

153 Accesses
Explore all metrics

Abstract

Chinese named entity recognition (CNER) is one of the important tasks in the field of information extraction. And different divisions of CNER for text processing units can be generally classified into character granularity and word granularity. These two approaches are not only limited by the applicable scenarios, but also susceptible to ambiguity, errors or out of vocabulary. In addition, the direct formalization of entity identification into question answering questions does not take full advantage of the knowledge information of the labels. Therefore, a CNER model incorporating label knowledge and lexicon information (LkLi-CNER) is proposed in this paper. The model first integrates lexical enhancement information directly into the BERT layer for full interaction by matching sentences with lexicons on a character-based basis. And then a priori knowledge is introduced to fuse the representation of label description text into the enhanced text representation, so that the model can be further enhanced by learning semantic information from the entity labels themselves. Finally, the probability of being the start and end of each category is calculated for each token, and the start-end group with the highest probability is selected as the output. The experimental results show that the LkLi-CNER model is significantly better than baseline, and good results are achieved simultaneously on four CNER datasets in different fields, which proves the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unify the Usage of Lexicon in Chinese Named Entity Recognition

LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition

Automatically building large-scale named entity recognition corpora from Chinese Wikipedia

Article 07 November 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets during the current study are available in the Github repository at https://github.com/jiesutd/LatticeLSTM and in the LDC catalog at https://catalog.ldc.upenn.edu/LDC2011T13. Additional data related to this study can be made available upon reasonable request.

References

Bunescu R, Mooney R (2005) A shortest path dependency kernel for relation extraction. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, pp 724–731
Chen Y, Xu L, Liu K, et al (2015) Event extraction via dynamic multi-pooling convolutional neural networks. In: Association for Computational Linguistics and International Joint Conference on Natural Language Processing, pp 167–176
Chen Y, Chen T, Ebner S, et al (2020) Reading the manual: event extraction as definition comprehension. In: Proceedings of the Fourth Workshop on Structured Prediction for NLP, pp 74–83
Collobert R, Weston J, Bottou L et al (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(76):2493–2537
Google Scholar
Cui Y, Che W, Liu T et al (2021) Pre-training with whole word masking for Chinese Bert. IEEE/ACM Trans Audio, Speech, Lang Process 29:3504–3514
Article Google Scholar
Dandapat S, Way A (2016) Improved named entity recognition using machine translation-based cross lingual information. Computación y Sistemas 20(3):495–504
Article Google Scholar
Devlin J, Chang MW, Lee K, et al (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies, pp 4171–4186
Diefenbach D, Lopez V, Singh K et al (2018) Core techniques of question answering systems over knowledge bases: a survey. Knowl Inf Syst 54(2):245–275
Google Scholar
Ding R, Xie P, Zhang X, et al (2019) A neural multi-digraph model for Chinese Ner with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 1462–1467
Ekbal A, Saha S, Sikdar UK (2016) On active annotation for named entity recognition. Int J Mach Learn Cybern 7:623–640
Article Google Scholar
Gao J, Li M, Huang CN et al (2005) Chinese word segmentation and named entity recognition: a pragmatic approach. Comput Linguist 31(4):531–574
Article Google Scholar
Gui T, Ma R, Zhang Q et al (2019) CNN-based Chinese Ner with lexicon rethinking. IJCAI, pp 4982–4988
Han X, Yue Q, Chu J et al (2022) Multi-feature fusion transformer for Chinese named entity recognition. In: Proceedings of 2022 41st Chinese Control Conference (CCC), pp 4227–4232
He J, Wang H (2008) Chinese named entity recognition and word segmentation based on character. In: Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing, pp 128–132
Hu Y, Verberne S, Scott D et al (2020) Named entity recognition for Chinese biomedical patents. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 627–637
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. Preprint at https://doi.org/10.48550/arXiv.1508.01991
Kim G, Lee C, Jo J et al (2020) Automatic extraction of named entities of cyber threats using a deep bi-lstm-crf network. Int J Mach Learn Cybern 11:2341–2355
Article Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations, pp 1–15
Lample G, Ballesteros M, Subramanian S et al (2016) Neural architectures for named entity recognition. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies, pp 260–270
Levow GA (2016) The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN workshop on Chinese language processing, pp 108–117
Levy O, Seo M, Choi E et al (2017) Zero-shot relation extraction via reading comprehension. In: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp 333–342
Li D, Luo S, Zhang X et al (2022) Review on named entity recognition. J Front Comput Sci Technol 16(9):1954–1968
Google Scholar
Li H, Hagiwara M, Li Q et al (2014) Comparison of the impact of word segmentation on name tagging for Chinese and Japanese. LREC, pp 2532–2536
Li J, Meng K (2021) Mfe-ner: multi-feature fusion embedding for Chinese named entity recognition. Preprint at arXiv: abs/1911.04474
Li L, Dai Y, Tang D et al (2022b) Markbert: marking word boundaries improves Chinese Bert. In: Proceedings of the Aaai Conference on Artificial Intelligence, pp 356–361.
Li X, Yin F, Sun Z et al (2019) Entity-relation extraction as multi-turn question answering. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 1340–1350
Li X, Feng J, Meng Y et al (2020a) A unified MRC framework for named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 5849–5859
Li X, Yan H, Qiu X, et al (2020b) Flat: Chinese ner using flat-lattice transformer. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 6836–6842
Lin H, Lu Y, Han X et al (2019) Cost-sensitive regularization for label confusion-aware event detection. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 5278–5283
Liu W, Xu T, Xu Q, et al (2019) An encoding strategy based word-character lstm for Chinese Ner. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies, pp 2379–2389
Liu W, Fu X, Zhang Y, et al (2021) Lexicon enhanced Chinese sequence labelling using Bert adapter. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 5847–5858
Liu Z, Zhu C, Zhao T (2010) Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words? In: Proceedings of the 6th International Conference on Intelligent Computing, pp 634–640
Ma R, Peng M, Zhang Q et al (2020) Simplify the usage of lexicon in Chinese Ner. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 5951–5960
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1064–1074
Peng N, Dredze M (2015) Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1535–1544
Riedel S, Yao L, McCallum A et al (2013) Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 74–84
Shen Y, Wang X, Tan Z et al (2022) Parallel instance query network for named entity recognition. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 947–961
Song Y, Shi S, Li J et al (2018) Directional skip-gram: explicitly distinguishing left and right context for word embeddings. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies, pp 175–180
Sui D, Chen Y, Liu K et al (2019) Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 3830–3840
Sun Z, Li X, Sun X et al (2021) Chinese Bert: Chinese pretraining enhanced by glyph and pinyin information. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 2065–2075
Wang X, Yang C, Guan R (2018) A comparative study for biomedical named entity recognition. Int J Mach Learn Cybern 9:373–382
Article Google Scholar
Wang Z, Qu Y, Chen L et al (2018b) Label-aware double transfer learning for cross-specialty medical named entity recognition. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies, pp 1–15
Weischedel R, Pradhan S, Ramshaw L et al (2011) Ontonotes release 4.0. LDC2011T03. Philadelphia, Penn:Linguistic Data Consortium. Preprint at https://doi.org/10.35111/102m-dr17
Book Google Scholar
Wu S, Song X, Feng Z (2021) Mect: multi-metadata embedding based cross-transformer for Chinese named entity recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp 1529–1539
Xu L, Fu X, Wu Y et al (2022) Vocabulary enhancement in Chinese-named entity recognition. In: Proceedings of the NLPCC 2016 and ICCPOL 2016, pp 581–586
Yang P, Cong X, Sun Z et al (2021) Enhanced language representation with label knowledge for span extraction. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3577–3588
Zhang N, Xu G, Zhang Z et al (2019) Mifm: multi-granularity information fusion model for Chinese named entity recognition. IEEE Access 7:181648–181655
Article Google Scholar
Zhang Y, Yang J (2018) Chinese Ner using lattice lstm. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1554–1564
Žukov-Gregorič A, Bachrach Y, Coope S (2018) Named entity recognition with parallel recurrent neural networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 69–74

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62276038), the Joint Fund of Chongqing Natural Science Foundation for Innovation and Development (No.CSTB2023NSCQ-LZX0164), the Chongqing Talent Program (No.CQYC20210202215), the Foundation for Innovative Research Groups of Natural Science Foundation of Chongqing (No. cstc2019jcyjcxttX0002), and the Key Cooperation Project of Chongqing Municipal Education Commission (HZ2021008).

Author information

Authors and Affiliations

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Yihan Yuan, Xiong Zhou & Man Gao
Key Laboratory of Big Data Intelligent Computing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Yihan Yuan, Qinghua Zhang, Xiong Zhou & Man Gao
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Yihan Yuan, Qinghua Zhang, Xiong Zhou & Man Gao
Key Laboratory of Tourism Multisource Data Perception and Decision, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Yihan Yuan, Qinghua Zhang, Xiong Zhou & Man Gao

Authors

Yihan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Man Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinghua Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yuan, Y., Zhang, Q., Zhou, X. et al. A Chinese named entity recognition model: integrating label knowledge and lexicon information. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02207-2

Download citation

Received: 24 May 2023
Accepted: 05 May 2024
Published: 16 May 2024
DOI: https://doi.org/10.1007/s13042-024-02207-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Chinese named entity recognition model: integrating label knowledge and lexicon information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unify the Usage of Lexicon in Chinese Named Entity Recognition

LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition

Automatically building large-scale named entity recognition corpora from Chinese Wikipedia

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Chinese named entity recognition model: integrating label knowledge and lexicon information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unify the Usage of Lexicon in Chinese Named Entity Recognition

LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition

Automatically building large-scale named entity recognition corpora from Chinese Wikipedia

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation