research-article

Named Entity Recognition in Persian Language based on Self-attention Mechanism with Weighted Relational Position Encoding

Authors:

Ebrahim Ganjalipour,

Amir Hossein Refahi Sheikhani,

Sohrab Kordrostami,

Ali Asghar HosseinzadehAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, Issue 12

Article No.: 262, Pages 1 - 23

https://doi.org/10.1145/3633513

Published: 19 December 2023 Publication History

Abstract

Named-entity Recognition (NER) is challenging for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER Model. We propose a customized model based on linguistic properties to compensate for this lack of resources in low-resource languages like Persian. According to pronoun-dropping and subject-object-verb word order specifications of Persian, we propose new weighted relative positional encoding in the self-attention mechanism. Using the pointwise mutual information factor, we inject co-occurrence information into context representation. We trained and tested our model on three different datasets: Arman, Peyma, and ParsTwiNER, and our method achieved 94.16%, 93.36%, and 84.49% word-level F1 scores, respectively. The experiments showed that our proposed model performs better than other Persian NER models. Ablation Study and Case Study also showed that our method can converge faster and is less prone to overfitting.

References

[1]

P. Liu, Y. Guo, F. Wang, and G. Li. 2022. Chinese named entity recognition: The state of the art. Neurocomputing 473 (2022), 37–53.

Digital Library

[2]

A. Vaswani et al. 2017. Attention is all you need. Adv. Neural Info. Process. Syst. 30 (2017).

[3]

P. Shaw, J. Uszkoreit, and A. Vaswani. 2018. Self-attention with relative position representations. Retrieved from https://arXiv:1803.02155

[4]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[5]

A. Conneau et al. 2019. Unsupervised cross-lingual representation learning at scale. Retrieved from https://arXiv:1911.02116

[6]

K. Church and P. Hanks. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist. 16, 1 (1990), 22–29.

Digital Library

[7]

C. Wang, K. Cho, and D. Kiela. 2018. Code-switched named entity recognition with embedding attention. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-Switching.

[8]

P. S. Mortazavi and M. Shamsfard. 2009. Named entity recognition in Persian texts. In Proceedings of the 15th National Computer Society of Iran Conference.

[9]

S. Rahati-Ghoochani, S. A. Esfahani, and J. Nader. 2010. Persian name entity recognition and classification. In Proceedings of the Signal and Data Processing Conference.

[10]

M. K. Khormuji and M. Bazrafkan. 2014. Persian named entity recognition based with local filters. Int. J. Comput. Appl. 100, 4 (2014), 1–6.

[11]

F. Ahmadi and H. Moradi. 2015. A hybrid method for Persian named entity recognition. In Proceedings of the 7th Conference on Information and Knowledge Technology (IKT’15).

[12]

H. Poostchi, E. Z. Borzeshi, M. Abdous, and M. Piccardi. 2016. PersoNER: Persian named-entity recognition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16).

[13]

M. H. Bokaei and M. Mahmoudi. 2018. Improved deep Persian named entity recognition. In Proceedings of the 9th International Symposium on Telecommunications (IST’18). 381–386.

[14]

M. S. Shahshahani, M. Mohseni, A. Shakery, and H. Faili. 2019. PAYMA: A tagged corpus of Persian named entities, Signal Data Process. Appl. 16, 1 (2019), 91–110.

[15]

S. Momtazi and F. Torabi. 2020. Named entity recognition in Persian text using deep learning. Signal Data Process. 16, 4 (2020), 93–112.

[16]

F. Balouchzahi and H. Shashirekha. 2020. PUNER-Parsi ULMFiT for named-entity recognition in Persian texts. In Proceedings of the Congress on Intelligent Systems. Springer, 75–88

[17]

M. S. Sheikhaei, H. Zafari, and Y. Tian. 2021. Joined type length encoding for nested named entity recognition. Trans. Asian Low-Resour. Lang. Info. Process. 21, 3 (2021), 1–23.

[18]

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. 2018. Improving language understanding by generative pre-training. http://scholar.google.com/scholar_lookup?hl=en&publication_year=2018&author=Alec+Radford&author=Karthik+Narasimhan&author=Tim+Salimans&author=Ilya+Sutskever&title=Improving+language+understanding+by+generative+pre-training

[19]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https://arXiv:1810.04805

[20]

N. Taghizadeh, Z. Borhanifard, M. GolestaniPour, and H. Faili. 2020. NSURL-2019 task 7: Named entity recognition (NER) in Farsi. Retrieved from https://arXiv:2003.09029

[21]

M. Mohseni and A. Tebbifakhr. 2019. MorphoBERT: A Persian NER system with BERT and morphological analysis. In Proceedings of the 1st International Workshop on NLP Solutions for Under Resourced Languages Co-located with ICNLSP.

[22]

E. Taher, S. A. Hoseini, and M. Shamsfard. 2020. Beheshti-NER: Persian named entity recognition using BERT. Retrieved from https://arXiv:2003.08875

[23]

M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri. 2021. Parsbert: Transformer-based model for Persian language understanding. Neural Process. Lett. 53, 6 (2021), 3831–3847.

Digital Library

[24]

M. Aghajani, A. Badri, and H. Beigy. 2021. ParsTwiNER: A corpus for named entity recognition at informal Persian. In Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT’21). 131–136.

[25]

M. M. Abdollah Pour and S. Momtazi. 2022. Comparative study of text representation and learning for Persian named entity recognition. ETRI J. 44, 5 (2022), 794–804.

[26]

T. Kudo and J. Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. Retrieved from https://arXiv:1808.06226

[27]

P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. J. Artific. Intell. Res. 37 (2010), 141–188.

Digital Library

[28]

J. A. Bullinaria and J. P. Levy. 2007. Extracting semantic representations from word co-occurrence statistics: A computational study. Behav. Res. Methods 39, 3 (2007), 510–526.

[29]

D. Kiela and S. Clark. 2014. A systematic study of semantic vector space model parameters. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and Their Compositionality (CVSC’14).

[30]

Y. Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. Retrieved from https://arXiv:1907.11692

[31]

E. F. Sang and F. De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Retrieved from https://arxiv.org/abs/cs/0306050

[32]

S. Hosseinnejad, Y. Shekofteh, and T. Emami Azadi. 2017. A'laam corpus: A standard corpus of named entity for persian language. Signal Data Process. 14, 3 (2017), 127–142.

[33]

K. Dashtipour, M. Gogate, A. Adeel, A. Algarafi, N. Howard, and A. Hussain. 2017. Persian named entity recognition. In Proceedings of the IEEE 16th International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC’17). IEEE, 79–83.

Index Terms

Named Entity Recognition in Persian Language based on Self-attention Mechanism with Weighted Relational Position Encoding
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Named entity recognition an aid to improve multilingual entity filling in language-independent approach
IKM4DR '12: Proceedings of the first workshop on Information and knowledge management for developing region

This paper details the approach to identify Named Entities (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human intervention and no linguistic expertise. The main objective in this paper is to focus on ...
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Improving Named Entity Recognition of English and Vietnamese Languages using Bilingual Constraints
NLPIR '18: Proceedings of the 2nd International Conference on Natural Language Processing and Information Retrieval

Named entity recognition plays a crucial role in many Natural Language Processing tasks because the semantic information is carried by entities. The recent efforts are trying to reduce the annotation labor because the state-of-the-art Named Entity ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 12

December 2023

194 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3638035

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2023

Online AM: 22 November 2023

Accepted: 17 November 2023

Revised: 13 October 2023

Received: 20 January 2023

Published in TALLIP Volume 22, Issue 12

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

Named entity recognition, persian natural language processing, transformer, pointwise mutual information

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
189
Total Downloads

Downloads (Last 12 months)189
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents