Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Named Entity Recognition in Persian Language based on Self-attention Mechanism with Weighted Relational Position Encoding

Published: 19 December 2023 Publication History

Abstract

Named-entity Recognition (NER) is challenging for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER Model. We propose a customized model based on linguistic properties to compensate for this lack of resources in low-resource languages like Persian. According to pronoun-dropping and subject-object-verb word order specifications of Persian, we propose new weighted relative positional encoding in the self-attention mechanism. Using the pointwise mutual information factor, we inject co-occurrence information into context representation. We trained and tested our model on three different datasets: Arman, Peyma, and ParsTwiNER, and our method achieved 94.16%, 93.36%, and 84.49% word-level F1 scores, respectively. The experiments showed that our proposed model performs better than other Persian NER models. Ablation Study and Case Study also showed that our method can converge faster and is less prone to overfitting.

References

[1]
P. Liu, Y. Guo, F. Wang, and G. Li. 2022. Chinese named entity recognition: The state of the art. Neurocomputing 473 (2022), 37–53.
[2]
A. Vaswani et al. 2017. Attention is all you need. Adv. Neural Info. Process. Syst. 30 (2017).
[3]
P. Shaw, J. Uszkoreit, and A. Vaswani. 2018. Self-attention with relative position representations. Retrieved from https://arXiv:1803.02155
[4]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[5]
A. Conneau et al. 2019. Unsupervised cross-lingual representation learning at scale. Retrieved from https://arXiv:1911.02116
[6]
K. Church and P. Hanks. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist. 16, 1 (1990), 22–29.
[7]
C. Wang, K. Cho, and D. Kiela. 2018. Code-switched named entity recognition with embedding attention. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-Switching.
[8]
P. S. Mortazavi and M. Shamsfard. 2009. Named entity recognition in Persian texts. In Proceedings of the 15th National Computer Society of Iran Conference.
[9]
S. Rahati-Ghoochani, S. A. Esfahani, and J. Nader. 2010. Persian name entity recognition and classification. In Proceedings of the Signal and Data Processing Conference.
[10]
M. K. Khormuji and M. Bazrafkan. 2014. Persian named entity recognition based with local filters. Int. J. Comput. Appl. 100, 4 (2014), 1–6.
[11]
F. Ahmadi and H. Moradi. 2015. A hybrid method for Persian named entity recognition. In Proceedings of the 7th Conference on Information and Knowledge Technology (IKT’15).
[12]
H. Poostchi, E. Z. Borzeshi, M. Abdous, and M. Piccardi. 2016. PersoNER: Persian named-entity recognition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16).
[13]
M. H. Bokaei and M. Mahmoudi. 2018. Improved deep Persian named entity recognition. In Proceedings of the 9th International Symposium on Telecommunications (IST’18). 381–386.
[14]
M. S. Shahshahani, M. Mohseni, A. Shakery, and H. Faili. 2019. PAYMA: A tagged corpus of Persian named entities, Signal Data Process. Appl. 16, 1 (2019), 91–110.
[15]
S. Momtazi and F. Torabi. 2020. Named entity recognition in Persian text using deep learning. Signal Data Process. 16, 4 (2020), 93–112.
[16]
F. Balouchzahi and H. Shashirekha. 2020. PUNER-Parsi ULMFiT for named-entity recognition in Persian texts. In Proceedings of the Congress on Intelligent Systems. Springer, 75–88
[17]
M. S. Sheikhaei, H. Zafari, and Y. Tian. 2021. Joined type length encoding for nested named entity recognition. Trans. Asian Low-Resour. Lang. Info. Process. 21, 3 (2021), 1–23.
[19]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https://arXiv:1810.04805
[20]
N. Taghizadeh, Z. Borhanifard, M. GolestaniPour, and H. Faili. 2020. NSURL-2019 task 7: Named entity recognition (NER) in Farsi. Retrieved from https://arXiv:2003.09029
[21]
M. Mohseni and A. Tebbifakhr. 2019. MorphoBERT: A Persian NER system with BERT and morphological analysis. In Proceedings of the 1st International Workshop on NLP Solutions for Under Resourced Languages Co-located with ICNLSP.
[22]
E. Taher, S. A. Hoseini, and M. Shamsfard. 2020. Beheshti-NER: Persian named entity recognition using BERT. Retrieved from https://arXiv:2003.08875
[23]
M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri. 2021. Parsbert: Transformer-based model for Persian language understanding. Neural Process. Lett. 53, 6 (2021), 3831–3847.
[24]
M. Aghajani, A. Badri, and H. Beigy. 2021. ParsTwiNER: A corpus for named entity recognition at informal Persian. In Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT’21). 131–136.
[25]
M. M. Abdollah Pour and S. Momtazi. 2022. Comparative study of text representation and learning for Persian named entity recognition. ETRI J. 44, 5 (2022), 794–804.
[26]
T. Kudo and J. Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. Retrieved from https://arXiv:1808.06226
[27]
P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. J. Artific. Intell. Res. 37 (2010), 141–188.
[28]
J. A. Bullinaria and J. P. Levy. 2007. Extracting semantic representations from word co-occurrence statistics: A computational study. Behav. Res. Methods 39, 3 (2007), 510–526.
[29]
D. Kiela and S. Clark. 2014. A systematic study of semantic vector space model parameters. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and Their Compositionality (CVSC’14).
[30]
Y. Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. Retrieved from https://arXiv:1907.11692
[31]
E. F. Sang and F. De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Retrieved from https://arxiv.org/abs/cs/0306050
[32]
S. Hosseinnejad, Y. Shekofteh, and T. Emami Azadi. 2017. A'laam corpus: A standard corpus of named entity for persian language. Signal Data Process. 14, 3 (2017), 127–142.
[33]
K. Dashtipour, M. Gogate, A. Adeel, A. Algarafi, N. Howard, and A. Hussain. 2017. Persian named entity recognition. In Proceedings of the IEEE 16th International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC’17). IEEE, 79–83.

Index Terms

  1. Named Entity Recognition in Persian Language based on Self-attention Mechanism with Weighted Relational Position Encoding

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 12
    December 2023
    194 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3638035
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 December 2023
    Online AM: 22 November 2023
    Accepted: 17 November 2023
    Revised: 13 October 2023
    Received: 20 January 2023
    Published in TALLIP Volume 22, Issue 12

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. Named entity recognition, persian natural language processing, transformer, pointwise mutual information

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 189
      Total Downloads
    • Downloads (Last 12 months)189
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media