Character-based Joint Word Segmentation and Part-of-Speech Tagging for Tibetan Based on Deep Learning

Published: 23 November 2022


Tibetan word segmentation and POS tagging are the primary tasks of Tibetan natural language processing. Most of existing methods of Tibetan word segmentation and POS tagging are based on rules and statistics, which need manual construction of features. In addition, the joint mode has shown stronger capabilities for word segmentation and POS tagging and have received great interests. In this paper, we propose Bi-LSTM+IDCNN+CRF structures, a simple yet effective end-to-end neural network model, for joint Tibetan word segmentation and POS tagging. We conduct step-by-step and joint experiments on the Tibetan datasets. The results demonstrate that the performance of the Bi-LSTM+IDCNN+CRF model is the best regardless of the step-by-step or joint mode. We obtain state-of-the-art performance in the joint tagging mode. The F1 score of the word segmentation task reached 92.31%, and the F1 score of the POS tagging task reached 81.26%.


  • (2024)Tibetan-BERT-wwm: A Tibetan Pretrained Model With Whole Word Masking for Text ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.337463311:5(6268-6277)Online publication date: Oct-2024
  • (2022)A Novel Sentiment Analysis Model of Museum User Experience Evaluation Data Based on Unbalanced Data Analysis TechnologyComputational Intelligence and Neuroscience10.1155/2022/20966342022Online publication date: 1-Jan-2022
  • (undefined)Improved Tibetan Word Vectors Models Based on Position Information FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3681787

  1. Character-based Joint Word Segmentation and Part-of-Speech Tagging for Tibetan Based on Deep Learning



    Published: 23 November 2022
    Online AM: 31 August 2022
    Accepted: 12 January 2022
    Revised: 10 January 2022
    Received: 29 December 2020
    Published in TALLIP Volume 21, Issue 5


    Author Tags

    1. Tibetan
    2. word segmentation
    3. POS tagging
    4. deep learning


    Funding Sources

    • National Key R&D Program of China
    • Ministry of Education - China Mobile Research Foundation
    • Fundamental Research Funds for the Central Universities
    • National Natural Science Foundation of China
    • Major National Project of High Resolution Earth Observation System
    • State Grid Corporation of China Science and Technology Project
    • Program for New Century Excellent Talents in University
    • Strategic Priority Research Program of the Chinese Academy of Sciences
    • Google Research Awards and Google Faculty Award, Science and Technology Plan of Qinghai Province


    • (2024)Tibetan-BERT-wwm: A Tibetan Pretrained Model With Whole Word Masking for Text ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.337463311:5(6268-6277)Online publication date: Oct-2024
    • (2022)A Novel Sentiment Analysis Model of Museum User Experience Evaluation Data Based on Unbalanced Data Analysis TechnologyComputational Intelligence and Neuroscience10.1155/2022/20966342022Online publication date: 1-Jan-2022
    • (undefined)Improved Tibetan Word Vectors Models Based on Position Information FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3681787

