Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-66187-8_2guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Study on the Importance of Linguistic Suffixes in Maithili POS Tagger Development

Published: 19 December 2019 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents our study on the effect of morphological inflections in the performance of a Maithili Part of Speech (POS) tagger. In the last few years, substantial effort is devoted to developing morphological analyzers and POS taggers in several Indian languages including Hindi, Bengali, Tamil, Telugu, Kannada, Punjabi and Marathi. But we did not find any open POS tagger or morphological analyzers in Maithili. However, Maithili is one of the official languages of India with around 50 million native speakers. So, we worked on developing a POS tagger in Maithili. For the development, we used a manually annotated in-house Maithili corpus containing 52,190 tokens. The tagset contains 27 tags. We first trained conditional random fields (CRF) classifier with various combination of word unigram, bigram, fixed-length suffix, and prefix features. There we observed that the fixed-length suffixes do not show the expected accuracy improvement. However, during the manual corpus annotation, we observed that suffixes played as a helpful clue. So, instead of using the fixed-length suffixes, we worked on identifying the morphological inflections in Mathili. When we used these morphological suffixes in the system, we found a noticeable performance improvement.

    References

    [1]
    Arulmozhi, P., Sobha, L.: A hybrid POS tagger for a relatively free word order language. In: Proceedings of the First National Symposium on Modeling and Shallow Parsing of Indian Languages, pp. 79–85 (2006)
    [2]
    Bharati A, Chaitanya V, Sangal R, and Ramakrishnamacharyulu K Natural Language Processing: A Paninian Perspective 1995 New Delhi Prentice-Hall of India
    [3]
    Dandapat, S.: Part-of-speech tagging for Bengali. Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur (2009)
    [4]
    Dandapat, S., Sarkar, S., Basu, A.: Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 221–224. Association for Computational Linguistics (2007)
    [5]
    Ekbal, A., Haque, R., Bandyopadhyay, S.: Bengali part of speech tagging using conditional random field. In: Proceedings of Seventh International Symposium on Natural Language Processing (SNLP 2007), pp. 131–136 (2007)
    [6]
    Garg, N., Goyal, V., Preet, S.: Rule based Hindi part of speech tagger. In: Proceedings of COLING 2012: Demonstration Papers, pp. 163–174 (2012)
    [7]
    Greene, B.B., Rubin, G.M.: Automatic grammatical tagging of English. Department of Linguistics, Brown University (1971)
    [8]
    Harris, Z.S.: String analysis of sentence structure, no. 1, Mouton (1962)
    [9]
    Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
    [10]
    Modi D and Nain N Afzalpulkar N, Srivastava V, Singh G, and Bhatnagar D Part-of-speech tagging of Hindi corpus using rule-based method Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing 2016 New Delhi Springer 241-247
    [11]
    Priyadarshi A and Saha SK Towards the first Maithili part of speech tagger: resource creation and system development Comput. Speech Lang. 2019 62 101054
    [12]
    Ranjan, P., Basu, H.V.S.S.A.: Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In: Proceedings of the 1st International Conference on Natural Language Processing (ICON 2003). Citeseer (2003)
    [13]
    Sharma, S.K., Lehal, G.S.: Using hidden Markov model to improve the accuracy of Punjabi POS tagger. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, vol. 2, pp. 697–701. IEEE (2011)
    [14]
    Shrivastava, M., Bhattacharyya, P.: Hindi POS tagger using Naive stemming: harnessing morphological information without extensive linguistic knowledge. In: International Conference on NLP (ICON 2008), Pune, India (2008)
    [15]
    Singh, S., Gupta, K., Shrivastava, M., Bhattacharyya, P.: Morphological richness offsets resource demand-experiences in constructing a POS tagger for Hindi. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 779–786. Association for Computational Linguistics (2006)

    Index Terms

    1. A Study on the Importance of Linguistic Suffixes in Maithili POS Tagger Development
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image Guide Proceedings
              Mining Intelligence and Knowledge Exploration: 7th International Conference, MIKE 2019, Goa, India, December 19–22, 2019, Proceedings
              Dec 2019
              356 pages
              ISBN:978-3-030-66186-1
              DOI:10.1007/978-3-030-66187-8

              Publisher

              Springer-Verlag

              Berlin, Heidelberg

              Publication History

              Published: 19 December 2019

              Author Tags

              1. Maithili NLP
              2. Parts-of-speech
              3. POS tagger
              4. Morphological analyzer

              Qualifiers

              • Article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0

              Other Metrics

              Citations

              View Options

              View options

              Get Access

              Login options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media