About

Shahid Mushtaq

Northern Borders University, Languages and Translation, Faculty Member

Morphological analyzer is a fundamental tool in Natural Language Processing (NLP) that generates the morphological analyses of a given word-form. It can be used in enhancing the accuracy of POS-Tagging, Chunking, Syntactic Parsing, Word Sense Disambiguation (WSD), Information Retrieval (IR) & Machine Translation (MT) Systems. This paper describes an ongoing effort to develop Nepali morphological analyzer, using an open source platform-Apertium (LT-Toolbox). Since, it is the initial stage of this projectwe have confined our work to inflectional morphology. So far, we have covered all the possible categories, as per LDC-IL1 POS tag-set of Nepali. Currently, the coverage of Nepali Morph-Analyzer is 20,000 words, classified into 219 paradigms

Publication Date: 2012

Publication Name: Journal of Modern Languages

Research Interests:
Modern Languages, Computer Science, Artificial Intelligence, Languages and Linguistics, Natural Language Processing, and 6 moreMachine Translation, Nepali, Parsing, Google, Google Search Engine, and Morphological Theory and Morphological Parsing

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Parsing, Annotation, and 6 moreDependency Parsing, Kashmiri Language, Syntax Semantics Interface, Treebanking, Dependency Grammar, and kashmiri

Download (.pdf)

The present paper explores passives in Kashmiri, a Northwestern Dardic language of the Indo-Aryan family. Though Kashmiri has some special features like V-2 phenomenon, pronominal clitics etc. it has an analytic passive construction like its Indo-Aryan counterparts. The internal argument surfaces as the subject of the passive, where the participial/infinitival verbal form - nI is added to the verb root followed by a periphrastic auxiliary yun ‘to come’ in perfective form. The agent of the action is in the form of athi or zaryi ( by / through ) and is preferably omitted. This optionality casts a doubt on its status – whether it is an adjunct or an argument. The promotion of the internal argument to the subject position is another key issue. The present paper investigates the above issues and claims that the Kashmiri passive construction is also a kind of ACTIVE-Passive and not really passive as in English. It is argued that in Kashmiri passives, the underlying subject remains an active subject and the underlying object does not become the surface subject. To prove this claim, some tests based on anaphora binding, pronominal co-reference, control, etc. are applied.

Publisher: John Benjamins Publishing Company

Publication Date: 2014

Publication Name: Linguistik Aktuell/Linguistics Today

Research Interests:
Engineering, Linguistics, Verb, Passive Constructions, and kashmiri

Treebank is a basic language resource for training and testing syntactic parser which forms a key module in various NLP systems like machine translation system. This paper reports an ongoing research of building dependency treebank for Kashmiri (KashTreeBank) and discusses some main annotation issues. The paper is based on the pilot annotation of 500 sentences.

Publication Date: 2012

Research Interests:
Computer Science

Download (.pdf)

Page Numbers: 280

Publication Date: 2018

Publication Name: LAMBERT Academic Publishing

Publication Name: Proceedings of Workshop on Machine Translation and Parsing of Indian Languages (MTPIL), COLING 2012

Research Interests:
Syntax & Dependency Parsing

Download (.pdf)

POS-tagging is the process of labeling words in the running corpus with their grammatical categories and optionally with their associated grammatical features. It is essentially a classification problem but for languages with split-orthography, it is also a mapping-problem which involves mapping of the arrays of tokens (words, chunks or sentences) on the arrays of tags in proper agreement with the syntactic structure of a language. While POS-tagging is an established technology in European languages and even in some South Asian Languages like Arabic and Chinese, it is an emerging field in Indian languages where little work has been done so far, particularly, in those languages which use Persio-Arabic script (e.g. Urdu, Kashmiri, Shina, Balti, and Purki). It has been argued that such languages are real challenge to the already complex NLP-tasks like tokenization, POS-tagging and chunking due to their split-orthography. The problem of script needs to be addressed tactfully so that such languages would not lag behind in the progressing scenario of Indian language-technology. Since, Kashmiri is one of such languages with severe split-orthography; this paper is an initiative to put the problem in the right perspective and to develop a versatile, fine-grained, hierarchical tag-set for Kashmiri that can handle script related issues as well as other linguistic issues. It also ensures maximum facilitation of POS-tagging at the level of parsing. The tag-set will be strictly morpho-syntactic in nature as per the guidelines of Expert Advisory Group for Language Engineering Standards (henceforth EAGLES) for morpho-syntactic annotation (Leech, and Wilson, 1999). Therefore, morpho-syntactic availability of the grammatical features would be the governing principle for the present tag-set. Capturing of semantically or lexically available grammatical features is out of the scope of the present tag-set and will be handled in the future work.

Research Interests:
POS tagging

Download (.pdf)

More Info: Co-author

Publication Name: Edited Book, The Lexicon-Syntax Interface: Views from South Asian Languages, John Benjamin’s, Amsterdam (to appear)

Research Interests:
Minimalist Syntax and Lexicon-Syntax Interface

Download (.pdf)

Publication Name: Proceedings of Workshop on Indian Language & Data: Resources & Evaluation (WILDRE) , LREC 2012 (Istanbul, Turkey)

Research Interests:
Training POS Tagger

Download (.pdf)

LREC2012.Workshop_Abstracts.pdf

01.LREC_2012_WILDRE_Proceedings.pdf

Publication Date: Oct 10, 2012

Publication Name: Journal of Modern Languages (JML)

Research Interests:
Morphological Theory and Morphological Parsing

Download (.pdf)

Publication Name: Interdisciplinary Journal of Linguistics Vol. 4. 2011. Department of Linguistics, University of Kashmir: Srinagar

Research Interests:
Form Function Duality and the Effect on Automatic Annotation

Books

The author, while working on a corpus linguistics project, has realized the importance of language resources, mainly the syntactically annotated text corpus, but has felt a severe vacuum in terms of such resources and the related research in his native language, Kashmiri. Considering this scenario, he has tried to address this problem in his PhD research. He started his work from scratch and developed a small-scale dependency treebank for Kashmiri (KashTreebank). This book is based on his PhD dissertation. It provides the necessary information about the theoretical and practical issues of developing a treebank, particularly in a resource-poor scenario. Since, these days, treebanks are in high demand, not only for training and testing syntactic parsers but also for promoting empirical research in linguistics, this book can serve as a basic source of information for developing large-scale treebanks. It will be more helpful for resource creation projects, aspiring computational linguists and language engineers, especially those interested in syntactic parsing, corpus linguistics and experimental syntax.

Publication Date: 2018

Publication Name: Lambert

Research Interests:
Natural Language Processing, Computational Linguistics, Data Driven Learning (Applied Linguistics), Applied Linguistics, Corpus Linguistics, and 3 moreSyntactic Parsing, Learner Corpus, and Treebanking

Related Authors

Shahid Mushtaq

Publication Date: 2012

Publication Name: Journal of Modern Languages

Publisher: John Benjamins Publishing Company

Publication Date: 2014

Publication Name: Linguistik Aktuell/Linguistics Today

Research Interests: Engineering, Linguistics, Verb, Passive Constructions, and kashmiri<div>()</div>

Publication Date: 2012

Research Interests: Computer Science<div>()</div>

Page Numbers: 280

Publication Date: 2018

Publication Name: LAMBERT Academic Publishing

Publication Name: Proceedings of Workshop on Machine Translation and Parsing of Indian Languages (MTPIL), COLING 2012

Research Interests: Syntax & Dependency Parsing<div>()</div>

Research Interests: POS tagging<div>()</div>

More Info: Co-author

Publication Name: Edited Book, The Lexicon-Syntax Interface: Views from South Asian Languages, John Benjamin’s, Amsterdam (to appear)

Research Interests: Minimalist Syntax and Lexicon-Syntax Interface<div>()</div>

Publication Name: Proceedings of Workshop on Indian Language & Data: Resources & Evaluation (WILDRE) , LREC 2012 (Istanbul, Turkey)

Research Interests: Training POS Tagger<div>()</div>

Publication Date: Oct 10, 2012

Publication Name: Journal of Modern Languages (JML)

Research Interests: Morphological Theory and Morphological Parsing<div>()</div>

Publication Name: Interdisciplinary Journal of Linguistics Vol. 4. 2011. Department of Linguistics, University of Kashmir: Srinagar

Research Interests: Form Function Duality and the Effect on Automatic Annotation<div>()</div>

Publication Date: 2018

Publication Name: Lambert

Log In

Research Interests:
Engineering, Linguistics, Verb, Passive Constructions, and kashmiri

Research Interests:
Computer Science

Research Interests:
Syntax & Dependency Parsing

Research Interests:
POS tagging

Research Interests:
Minimalist Syntax and Lexicon-Syntax Interface

Research Interests:
Training POS Tagger

Research Interests:
Morphological Theory and Morphological Parsing

Research Interests:
Form Function Duality and the Effect on Automatic Annotation