The paper presents a software integration, testing and visualization tool, called Dashboard, which is based on pipe-lined backboard architecture for family of natural language processing (NLP) application. The Dashboard helps in testing... more
The paper presents a software integration, testing and visualization tool, called Dashboard, which is based on pipe-lined backboard architecture for family of natural language processing (NLP) application. The Dashboard helps in testing of a module in isolation, facilitating the training and tuning of a module, integration and testing of a set of heterogeneous modules, and building and testing of complete
Transfer based Machine Translation (MT) System is a large complex functional application. When these MT systems are deployed with increasing translation load the Quality of Service (QoS) degrades (namely, job completion time increases,... more
Transfer based Machine Translation (MT) System is a large complex functional application. When these MT systems are deployed with increasing translation load the Quality of Service (QoS) degrades (namely, job completion time increases, system throughput decreases, and system performance does not scale with increase in provision of resources). To improve QoS of the MT system MapReduce framework for distributed processing was explored. MT application, which has very large code size (order of 100 MB) of computation, transferring it across the data nodes of the cluster would be totally antithetical to the basic goal of throughput enhancement. To utilize the benefit of parallelism provided by Hadoop, a very large complex MT application has adopted a distinct approach to overcome this difficulty with no time penalty. This paper presents an engineering approach to delude MapReduce framework for parallelization of machine translation tasks on a large cluster of machines to assure QoS of MT ...
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following... more
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS1, UAS2 and LA3 accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for Automated data.
Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster... more
Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing ma-chine translation (MT) outputs. This pa-per also explores the use of MT com-bined with ASR in order to improve recog-nition accuracy in a workbench integrat-ing eye-tracking functionalities to collect process-oriented information about trans-lators' performance.
This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics. Query Word Labeling is on token... more
This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics. Query Word Labeling is on token level language identification of query words in code-mixed queries and the transliteration of identi- fied Indian language words into their native scripts. We have devel- oped an SVM classifier for the token level language identification of query words and a decision tree classifier for transliteration. The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliter- ated Roman script. We have used edit distance based query expan- sion and language modeling based pruning followed by relevance based re-ranking for the retrieval of relevant Hindi Song lyrics for a given query. We see that even though our approaches are not very sophis- ticated, they perform reasonably well. Our results show that these approaches may perform much better if more sophisticated features or ranking is used. Both of our systems are available for download and can be used for research purposes.
Word Embeddings have shown to be use- ful in wide range of NLP tasks. We ex- plore the methods of using the embed- dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, rel- atively freer word order)... more
Word Embeddings have shown to be use- ful in wide range of NLP tasks. We ex- plore the methods of using the embed- dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, rel- atively freer word order) language and show that they not only help improve the quality of parsing, but can even act as a cheap alternative to the traditional features which are costly to acquire. We demon- strate that if we use distributed represen- tation of lexical items instead of features produced by costly tools such as Morpho- logical Analyzer, we get competitive re- sults. This implies that only mono-lingual corpus will suffice to produce good accu- racy in case of resource poor languages for which these tools are unavailable. We also explored the importance of these represen- tations for domain adaptation
Automated prediction of valence, one key feature of a person's emotional state, from individuals' personal narratives may provide crucial information for mental healthcare (e.g. early diagnosis of mental diseases, supervision of disease... more
Automated prediction of valence, one key feature of a person's emotional state, from individuals' personal narratives may provide crucial information for mental healthcare (e.g. early diagnosis of mental diseases, supervision of disease course, etc.). In the Interspeech 2018 ComParE Self-Assessed Affect challenge, the task of valence prediction was framed as a three-class classification problem using 8 seconds fragments from individuals' narratives. As such, the task did not allow for exploring contex-tual information of the narratives. In this work, we investigate the intrinsic information from multiple narratives recounted by the same individual in order to predict their current state-of-mind. Furthermore, with generalizability in mind, we decided to focus our experiments exclusively on textual information as the public availability of audio narratives is limited compared to text. Our hypothesis is that context modeling might provide insights about emotion triggering concepts (e.g. events, people, places) mentioned in the narratives that are linked to an indi-vidual's state of mind. We explore multiple machine learning techniques to model narratives. We find that the models are able to capture inter-individual differences, leading to more accurate predictions of an individual's emotional state, as compared to single narratives.
Personal Narratives (PN) - recollections of facts, events, and thoughts from one's own experience - are often used in everyday conversations. So far, PNs have mainly been explored for tasks such as valence prediction or emotion... more
Personal Narratives (PN) - recollections of facts, events, and thoughts from one's own experience - are often used in everyday conversations. So far, PNs have mainly been explored for tasks such as valence prediction or emotion classification (i.e. happy, sad). However, these tasks might overlook more fine-grained information that could nevertheless prove relevant for understanding PNs. In this work, we propose a novel task for Narrative Understanding: Emotion Carrier Recognition (ECR). We argue that automatic recognition of emotion carriers, the text fragments that carry the emotions of the narrator (i.e. 'loss of a grandpa', 'high school reunion'), from PNs, provides a deeper level of emotion analysis needed, for instance, in the mental healthcare domain. In this work, we explore the task of ECR using a corpus of PNs manually annotated with emotion carriers and investigate different baseline models for the task. Furthermore, we propose several evaluation strate...
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following... more
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy ie splitting the data into interChunks and intraChunks to obtain the best possible LAS1, UAS2 and LA3 accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for Automated data. KEYWORDS: Hindi ...
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of... more
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of speech recognition of a translator speaking in the target language, taking the advantage of source Language string and information from WordNet. We use machine trans-lation to translate the source Language string to target language and use this infor-mation and the semantic information we get for the words in the translated string from WordNet to bias the speech recog-niser towards the gained knowledge. In this paper, we perform different experi-ments including variation of number of hy-pothesis of MT 1 and also different tech-niques of incorporating the semantic in-formation. Overall we outperformed the baseline system having no semantic infor-mation by the increase in word accuracy of 1.6% for the Hindi ASR 2 in English-Hindi system.
Word Embeddings have shown to be useful in wide range of NLP tasks. We explore the methods of using the embed-dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, relatively freer word order) language and show that they... more
Word Embeddings have shown to be useful in wide range of NLP tasks. We explore the methods of using the embed-dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, relatively freer word order) language and show that they not only help improve the quality of parsing, but can even act as a cheap alternative to the traditional features which are costly to acquire. We demonstrate that if we use distributed representation of lexical items instead of features produced by costly tools such as Morphological Analyzer, we get competitive results. This implies that only mono-lingual corpus will suffice to produce good accuracy in case of resource poor languages for which these tools are unavailable. We also explored the importance of these representations for domain adaptation.