In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following... more
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS, UAS and LA accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for automated data.
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following... more
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS1, UAS2 and LA3 accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for Automated data.
In a conventional CAT (Computer Assisted Translation) system a human translator post-edits an automatically generated target language text using the keyboard. In this paper we extend a CAT system with speech input by which the translator... more
In a conventional CAT (Computer Assisted Translation) system a human translator post-edits an automatically generated target language text using the keyboard. In this paper we extend a CAT system with speech input by which the translator speaks the translation, a process referred to as sight translation. We report several experiments to improve the performance of an automatic speech recognition system, taking advantage of machine translation output and information from WordNet. Overall we outperform a baseline system which has no semantic information by an increased 1.6% word accuracy for the English to Hindi translation.
Typing has traditionally been the only input method used by human translators working with computer-assisted translation (CAT) tools. However, speech is a natural communication channel for humans and, in principle, it should be faster and... more
Typing has traditionally been the only input method used by human translators working with computer-assisted translation (CAT) tools. However, speech is a natural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing machine translation (MT) outputs. This paper also explores the use of MT combined with ASR in order to improve recognition accuracy in a workbench integrating eye-tracking functionalities to collect process-oriented information about translators performance.
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of... more
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of speech recognition of a translator speaking in the target language, taking the advantage of source Language string and information from WordNet. We use machine trans-lation to translate the source Language string to target language and use this infor-mation and the semantic information we get for the words in the translated string from WordNet to bias the speech recog-niser towards the gained knowledge. In this paper, we perform different experi-ments including variation of number of hy-pothesis of MT 1 and also different tech-niques of incorporating the semantic in-formation. Overall we outperformed the baseline system having no semantic infor-mation by the increase in word accuracy of 1.6% for the Hindi ASR 2 in English-Hindi system.
Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster... more
Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing ma-chine translation (MT) outputs. This pa-per also explores the use of MT com-bined with ASR in order to improve recog-nition accuracy in a workbench integrat-ing eye-tracking functionalities to collect process-oriented information about trans-lators' performance.
Word Embeddings have shown to be use- ful in wide range of NLP tasks. We ex- plore the methods of using the embed- dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, rel- atively freer word order)... more
Word Embeddings have shown to be use- ful in wide range of NLP tasks. We ex- plore the methods of using the embed- dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, rel- atively freer word order) language and show that they not only help improve the quality of parsing, but can even act as a cheap alternative to the traditional features which are costly to acquire. We demon- strate that if we use distributed represen- tation of lexical items instead of features produced by costly tools such as Morpho- logical Analyzer, we get competitive re- sults. This implies that only mono-lingual corpus will suffice to produce good accu- racy in case of resource poor languages for which these tools are unavailable. We also explored the importance of these represen- tations for domain adaptation