Word Embeddings have shown to be use-
ful in wide range of NLP tasks. We ex-
plore the method... more Word Embeddings have shown to be use- ful in wide range of NLP tasks. We ex- plore the methods of using the embed- dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, rel- atively freer word order) language and show that they not only help improve the quality of parsing, but can even act as a cheap alternative to the traditional features which are costly to acquire. We demon- strate that if we use distributed represen- tation of lexical items instead of features produced by costly tools such as Morpho- logical Analyzer, we get competitive re- sults. This implies that only mono-lingual corpus will suffice to produce good accu- racy in case of resource poor languages for which these tools are unavailable. We also explored the importance of these represen- tations for domain adaptation
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language ... more In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of speech recognition of a translator speaking in the target language, taking the advantage of source Language string and information from WordNet. We use machine trans-lation to translate the source Language string to target language and use this infor-mation and the semantic information we get for the words in the translated string from WordNet to bias the speech recog-niser towards the gained knowledge. In this paper, we perform different experi-ments including variation of number of hy-pothesis of MT 1 and also different tech-niques of incorporating the semantic in-formation. Overall we outperformed the baseline system having no semantic infor-mation by the increase in word accuracy of 1.6% for the Hindi ASR 2 in English-Hindi system.
Typing has traditionally been the only in-put method used by human translators working with compu... more Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing ma-chine translation (MT) outputs. This pa-per also explores the use of MT com-bined with ASR in order to improve recog-nition accuracy in a workbench integrat-ing eye-tracking functionalities to collect process-oriented information about trans-lators' performance.
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), pages 163ā170, COLING 2012, Mumbai, December 2012., Dec 2012
In this paper, we present our approach towards dependency parsing of Hindi language as a part of ... more In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS, UAS and LA accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for automated data.
Typing has traditionally been the only input method used by human translators working with comput... more Typing has traditionally been the only input method used by human translators working with computer-assisted translation (CAT) tools. However, speech is a natural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing machine translation (MT) outputs. This paper also explores the use of MT combined with ASR in order to improve recognition accuracy in a workbench integrating eye-tracking functionalities to collect process-oriented information about translators performance.
In a conventional CAT (Computer Assisted Translation) system a human translator post-edits an aut... more In a conventional CAT (Computer Assisted Translation) system a human translator post-edits an automatically generated target language text using the keyboard. In this paper we extend a CAT system with speech input by which the translator speaks the translation, a process referred to as sight translation. We report several experiments to improve the performance of an automatic speech recognition system, taking advantage of machine translation output and information from WordNet. Overall we outperform a baseline system which has no semantic information by an increased 1.6% word accuracy for the English to Hindi translation.
In this paper, we present our approach towards dependency parsing of Hindi language as a part of ... more In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS1, UAS2 and LA3 accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for Automated data.
Word Embeddings have shown to be use-
ful in wide range of NLP tasks. We ex-
plore the method... more Word Embeddings have shown to be use- ful in wide range of NLP tasks. We ex- plore the methods of using the embed- dings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, rel- atively freer word order) language and show that they not only help improve the quality of parsing, but can even act as a cheap alternative to the traditional features which are costly to acquire. We demon- strate that if we use distributed represen- tation of lexical items instead of features produced by costly tools such as Morpho- logical Analyzer, we get competitive re- sults. This implies that only mono-lingual corpus will suffice to produce good accu- racy in case of resource poor languages for which these tools are unavailable. We also explored the importance of these represen- tations for domain adaptation
In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language ... more In a CAT (Computer Assisted Transla-tion) system a human translator translates a source language string into a target lan-guage string using different input methods such as speech and typing. In this paper, we improve the performance of speech recognition of a translator speaking in the target language, taking the advantage of source Language string and information from WordNet. We use machine trans-lation to translate the source Language string to target language and use this infor-mation and the semantic information we get for the words in the translated string from WordNet to bias the speech recog-niser towards the gained knowledge. In this paper, we perform different experi-ments including variation of number of hy-pothesis of MT 1 and also different tech-niques of incorporating the semantic in-formation. Overall we outperformed the baseline system having no semantic infor-mation by the increase in word accuracy of 1.6% for the Hindi ASR 2 in English-Hindi system.
Typing has traditionally been the only in-put method used by human translators working with compu... more Typing has traditionally been the only in-put method used by human translators working with computer-assisted transla-tion (CAT) tools. However, speech is a nat-ural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing ma-chine translation (MT) outputs. This pa-per also explores the use of MT com-bined with ASR in order to improve recog-nition accuracy in a workbench integrat-ing eye-tracking functionalities to collect process-oriented information about trans-lators' performance.
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), pages 163ā170, COLING 2012, Mumbai, December 2012., Dec 2012
In this paper, we present our approach towards dependency parsing of Hindi language as a part of ... more In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS, UAS and LA accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for automated data.
Typing has traditionally been the only input method used by human translators working with comput... more Typing has traditionally been the only input method used by human translators working with computer-assisted translation (CAT) tools. However, speech is a natural communication channel for humans and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing machine translation (MT) outputs. This paper also explores the use of MT combined with ASR in order to improve recognition accuracy in a workbench integrating eye-tracking functionalities to collect process-oriented information about translators performance.
In a conventional CAT (Computer Assisted Translation) system a human translator post-edits an aut... more In a conventional CAT (Computer Assisted Translation) system a human translator post-edits an automatically generated target language text using the keyboard. In this paper we extend a CAT system with speech input by which the translator speaks the translation, a process referred to as sight translation. We report several experiments to improve the performance of an automatic speech recognition system, taking advantage of machine translation output and information from WordNet. Overall we outperform a baseline system which has no semantic information by an increased 1.6% word accuracy for the English to Hindi translation.
In this paper, we present our approach towards dependency parsing of Hindi language as a part of ... more In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS1, UAS2 and LA3 accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for Automated data.
Uploads
Papers by Karan Singla
ful in wide range of NLP tasks. We ex-
plore the methods of using the embed-
dings in Dependency Parsing of Hindi,
a MoR-FWO (morphologically rich, rel-
atively freer word order) language and
show that they not only help improve the
quality of parsing, but can even act as a
cheap alternative to the traditional features
which are costly to acquire. We demon-
strate that if we use distributed represen-
tation of lexical items instead of features
produced by costly tools such as Morpho-
logical Analyzer, we get competitive re-
sults. This implies that only mono-lingual
corpus will suffice to produce good accu-
racy in case of resource poor languages for
which these tools are unavailable. We also
explored the importance of these represen-
tations for domain adaptation
and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of
automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing machine
translation (MT) outputs. This paper also explores the use of MT combined with ASR in order to improve recognition accuracy in a workbench integrating eye-tracking functionalities to collect process-oriented information about translators performance.
experiments to improve the performance of an automatic speech recognition system, taking advantage of machine translation
output and information from WordNet. Overall we outperform a baseline system which has no semantic information by an increased 1.6% word accuracy for the English to Hindi translation.
ful in wide range of NLP tasks. We ex-
plore the methods of using the embed-
dings in Dependency Parsing of Hindi,
a MoR-FWO (morphologically rich, rel-
atively freer word order) language and
show that they not only help improve the
quality of parsing, but can even act as a
cheap alternative to the traditional features
which are costly to acquire. We demon-
strate that if we use distributed represen-
tation of lexical items instead of features
produced by costly tools such as Morpho-
logical Analyzer, we get competitive re-
sults. This implies that only mono-lingual
corpus will suffice to produce good accu-
racy in case of resource poor languages for
which these tools are unavailable. We also
explored the importance of these represen-
tations for domain adaptation
and, in principle, it should be faster and easier than typing from a keyboard. This contribution investigates the integration of
automatic speech recognition (ASR) in a CAT workbench testing its real use by human translators while post-editing machine
translation (MT) outputs. This paper also explores the use of MT combined with ASR in order to improve recognition accuracy in a workbench integrating eye-tracking functionalities to collect process-oriented information about translators performance.
experiments to improve the performance of an automatic speech recognition system, taking advantage of machine translation
output and information from WordNet. Overall we outperform a baseline system which has no semantic information by an increased 1.6% word accuracy for the English to Hindi translation.