SAHAAYAK 2023 -- the Multi Domain Bilingual Parallel Corpus of Sanskrit to Hindi for Machine Translation
Authors:
Vishvajitsinh Bakrola,
Jitendra Nasariwala
Abstract:
The data article presents the large bilingual parallel corpus of low-resourced language pair Sanskrit-Hindi, named SAHAAYAK 2023. The corpus contains total of 1.5M sentence pairs between Sanskrit and Hindi. To make the universal usability of the corpus and to make it balanced, data from multiple domain has been incorporated into the corpus that includes, News, Daily conversations, Politics, Histor…
▽ More
The data article presents the large bilingual parallel corpus of low-resourced language pair Sanskrit-Hindi, named SAHAAYAK 2023. The corpus contains total of 1.5M sentence pairs between Sanskrit and Hindi. To make the universal usability of the corpus and to make it balanced, data from multiple domain has been incorporated into the corpus that includes, News, Daily conversations, Politics, History, Sport, and Ancient Indian Literature. The multifaceted approach has been adapted to make a sizable multi-domain corpus of low-resourced languages like Sanskrit. Our development approach is spanned from creating a small hand-crafted dataset to applying a wide range of mining, cleaning, and verification. We have used the three-fold process of mining: mining from machine-readable sources, mining from non-machine readable sources, and collation from existing corpora sources. Post mining, the dedicated pipeline for normalization, alignment, and corpus cleaning is developed and applied to the corpus to make it ready to use on machine translation algorithms.
△ Less
Submitted 27 June, 2023;
originally announced July 2023.
Neural Machine Translation System of Indic Languages -- An Attention based Approach
Authors:
Parth Shah,
Vishvajit Bakrola
Abstract:
Neural machine translation (NMT) is a recent and effective technique which led to remarkable improvements in comparison of conventional machine translation techniques. Proposed neural machine translation model developed for the Gujarati language contains encoder-decoder with attention mechanism. In India, almost all the languages are originated from their ancestral language - Sanskrit. They are ha…
▽ More
Neural machine translation (NMT) is a recent and effective technique which led to remarkable improvements in comparison of conventional machine translation techniques. Proposed neural machine translation model developed for the Gujarati language contains encoder-decoder with attention mechanism. In India, almost all the languages are originated from their ancestral language - Sanskrit. They are having inevitable similarities including lexical and named entity similarity. Translating into Indic languages is always be a challenging task. In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati that together covers more than 58.49 percentage of total speakers in the country. We have compared the performance of our NMT model with automatic evaluation matrices such as BLEU, perplexity and TER matrix. The comparison of our network with Google translate is also presented where it outperformed with a margin of 6 BLEU score on English-Gujarati translation.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
Optimal Approach for Image Recognition using Deep Convolutional Architecture
Authors:
Parth Shah,
Vishvajit Bakrola,
Supriya Pati
Abstract:
In the recent time deep learning has achieved huge popularity due to its performance in various machine learning algorithms. Deep learning as hierarchical or structured learning attempts to model high level abstractions in data by using a group of processing layers. The foundation of deep learning architectures is inspired by the understanding of information processing and neural responses in huma…
▽ More
In the recent time deep learning has achieved huge popularity due to its performance in various machine learning algorithms. Deep learning as hierarchical or structured learning attempts to model high level abstractions in data by using a group of processing layers. The foundation of deep learning architectures is inspired by the understanding of information processing and neural responses in human brain. The architectures are created by stacking multiple linear or non-linear operations. The article mainly focuses on the state-of-art deep learning models and various real world applications specific training methods. Selecting optimal architecture for specific problem is a challenging task, at a closing stage of the article we proposed optimal approach to deep convolutional architecture for the application of image recognition.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.