Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–4 of 4 results for author: Lodagala, V S

.
  1. arXiv:2308.01018  [pdf, other

    cs.CL cs.SD eess.AS

    SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

    Authors: Ramanan Sivaguru, Vasista Sai Lodagala, S Umesh

    Abstract: While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditional inputs, it still leaves scope for richer representations. As a part of this work, we leverage representations from various Self-Supervised Learning (SSL) models to enhance the quality of the synthesized speech. In particular, we pass the FastSpeech2 encoder's length-regulated outputs through a… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at Interspeech 2023

  2. arXiv:2211.01338  [pdf, other

    eess.AS cs.CL cs.MM cs.SD eess.IV

    Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

    Authors: Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya , et al. (2 additional authors not shown)

    Abstract: Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  3. arXiv:2211.01246  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

    Authors: Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

    Abstract: In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augme… ▽ More

    Submitted 13 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  4. arXiv:2210.02592  [pdf, other

    cs.CL

    CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

    Authors: Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

    Abstract: While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those ne… ▽ More

    Submitted 13 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022