Juan Pino

Publisher: ISCA

Publication Name: Interspeech 2022

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Speech Synthesis, Machine Translation, and 4 moreSpeech Recognition, Encoder, Speech Translation, and Training Set

Download (.pdf)

Publisher: ISCA

Publication Name: Interspeech 2022

Research Interests:
Engineering, Computer Science, Speech Synthesis, Speech Recognition, and Speech Translation

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Research Interests:
Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, and 4 moreSpeech Recognition, Encoder, Speech Translation, and arXiv

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Research Interests:
Computer Science

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Research Interests:
Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Speech Recognition, and 4 moreEncoder, Chen, Speech Translation, and arXiv

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research Interests:
Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Speech Synthesis, and 7 moreSpeech Recognition, Inference, Encoder, Spectrogram, Language Model, Speech Translation, and arXiv

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Speech Recognition, Encoder, and Speech Translation

Download (.pdf)

Publisher: IEEE

Publication Name: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research Interests:
Computer Science, Speech Recognition, Transformer, Encoder, and Speech Translation

Download (.pdf)

Publisher: European Language Resources Association

Publication Date: Feb 4, 2020

Publication Name: language resources and evaluation

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Popularity, LICENSE, and Speech Translation

Download (.pdf)

For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then trans- late with machine translation (MT). A major cause of the performance... more

For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then trans- late with machine translation (MT). A major cause of the performance gap is that, while existing AST corpora are small, massive datasets exist for both the ASR and MT subsystems. In this work, we evaluate several data augmentation and pretraining approaches for AST, by comparing all on the same datasets. Simple data augmentation by translating ASR transcripts proves most effective on the English–French augmented LibriSpeech dataset, closing the performance gap from 8.2 to 1.4 BLEU, compared to a very strong cascade that could directly utilize copious ASR and MT data. The same end-to-end approach plus fine-tuning closes the gap on the English–Romanian MuST-C dataset from 6.7 to 3.7 BLEU. In addition to these results, we present practical rec- ommendations for augmentation and pretraining approaches. Finally, we decrease...

Publisher: Zenodo

Publication Date: Nov 2, 2019

Research Interests:
Computer Science

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2021

Publication Name: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Research Interests:
Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Semi-supervised Learning, and 3 moreLICENSE, Representation Politics, and arXiv

Download (.pdf)

Publisher: International Committee on Computational Linguistics

Publication Date: 2020

Publication Name: Proceedings of the 28th International Conference on Computational Linguistics

Research Interests:
Engineering, Computer Science, Architecture, Machine Translation, Speech Recognition, and 2 moreTransformer and Speech Translation

Download (.pdf)

Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a... more

Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention. We also introduce two novel and interpretable approaches for latency control that are specifically designed for multiple attentions heads. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We also analyze how the latency controls affect the attention span and we motivate the introduction of our model by analyzing the effect of the number of decoder layers and heads on quality and latency.

Publisher: ICLR

Publication Date: 2020

Publication Name: ArXiv

Research Interests:
Computer Science, Artificial Intelligence, Machine Translation, Transformer, and arXiv

Download (.pdf)

In a speech-to-speech translation (S2ST) pipeline, the text-to-speech (TTS) module is an important component for delivering the translated speech to users. To enable incremental S2ST, the TTS module must be capable of synthesizing and... more

In a speech-to-speech translation (S2ST) pipeline, the text-to-speech (TTS) module is an important component for delivering the translated speech to users. To enable incremental S2ST, the TTS module must be capable of synthesizing and playing utterances while its input text is still streaming in. In this work, we focus on improving the incremental synthesis performance of TTS models. With a simple data augmentation strategy based on prefixes, we are able to improve the incremental TTS quality to approach offline performance. Furthermore, we bring our incremental TTS system to the practical scenario in combination with an upstream simultaneous speech translation system, and show the gains also carry over to this use-case. In addition, we propose latency metrics tailored to S2ST applications, and investigate methods for latency reduction in this context.

Publisher: ArXiv

Publication Date: 2021

Publication Name: ArXiv

Research Interests:
Computer Science, Speech Recognition, and arXiv

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2021

Publication Name: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Research Interests:
Computer Science, Artificial Intelligence, Machine Translation, Speech Recognition, and arXiv

Download (.pdf)

Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets. Nevertheless, current datasets cover a limited number of languages. With the aim to foster research in... more

Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets. Nevertheless, current datasets cover a limited number of languages. With the aim to foster research in massive multilingual speech translation and speech translation for low resource language pairs, we release CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. This represents the largest open dataset available to date from total volume and language coverage perspective. Data sanity checks provide evidence about the quality of the data, which is released under CC0 license. We also provide extensive speech recognition, bilingual and multilingual machine translation and speech translation baselines with open-source implementation.

Publication Date: 2020

Publication Name: arXiv: Computation and Language

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, and Speech Translation

Download (.pdf)

We present the first direct simultaneous speech-to-speech translation (Simul-S2ST) model, with the ability to start generating translation in the target speech before consuming the full source speech content and independently from... more

We present the first direct simultaneous speech-to-speech translation (Simul-S2ST) model, with the ability to start generating translation in the target speech before consuming the full source speech content and independently from intermediate text representations. Our approach leverages recent progress on direct speech-to-speech translation with discrete units. Instead of continuous spectrogram features, a sequence of direct representations, which are learned in a unsupervised manner, are predicted from the model and passed directly to a vocoder for speech synthesis. The simultaneous policy then operates on source speech features and target discrete units. Finally, a vocoder synthesize the target speech from discrete units on-the-fly. We carry out numerical studies to compare cascaded and direct approach on Fisher Spanish-English dataset.

Publisher: ArXiv

Publication Date: 2021

Publication Name: ArXiv

Research Interests:
Engineering, Computer Science, Speech Synthesis, Machine Translation, Speech Recognition, and 4 moreSpectrogram, Direct Speech, Speech Translation, and arXiv

Download (.pdf)

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq’s careful design for scalability and extensibility. We provide... more

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq’s careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq’s machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T is available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.

Publisher: AACL

Publication Date: 2020

Publication Name: ArXiv

Research Interests:
Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Documentation, and 8 moreMachine Translation, Workflow, Inference, Transformer, Scalability, Language Model, Speech Translation, and arXiv

Download (.pdf)

In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task. Our system is built by leveraging transfer learning... more

In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task. Our system is built by leveraging transfer learning across modalities, tasks and languages. First, we leverage general-purpose multilingual modules pretrained with large amounts of unlabelled and labelled data. We further enable knowledge transfer from the text task to the speech task by training two tasks jointly. Finally, our multilingual model is finetuned on speech translation task-specific data to achieve the best translation results. Experimental results show our system outperforms the reported systems, including both end-to-end and cascaded based approaches, by a large margin. In some translation directions, our speech translation results evaluated on the public Multilingual TEDx test set are even comparable with the ones from a strong text-to-text translation system, which uses the oracle speech t...

Publisher: IWSLT

Publication Date: 2021

Publication Name: ArXiv

Research Interests:
Engineering, Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, and 4 moreSpeech Recognition, Oracle, Speech Translation, and arXiv

Download (.pdf)

Publisher: IEEE

Publication Date: 2021

Publication Name: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, Speech Recognition, and 2 moreAutoencoder and Word error rate

Download (.pdf)

Publication Date: 2018

Research Interests:
Computer Science and Adversarial System

Download (.pdf)

Publisher: ISCA

Publication Date: 2021

Publication Name: Interspeech 2021

Research Interests:
Computer Science and Speech Recognition

Publisher: ISCA

Publication Date: 2021

Publication Name: Interspeech 2021

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, Speech Recognition, and 3 moreLanguage Model, Speech Translation, and arXiv

Download (.pdf)

Publisher: ISCA

Publication Date: 2020

Publication Name: Interspeech 2020

Research Interests:
Engineering, Computer Science, Speech Recognition, and Speech Translation

Download (.pdf)

Publisher: ISCA

Publication Date: 2020

Publication Name: Interspeech 2020

Research Interests:
Engineering, Computer Science, Speech Recognition, and Speech Translation

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2020

Publication Name: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Research Interests:
Computer Science and Machine Translation

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2020

Publication Name: Proceedings of the 17th International Conference on Spoken Language Translation

Research Interests:
Computer Science

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2019

Publication Name: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, Syntax, and Nepali

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2019

Publication Name: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, Salient, and 2 moreRobustness (evolution) and testbed

Download (.pdf)

Publisher: Association for Computational Linguistics

Publication Date: 2019

Publication Name: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Research Interests:
Computer Science, Artificial Intelligence, and Natural Language Processing

Download (.pdf)

Publisher: ISCA

Publication Name: Interspeech 2022

Publisher: ISCA

Publication Name: Interspeech 2022

Research Interests: Engineering, Computer Science, Speech Synthesis, Speech Recognition, and Speech Translation<div>()</div>

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Research Interests: Computer Science<div>()</div>

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Publisher: Association for Computational Linguistics

Publication Name: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research Interests: Computer Science, Artificial Intelligence, Natural Language Processing, Speech Recognition, Encoder, and Speech Translation<div>()</div>

Publisher: IEEE

Publication Name: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research Interests: Computer Science, Speech Recognition, Transformer, Encoder, and Speech Translation<div>()</div>

Publisher: European Language Resources Association

Publication Date: Feb 4, 2020

Publication Name: language resources and evaluation

Research Interests: Computer Science, Artificial Intelligence, Natural Language Processing, Popularity, LICENSE, and Speech Translation<div>()</div>

Publisher: Zenodo

Publication Date: Nov 2, 2019

Research Interests: Computer Science<div>()</div>

Publisher: Association for Computational Linguistics

Publication Date: 2021

Publication Name: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Publisher: International Committee on Computational Linguistics

Publication Date: 2020

Publication Name: Proceedings of the 28th International Conference on Computational Linguistics

Publisher: ICLR

Publication Date: 2020

Publication Name: ArXiv

Research Interests: Computer Science, Artificial Intelligence, Machine Translation, Transformer, and arXiv<div>()</div>

Publisher: ArXiv

Publication Date: 2021

Publication Name: ArXiv

Research Interests: Computer Science, Speech Recognition, and arXiv<div>()</div>

Publisher: Association for Computational Linguistics

Publication Date: 2021

Publication Name: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Research Interests: Computer Science, Artificial Intelligence, Machine Translation, Speech Recognition, and arXiv<div>()</div>

Publication Date: 2020

Publication Name: arXiv: Computation and Language

Research Interests: Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, and Speech Translation<div>()</div>

Publisher: ArXiv

Publication Date: 2021

Publication Name: ArXiv

Publisher: AACL

Publication Date: 2020

Publication Name: ArXiv

Publisher: IWSLT

Publication Date: 2021

Publication Name: ArXiv

Publisher: IEEE

Publication Date: 2021

Publication Name: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publication Date: 2018

Research Interests: Computer Science and Adversarial System<div>()</div>

Publisher: ISCA

Publication Date: 2021

Publication Name: Interspeech 2021

Research Interests: Computer Science and Speech Recognition<div>()</div>

Publisher: ISCA

Publication Date: 2021

Publication Name: Interspeech 2021

Publisher: ISCA

Publication Date: 2020

Publication Name: Interspeech 2020

Research Interests: Engineering, Computer Science, Speech Recognition, and Speech Translation<div>()</div>

Publisher: ISCA

Publication Date: 2020

Publication Name: Interspeech 2020

Research Interests: Engineering, Computer Science, Speech Recognition, and Speech Translation<div>()</div>

Publisher: Association for Computational Linguistics

Publication Date: 2020

Research Interests:
Engineering, Computer Science, Speech Synthesis, Speech Recognition, and Speech Translation

Research Interests:
Computer Science

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Speech Recognition, Encoder, and Speech Translation

Research Interests:
Computer Science, Speech Recognition, Transformer, Encoder, and Speech Translation

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Popularity, LICENSE, and Speech Translation

Research Interests:
Computer Science

Research Interests:
Computer Science, Artificial Intelligence, Machine Translation, Transformer, and arXiv

Research Interests:
Computer Science, Speech Recognition, and arXiv

Research Interests:
Computer Science, Artificial Intelligence, Machine Translation, Speech Recognition, and arXiv

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, and Speech Translation

Research Interests:
Computer Science and Adversarial System

Research Interests:
Computer Science and Speech Recognition

Research Interests:
Engineering, Computer Science, Speech Recognition, and Speech Translation

Research Interests:
Engineering, Computer Science, Speech Recognition, and Speech Translation

Research Interests:
Computer Science and Machine Translation

Research Interests:
Computer Science

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Machine Translation, Syntax, and Nepali

Research Interests:
Computer Science, Artificial Intelligence, and Natural Language Processing