TASLP: Vol 30, No

Volume 302022

Volume 30

2022

Publisher:

IEEE Press

ISSN:2329-9290

EISSN:2329-9304

Tags:

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Select All

Export Citations Save to Binder

research-article

RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems

Pages 1–11https://doi.org/10.1109/TASLP.2021.3126932

Automatically solving math word problems is a critical task in the field of natural language processing. Recent models have reached their performance bottleneck and require more high-quality data for training. We propose a novel data augmentation method ...

research-article

Scalable and Efficient Neural Speech Coding: A Hybrid Design

Pages 12–25https://doi.org/10.1109/TASLP.2021.3129353

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (...

research-article

Text Generation From Data With Dynamic Planning

Pages 26–34https://doi.org/10.1109/TASLP.2021.3129346

Transcribing structural data into readable text (data-to-text) is a fundamental language generation task. One of its challenges is to plan the input records for text realization. Recent works tackle this problem with a static planner, which performs ...

research-article

Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation

Pages 35–48https://doi.org/10.1109/TASLP.2021.3130966

The perception of one’s own voice influences the acceptance of hearing devices, such as headphones, headsets or hearing aids. When these devices fully or partially occlude the ear canal, the wearer’s own voice sounds boomy or like talking in ...

research-article

Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles

Pages 49–59https://doi.org/10.1109/TASLP.2021.3130972

Recent pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally provide more effective contextualized word representations ...

research-article

Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition

Pages 60–75https://doi.org/10.1109/TASLP.2021.3130975

Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model learned from large-scale data towards a task-specific small-scale data scenarios. However, previous studies focus on single domain ...

research-article

Open Access

Unsupervised Character Embedding Correction and Candidate Word Denoising

Pages 76–86https://doi.org/10.1109/TASLP.2021.3129334

Inthis paper, we take Indonesian as the research object, and propose a multiple filter correction framework (MFCF). The main idea of MFCF is to remove noise from candidate words to increase the probability of correct words being selected. In MFCF, we use ...

research-article

Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service

Pages 87–97https://doi.org/10.1109/TASLP.2021.3133206

Given a long dialogue, the dialogue summarization system aims to obtain a shorter highlight which retains the important information in the original text. For the customer service scenarios, the summaries of most dialogues between an agent and a user focus ...

research-article

Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack

Pages 98–111https://doi.org/10.1109/TASLP.2021.3130970

Over the past few years, various word-level textual attack approaches have been proposed to reveal the vulnerability of deep neural networks used in natural language processing. Typically, these approaches involve an important optimization step to ...

research-article

Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity

Pages 112–122https://doi.org/10.1109/TASLP.2021.3133194

Honey bees are one of the most important insects on the planet since they play a key role in the pollination services of both cultivated and spontaneous flora. Recent years have seen an increase in bee mortality which points out the necessity of intensive ...

research-article

Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning

Pages 123–134https://doi.org/10.1109/TASLP.2021.3133195

Despite the widespread utilization of deep neural networks (DNNs) for speech emotion recognition (SER), they are severely restricted due to the paucity of labeled data for training. Recently, segment-based approaches for SER have been evolving, which ...

research-article

Open Access

Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

Pages 135–147https://doi.org/10.1109/TASLP.2021.3133218

We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue that a DNN vector-to-vector ...

research-article

Open Access

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Pages 148–161https://doi.org/10.1109/TASLP.2021.3133216

Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this ...

research-article

End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement

Pages 162–173https://doi.org/10.1109/TASLP.2021.3126947

Intelligibility of speech can be significantly reduced when it is presented in adverse near-end listening conditions, like background noise. Multiple approaches have been suggested to improve the perception of speech in such conditions. However, most of ...

research-article

VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation

Pages 174–189https://doi.org/10.1109/TASLP.2021.3133190

Speech dereverberation is an important issue for many real-world speech processing applications. Among the techniques developed, the weighted prediction error (WPE) algorithm has been widely adopted and advanced over the last decade, which blindly cancels ...

research-article

Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Pages 190–201https://doi.org/10.1109/TASLP.2021.3133205

Generating natural speech with a diverse and smooth prosody pattern is a challenging task. Although random sampling with phone-level prosody distribution has been investigated to generate different prosody patterns, the diversity of the generated speech ...

research-article

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

Pages 202–217https://doi.org/10.1109/TASLP.2021.3133189

Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV ...

research-article

Multi-View Speech Emotion Recognition Via Collective Relation Construction

Pages 218–229https://doi.org/10.1109/TASLP.2021.3133196

Automatic emotion recognition from speech plays a fundamental role towards advanced emotional intelligence in human-machine interaction systems. The discriminative knowledge from speech for effective emotion recognition may come from multiple physical ...

research-article

Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Pages 230–243https://doi.org/10.1109/TASLP.2021.3138720

ASRhas been shown to achieve great performance recently. However, most of them rely on massive paired data, which is not feasible for low-resource languages worldwide. This paper investigates how to learn directly from unpaired phone sequences and speech ...

research-article

Open Access

Word-Region Alignment-Guided Multimodal Neural Machine Translation

Pages 244–259https://doi.org/10.1109/TASLP.2021.3138719

We propose word-region alignment-guided multimodal neural machine translation (MNMT), a novel model for MNMT that links the semantic correlation between textual and visual modalities using word-region alignment (WRA). Existing studies on MNMT have mainly ...

research-article

Syntax-Aware Multi-Spans Generation for Reading Comprehension

Pages 260–268https://doi.org/10.1109/TASLP.2021.3138679

This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such ...

research-article

DUMA: Reading Comprehension With Transposition Thinking

Pages 269–279https://doi.org/10.1109/TASLP.2021.3138683

Multi-choice Machine Reading Comprehension (MRC) requires models to decide the correct answer from a set of answer options when given a passage and a question. Thus, in addition to a powerful Pre-trained Language Model (PrLM) as an encoder, multi-choice ...

research-article

Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions

Pages 280–291https://doi.org/10.1109/TASLP.2021.3138706

Distractor generation task aims to generate incorrect options (i.e., distractors) for multiple choice questions from an article.Existing methods for this task often utilize a standard encoder-decoder framework. However, these methods often tend to ...

research-article

A Parametric Unconstrained Beamformer Based Binaural Noise Reduction for Assistive Hearing

Pages 292–304https://doi.org/10.1109/TASLP.2021.3138675

For hearing-impaired listeners, it is required not only to enhance the target speech by suppressing ambient noises, but also to preserve the binaural cues of important directional sources, such that a complete spatial awareness of the acoustic scene is ...

research-article

Music Emotion Recognition: Intention of Composers-Performers Versus Perception of Musicians, Non-Musicians, and Listening Machines

Pages 305–316https://doi.org/10.1109/TASLP.2021.3138709

This paper investigates to which extent state of the art machine learning methods are effective in classifying emotions in the context of individual musical instruments, and how their performances compare with musically trained and untrained listeners. To ...

research-article

Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

Pages 317–329https://doi.org/10.1109/TASLP.2021.3138674

Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily ...

research-article

Integrating Prior Translation Knowledge Into Neural Machine Translation

Pages 330–339https://doi.org/10.1109/TASLP.2021.3138714

Neural machine translation (NMT), which is an encoder-decoder joint neural language model with an attention mechanism, has achieved impressive results on various machine translation tasks in the past several years. However, the language model attribute of ...

research-article

Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification

Pages 340–354https://doi.org/10.1109/TASLP.2021.3138707

Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech recognition (ASR). However, how to tackle the long-tailed data distribution problem while maintaining E2E ASR models' performance for high-frequency tokens is still ...

research-article

HPSG-Inspired Joint Neural Constituent and Dependency Parsing in O(<inline-formula><tex-math notation="LaTeX">$n^3$</tex-math></inline-formula>) Time Complexity

Pages 355–366https://doi.org/10.1109/TASLP.2021.3138715

Constituent and dependency parsing, the two classic forms of syntactic parsing, have been found to benefit from joint training and decoding under a uniform formalism, inspired by Head-driven Phrase Structure Grammar (HPSG). We thus refer to this joint ...

research-article

Open Access

Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds

Pages 367–377https://doi.org/10.1109/TASLP.2022.3140549

Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer. The framework of Automatic ...

IEEE/ACM Transactions on Audio, Speech and Language Processing

Sections

RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems

Scalable and Efficient Neural Speech Coding: A Hybrid Design

Text Generation From Data With Dynamic Planning

Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation

Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles

Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition

Unsupervised Character Embedding Correction and Candidate Word Denoising

Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service

Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack

Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity

Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning

Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement

VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation

Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

Multi-View Speech Emotion Recognition Via Collective Relation Construction

Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Word-Region Alignment-Guided Multimodal Neural Machine Translation

Syntax-Aware Multi-Spans Generation for Reading Comprehension

DUMA: Reading Comprehension With Transposition Thinking

Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions

A Parametric Unconstrained Beamformer Based Binaural Noise Reduction for Assistive Hearing

Music Emotion Recognition: Intention of Composers-Performers Versus Perception of Musicians, Non-Musicians, and Listening Machines

Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

Integrating Prior Translation Knowledge Into Neural Machine Translation

Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification

HPSG-Inspired Joint Neural Constituent and Dependency Parsing in O(<inline-formula><tex-math notation="LaTeX">$n^3$</tex-math></inline-formula>) Time Complexity

Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds

Sections

Save to Binder

Subjects

Comments