Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Reflects downloads up to 09 Nov 2024Bibliometrics
research-article
RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems

Automatically solving math word problems is a critical task in the field of natural language processing. Recent models have reached their performance bottleneck and require more high-quality data for training. We propose a novel data augmentation method ...

research-article
Scalable and Efficient Neural Speech Coding: A Hybrid Design

We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (...

research-article
Text Generation From Data With Dynamic Planning

Transcribing structural data into readable text (data-to-text) is a fundamental language generation task. One of its challenges is to plan the input records for text realization. Recent works tackle this problem with a static planner, which performs ...

research-article
Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation

The perception of one’s own voice influences the acceptance of hearing devices, such as headphones, headsets or hearing aids. When these devices fully or partially occlude the ear canal, the wearer’s own voice sounds boomy or like talking in ...

research-article
Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles

Recent pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally provide more effective contextualized word representations ...

research-article
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition

Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model learned from large-scale data towards a task-specific small-scale data scenarios. However, previous studies focus on single domain ...

research-article
Open Access
Unsupervised Character Embedding Correction and Candidate Word Denoising

Inthis paper, we take Indonesian as the research object, and propose a multiple filter correction framework (MFCF). The main idea of MFCF is to remove noise from candidate words to increase the probability of correct words being selected. In MFCF, we use ...

research-article
Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service

Given a long dialogue, the dialogue summarization system aims to obtain a shorter highlight which retains the important information in the original text. For the customer service scenarios, the summaries of most dialogues between an agent and a user focus ...

research-article
Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack

Over the past few years, various word-level textual attack approaches have been proposed to reveal the vulnerability of deep neural networks used in natural language processing. Typically, these approaches involve an important optimization step to ...

research-article
Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity

Honey bees are one of the most important insects on the planet since they play a key role in the pollination services of both cultivated and spontaneous flora. Recent years have seen an increase in bee mortality which points out the necessity of intensive ...

research-article
Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning

Despite the widespread utilization of deep neural networks (DNNs) for speech emotion recognition (SER), they are severely restricted due to the paucity of labeled data for training. Recently, segment-based approaches for SER have been evolving, which ...

research-article
Open Access
Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue that a DNN vector-to-vector ...

research-article
Open Access
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this ...

research-article
End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement

Intelligibility of speech can be significantly reduced when it is presented in adverse near-end listening conditions, like background noise. Multiple approaches have been suggested to improve the perception of speech in such conditions. However, most of ...

research-article
VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation

Speech dereverberation is an important issue for many real-world speech processing applications. Among the techniques developed, the weighted prediction error (WPE) algorithm has been widely adopted and advanced over the last decade, which blindly cancels ...

research-article
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis

Generating natural speech with a diverse and smooth prosody pattern is a challenging task. Although random sampling with phone-level prosody distribution has been investigated to generate different prosody patterns, the diversity of the generated speech ...

research-article
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV ...

research-article
Multi-View Speech Emotion Recognition Via Collective Relation Construction

Automatic emotion recognition from speech plays a fundamental role towards advanced emotional intelligence in human-machine interaction systems. The discriminative knowledge from speech for effective emotion recognition may come from multiple physical ...

research-article
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

ASRhas been shown to achieve great performance recently. However, most of them rely on massive paired data, which is not feasible for low-resource languages worldwide. This paper investigates how to learn directly from unpaired phone sequences and speech ...

research-article
Open Access
Word-Region Alignment-Guided Multimodal Neural Machine Translation

We propose word-region alignment-guided multimodal neural machine translation (MNMT), a novel model for MNMT that links the semantic correlation between textual and visual modalities using word-region alignment (WRA). Existing studies on MNMT have mainly ...

research-article
Syntax-Aware Multi-Spans Generation for Reading Comprehension

This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such ...

research-article
DUMA: Reading Comprehension With Transposition Thinking

Multi-choice Machine Reading Comprehension (MRC) requires models to decide the correct answer from a set of answer options when given a passage and a question. Thus, in addition to a powerful Pre-trained Language Model (PrLM) as an encoder, multi-choice ...

research-article
Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions

Distractor generation task aims to generate incorrect options (i.e., distractors) for multiple choice questions from an article.Existing methods for this task often utilize a standard encoder-decoder framework. However, these methods often tend to ...

research-article
A Parametric Unconstrained Beamformer Based Binaural Noise Reduction for Assistive Hearing

For hearing-impaired listeners, it is required not only to enhance the target speech by suppressing ambient noises, but also to preserve the binaural cues of important directional sources, such that a complete spatial awareness of the acoustic scene is ...

research-article
Music Emotion Recognition: Intention of Composers-Performers Versus Perception of Musicians, Non-Musicians, and Listening Machines

This paper investigates to which extent state of the art machine learning methods are effective in classifying emotions in the context of individual musical instruments, and how their performances compare with musically trained and untrained listeners. To ...

research-article
Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily ...

research-article
Integrating Prior Translation Knowledge Into Neural Machine Translation

Neural machine translation (NMT), which is an encoder-decoder joint neural language model with an attention mechanism, has achieved impressive results on various machine translation tasks in the past several years. However, the language model attribute of ...

research-article
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification

Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech recognition (ASR). However, how to tackle the long-tailed data distribution problem while maintaining E2E ASR models' performance for high-frequency tokens is still ...

research-article
HPSG-Inspired Joint Neural Constituent and Dependency Parsing in O(<inline-formula><tex-math notation="LaTeX">$n^3$</tex-math></inline-formula>) Time Complexity

Constituent and dependency parsing, the two classic forms of syntactic parsing, have been found to benefit from joint training and decoding under a uniform formalism, inspired by Head-driven Phrase Structure Grammar (HPSG). We thus refer to this joint ...

research-article
Open Access
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds

Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer. The framework of Automatic ...

Subjects

Comments