Information systems

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,836,077 records)|Limit your search to The ACM Full-Text Collection (773,992 records)

Showing 1 - 20of620 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
November 2024
IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5051–5064https://doi.org/10.1109/TASLP.2024.3507560
Extracting direct-path spatial feature is crucial for sound source localization in adverse acoustic environments. This paper proposes IPDnet, a neural network that estimates direct-path inter-channel phase difference (DP-IPD) of sound sources from ...
0
1
Metrics
Total Citations0
Total Downloads1
Last 12 Months1
Last 6 weeks1
Get Access
research-article
November 2024
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5106–5116https://doi.org/10.1109/TASLP.2024.3507568
In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance ...
0
Metrics
Total Citations0
Get Access
research-article
Open Access
November 2024
Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5092–5105https://doi.org/10.1109/TASLP.2024.3507566
Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. ...
0
6
Metrics
Total Citations0
Total Downloads6
Last 12 Months6
Last 6 weeks6
View online with eReader
PDF
research-article
Open Access
November 2024
An Interpretable Deep Mutual Information Curriculum Metric for a Robust and Generalized Speech Emotion Recognition System
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5117–5130https://doi.org/10.1109/TASLP.2024.3507562
It is difficult to achieve robust and well-generalized models for tasks involving subjective concepts such as emotion. It is inevitable to deal with noisy labels, given the ambiguous nature of human perception. Methodologies relying on <italic>semi-...
0
5
Metrics
Total Citations0
Total Downloads5
Last 12 Months5
Last 6 weeks5
View online with eReader
PDF
research-article
November 2024
Online Neural Speaker Diarization With Target Speaker Tracking
- Weiqing Wang,
- Ming Li
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5078–5091https://doi.org/10.1109/TASLP.2024.3507559
This paper proposes an online target speaker voice activity detection (TS-VAD) system for speaker diarization tasks that does not rely on prior knowledge from clustering-based diarization systems to obtain target speaker embeddings. By adapting ...
0
Metrics
Total Citations0
Get Access
research-article
November 2024
CLAPSep: Leveraging Contrastive Pre-Trained Model for Multi-Modal Query-Conditioned Target Sound Extraction
- Hao Ma,
- Zhiyuan Peng,
- Xu Li,
- Mingjie Shao,
- Xixin Wu,
- Ju Liu
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4945–4960https://doi.org/10.1109/TASLP.2024.3497586
Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings. This can be achieved by language-queried target sound extraction (TSE), which typically consists of two components: a query network that converts user ...
0
Metrics
Total Citations0
Get Access
research-article
November 2024
Scalable-Complexity Steered Response Power Based on Low-Rank and Sparse Interpolation
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5024–5039https://doi.org/10.1109/TASLP.2024.3496317
The steered response power (SRP) is a popular approach to compute a map of the acoustic scene, typically used for acoustic source localization. The SRP map is obtained as the frequency-weighted output power of a beamformer steered towards a grid of ...
0
Metrics
Total Citations0
Get Access
research-article
November 2024
<inline-formula><tex-math notation="LaTeX">$\mathcal {P}$</tex-math></inline-formula>owMix: A Versatile Regularizer for Multimodal Sentiment Analysis
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 5010–5023https://doi.org/10.1109/TASLP.2024.3496316
Multimodal sentiment analysis (MSA) leverages heterogeneous data sources to interpret the complex nature of human sentiments. Despite significant progress in multimodal architecture design, the field lacks comprehensive regularization methods. This paper ...
0
Metrics
Total Citations0
Get Access
research-article
November 2024
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4822–4837https://doi.org/10.1109/TASLP.2024.3486206
Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional ...
0
Metrics
Total Citations0
Get Access
research-article
November 2024
FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection
- Bo Wang,
- Yeling Tang,
- Fei Wei,
- Zhongjie Ba,
- Kui Ren
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4905–4918https://doi.org/10.1109/TASLP.2024.3492796
In recent years, the field of audio deepfake detection has witnessed significant advancements. Nonetheless, the majority of solutions have concentrated on high-quality audio, largely overlooking the challenge of low-quality compressed audio in real-world ...
0
3
Metrics
Total Citations0
Total Downloads3
Last 12 Months3
Last 6 weeks3
Get Access
research-article
November 2024
TF-CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation
- Vahid Ahmadi Kalkhorani,
- DeLiang Wang
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4999–5009https://doi.org/10.1109/TASLP.2024.3492803
We introduce TF-CrossNet, a complex spectral mapping approach to speaker separation and enhancement in reverberant and noisy conditions. The proposed architecture comprises an encoder layer, a global multi-head self-attention module, a cross-band module, ...
0
Metrics
Total Citations0
Get Access
research-article
November 2024
FlowHash: Accelerating Audio Search With Balanced Hashing via Normalizing Flow
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4961–4970https://doi.org/10.1109/TASLP.2024.3486227
Nearest neighbor search on context representation vectors is a formidable task due to challenges posed by high dimensionality, scalability issues, and potential noise within query vectors. Our novel approach leverages normalizing flow within a self-...
0
Metrics
Total Citations0
Get Access
research-article
October 2024
MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning
- Ren Li,
- Qiao Xiao,
- Jianxi Yang,
- Luyi Zhang,
- Yu Chen
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4838–4849https://doi.org/10.1109/TASLP.2024.3490373
The rapid development of pre-trained language models (PLMs) has significantly enhanced the performance of machine reading comprehension (MRC). Nevertheless, the traditional fine-tuning approaches necessitate extensive labeled data. MRC remains a ...
0
Metrics
Total Citations0
Get Access
research-article
October 2024
DeFTAN-II: Efficient Multichannel Speech Enhancement With Subgroup Processing
- Dongheon Lee,
- Jung-Woo Choi
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4850–4866https://doi.org/10.1109/TASLP.2024.3488564
In this work, we present DeFTAN-II, an efficient multichannel speech enhancement model based on transformer architecture and subgroup processing. Despite the success of transformers in speech enhancement, they face challenges in capturing local relations, ...
0
1
Metrics
Total Citations0
Total Downloads1
Last 12 Months1
Last 6 weeks1
Get Access
research-article
October 2024
WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4755–4767https://doi.org/10.1109/TASLP.2024.3487419
Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to ...
0
1
Metrics
Total Citations0
Total Downloads1
Last 12 Months1
Last 6 weeks1
Get Access
research-article
October 2024
CL-MASR: A Continual Learning Benchmark for Multilingual ASR
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4931–4944https://doi.org/10.1109/TASLP.2024.3487410
Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it possible to transcribe audio in multiple languages with a single model. However, current state-of-the-art ASR models are typically evaluated on individual languages ...
0
Metrics
Total Citations0
Get Access
research-article
October 2024
Interference-Controlled Maximum Noise Reduction Beamformer Based on Deep-Learned Interference Manifold
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4676–4690https://doi.org/10.1109/TASLP.2024.3485551
Beamforming has been used in a wide range of applications to extract the signal of interest from microphone array observations, which consist of not only the signal of interest, but also noise, interference, and reverberation. The recently proposed ...
0
2
Metrics
Total Citations0
Total Downloads2
Last 12 Months2
Last 6 weeks2
Get Access
research-article
October 2024
EchoScan: Scanning Complex Room Geometries via Acoustic Echoes
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4768–4782https://doi.org/10.1109/TASLP.2024.3485516
Accurate estimation of indoor space geometries is vital for constructing precise digital twins, whose broad industrial applications include navigation in unfamiliar environments and efficient evacuation planning, particularly in low-light conditions. This ...
0
Metrics
Total Citations0
Get Access
research-article
October 2024
Learning Dynamic and Static Representations for Extrapolation-Based Temporal Knowledge Graph Reasoning
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4741–4754https://doi.org/10.1109/TASLP.2024.3485500
Temporal knowledge graph reasoning aims to predict the missing links (facts) in the future timestamps. However, most existing methods have a common limitation: they focus on learning dynamic representations of temporal knowledge graphs and rarely consider ...
0
Metrics
Total Citations0
Get Access
research-article
October 2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 32Pages 4700–4712https://doi.org/10.1109/TASLP.2024.3485485
Recent advancements in diffusion models and large language models (LLMs) have significantly propelled the field of generation tasks. Text-to-Audio (TTA), a burgeoning generation application designed to generate audio from natural language prompts, is ...
0
1
Metrics
Total Citations0
Total Downloads1
Last 12 Months1
Last 6 weeks1
Get Access

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

All Publications

Content Type

Media Formats

Publisher

Publication Date

IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach

An Interpretable Deep Mutual Information Curriculum Metric for a Robust and Generalized Speech Emotion Recognition System

Online Neural Speaker Diarization With Target Speaker Tracking

CLAPSep: Leveraging Contrastive Pre-Trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

Scalable-Complexity Steered Response Power Based on Low-Rank and Sparse Interpolation

<inline-formula><tex-math notation="LaTeX">$\mathcal {P}$</tex-math></inline-formula>owMix: A Versatile Regularizer for Multimodal Sentiment Analysis

Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features

FTDKD: Frequency-Time Domain Knowledge Distillation for Low-Quality Compressed Audio Deepfake Detection

TF-CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

FlowHash: Accelerating Audio Search With Balanced Hashing via Normalizing Flow

MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning

DeFTAN-II: Efficient Multichannel Speech Enhancement With Subgroup Processing

WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment

CL-MASR: A Continual Learning Benchmark for Multilingual ASR

Interference-Controlled Maximum Noise Reduction Beamformer Based on Deep-Learned Interference Manifold

EchoScan: Scanning Complex Room Geometries via Acoustic Echoes

Learning Dynamic and Static Representations for Extrapolation-Based Temporal Knowledge Graph Reasoning

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation