default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 28
Volume 28, 2020
- Jamal Amini, Richard Christian Hendriks, Richard Heusdens, Meng Guo, Jesper Jensen:
Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks. 1-12 - Chitralekha Gupta, Haizhou Li, Ye Wang:
Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference. 13-26 - Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, Zhiyao Duan:
Noise-Resilient Training Method for Face Landmark Generation From Speech. 27-38 - Peidong Wang, Ke Tan, DeLiang Wang:
Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling. 39-48 - Yuki Mitsufuji, Stefan Uhlich, Norihiro Takamune, Daichi Kitamura, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain. 49-60 - Yaron Laufer, Sharon Gannot:
Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field. 61-76 - Naveen Kumar Desiraju, Simon Doclo, Markus Buck, Tobias Wolff:
Online Estimation of Reverberation Parameters For Late Residual Echo Suppression. 77-91 - Mehdi Zohourian, Rainer Martin:
Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation. 92-104 - Youhyun Shin, Sang-goo Lee:
Learning Context Using Segment-Level LSTM for Neural Sequence Labeling. 105-115 - Gongping Huang, Jingdong Chen, Jacob Benesty:
Design of Planar Differential Microphone Arrays With Fractional Orders. 116-130 - Ming-Hsiang Su, Chung-Hsien Wu, Liang-Yu Chen:
Attention-Based Response Generation Using Parallel Double Q-Learning for Dialog Policy Decision in a Conversational System. 131-143 - Satoru Emura:
Wave-Domain Residual Echo Reduction Using Subspace Tracking. 144-156 - Xin Wang, Shinji Takaki, Junichi Yamagishi, Simon King, Keiichi Tokuda:
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis. 157-170 - Falk-Martin Hoffmann, Philip Arthur Nelson, Filippo Maria Fazi:
DOA Estimation Performance With Circular Arrays in Sound Fields With Finite Rate of Innovation. 171-184 - Rongfeng Su, Xunying Liu, Lan Wang, Jingzhou Yang:
Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition. 185-197 - Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès, Renato De Mori:
Real to H-Space Autoencoders for Theme Identification in Telephone Conversations. 198-210 - Antonio Canclini, Fabio Antonacci, Stefano Tubaro, Augusto Sarti:
A Methodology for the Robust Estimation of the Radiation Pattern of Acoustic Sources. 211-224 - Yi Yu, Hongsen He, Badong Chen, Jianghui Li, Youwen Zhang, Lu Lu:
M-Estimate Based Normalized Subband Adaptive Filter Algorithm: Performance Analysis and Improvements. 225-239 - Haoxiang Wen, Senquan Yang, Yuanquan Hong, Huan Luo:
A Partial Update Adaptive Algorithm for Sparse System Identification. 240-255 - Martin Bo Møller, Jan Østergaard:
A Moving Horizon Framework for Sound Zones. 256-265 - Stylianos Ioannis Mimilakis, Konstantinos Drossos, Estefanía Cano, Gerald Schuller:
Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation. 266-278 - Lachlan Birnie, Thushara D. Abhayapala, Prasanga N. Samarasinghe:
Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework. 279-293 - Wenhao Ding, Liang He:
Adaptive Multi-Scale Detection of Acoustic Events. 294-306 - Weijian Zhang, Peng Song:
Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition. 307-318 - Bidisha Sharma, Ye Wang:
Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features. 319-331 - Hai Morgenstern, Boaz Rafaely:
Perceptually-Transparent Online Estimation of Two-Channel Room Transfer Function for Sound Calibration. 332-342 - Shaojin Ding, Guanlong Zhao, Christopher Liberatore, Ricardo Gutierrez-Osuna:
Learning Structured Sparse Representations for Voice Conversion. 343-354 - Mireia Díez, Lukás Burget, Federico Landini, Jan Cernocký:
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors. 355-368 - Jia-Chen Gu, Zhen-Hua Ling, Quan Liu:
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. 369-379 - Ke Tan, DeLiang Wang:
Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement. 380-390 - Richeng Duan, Tatsuya Kawahara, Masatake Dantsuji, Hiroaki Nanjo:
Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis. 391-401 - Xin Wang, Shinji Takaki, Junichi Yamagishi:
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. 402-415 - Sanjeel Parekh, Slim Essid, Alexey Ozerov, Ngoc Q. K. Duong, Patrick Pérez, Gaël Richard:
Weakly Supervised Representation Learning for Audio-Visual Scene Analysis. 416-428 - Jianfei Yu, Jing Jiang, Rui Xia:
Entity-Sensitive Attention and Fusion Network for Entity-Level Multimodal Sentiment Classification. 429-439 - John G. Beerends, Niels M. P. Neumann, Egon L. van den Broek, Anna Llagostera Casanovas, Jovana Torres Menendez, Christian Schmidmer, Jens Berger:
Subjective and Objective Assessment of Full Bandwidth Speech Quality. 440-449 - Vikram C. Mathad, S. R. Mahadeva Prasanna:
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech. 450-460 - Minh Nguyen, Gia H. Ngo, Nancy F. Chen:
Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks. 461-473 - Dani Cherkassky, Sharon Gannot:
Successive Relative Transfer Function Identification Using Blind Oblique Projection. 474-486 - Ivo Trowitzsch, Christopher Schymura, Dorothea Kolossa, Klaus Obermayer:
Joining Sound Event Detection and Localization Through Spatial Segregation. 487-502 - Shinichi Mogami, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Nobutaka Ono:
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation. 503-518 - Hamzeh Ghasemzadeh, Meisam Khalil Arjmandi:
Toward Optimum Quantification of Pathology-Induced Noises: An Investigation of Information Missed by Human Auditory System. 519-528 - Fei Ma, Wen Zhang, Thushara Dheemantha Abhayapala:
Active Control of Outgoing Broadband Noise Fields in Rooms. 529-539 - Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations. 540-552 - Tao Dai, Li Zhu, Yaxiong Wang, Kathleen M. Carley:
Attentive Stacked Denoising Autoencoder With Bi-LSTM for Personalized Context-Aware Citation Recommendation. 553-568 - Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura:
Multi-Source Neural Machine Translation With Missing Data. 569-580 - Jin Wang, Liang-Chih Yu, K. Robert Lai, Xuejie Zhang:
Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis. 581-591 - Abul Azad, Lamine Mili:
Robust Speech Filter and Voice Encoder Parameter Estimation Using the Phase-Phase Correlator. 592-604 - Abdullah Fahim, Prasanga N. Samarasinghe, Thushara D. Abhayapala:
Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield. 605-618 - Yaron Laufer, Bracha Laufer-Goldshtein, Sharon Gannot:
ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field. 619-634 - Zhongqing Wang, Qingying Sun, Shoushan Li, Qiaoming Zhu, Guodong Zhou:
Neural Stance Detection With Hierarchical Linguistic Representations. 635-645 - Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky:
Multi-Stream End-to-End Speech Recognition. 646-655 - Yu Maeno, Yuki Mitsufuji, Prasanga N. Samarasinghe, Naoki Murata, Thushara D. Abhayapala:
Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays. 656-670 - Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao:
A Joint Sentence Scoring and Selection Framework for Neural Extractive Document Summarization. 671-681 - Ivan Kukanov, Trung Ngo Trong, Ville Hautamäki, Sabato Marco Siniscalchi, Valerio Mario Salerno, Kong Aik Lee:
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition. 682-695 - Shoichi Koyama, Gilles Chardon, Laurent Daudet:
Optimizing Source and Sensor Placement for Sound Field Control: An Overview. 696-714 - Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda:
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model. 715-728 - Thomas Dietzen, Simon Doclo, Marc Moonen, Toon van Waterschoot:
Integrated Sidelobe Cancellation and Linear Prediction Kalman Filter for Joint Multi-Microphone Speech Dereverberation, Interfering Speech Cancellation, and Noise Reduction. 740-754 - Thomas Dietzen, Simon Doclo, Marc Moonen, Toon van Waterschoot:
Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem. 755-769 - Liwen Zhang, Ziqiang Shi, Jiqing Han:
Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification. 770-784 - Mengfan Zhang, Zhongshu Ge, Tiejun Liu, Xihong Wu, Tianshu Qu:
Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. 785-797 - Laureano Moro-Velázquez, Estefanía Hernández-García, Jorge Andrés Gómez García, Juan Ignacio Godino-Llorente, Najim Dehak:
Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance. 798-812 - Yijia Liu, Wanxiang Che, Bing Qin, Ting Liu:
Exploring Segment Representations for Neural Semi-Markov Conditional Random Fields. 813-824 - Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen:
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. 825-838 - Yang Ai, Zhen-Hua Ling:
A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis. 839-851 - Dongyan Yu, Huiping Duan, Jun Fang, Bing Zeng:
Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification. 852-861 - Ali Aroudi, Simon Doclo:
Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding. 862-875 - Christopher Gribben, Hyunkook Lee:
The Perception of Band-Limited Decorrelation Between Vertically Oriented Loudspeakers. 876-888 - Olivier Perrotin, Ian Vince McLoughlin:
Glottal Flow Synthesis for Whisper-to-Speech Conversion. 889-900 - Gongping Huang, Jacob Benesty, Israel Cohen, Jingdong Chen:
Differential Beamforming on Graphs. 901-913 - Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot:
Global and Local Simplex Representations for Multichannel Source Separation. 914-928 - Henning F. Schepker, Sven Nordholm, Simon Doclo:
Acoustic Feedback Suppression for Multi-Microphone Hearing Devices Using a Soft-Constrained Null-Steering Beamformer. 929-940 - Zhong-Qiu Wang, DeLiang Wang:
Deep Learning Based Target Cancellation for Speech Dereverberation. 941-950 - Yeongseok Kim, Youngjin Park:
Blockwise Weighted Least Square Active Noise Control for CPU-GPU Architecture. 951-963 - Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller:
Speech Technology for Unwritten Languages. 964-975 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Machine Speech Chain. 976-989 - M. Khadem-hosseini, Shahrokh Ghaemmaghami, Azra Abtahi, Saeed Gazor, Farrokh Marvasti:
Error Correction in Pitch Detection Using a Deep Learning Based Classification. 990-999 - Enzo De Sena, Zoran Cvetkovic, Hüseyin Hacihabiboglu, Marc Moonen, Toon van Waterschoot:
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction. 1000-1015 - Vera Erbes, Sascha Spors:
Localisation Properties of Wave Field Synthesis in a Listening Room. 1016-1024 - Jia Pan, Genshun Wan, Jun Du, Zhongfu Ye:
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition. 1025-1037 - Weicheng Cai, Jinkun Chen, Jun Zhang, Ming Li:
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. 1038-1051 - George Sterpu, Christian Saam, Naomi Harte:
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition. 1052-1064 - Christopher Schymura, Dorothea Kolossa:
Audiovisual Speaker Tracking Using Nonlinear Dynamical Systems With Dynamic Stream Weights. 1065-1078 - Gongping Huang, Jacob Benesty, Israel Cohen, Jingdong Chen:
A Simple Theory and New Method of Differential Beamforming With Uniform Linear Microphone Arrays. 1079-1093 - Chung-Ying Ho, Kuo-Kai Shyu, Cheng-Yuan Chang, Sen M. Kuo:
Efficient Narrowband Noise Cancellation System Using Adaptive Line Enhancer. 1094-1103 - Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii:
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement. 1104-1117 - Beat Gfeller, Christian Havnø Frank, Dominik Roblek, Matthew Sharifi, Marco Tagliasacchi, Mihajlo Velimirovic:
SPICE: Self-Supervised Pitch Estimation. 1118-1128 - Christoph Urbanietz, Gerald Enzner:
Direct Spatial-Fourier Regression of HRIRs from Multi-Elevation Continuous-Azimuth Recordings. 1129-1142 - Yaakov Buchris, Israel Cohen, Jacob Benesty, Alon Amar:
Joint Sparse Concentric Array Design for Frequency and Rotationally Invariant Beampattern. 1143-1158 - Tharindu Fernando, Sridha Sridharan, Mitchell McLaren, Darshana Priyasad, Simon Denman, Clinton Fookes:
Temporarily-Aware Context Modeling Using Generative Adversarial Networks for Speech Activity Detection. 1159-1169 - Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao:
Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement. 1170-1182 - Qiaoling Zhang, WeiQiang Xu, Weiwei Zhang, Jie Feng, Zhiyong Chen:
Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. 1183-1197 - Yinhe Zheng, Guanyi Chen, Minlie Huang:
Out-of-Domain Detection for Natural Language Understanding in Dialog Systems. 1198-1209 - Ina Kodrasi, Hervé Bourlard:
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection. 1210-1222 - Bharat Padi, Anand Mohan, Sriram Ganapathy:
Towards Relevance and Sequence Modeling in Language Recognition. 1223-1232 - Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen:
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices. 1233-1247 - Vishnuvardhan Varanasi, Harshit Gupta, Rajesh M. Hegde:
A Deep Learning Framework for Robust DOA Estimation Using Spherical Harmonic Decomposition. 1248-1259 - Sahar Hashemgeloogerdi, Mark F. Bocko:
Adaptive Feedback Cancellation in Hearing Aids Based on Orthonormal Basis Functions With Prediction-Error Method Based Prewhitening. 1260-1269 - Maximo Cobos, Fabio Antonacci, Luca Comanducci, Augusto Sarti:
Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach. 1270-1281 - Yingying Zhu, Haiquan Zhao, Xiangping Zeng, Badong Chen:
Robust Generalized Maximum Correntropy Criterion Algorithms for Active Noise Control. 1282-1292 - Hassan Taherian, Zhong-Qiu Wang, Jorge Chang, DeLiang Wang:
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement. 1293-1302 - Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen, Xuefei Liu:
End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. 1303-1314 - T. Lavanya, T. Nagarajan, P. Vijayalakshmi:
Multi-Level Single-Channel Speech Enhancement Using a Unified Framework for Estimating Magnitude and Phase Spectra. 1315-1327 - Adrien Ycart, Emmanouil Benetos:
Learning and Evaluation Methodologies for Polyphonic Music Sequence Prediction With LSTMs. 1328-1341 - Takatomo Kano, Sakriani Sakti, Satoshi Nakamura:
End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs. 1342-1355 - Huanyu Zuo, Prasanga N. Samarasinghe, Thushara D. Abhayapala:
Intensity Based Spatial Soundfield Reproduction Using an Irregular Loudspeaker Array. 1356-1369 - Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li:
SpEx: Multi-Scale Time Domain Speaker Extraction Network. 1370-1384 - Wangyou Zhang, Xuankai Chang, Yanmin Qian, Shinji Watanabe:
Improving End-to-End Single-Channel Multi-Talker Speech Recognition. 1385-1394 - Alakananda Vempala, Eduardo Blanco:
Extracting Biographical Spatial Timelines: Corpus and Experiments. 1395-1403 - Qiquan Zhang, Aaron Nicolson, Mingjiang Wang, Kuldip K. Paliwal, Chenxu Wang:
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation. 1404-1415 - Dhananjay Ram, Lesly Miculicich, Hervé Bourlard:
Neural Network Based End-to-End Query by Example Spoken Term Detection. 1416-1427 - Enea Ceolini, Ilya Kiselev, Shih-Chii Liu:
Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution. 1428-1439 - Su Zhu, Zijian Zhao, Rao Ma, Kai Yu:
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding. 1440-1451 - Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan:
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture. 1452-1465 - Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian:
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection. 1466-1478 - Dong-Yuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen:
Feedforward Selective Fixed-Filter Active Noise Control: Algorithm and Implementation. 1479-1492 - Zhihao Du, Xueliang Zhang, Jiqing Han:
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement. 1493-1505 - Yue Zhang, Yile Wang, Jie Yang:
Lattice LSTM for Chinese Sentence Representation. 1506-1519 - Zhuo Tang, Boyan Wan, Li Yang:
Word-Character Graph Convolution Network for Chinese Named Entity Recognition. 1520-1532 - Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen:
Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning. 1533-1548 - Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha:
Speech/Music Classification Using Features From Spectral Peaks. 1549-1559 - Liming Wang, Mark Hasegawa-Johnson:
Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts. 1560-1573 - Yang Fan, Fei Tian, Yingce Xia, Tao Qin, Xiang-Yang Li, Tie-Yan Liu:
Searching Better Architectures for Neural Machine Translation. 1574-1585 - Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, Muyun Yang, Hai Zhao:
Towards More Diverse Input Representation for Neural Machine Translation. 1586-1597 - Yan Zhao, DeLiang Wang, Buye Xu, Tao Zhang:
Monaural Speech Dereverberation Using Temporal Convolutional Networks With Self Attention. 1598-1607 - Yanhui Tu, Jun Du, Tian Gao, Chin-Hui Lee:
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement. 1608-1619 - Christine Evers, Heinrich W. Löllmann, Heinrich Mellmann, Alexander Schmidt, Hendrik Barfuss, Patrick A. Naylor, Walter Kellermann:
The LOCATA Challenge: Acoustic Source Localization and Tracking. 1620-1643 - Hiroaki Tsushima, Eita Nakamura, Kazuyoshi Yoshii:
Bayesian Melody Harmonization Based on a Tree-Structured Generative Model of Chord Sequences and Melodies. 1644-1655 - Keunhyoung Luke Kim, Jongpil Lee, Sangeun Kum, Chae Lin Park, Juhan Nam:
Semantic Tagging of Singing Voices in Popular Music Recordings. 1656-1668 - Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, Erhong Yang:
Incorporating Sememes into Chinese Definition Modeling. 1669-1677 - Ryo Nishikimi, Eita Nakamura, Masataka Goto, Katsutoshi Itoyama, Kazuyoshi Yoshii:
Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories. 1678-1691 - Byeongho Jo, Franz Zotter, Jung-Woo Choi:
Extended Vector-Based EB-ESPRIT Method. 1692-1705 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Corrections to "Machine Speech Chain". 1706 - Zaixiang Zheng, Shujian Huang, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen:
Improving Self-Attention Networks With Sequential Relations. 1707-1716 - Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard:
Automatic Pathological Speech Intelligibility Assessment Exploiting Subspace-Based Analyses. 1717-1728 - Cagdas Tuna, Antonio Canclini, Federico Borra, Philipp Götz, Fabio Antonacci, Andreas Walther, Augusto Sarti, Emanuël A. P. Habets:
3D Room Geometry Inference Using a Linear Loudspeaker Array and a Single Microphone. 1729-1744 - Xianjun Xia, Roberto Togneri, Ferdous Sohel, Yuanjun Zhao, Defeng David Huang:
Sound Event Detection Using Multiple Optimized Kernels. 1745-1754 - Federico Borra, Alberto Bernardini, Fabio Antonacci, Augusto Sarti:
Efficient Implementations of First-Order Steerable Differential Microphone Arrays With Arbitrary Planar Geometry. 1755-1766 - Moti Lugasi, Boaz Rafaely:
Speech Enhancement Using Masking for Binaural Reproduction of Ambisonics Signals. 1767-1777 - Zhong-Qiu Wang, Peidong Wang, DeLiang Wang:
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR. 1778-1787 - Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud:
Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders. 1788-1800 - Kai Song, Xiaoqing Zhou, Heng Yu, Zhongqiang Huang, Yue Zhang, Weihua Luo, Xiangyu Duan, Min Zhang:
Towards Better Word Alignment in Transformer. 1801-1812 - Lian Huang, Chi-Man Pun:
Audio Replay Spoof Attack Detection by Joint Segment-Based Linear Filter Bank Feature Extraction and Attention-Enhanced DenseNet-BiLSTM Network. 1813-1825 - Yang Xiang, Changchun Bao:
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network. 1826-1838 - Hao Fei, Donghong Ji, Yue Zhang, Yafeng Ren:
Topic-Enhanced Capsule Network for Multi-Label Emotion Classification. 1839-1848 - Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo:
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion. 1849-1863 - Huayang Li, Guoping Huang, Deng Cai, Lemao Liu:
Neural Machine Translation With Noisy Lexical Constraints. 1864-1874 - Chien-Yao Wang, Tzu-Chiang Tai, Jia-Ching Wang, Andri Santoso, Seksan Mathulaprangsan, Chin-Chin Chiang, Chung-Hsien Wu:
Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks. 1875-1887 - Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang, Hsin-Min Wang, Yu Tsao:
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks. 1888-1900 - Dhananjaya N. Gowda, Sudarsana Reddy Kadiri, Brad H. Story, Paavo Alku:
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals. 1901-1914 - Sebastian J. Schlecht, Emanuël A. P. Habets:
Scattering in Feedback Delay Networks. 1915-1924 - Irene Martín-Morató, Maximo Cobos, Francesc J. Ferri:
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification. 1925-1935 - Su Zhu, Ruisheng Cao, Kai Yu:
Dual Learning for Semi-Supervised Natural Language Understanding. 1936-1947 - Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution. 1948-1963 - Vinayak Abrol, Pulkit Sharma:
Learning Hierarchy Aware Embedding From Raw Audio for Acoustic Scene Classification. 1964-1973 - Daniele Mirabilii, Emanuël A. P. Habets:
Spatial Coherence-Aware Multi-Channel Wind Noise Reduction. 1974-1987 - Yougen Yuan, Lei Xie, Cheung-Chi Leung, Hongjie Chen, Bin Ma:
Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings. 1988-2000 - Daniele Salvati, Carlo Drioli, Gian Luca Foresti:
Diagonal Unloading Beamforming in the Spherical Harmonic Domain for Acoustic Source Localization in Reverberant Environments. 2001-2012 - Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien:
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification. 2013-2024 - Thomas Sgouros, Nikolaos Mitianoudis:
A novel Directional Framework for Source Counting and Source Separation in Instantaneous Underdetermined Audio Mixtures. 2025-2035 - Kenta Niwa, Hironobu Chiba, Noboru Harada, Guoqiang Zhang, W. Bastiaan Kleijn:
Microphone Array Wiener Post Filtering Using Monotone Operator Splitting. 2036-2046 - Hui Luo, Jiqing Han:
Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition. 2047-2060 - Ming-Hsiang Su, Chung-Hsien Wu, Hao-Tse Cheng:
A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization. 2061-2072 - Boqing Zhu, Kele Xu, Qiuqiang Kong, Huaimin Wang, Yuxing Peng:
Audio Tagging by Cross Filtering Noisy Labels. 2073-2083 - Sangeeta Bagha, Debi Prasad Das, Santosh Kumar Behera:
An Efficient Narrowband Active Noise Control System for Accommodating Frequency Mismatch. 2084-2094 - Weiwei Zhang, Zhe Chen, Fuliang Yin:
Multi-Pitch Estimation of Polyphonic Music Based on Pseudo Two-Dimensional Spectrum. 2095-2108 - Yuzhou Liu, DeLiang Wang:
Causal Deep CASA for Monaural Talker-Independent Speaker Separation. 2109-2118 - Huanyu Zuo, Thushara D. Abhayapala, Prasanga N. Samarasinghe:
Particle Velocity Assisted Three Dimensional Sound Field Reproduction Using a Modal-Domain Approach. 2119-2133 - Shun Kiyono, Jun Suzuki, Tomoya Mizumoto, Kentaro Inui:
Massive Exploration of Pseudo Data for Grammatical Error Correction. 2134-2145 - Bin Wang, C.-C. Jay Kuo:
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models. 2146-2157 - Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert:
Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise. 2158-2173 - Qi Liu, Zhehuai Chen, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu:
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model. 2174-2183 - Hanan Beit-On, Boaz Rafaely:
Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization. 2184-2193 - Christoph Pörschmann, Johannes M. Arend, Fabian Brinkmann:
Correction to "Directional Equalization of Sparse Head-Related Transfer Function Sets for Spatial Upsampling". 2194 - Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds:
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals. 2195-2210 - Sheng-Hua Zhong, Peiqi Liu, Zhong Ming, Yan Liu:
How to Evaluate Single-Round Dialogues Like Humans: An Information-Oriented Metric. 2211-2223 - Wilmer Lobato, Márcio Holsbach Costa:
Worst-Case-Optimization Robust-MVDR Beamformer for Stereo Noise Reduction in Hearing Aids. 2224-2237 - Luca Comanducci, Federico Borra, Paolo Bestagini, Fabio Antonacci, Stefano Tubaro, Augusto Sarti:
Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform. 2238-2251 - Feiran Yang, Jianfeng Guo, Jun Yang:
Stochastic Analysis of the Filtered-x LMS Algorithm for Active Noise Control. 2252-2266 - Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach:
Jointly Optimal Denoising, Dereverberation, and Source Separation. 2267-2282 - Haytham M. Fayek, Justin Johnson:
Temporal Reasoning via Audio Question Answering. 2283-2294 - Amulya Gupta, Zhu (Drew) Zhang:
Swings and Roundabouts: Attention-Structure Interaction Effect in Deep Semantic Matching. 2295-2307 - Chang Huai You, Jichen Yang:
Device Feature Extraction Based on Parallel Neural Network Training for Replay Spoofing Detection. 2308-2318 - Santosh Kesiraju, Oldrich Plchot, Lukás Burget, Suryakanth V. Gangashetty:
Learning Document Embeddings Along With Their Uncertainties. 2319-2332 - Mirco Pezzoli, Federico Borra, Fabio Antonacci, Stefano Tubaro, Augusto Sarti:
A Parametric Approach to Virtual Miking for Sources of Arbitrary Directivity. 2333-2348 - Shengbei Wang, Weitao Yuan, Masashi Unoki:
Multi-Subspace Echo Hiding Based on Time-Frequency Similarities of Audio Signals. 2349-2363 - Yujia Qin, Fanchao Qi, Sicong Ouyang, Zhiyuan Liu, Cheng Yang, Yasheng Wang, Qun Liu, Maosong Sun:
Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes. 2364-2373 - Yuchen Dong, Jie Chen, Wen Zhang:
Distributed Wave-Domain Active Noise Control Based on the Diffusion Adaptation. 2374-2385 - Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux:
Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision. 2386-2399 - Zhi Chen, Lu Chen, Xiaoyuan Liu, Kai Yu:
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. 2400-2411 - Taewoong Lee, Jesper Kjær Nielsen, Mads Græsbøll Christensen:
Signal-Adaptive and Perceptually Optimized Sound Zones With Variable Span Trade-Off Filters. 2412-2426 - Hao Fei, Meishan Zhang, Fei Li, Donghong Ji:
Cross-Lingual Semantic Role Labeling With Model Transfer. 2427-2437 - Kai Yu, Rao Ma, Kaiyu Shi, Qi Liu:
Neural Network Language Model Compression With Product Quantization and Soft Binarization. 2438-2449 - Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley:
Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization. 2450-2460 - Adrian Herzog, Emanuël A. P. Habets:
Direction and Reverberation Preserving Noise Reduction of Ambisonics Signals. 2461-2475 - Yu Wang, Yun Li, Ziye Zhu, Hanghang Tong, Yue Huang:
Adversarial Learning for Multi-Task Sequence Labeling With Attention Mechanism. 2476-2488 - Ashutosh Pandey, DeLiang Wang:
On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement. 2489-2499 - Mantong Zhou, Minlie Huang, Xiaoyan Zhu:
Robust Reading Comprehension With Linguistic Constraints via Posterior Regularization. 2500-2510 - Michael Saxon, Ayush Tripathi, Yishan Jiao, Julie M. Liss, Visar Berisha:
Robust Estimation of Hypernasality in Dysarthria With Acoustic Model Likelihood Features. 2511-2522 - Lin Wang, Andrea Cavallaro:
A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones. 2523-2537 - Bowen Zhang, Xutao Li, Xiaofei Xu, Ka-Cheong Leung, Zhiyao Chen, Yunming Ye:
Knowledge Guided Capsule Attention Network for Aspect-Based Sentiment Analysis. 2538-2551 - Qi Qi, Xiaolu Wang, Haifeng Sun, Jingyu Wang, Xiao Liang, Jianxin Liao:
A Novel Multi-Task Learning Framework for Semi-Supervised Semantic Parsing. 2552-2560 - Haisong Ding, Kai Chen, Qiang Huo:
Improving Knowledge Distillation of CTC-Trained Acoustic Models With Alignment-Consistent Ensemble and Target Delay. 2561-2571 - Ayana, Yun Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun:
Reinforced Zero-Shot Cross-Lingual Neural Headline Generation. 2572-2584 - Mingming Yang, Rui Wang, Kehai Chen, Xing Wang, Tiejun Zhao, Min Zhang:
A Novel Sentence-Level Agreement Architecture for Neural Machine Translation. 2585-2597 - Shuai Wang, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu:
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition. 2598-2609 - Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara:
Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation. 2610-2625 - Thi Ngoc Tho Nguyen, Woon-Seng Gan, Rishabh Ranjan, Douglas L. Jones:
Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network. 2626-2637 - Ondrej Cífka, Umut Simsekli, Gaël Richard:
Groove2Groove: One-Shot Music Style Transfer With Supervision From Synthetic Data. 2638-2650 - Judy Najnudel, Thomas Hélie, David Roze, Henri Boutin:
Simulation of an Ondes Martenot Circuit. 2651-2660 - R. Jyothi, Prabhu Babu:
SOLVIT: A Reference-Free Source Localization Technique Using Majorization Minimization. 2661-2673 - Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai:
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification. 2674-2683 - Vicent Molés-Cases, Gema Piñero, Maria de Diego, Alberto González:
Personal Sound Zones by Subband Filtering and Time Domain Optimization. 2684-2696 - Srinivas Parthasarathy, Carlos Busso:
Semi-Supervised Speech Emotion Recognition With Ladder Networks. 2697-2709 - Artuur Leeuwenberg, Marie-Francine Moens:
Towards Extracting Absolute Event Timelines From English Clinical Reports. 2710-2719 - Lin Sun, Yuxuan Sun, Fule Ji, Chi Wang:
Joint Learning of Token Context and Span Feature for Span-Based Nested NER. 2720-2730 - Jamal Amini, Richard Christian Hendriks, Richard Heusdens, Meng Guo, Jesper Jensen:
Spatially Correct Rate-Constrained Noise Reduction for Binaural Hearing Aids in Wireless Acoustic Sensor Networks. 2731-2742 - Zuchao Li, Chaoyu Guan, Hai Zhao, Rui Wang, Kevin Parnow, Zhuosheng Zhang:
Memory Network for Linguistic Structure Parsing. 2743-2755 - Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao:
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders. 2756-2769 - Xin Liu, Qingcai Chen, Xiangping Wu, Yang Hua, Jing Chen, Dongfang Li, Buzhou Tang, Xiaolong Wang:
Gated Semantic Difference Based Sentence Semantic Equivalence Identification. 2770-2780 - Gilles Boulianne:
A Study of Inductive Biases for Unsupervised Speech Representation Learning. 2781-2795 - Yu-Te Wu, Berlin Chen, Li Su:
Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation. 2796-2809 - Weiwei Lin, Man-Wai Mak, Na Li, Dan Su, Dong Yu:
A Framework for Adapting DNN Speaker Embedding Across Languages. 2810-2822 - Purvi Agrawal, Sriram Ganapathy:
Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting. 2823-2836 - Xingwei Sun, Ze-Feng Gao, Zhong-Yi Lu, Junfeng Li, Yonghong Yan:
A Model Compression Method With Matrix Product Operators for Speech Enhancement. 2837-2847 - Koby Weisberg, Bracha Laufer-Goldshtein, Sharon Gannot:
Simultaneous Tracking and Separation of Multiple Sources Using Factor Graph Model. 2848-2864 - Chao Pan, Jingdong Chen, Guangming Shi:
On Estimation of Time-Varying Variances of Source and Noise for Sensor Array Processing. 2865-2879 - Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley:
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. 2880-2894 - Shuyang Zhao, Toni Heittola, Tuomas Virtanen:
Active Learning for Sound Event Detection. 2895-2905 - Ondrej Mokrý, Pavel Rajmic:
Audio Inpainting: Revisited and Reweighted. 2906-2918 - Christof Weiß, Hendrik Schreiber, Meinard Müller:
Local Key Estimation in Music Recordings: A Case Study Across Songs, Versions, and Annotators. 2919-2932 - Leilei Gan, Yue Zhang:
Investigating Self-Attention Network for Chinese Word Segmentation. 2933-2941 - Nico Gößling, Elior Hadad, Sharon Gannot, Simon Doclo:
Binaural LCMV Beamforming With Partial Noise Estimation. 2942-2955 - Yiming Wu, Tristan Carsault, Eita Nakamura, Kazuyoshi Yoshii:
Semi-Supervised Neural Chord Estimation Based on a Variational Autoencoder With Latent Chord Labels and Features. 2956-2966 - Hieu-Thi Luong, Junichi Yamagishi:
NAUTILUS: A Versatile Voice Cloning System. 2967-2981 - Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo:
Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks. 2982-2995 - Pierre Lecomte, Manuel Melon, Laurent Simon:
Spherical Fraction Beamforming. 2996-3009 - Constantinos Papayiannis, Christine Evers, Patrick A. Naylor:
End-to-End Classification of Reverberant Rooms Using DNNs. 3010-3017 - Bhusan Chettri, Emmanouil Benetos, Bob L. T. Sturm:
Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 Benchmark. 3018-3028 - Alexios Gidiotis, Grigorios Tsoumakas:
A Divide-and-Conquer Approach to the Summarization of Long Documents. 3029-3040 - Huiyuan Sun, Thushara D. Abhayapala, Prasanga N. Samarasinghe:
A Realistic Multiple Circular Array System for Active Noise Control Over 3D Space. 3041-3052 - Sasan Asadiabadi, Engin Erzin:
Vocal Tract Contour Tracking in rtMRI Using Deep Temporal Regression Network. 3053-3064 - Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang:
Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition. 3065-3079 - Juan M. Martín-Doñas, Jesper Jensen, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado:
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation. 3080-3094 - Rui Wang, Zhe Chen, Fuliang Yin:
Active Sampling Rate Calibration Method for Acoustic Sensor Networks. 3095-3107 - Yonggang Hu, Prasanga N. Samarasinghe, Sharon Gannot, Thushara D. Abhayapala:
Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments. 3108-3123 - Ashwin Bellur, Mounya Elhilali:
Audio Object Classification Using Distributed Beliefs and Attention. 729-739
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.