Search | arXiv e-print repository

arXiv:2211.09166 [pdf, other]

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through advers… ▽ More This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through adversarial training. In the first stage, we disentangle different latent variables because disentangled representations can help DNN generate a better enhanced speech. Specifically, we use the $β$-variational autoencoder (VAE) algorithm to obtain the speech and noise posterior estimations and related representations from the observed signal. However, since the posteriors and representations are intractable and we can only apply a conditional assumption to estimate them, it is difficult to ensure that these estimations are always pretty accurate, which may potentially degrade the final accuracy of the signal estimation. To further improve the quality of enhanced speech, in the second stage, we introduce adversarial training to reduce the effect of the inaccurate posterior towards signal reconstruction and improve the signal estimation accuracy, making our algorithm more robust for the potentially inaccurate posterior estimations. As a result, better SE performance can be achieved. The experimental results indicate that the proposed strategy can help similar DNN-based SE algorithms achieve higher short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and scale-invariant signal-to-distortion ratio (SI-SDR) scores. Moreover, the proposed algorithm can also outperform recent competitive SE algorithms. △ Less

Submitted 27 September, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2205.05581 [pdf, other]

A deep representation learning speech enhancement method using $β$-VAE

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of repr… ▽ More In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of representation learning. More specifically, our $β$-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous $β$-VAE algorithms. Unlike the previous $β$-VAE algorithms, the proposed $β$-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to Eurosipco

arXiv:2201.09875 [pdf, other]

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these… ▽ More Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method. △ Less

Submitted 24 January, 2022; originally announced January 2022.

Comments: Accepted by ICASSP 2022

arXiv:2006.16689 [pdf, other]

A Speech Enhancement Algorithm based on Non-negative Hidden Markov Model and Kullback-Leibler Divergence

Authors: Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: In this paper, we propose a novel supervised single-channel speech enhancement method combing the the Kullback-Leibler divergence-based non-negative matrix factorization (NMF) and hidden Markov model (NMF-HMM). With the application of HMM, the temporal dynamics information of speech signals can be taken into account. In the training stage, the sum of Poisson, leading to the KL divergence measure,… ▽ More In this paper, we propose a novel supervised single-channel speech enhancement method combing the the Kullback-Leibler divergence-based non-negative matrix factorization (NMF) and hidden Markov model (NMF-HMM). With the application of HMM, the temporal dynamics information of speech signals can be taken into account. In the training stage, the sum of Poisson, leading to the KL divergence measure, is used as the observation model for each state of HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of the proposed model. In the online enhancement stage, we propose a novel minimum mean-square error (MMSE) estimator for the proposed NMF-HMM. This estimator can be implemented using parallel computing, saving the time complexity. The performance of the proposed algorithm is verified by objective measures. The experimental results show that the proposed strategy achieves better speech enhancement performance than state-of-the-art speech enhancement methods. More specifically, compared with the traditional NMF-based speech enhancement methods, our proposed algorithm achieves a 5\% improvement for short-time objective intelligibility (STOI) and 0.18 improvement for perceptual evaluation of speech quality (PESQ). △ Less

Submitted 30 June, 2020; originally announced June 2020.

arXiv:1807.07376 [pdf, other]

doi 10.1038/s41467-018-07995-0

Long-lasting field-free alignment of large molecules inside helium nanodroplets

Authors: Adam S. Chatterley, Constant Schouder, Lars Christiansen, Benjamin Shepperson, Mette H. Rasmussen, Henrik Stapelfeldt

Abstract: Molecules with their axes sharply confined in space, available through laser-induced alignment methods, are essential for many current experiments, including ultrafast molecular imaging. Most of these applications require both that the aligning laser field is turned-off, to avoid undesired perturbations, and that the molecules remain aligned sufficiently long that reactions and dynamics can be map… ▽ More Molecules with their axes sharply confined in space, available through laser-induced alignment methods, are essential for many current experiments, including ultrafast molecular imaging. Most of these applications require both that the aligning laser field is turned-off, to avoid undesired perturbations, and that the molecules remain aligned sufficiently long that reactions and dynamics can be mapped out. Presently, this is only possible for small, linear molecules and for times less than 1 picosecond. Here, we demonstrate strong, field-free alignment of large molecules inside helium nanodroplets, lasting tens of picoseconds. Molecular alignment in either one or three dimensions is created by a slowly switched-on laser pulse, made field-free through rapid pulse truncation, and retained thanks to the impeding effect of the helium environment on molecular rotation. We illustrate the opportunities that field-free aligned molecules open by measuring the alignment-dependent strong-field ionization yield of a thiophene oligomer. Our technique will enable molecular-frame experiments, including ultrafast excited state dynamics, on a variety of large molecules and complexes. △ Less

Submitted 19 July, 2018; originally announced July 2018.

Showing 1–5 of 5 results for author: Rasmussen, M H