Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Timo Gerkmann

    To improve the quality of single-channel speech enhancement algorithms, various approaches include additional prior knowledge about speech, e.g., in the form of pre-trained speech models. In this paper, we consider a vector Taylor series... more
    To improve the quality of single-channel speech enhancement algorithms, various approaches include additional prior knowledge about speech, e.g., in the form of pre-trained speech models. In this paper, we consider a vector Taylor series based approach with a low-rank speech model. While employing a low-rank speech model keeps the complexity feasible, only speech spectral envelopes are represented and noise reduction between spectral harmonics is not possible. To counteract this issue, we propose a combination of generic, single-channel enhancement methods and the pre-trained vector Taylor series approach. Compared to a competing harmonic post-filter approach, the proposed combination is derived within a statistical framework and yields a better quality for the enhanced signal. This is verified using instrumental quality measures.
    Due to the low computational complexity and the low memory consumption, first-order recursive smoothing is a technique often applied to estimate the mean of a random process. For instance, recursive smoothing is used in noise power... more
    Due to the low computational complexity and the low memory consumption, first-order recursive smoothing is a technique often applied to estimate the mean of a random process. For instance, recursive smoothing is used in noise power estimators where adaptively changing smoothing factors are used instead of fixed ones to prevent the speech power from leaking into the noise estimate. However, in general, the usage of adaptive smoothing factors leads to a biased estimate of the mean. In this paper, we propose a novel method to correct the bias evoked by adaptive smoothing factors. We compare this method to a recently proposed compensation method in terms of the log-error distortion using real world signals for two noise power estimators. We show that both corrections reduce the distortion measure in noisy speech while the novel method has the advantage that no iteration is required for determining the correction factor.
    Magnetic particle imaging (MPI) is a tracer-based imaging technique that can be used for imaging vessels and organ perfusion with high temporal resolution. Background signals are a major source for image artifacts and in turn restrict the... more
    Magnetic particle imaging (MPI) is a tracer-based imaging technique that can be used for imaging vessels and organ perfusion with high temporal resolution. Background signals are a major source for image artifacts and in turn restrict the sensitivity of the method in practice. While static background signals can be removed from the measured signal by taking a dedicated background scan and performing subtraction, this simple procedure is not applicable in case of non-stationary background signals that occur in practice due to e.g. temperature drifts in the electromagnetic coils of the MPI scanner. Within this work we will investigate a dynamic background subtraction method that is based on two background measurements taken before and after the object measurement. Using first-order interpolation it is possible to remove linear background changes and in turn significantly suppress artifacts. The method is evaluated using static and dynamic phantom measurements and it is shown that dynamic background subtraction is capable of reducing the artifact level approximately by a factor of four.
    Enhancing noisy speech is an important task to restore its quality and to improve its intelligibility. In traditional non-machine-learning (ML) based approaches the parameters required for noise reduction are estimated blindly from the... more
    Enhancing noisy speech is an important task to restore its quality and to improve its intelligibility. In traditional non-machine-learning (ML) based approaches the parameters required for noise reduction are estimated blindly from the noisy observation while the actual filter functions are derived analytically based on statistical assumptions. Even though such approaches generalize well to many different acoustic conditions, the noise suppression capability in transient noises is low. To amend this shortcoming, machine-learning (ML) methods such as deep learning have been employed for speech enhancement. However, due to their data-driven nature, the generalization of ML based approaches to unknown noise types is still discussed. To improve the generalization of ML based algorithms and to enhance the noise suppression of non-ML based methods, we propose a combination of both approaches. For this, we employ the a priori signal-to-noise ratio (SNR) and the a posteriori SNR estimated as input features in a deep neural network (DNN) based enhancement scheme. We show that this approach allows ML based speech estimators to generalize quickly to unknown noise types even if only few noise conditions have been seen during training. Further, the proposed features outperform a competing approach where an estimate of the noise power spectral density is appended to the noisy spectra. Instrumental measures such as Perceptual Evaluation of Speech Quality (PESQ) and short-time objective intelligibility (STOI) indicate strong improvements in unseen conditions when the proposed features are used. Listening experiments confirm the improved generalization of our proposed combination.
    Due to the low computational complexity and the low memory consumption, first-order recursive smoothing is a technique often applied to estimate the mean of a random process. For instance, recursive smoothing is used in noise power... more
    Due to the low computational complexity and the low memory consumption, first-order recursive smoothing is a technique often applied to estimate the mean of a random process. For instance, recursive smoothing is used in noise power estimators where adaptively changing smoothing factors are used instead of fixed ones to prevent the speech power from leaking into the noise estimate. However, in general, the usage of adaptive smoothing factors leads to a biased estimate of the mean. In this paper, we propose a novel method to correct the bias evoked by adaptive smoothing factors. We compare this method to a recently proposed compensation method in terms of the log-error distortion using real world signals for two noise power estimators. We show that both corrections reduce the distortion measure in noisy speech while the novel method has the advantage that no iteration is required for determining the correction factor.
    For single-microphone noise reduction, a minimum variance distortionless response (MVDR) filter has been proposed recently. This filter takes the speech correlations of consecutive time frames into account and achieves impressive results... more
    For single-microphone noise reduction, a minimum variance distortionless response (MVDR) filter has been proposed recently. This filter takes the speech correlations of consecutive time frames into account and achieves impressive results in terms of speech distortions even in a blind implementation where we only have access to the noisy speech signal. However, compared to conventional approaches less noise reduction is achieved. Therefore, we propose to combine the single-microphone MVDR with a Wiener post-filter as the minimum-mean-square error optimal solution when multiple time frames are considered. We propose to pre-train the required interframe coherence matrices of the interferences for a large database, while speech correlations and interference power spectral densities are estimated online. In an experimental study based on instrumental measures, the proposed approach achieves a good trade-off between a single-channel Wiener filter and a multi-frame MVDR.
    Most Bayesian clean speech estimators, like the Wiener filter or Ephraim and Malah's amplitude estimators, are derived under the assumption that the true power spectral density (PSD) of speech is known. In practice, however, only... more
    Most Bayesian clean speech estimators, like the Wiener filter or Ephraim and Malah's amplitude estimators, are derived under the assumption that the true power spectral density (PSD) of speech is known. In practice, however, only estimates are available. When the PSD estimation errors are neglected, they propagate through to the final speech estimate, resulting in undesired artifacts such as musical noise and speech distortions. To increase the robustness to PSD estimation errors, recently a linear estimator has been proposed that explicitly takes into account the uncertainty of the available speech PSD estimate. In this paper, we show that in the derivation of this estimator a limiting statistical assumption is made, and that avoiding this assumption leads to a novel, potentially more powerful nonlinear estimator under PSD uncertainty. In combination with a sophisticated speech PSD estimator, the proposed approach achieves a higher predicted speech quality than the linear alter...
    In conventional speech enhancement, statistical models for speech and noise are used to derive clean speech estimators. The parameters of the models are estimated blindly from the noisy observation using carefully designed algorithms.... more
    In conventional speech enhancement, statistical models for speech and noise are used to derive clean speech estimators. The parameters of the models are estimated blindly from the noisy observation using carefully designed algorithms. These algorithms generalize well to unseen acoustic conditions, but are unable to reduce highly non-stationary noise types. This shortcoming motivated the usage of machine-learning-based (ML-based) algorithms, in particular deep neural networks (DNNs). But if only limited training data are available, the noise reduction performance in unseen acoustic conditions suffers. In this paper, motivated by conventional speech enhancement, we propose to use the a priori and a posteriori signal-to-noise ratios (SNRs) for DNN-based speech enhancement systems. Instrumental measures show that the proposed features increase the robustness in unknown noise types even if only limited training data are available.
    The proposed algorithm adopts LP residual as one of the sparse representation of speech, considering it is feasible and advantageous, as analysed in the previous section. To make full use of the sparsity of speech, DCT coefficients are... more
    The proposed algorithm adopts LP residual as one of the sparse representation of speech, considering it is feasible and advantageous, as analysed in the previous section. To make full use of the sparsity of speech, DCT coefficients are also included to contribute as a measurement. The proposed algorithm aims to recover the clean speech, whose LP residual and DCT coefficients are both sparse, via solving an optimization problem under a series of constraints.

    And 146 more