0% found this document useful (0 votes)

26 views

CNN Basic

This paper describes the transformation in the enhanced speech signal by applying the deep convolutional neural network (Deep CNN), which can model nonlinear relationships and compare it with the Wiener filtering method, which is the best technique for speech enhancement among the traditional methods. Denoising is performed in the frequency domain and converted back to the time domain to analyze performance metrics such as speech quality and speech intelligibility.

Uploaded by

pravin2275767

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

CNN Basic

Uploaded by

pravin2275767

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Soft Computing

https://doi.org/10.1007/s00500-021-06291-2 (0123456789().,-volV)(0123456789().
,- volV)

FOCUS

Enhancement of single channel speech quality and intelligibility

in multiple noise conditions using wiener filter and deep CNN
D. Hepsiba1,2 • Judith Justin1

Accepted: 15 September 2021

The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021

Abstract
Nowadays, deep neural network has become the prime approach for enhancing speech signals as it yields good results
compared to the traditional methods. This paper describes the transformation in the enhanced speech signal by applying the
deep convolutional neural network (Deep CNN), which can model nonlinear relationships and compare it with the Wiener
filtering method, which is the best technique for speech enhancement among the traditional methods. Denoising is
performed in the frequency domain and converted back to the time domain to analyze performance metrics such as speech
quality and speech intelligibility. The speech quality is analyzed based on the signal to noise ratio (SNR) and perceptual
evaluation of speech quality (PESQ). Speech intelligibility is analyzed by short-time objective intelligibility (STOI). Both
the methods evaluated the denoised speech, and the analysis made on the results shows that the SNR of the conventional
Wiener filtering method is much improved when compared with Deep CNN. However, the PESQ and STOI of Deep CNN-
based enhanced speech outperform the Wiener filtering method. The performance metrics indicate that Deep CNN achieves
better results than the conventional technique.

Keywords Deep convolutional neural network Noisy speech Speech enhancement Speech quality Intelligibility

1 Introduction extraction of the clear speech signal from the distorted

noisy speech which are prone to background noise and
Communication through speech is one of the vibrant reverberations.
methodologies to express one person’s internal thoughts to Speech signal enhancement is a tedious process com-
another and from the human to machine and vice versa. pared to other signals because of its characteristic that
The original quality of the speech signal becomes distorted changes intensely with time. The algorithms used for this
as it is delivered into the outside world. Therefore, the process need to give a spontaneous action for different
speech signal mixed with noise needs to be enhanced. practical applications. The most common speech process-
Consequently, speech enrichment needs to enhance the ing techniques for denoising that are used for enhancing the
quality and legibility (Wang et al. 2021) of noisy speech speech signal are minimum mean square error method
signals. The need of the hour in our day-to-day life is the (Schwerin and Paliwal 2014) that is performed by short-
time spectral magnitude estimation between the clean
speech signal and enhanced speech signal, spectral sub-
Communicated by Joy Iong-Zong Chen. traction method (Paliwal et al. 2010) that deals with the
clean speech spectrum estimation by subtraction of noise
& D. Hepsiba spectrum from noisy speech spectrum. Various filtering
hepsiba@karunya.edu
techniques like Wiener filter (Grais and Erdogan 2013) that
Judith Justin acts as a linear estimator for reducing the mean squared
hod_bmie@avinuty.ac.in
error (MSE) between the clean speech and enhanced
1
Department of Biomedical Instrumentation Engineering, speech signal, and Kalman filter (Dionelis and Brookes
Avinashilingam Institute for Home Science and Higher 2018) estimates the model from observing a set of the noisy
Education for Women, Coimbatore, Tamil Nadu, India speech signal. These statistical-based (Hu and Loizou
2
Department of Biomedical Engineering, Karunya Institute of 2008; Loizou 2013) unsupervised models are imperfect in
Technology and Sciences, Coimbatore, Tamil Nadu, India

123
D. Hepsiba, J. Justin

predicting the variations because of dynamic nature of clean speech and the SNR to build the Wiener filter in the
noisy speech signals. The statistical assumptions need to be frequency domain (Xia and Bao 2014). Modeling of the
made in the unsupervised models and that does not time and frequency correlation dimensions by applying the
improve the performance of the denoised speech. The improved minima controlled recursive averaging (IMCRA)
supervised models are data driven and it eliminates the and also incorporating the long short-term memory
statistical assumptions that are made on the clean and noisy (LSTM) of recurrent neural network (RNN) architecture
speech signals. and CNN exhibits good results in terms of the performance
Nowadays, the enhancement techniques incorporate the metrics (Yuan 2020). Cycle consistent training (Meng et al.
taxonomy of artificial intelligence by which the machine 2018) for enhancement optimizes clean to noisy and noisy
learning (Srinivasan et al. 2006) and deep learning tech- to clean speech mapping simultaneously.
nique (Kolbk et al. 2017; Wang and Chen 2018; Chai et al. The different DNN-based speech enhancement
2019) is widely applied to improve the clarity of speech methodologies adopted vary based on neural network
(intelligibility) and it increases the listening capability architecture, training the target and selection of training
(based on quality) so that it is perceived. It is imperative as features. Nowadays, the deep learning models that are
the listeners are interested and focused on listening to the becoming popular in the field of speech enhancement are
speech signal with excellent quality and intelligibility. the CNN (Zheng et al. 2020; Li et al. 2020), LSTM (Li
Denoising is a fundamental strategy that is implemented et al. 2019), and RNN (Xian et al. 2021), which incorporate
for applications that deal with speech signals such as the transformation function to convert the spectral features
telecommunication (Rix et al. 2001), speaker recognition in of the noisy speech signal and clean speech signal. As CNN
biometrics (Jain et al. 2004), hearing aids (Healy et al. is widely used for image processing and recognition, it
2017), hands-free communication (Thiergart and Taseska would be a good solution for the problems faced with the
2014) and many more. degradation of speech signals due to background noise. The
The drawbacks of the unsupervised techniques could be SNR-aware (Fu et al. 2016) CNN for the enhancement
overcome by applying the deep neural network that deals process shows that the CNN suits well for extracting the
with training the network with massive data in multiple time–frequency features and moves forward in achieving
noise conditions. The data-driven approach (Zhao et al. the goal. Loss functions based (Fu et al. 2018; Li et al.
2018) of the deep neural network makes it more efficient 2020) on the performance metric STOI are used for mod-
and is responsive to untrained conditions and unseen eling the utterance as a whole.
noises. In the recent past, the commonly used techniques CNN implemented to perform end-to-end speech
for supervised speech enhancement (Nossier et al. 2021) enhancement (Du et al. 2017) task can estimate the phase
technique include the mapping in the frequency domain or of clean speech that improves the quality and intelligibility
time–frequency masking. The speech signal is converted of speech. Some of the speech enhancement methods
from the frequency domain to the time domain. These perform direct enhancement on the raw speech waveforms
methodologies enable the reconstruction of the speech by mapping (Fu et al. 2017; Pandey and Wang 2019) and
signal from frequency domain to time domain with the are referred to as the waveform-based approaches. The
phase of the noisy signal (Li et al. 2019). fully convolutional neural network (Park and Lee 2017) is
The order of the content of this research paper is as one among them that allows direct mapping and feature
follows: the recent work carried out in speech enhancement selection from the convolutional encoder-decoder model
is discussed in the 2nd Section. A clear explanation of the (Lan et al. 2020). Obtaining the mean absolute error loss
proposed Deep CNN system and a comparison with the for the training of CNN is done by taking the magnitude of
Wiener filter is given in the 3rd Section. Section 4 dis- the enhanced STFT and clean STFT (Pandey and Wang
cusses the dataset used, features extracted, algorithm and 2019). In some cases, a combination of the CNN and RNN
its description. The description of the results obtained and model (Hsieh et al. 2020) works out to be more suitable to
the conclusion are mentioned in the 5th and 6th Section, capture the local and sequential correlations (Wang et al.
respectively. 2021). Another approach uses sequence to sequence model
(Kameoka et al. 2020) using LSTM RNN to model the
encoder by encoding the input sequence and decoder to
2 Related works decode the output sequence for voice conversion.
The mapping function created based on the noisy and
Similar works carried out in the speech enhancement area clean speech signal by the nonlinear-based regression
helps in removing the background noise that affects the model (Xu et al. 2013) shows that the ability to handle the
speech signal are the weighted noise encoder for enhancing unseen noise is diminished. In the ILMSAF-based speech
the speech signal by considering the power spectrum of enhancement, the performance of the network is reduced

123
Enhancement of single channel speech quality and intelligibility in multiple noise conditions using…

for the volvo noise (Li et al. 2016; Sungheetha and Rajesh 3.2 Wiener filtering
2021; Kumar 2021). As the task is to enhance the speech
signal by removing the noise, the CNN is applied for the The presence of noise is unavoidable in real-world sce-
speech enhancement as it was observed that it gives narios of speech processing. The most fundamental
improved results compared to multi-layer perceptron methodology in noise reduction of a speech signal is the
(Grais and Plumbley 2017). optimal Wiener filter. The Wiener filter acts as a linear
The CNN is robust and suits well for speech enhance- filter that could be utilized to separate the clean speech
ment. Therefore, in the proposed work, the Deep CNN is signal from the noisy speech signal by reducing the MSE
designed to give outperforming results. Deep CNN takes between the estimated signal and the original signal. As the
the noisy speech signal as the input and converts it into the Wiener filter can achieve noise reduction, it also has the
frequency domain to train the network. It is because the disadvantage of losing the speech signal’s integrity.
noise and the clean speech signal can be discriminated only Therefore, the speech misrepresentation should be man-
in the frequency domain. The training is performed until aged in such a way by adequately manipulating the Wiener
the mean squared error is minimum between the clean filter or to have explicit knowledge of the speech signal. In
speech signal and the denoised or enhanced speech signal. any speech communication system, the speech signal could
be distorted by background noise and reverberations.
Therefore, noise reduction methodologies and speech
3 Speech enhancement system enhancing techniques are needed to obtain the desired
speech signal from the corrupted ones.
In today’s scenario, the best of all techniques are the Deep C ðx Þ C ðxÞ
algorithms, as they can handle a lot of data and design a Rð x Þ ¼ ¼ ð2Þ
SðxÞ C ðxÞ þ BðxÞ
model by themselves. In this work, the Deep CNN is
designed to perform speech enhancement and a compara- where C ðxÞ—Signal Spectrum, BðxÞ—Noise Power
tive study is done by analyzing its performance with the Spectrum, SðxÞ—Noisy Speech Spectrum
best conventional technique, i.e., the Wiener filter as shown
C ðxÞ SðxÞ BðxÞ
in Fig. 1. Therefore, the best conventional Wiener filter and RWiener ðxÞ ¼ ¼ ð3Þ
SðxÞ SðxÞ
Deep CNN are taken for comparison. The comparison
results show that each technique is best in its way. Es —Estimation of enhanced signal
c
Es ðx; kÞ ¼ RWiener ðxÞSðx; kÞ ð4Þ
3.1 Model of speech signal qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

d^½n ¼ IFFT c
Es ðx; kÞ ð5Þ
The noisy speech signal is acquired from adding the clean
speech signal with the different types of noise as given in
By combining the magnitude of the clear speech spectral
Eq. 1. The task is to retrieve the clean speech signal from
data with the phase of the noisy speech, the estimate of the
the noisy speech signal by eliminating the noise.
enhanced speech is obtained. It is given as,
cðnÞ: Clean Speech Signal
d^½n ¼ d^½n\s½n ð6Þ
bðnÞ: Noise Signal
sðnÞ: Noisy Speech Signal ^
d—Estimate of Enhanced Speech.
sðnÞ ¼ cðnÞ þ bðnÞ ð1Þ
3.3 Speech denoising and enhancement using
deep convolutional neural network

Deep learning adopts the learning methodologies to create

a model based on the data given to it. Neural network is the
basic building block of deep learning. Speech enhancement
is much required as the speech signal gets easily corrupted
due to multiple noise conditions and noise levels. The
noises can be stationary or nonstationary with varying
acoustic characteristics. As the DNN can possess the model
of highly nonlinear parameters, it makes the speech
enhancement process simpler. The DNN architecture
adopts the multi-layer feedforward network. The Deep
Fig. 1 Speech enhancement system for performance analysis

123
D. Hepsiba, J. Justin

CNN is designed with multiple hidden layers with rectified signal is generated for feeding the Deep CNN. The noisy
linear unit (ReLU) activation function for speech data set is created by mixing the clean speech with the
enhancement. The input applied to the Deep CNN system different noise types such as washing machine noise,
is the frames of the noisy speech signal, and the expected rainbow noise, jet airplane noise and train whistle noise
output is the denoised speech signal. with different noise levels such as 0 dB, 5 dB, 10 dB and
The clean and noisy speech signal is converted to the 15 dB.
frequency domain using STFT. The magnitude spectrum of The dataset contains 400 utterances and it is split into
the clean speech signal is taken as the target. The noisy 3:1 for training and testing. Deep CNN is trained with 300
speech signal is taken as the predictor and presented to the sentences and tested with 100 sentences. The training set is
Deep CNN for denoising the speech as shown in Fig. 2. created by mixing the noise with the clean speech signal at
The regression network uses the magnitude of the noisy different noise levels. From the testing set, the noisy speech
speech signal to reduce the mean square error between the signal is randomly chosen to check the denoising ability of
denoised speech signal and the clean speech signal. The the network.
output from the Deep CNN gives the denoised signal in the
frequency domain. The denoised speech signal is converted 4.2 Feature extraction
to the time domain using the output magnitude spectrum
from the Deep CNN network and the phase of the noisy The first step is to convert the speech signal from the time
speech signal. domain to the frequency domain using STFT to extract
features. The magnitude STFT vectors of the clean speech
and the noisy speech are input features to the Deep CNN
4 Algorithm description Model. Therefore, the speech signal is divided into a 10 ms
frame with no frameshift. In converting from the time
domain to frequency domain using STFT, the hamming

4.1 Dataset window is utilized with a window length of 256 samples

and 75% overlap. For training and testing purposes, the
The clean speech signal is taken from the University of speech signal is down sampled to an 8 kHz signal and a
Edinburgh, Centre for Speech Technology Research 256-point FFT is implemented and the number of fre-
(CSTR) (https://datashare.is.ed.ac.uk/handle/10283/2791). quency bins is 129. The clean speech corpus taken from the
The dataset contains nearly 400 speech sentences. These open-source dataset was contaminated by the noise signals
speech sentences are taken for training and different types at different noise levels.
of noise are added with different decibels. The dataset is The discrete Fourier transform is applied on the over-
divided into training and testing data by applying holdout lapped frames for acquiring the STFT of the signal. Due to
validation method. 80% of the dataset is taken as training the overlap, the successive frames cause the nearby frames
data and 20% is taken as the testing data. The noisy speech to have common samples at the boundary of the overlap.

123
Enhancement of single channel speech quality and intelligibility in multiple noise conditions using…

4.3 Denoising using convolutional layers

The denoising algorithm utilizes convolutional layers in

which each neuron is connected to all the activations in the
previous layer. Deep CNN is used to learn the spectral
mapping from the noisy speech signal to the clean speech
signal. The Deep CNN in this work is designed with 2-D
convolutional layer and applies the sliding filter to the input
as shown in Fig. 3.
The inputs to the convolutional layer are the features
taken from the magnitude vector of STFT and the number
of segments of the noisy speech signal. The convolution
layer convolves by moving the filter on the input vertically
and horizontally. The dot product is determined by the
weights and the input and it is added to the bias.
The convolutional layers are defined as a group of lay-
ers, i.e., Convolutional Layer, Batch Normalization Layer
and ReLu Layer and repeated 6 times, with the filter width
of 9, 5 and 9 and the number of filters are 18, 30 and 8. The
final convolutional layer is given a filter width of 129 along
with 1 filter. The mean and standard deviation of outputs
are normalized using the Batch Normalization Layers. The
Fig. 2 Proposed deep CNN for speech enhancement
maximum epoch is set to 15; therefore, the network makes
15 passes through the training data. The shuffle is made for
The relationship between STFT magnitude and the STFT
the training sequence at the starting of every epoch. During
phase is due to the correlation between the adjacent frames
the training phase, the Adam Optimizer is used for opti-
in the frequency domain. The original speech signal is
mizing the parameters and the MSE is taken as the loss
reconstructed by maintaining a relation between the STFT
function.
magnitude and phase.

Fig. 3 Deep CNN architecture for denoising speech signal

123
D. Hepsiba, J. Justin

Table 1 PESQ description Fig. 4 Performance improvement comparison for noise levels 0 dB, c
PESQ Score Description
5 dB, 10 dB, 15 dB a washing machine noise, b rainbow noise,
4–5 Excellent c train whistle noise, d jet airplane noise
3–4 Good
2–3 Fair
1–2 Poor
signal is improved compared to the SNR of the noisy
0–1 Bad
signal.
The noisy signals are taken for different noise levels
5 Results and discussions such as 0 dB, 5 dB, 10 dB and 15 dB for the different
noise types and were added with the clean speech signal to
The clean speech signal is added with different noise types form the noisy speech signal. For analyzing the enhanced
such as washing machine noise, rainbow noise, train speech signal, the performance metrics considered are
whistle noise and jet airplane noise with different noise SNR, PESQ and STOI.
levels such as 0 dB, 5 dB, 10 dB and 15 dB. The noisy The performance metrics are calculated as follows:
speech signal generated by adding washing machine noise • Signal to Noise Ratio (SNR)
is given as input to the Wiener filter and DNN-based
speech enhancement system. The SNR of the denoised

Table 2 Comparison of SNR, PESQ and STOI of noisy signal and denoised signal using Wiener filter and deep CNN
Noise level Washing machine noise Rainbow noise Train whistle noise Jet airplane noise
(dB)
SNR SNR SNR SNR
Noisy Wiener Deep Noisy Wiener Deep Noisy Wiener Deep Noisy Wiener Deep
signal filter CNN signal filter CNN signal filter CNN signal filter CNN

0 19.4898 34.7837 30.5801 19.3679 37 .2929 30.5738 19.2925 39.801 35.7682 19.5242 36.4482 33.2925
5 23.0452 35.3452 33.4311 22.9899 37.7086 32.5593 23.0805 38.0964 36.0604 23.1936 38.9859 34.2398
10 25.5831 36.5445 34.8537 25.5454 37.9403 34.6552 25.6373 38.9824 36.0091 25.5068 44.103 35.1301
15 26.6226 38.3302 35.7433 26.7152 37.4407 35.5169 26.6514 39.346 36.1191 27.0348 42.8054 36.1123
Noise level Washing machine Rainbow Train whistle Jet airplane
(dB)
PESQ PESQ PESQ PESQ
Noisy Wiener Deep Noisy Wiener Deep Noisy Wiener Deep Noisy Wiener Deep
signal filter CNN signal filter CNN signal filter CNN signal filter CNN

0 2.0374 1.8916 2.3326 1.91 1.812 2.0966 1.8663 2.1188 2.6372 2.2741 1.5465 2.3693
5 2.2703 2.2672 2.4992 2.2918 1.9707 2.3908 2.5498 2.2221 2.7768 2.5966 1.6103 2.6944
10 2.6184 2.2699 2.6496 2.4612 1.995 2.56 2.7473 2.4913 2.8795 2.6331 1.7719 2.8764
15 2.6983 2.3078 2.816 2.6623 1.9265 2.6776 2.9022 2.4074 2.9699 2.6785 1.8318 2.7983

Noise level Washing machine Rainbow Train whistle Jet airplane

(dB)
STOI STOI STOI STOI
Noisy Wiener Deep Noisy Wiener Deep Noisy Wiener Deep Noisy wiener Deep
signal filter CNN signal filter CNN signal filter CNN signal filter CNN

0 0.5166 0.0173 0.5809 0.5252 0.0039 0.5812 0.7598 0.0058 0.7923 0.6334 0.0718 0.6726
5 0.6569 0.0278 0.6814 0.648 0.0459 0.6609 0.8047 0.0535 0.8164 0.6951 0.0724 0.7074
10 0.7284 0.0174 0.7501 0.7291 0.0263 0.744 0.7781 0.0783 0.7814 0.7459 0.0214 0.7685
15 0.7542 0.048 0.8099 0.7331 0.0686 0.7704 0.8256 0.0177 0.8912 0.7463 0.0397 0.7693

123
Enhancement of single channel speech quality and intelligibility in multiple noise conditions using…

(a) Washing Machine Noise

(b) Rainbow Noise

(c) Train Whistle Noise

(d) Jet Airlane Noise p

Noisy Signal Wiener Filter Deep CNN

123
D. Hepsiba, J. Justin

(a) 0dB – Washing Machine Noise (b) 5dB – Rainbow Noise

(c) 10dB – Train Whistle Noise (d) 15dB – Jet Airplane Noise
Fig. 5 Spectrogram analysis of clean speech, noisy speech and denoised speech signal a 0 dB—washing machine noise, b 5 dB—rainbow noise,
c 10 dB—train whistle noise and d 15 dB—jet airplane noise

S rms • Perceptual Evaluation of Speech Quality (PESQ)

SNRdB ¼ 20 log10
N rms PESQ is a subjective quality measurement and it is
where S_rms—root mean square of speech signal, based on the mean opinion score based on the eval-
N_rms—root mean square of level of noise. uation given by the listeners and standardized by

123
Enhancement of single channel speech quality and intelligibility in multiple noise conditions using…

International Telecommunications Union (ITU). The observed that the Wiener filter has good capability in
PESQ value ranges as per Table 1 given below. improving the quality of the speech signal. When the
• Short Time Objective Intelligibility (STOI) intelligibility of the speech signal is considered, the per-
formance of the Wiener filter is deficient. However, the
DNN shows a drastic increase in terms of the clarity of the
STOI is a subjective intelligibility measurement, speech signal.
larger the value better the speech intelligibility. The The denoised signal shown in Fig. 4 represents that the
STOI value ranges between 0 and 1. SNR of the noisy signal is much improved in the Wiener
The audio of the noisy speech signal was inferior in filter compared to the Deep CNN. However, in terms of the
quality as well as intelligibility. When the signals were fed other performance metric representing the quality of
to the Deep CNN system for speech enhancement, the speech, i.e., the PESQ of the denoised speech signal is
performance of the denoised speech was well improved in much improved in Deep CNN compared to the Wiener
terms of quality which were clearly observed by the values filter. When the intelligibility of the denoised speech is
of SNR and PESQ. Also, the intelligibility was improved, analyzed, it is evident that the STOI scores of the Deep
which was analyzed from the STOI scores. Table 2 shows CNN give an excellent improvement in the clarity of
the quality (SNR and PESQ) and intelligibility (STOI) of speech. The spectrograms of the clean speech, noisy speech
noisy signals and improvement in the denoised signal’s and denoised speech for the different types of noise and
performance metrics. noise levels are shown in Fig. 5.
In order to analyze the quality, SNR and PESQ are
considered and to evaluate the clarity of speech; the metric
STOI is taken. The subjective quality of the spoken speech 6 Conclusion
signal is analyzed by PESQ. The value of PESQ ranges
between - 0.5 to 4.5. The higher the value of PESQ on the The proposed single channel speech enhancement system
scale indicates the improvement in quality of the denoised estimates the magnitude of the speech signal in the fre-
speech. STOI refers to the subjective intelligibility of quency domain. The Deep CNN-based single channel
speech and it ranges between 0 and 1. The improvement in speech enhancement system is compared with the tradi-
the STOI value is indicated by the higher value. tional Wiener filtering method. Evaluation is carried out on
As per the observations from the performance metrics multiple noise conditions to analyze the denoising capa-
shown in Table 2, the SNR of the denoised signal through bility of the speech enhancement system, and the results
Wiener filtering shows good improvement compared to indicate that the Deep CNN-based system outperforms in
Deep CNN model for different noise levels as well as terms of quality and intelligibility compared to the best
different noise types. The PESQ value of the Wiener filter performing Wiener filtering traditional technique. The
is in the poor range (1–2) for the rainbow and jet airplane quality of the denoised speech signal based on the SNR
noise as per PESQ scores given in Table 1. But the PESQ shows a drastic improvement for the Wiener filtered
value of the washing machine noise and train whistle noise denoised signal. However, the Deep CNN yields excellent
of the Wiener filter is in the fair (2–3) range. results in terms of quality and intelligibility that are ana-
For the Deep CNN, the PESQ values for all the noise lyzed based on the scores of PESQ and STOI. Thus, it
levels and noise types it falls in the fair (2–3) category of should be recorded that the performance of Deep CNN
mean opinion score. As the Wiener filter focusses more on outperforms the traditional Wiener filter technique.
the quality of the speech signal, it gives good result in
terms of SNR and moderate results for PESQ. But the Funding No funding.
intelligibility of speech is compromised which reduces the
clarity of the speech signal. The STOI scores show that the
Wiener filter is not capable of improving the intelligibility. Declarations
On the other hand, the Deep CNN shows drastic results in
Conflict of interest We don’t have any conflict of interest.
the STOI values, which in turn represents the intelligibility
of the denoised speech signal. Human and animal rights statement Humans/animals are not
The consolidated results in Table 2 show the improve- involved in this research work.
ment in the performance metrics of Deep CNN compared
Data availability statements The datasets analyzed during the current
to the conventional Wiener filtering algorithm for denois- study are available in the University of Edinburgh, Centre for Speech
ing speech signal. The Wiener filtering method shows Technology Research (CSTR). https://datashare.is.ed.ac.uk/handle/
outstanding results on the SNR and the PESQ. It is clearly 10283/2791.

123
D. Hepsiba, J. Justin

References networks for monoaural speech enhancement. IEEE Access

8:78979–78991
Li A, Yuan M, Zheng C, Li X (2020) Speech enhancement using
Chai L, Du J, Liu Q-F, Lee C-H (2019) Using generalized Gaussian
progressive learning-based convolutional recurrent neural net-
distributions to improve regression error modeling for deep
work. Appl Acoust 166:107347
learning-based speech enhancement. IEEE ACM Trans Audio
Li R, Liu Y, Shi Y, Dong L, Cui W (2016) ILMSAF based speech
Speech Lang Process 27(12):1919–1931
enhancement with DNN and noise classification. Speech Com-
Cui X, Chen Z, Yin F (2020) Speech enhancement based on simple
mun 85:53–70
recurrent unit network. Appl Acoust 157:107019
Li J, Zhang H, Zhang X, Li C (2019) Single channel speech
De S, Smith SL (2020) Batch normalization biases deep residual
enhancement using temporal convolutional recurrent neural
networks towards shallow paths. CoRR, vol. abs/2002.10444
networks. In: Proceedings of the APSIPA ASC, pp 896–900
Dionelis N, Brookes M (2018) Phase aware single channel speech
Loizou PC (2013) Speech enhancement: theory and practice, 2nd edn.
enhancement with modulation domain Kalman filtering. IEEE
CRC Press, Boca Raton
ACM Trans Audio Speech Lang Process 26:5
Meng Z, Li J, Gong Y, Juang BH (2018) Cycle-consistent speech
Du et al (2017) Stacked convolutional denoising auto-encoders for
enhancement. In: Proceedings of the INTERSPEECH,
feature representation. IEEE Trans Cybern 47(4):1017–1027
pp 1165–1169
Fu S-W, Tsao Y, Lu X (2016) Snr-aware convolutional neural
Nossier SA, Wall J, Moniri M, Glackin C, Cannings N (2021) An
network modeling for speech enhancement. In: Interspeech,
experimental analysis of deep learning architectures for super-
pp 3768–3772
vised speech enhancement. Electronics 10(1):17
Fu S-W, Tsao Y, Lu X, Kawai H (2017) Raw waveform-based speech
Paliwal KK, Wojcicki K, Schwerin B (2010) Single-channel speech
enhancement by fully convolutional networks. In: Proceedings
enhancement using spectral subtraction in the short-time mod-
of the APSIPA ASC, pp 6–12
ulation domain. Speech Commun 52(5):450–475
Fu S-W, Wang T-W, Tsao Y, Lu X, Kawai H (2018) End-to-end
Pandey D, Wang D (2019) TCNN: temporal convolutional neural
waveform utterance enhancement for direct evaluation metrics
network for real-time speech enhancement in the time domain.
optimization by fully convolutional neural networks. IEEE ACM
In: Proceedings of the Interspeech, pp 6975–6879
Trans Audio Speech Lang Process (TASLP) 26(9):1570–1584
Pandey A, Wang D (2019) A new framework for CNN based speech
Grais EM, Erdogan H (2013) Discriminative nonnegative dictionary
enhancement in the time domain. IEEE ACM Trans Audio
learning using cross-coherence penalties for single channel
Speech Lang Process 27(7):1179
source separation. In: Proc. Inter-speech
Park SR, Lee JW (2017) A fully convolutional neural network for
Grais EM, Plumbley MD (2017) Single channel audio source
speech enhancement. Proc Interspeech 2017:1993–1997
separation using convolutional denoising autoencoders. In:
Rix W, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual
Proceedings of the IEEE global conference on signal information
evaluation of speech quality (PESQ)—a new method for speech
processing, pp 1265–1269
quality assessment of telephone networks and codecs. In:
Healy EW, Delfarah M, Vasko JL, Carter BL, Wang D (2017) An
Proceedings of the IEEE international conference on acoustics,
algorithm to increase intelligibility for hearing-impaired listeners
speech, and signal processing, vol 2, pp 749–752
in the presence of a competing talker. J Acoust Soc Am
Schwerin B, Paliwal KK (2014) Using STFT real and imaginary parts
141(6):4230–4239
of modulation signals for MMSE-based speech enhancement.
Hsieh T-A, Wang H-M, Lu X, Tsao Y (2020) WaveCRN: an efficient
Speech Commun 58:49–68
convolutional recurrent neural network for end-to-end speech
Srinivasan S, Samuelsson J, Kleijn WB (2006) Codebook driven short
enhancement. IEEE Signal Process Lett 27:2149
term predictor parameter estimation for speech enhancement.
https://datashare.is.ed.ac.uk/handle/10283/2791
IEEE Trans Audio Speech Lang Process 14(1):163–176
Hu Y, Loizou PC (2008) Evaluation of objective quality measures for
Sungheetha A, Rajesh Sharma R (2021) Classification of remote
speech enhancement. IEEE Trans Audio Speech Lang Process
sensing image scenes using double feature extraction hybrid
16(1):229–238
deep learning approach. J Inf Technol 3(02):133–149
ITU, Perceptual Evaluation of Speech Quality (PESQ): an objective
Tan K, Wang D (2020) Learning complex spectral mapping with
method for end-to-end speech quality assessment of narrowband
gated convolutional recurrent networks for monaural speech
telephone networks and speech codecs ITU-T Rec. p 862 (2000)
enhancement. IEEE ACM Trans Audio Speech Lang Process
Jain K, Ross A, Prabhakar S (2004) An introduction to biometric
28:380–390
recognition. IEEE Trans Circuits Syst Video Technol 14(1):4–20
Thiergart O, Taseska M, Habets EAP (2014) An informed parametric
Kameoka H, Tanaka K, Kwasny D, Kaneko T, Hojo N (2020)
spatial filter based on instantaneous direction-of-arrival esti-
ConvS2S-VC: fully convolutional sequence-to-sequence voice
mates. IEEE ACM Trans Audio Speech Lang Process 22:12
conversion. IEEE ACM Trans Audio Speech Lang Process
Wang D, Chen J (2018) Supervised speech separation based on deep
28:1849–1863
learning: An overview. IEEE ACM Trans Audio Speech Lang
Kolbæk M, Tran Z-H, Jensen SH, Jensen J (2020) On loss functions
Process 26(10):1702–1726
for supervised monaural time-domain speech enhancement.
Wang NY-H, Wang H-LS, Wang F-W, Lu X, Wang H-M, Tsao Y
IEEE ACM Trans Audio Speech Lang Process 28:825–838
(2021) Improving the intelligibility of speech for simulated
Kolbk M, Tan Z, Jensen J (2017) Speech intelligibility potential of
electric and acoustic simulation using fully convolutional neural
general and specialized deep neural network-based speech
network. IEEE Trans Neural Syst Rehabil Eng 29:184–195
enhancement systems. IEEE ACM Trans Audio Speech Lang
Xia B, Bao C (2014) Wiener filtering based speech enhancement with
Process 25(1):153–167
weighted denoising auto-encoder and noise classification.
Kumar TS (2021) Construction of hybrid deep learning model for
Speech Commun 60:13–29
predicting children behavior based on their emotional reaction.
Xian Y, Sun Y, Wang W, Naqvi SM (2021) Convolutional fusion
J Inf Technol 3(01):29–43
network for monaural speech enhancement. Neural Netw
Lan T, Lyu Y, Ye W, Hui G, Zenglin Xu, Liu Q (2020) Combining
143:97–107
multi-perspective attention mechanism with convolutional

123
Enhancement of single channel speech quality and intelligibility in multiple noise conditions using…

Xu Y, Jun Du, Dai L-R, Lee C-H (2013) An experimental study on Zheng N, Shi Y, Rong W, Kang Y (2020) Effects of skip connections
speech enhancement based on deep neural networks. IEEE in CNN-based architectures for speech enhancement. J Signal
Signal Process Lett 21(1):65–68 Process Syst 92:875–884
Yuan W (2020) A time–frequency smoothing neural network for
speech enhancement. Speech Commun 124:75–84 Publisher’s Note Springer Nature remains neutral with regard to
Zhao H, Zarar S, Tashev I, Lee C (2018) Convolutional-recurrent jurisdictional claims in published maps and institutional affiliations.
neural networks for speech enhancement. In: International
conference on acoustics, speech, and signal processing, pp
2401–2405

123

6th Central Pay Commission Salary Calculator
100% (436)
6th Central Pay Commission Salary Calculator
15 pages
Noise Reduction in Speech Processing PDF
100% (1)
Noise Reduction in Speech Processing PDF
240 pages
Biomedical Signal Processing and Signal Modeling - Bruce PDF
No ratings yet
Biomedical Signal Processing and Signal Modeling - Bruce PDF
14 pages
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
No ratings yet
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
17 pages
A speech denoising demonstration system using multi-model deep-learning neural networks
No ratings yet
A speech denoising demonstration system using multi-model deep-learning neural networks
23 pages
Speech Enhancement Temporal Convolutional Neural Network
No ratings yet
Speech Enhancement Temporal Convolutional Neural Network
37 pages
An Experimental Study On Speech Enhancement Based On Deep Neural Networks
No ratings yet
An Experimental Study On Speech Enhancement Based On Deep Neural Networks
4 pages
Applsci 12 03461 v2
No ratings yet
Applsci 12 03461 v2
15 pages
Speech Processing Research Paper
No ratings yet
Speech Processing Research Paper
13 pages
Speech Enhancement Using Signal Subspace Algorithm
No ratings yet
Speech Enhancement Using Signal Subspace Algorithm
4 pages
Fundamental of Speech Enhencements
No ratings yet
Fundamental of Speech Enhencements
112 pages
2019 Speech Enhancement For Secure Communication
No ratings yet
2019 Speech Enhancement For Secure Communication
19 pages
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
No ratings yet
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
14 pages
Different Techniques For The Enhancement of The Intelligibility of A Speech Signal
No ratings yet
Different Techniques For The Enhancement of The Intelligibility of A Speech Signal
8 pages
A Corpus-Based Approach To Speech Enhancement From Nonstationary Noise
No ratings yet
A Corpus-Based Approach To Speech Enhancement From Nonstationary Noise
15 pages
18
No ratings yet
18
14 pages
Speech Enhancement Using A DNN-Augmented Colored-Noise
No ratings yet
Speech Enhancement Using A DNN-Augmented Colored-Noise
14 pages
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
No ratings yet
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
14 pages
Spcom20 Aaron
No ratings yet
Spcom20 Aaron
17 pages
CDiffSEwRL_1113_Chu_final
No ratings yet
CDiffSEwRL_1113_Chu_final
9 pages
Multi-Level Single-Channel Speech Enhancement Using A Unified Framework For Estimating Magnitude and Phase Spectra
No ratings yet
Multi-Level Single-Channel Speech Enhancement Using A Unified Framework For Estimating Magnitude and Phase Spectra
13 pages
Improving GANs for Speech Enhancement
No ratings yet
Improving GANs for Speech Enhancement
5 pages
Selected Paper at Ncsp'20
No ratings yet
Selected Paper at Ncsp'20
4 pages
参考7
No ratings yet
参考7
24 pages
Keynote Slides
No ratings yet
Keynote Slides
33 pages
Cassia PhD13
No ratings yet
Cassia PhD13
251 pages
applsci-15-02919
No ratings yet
applsci-15-02919
19 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
Speech Enhancement: Concept and Methodology
No ratings yet
Speech Enhancement: Concept and Methodology
21 pages
Taal 2013
No ratings yet
Taal 2013
4 pages
MSGLN
No ratings yet
MSGLN
10 pages
Applsci 12 09000 v3 PDF
No ratings yet
Applsci 12 09000 v3 PDF
14 pages
Comparison of Noise Removal and Echo Cancellation For Audio Signals
No ratings yet
Comparison of Noise Removal and Echo Cancellation For Audio Signals
3 pages
A Convolutional Recurrent Neural Network For Real-Time Speech Enhancement
No ratings yet
A Convolutional Recurrent Neural Network For Real-Time Speech Enhancement
5 pages
Also Dog
No ratings yet
Also Dog
10 pages
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
No ratings yet
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
12 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
BTP Group-1 Report
No ratings yet
BTP Group-1 Report
21 pages
peepeepoopoo
No ratings yet
peepeepoopoo
5 pages
Jaideep jatinPPT
No ratings yet
Jaideep jatinPPT
12 pages
High-Fidelity Noise Reduction With Differentiable Signal
No ratings yet
High-Fidelity Noise Reduction With Differentiable Signal
10 pages
Comparison of Speech Enhancement Algorithms: Sciencedirect
No ratings yet
Comparison of Speech Enhancement Algorithms: Sciencedirect
11 pages
A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement
No ratings yet
A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement
5 pages
Real-Time Speech Enhancement On Raw Signals With Deep State-Space Modeling
No ratings yet
Real-Time Speech Enhancement On Raw Signals With Deep State-Space Modeling
7 pages
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
No ratings yet
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
29 pages
LP Based Technology
No ratings yet
LP Based Technology
37 pages
Digital Signal Processing: PBL Approach: Theme: Speech Processing Title
No ratings yet
Digital Signal Processing: PBL Approach: Theme: Speech Processing Title
15 pages
1 en 26 Chapter Author
No ratings yet
1 en 26 Chapter Author
13 pages
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
No ratings yet
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
5 pages
705_1_online
No ratings yet
705_1_online
15 pages
Noise Estimation and Noise Removal Techniques For Speech Recognition in Adverse Environment
No ratings yet
Noise Estimation and Noise Removal Techniques For Speech Recognition in Adverse Environment
8 pages
04.04.2021 - PPT Presentation IEEE 6th I2CT - Paper ID - 247
No ratings yet
04.04.2021 - PPT Presentation IEEE 6th I2CT - Paper ID - 247
27 pages
Speech Enhancement Using Deep Neural Networks
No ratings yet
Speech Enhancement Using Deep Neural Networks
7 pages
Atmaja 2016 J. Phys. Conf. Ser. 776 012072
No ratings yet
Atmaja 2016 J. Phys. Conf. Ser. 776 012072
7 pages
Best Poster Award Poster
No ratings yet
Best Poster Award Poster
1 page
A Perceptually-Motivated Approach for Low-Complexity Real-Time Enhancement of Fullband Speech
No ratings yet
A Perceptually-Motivated Approach for Low-Complexity Real-Time Enhancement of Fullband Speech
5 pages
Adaptive Wiener Filtering Approach For Speech Enhancement
No ratings yet
Adaptive Wiener Filtering Approach For Speech Enhancement
9 pages
Speech Enhancement Using An Adaptive Wiener Filtering Approach M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-Samie
No ratings yet
Speech Enhancement Using An Adaptive Wiener Filtering Approach M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-Samie
18 pages
A_Fused_Deep_Denoising_Sound_Coding_Strategy_for_Bilateral_Cochlear_Implants
No ratings yet
A_Fused_Deep_Denoising_Sound_Coding_Strategy_for_Bilateral_Cochlear_Implants
11 pages
Applied Sciences: Speech Enhancement For Hearing Aids With Deep Learning On Environmental Noises
No ratings yet
Applied Sciences: Speech Enhancement For Hearing Aids With Deep Learning On Environmental Noises
10 pages
Paper 3
No ratings yet
Paper 3
8 pages
An Executive Guide Biometrics
From Everand
An Executive Guide Biometrics
alasdair gilchrist
No ratings yet
Transfermgr D 21 02696 PDF
No ratings yet
Transfermgr D 21 02696 PDF
30 pages
WSN
No ratings yet
WSN
4 pages
Demucs PDF
100% (1)
Demucs PDF
17 pages
Rau's IAS CSAT FLT 1 PDF
No ratings yet
Rau's IAS CSAT FLT 1 PDF
32 pages
Speech
No ratings yet
Speech
7 pages
JD-R59680 Senior Data Scientist
No ratings yet
JD-R59680 Senior Data Scientist
2 pages
DIP
No ratings yet
DIP
5 pages
Speech Enhancement in Modulation Domain Using Codebook-Based Speech and Noise Estimation
No ratings yet
Speech Enhancement in Modulation Domain Using Codebook-Based Speech and Noise Estimation
5 pages
T - C S E I C: WO Hannel Peech Nhancement AND Mplementation Onsiderations
No ratings yet
T - C S E I C: WO Hannel Peech Nhancement AND Mplementation Onsiderations
180 pages
Using Matlab With Python Cheat Sheet
0% (1)
Using Matlab With Python Cheat Sheet
1 page
Registration Form
No ratings yet
Registration Form
1 page
Advances in Computational Intelligence
No ratings yet
Advances in Computational Intelligence
26 pages
PART III: Biomedical Signal Processing: An Introduction
No ratings yet
PART III: Biomedical Signal Processing: An Introduction
83 pages
Microprocessor UNIT-6
No ratings yet
Microprocessor UNIT-6
15 pages
Pin Config1 SNK
No ratings yet
Pin Config1 SNK
6 pages
VGST Components
No ratings yet
VGST Components
4 pages
Mamba-Based Decoder-Only Approach With Bidirectional
No ratings yet
Mamba-Based Decoder-Only Approach With Bidirectional
6 pages
REMOTE Data Scientist: Militsa Kodjabasheva
No ratings yet
REMOTE Data Scientist: Militsa Kodjabasheva
1 page
MD
No ratings yet
MD
1 page
Introduction To Simulation Grid Design and Upscaling Methods
No ratings yet
Introduction To Simulation Grid Design and Upscaling Methods
2 pages
0054 Syllabus
No ratings yet
0054 Syllabus
2 pages
Dynamics and Control For An In-Plane Morphing Wing: Shi Rongqi and Song Jianmei
No ratings yet
Dynamics and Control For An In-Plane Morphing Wing: Shi Rongqi and Song Jianmei
8 pages
INT354 Question Bank
No ratings yet
INT354 Question Bank
11 pages
DC17 Chp11
No ratings yet
DC17 Chp11
34 pages
DBMS Keys.
No ratings yet
DBMS Keys.
16 pages
Pytorch For Beginners
No ratings yet
Pytorch For Beginners
13 pages
328 33 Powerpoint Slides 12 Efficient Test Suite Management Chapter 12
No ratings yet
328 33 Powerpoint Slides 12 Efficient Test Suite Management Chapter 12
18 pages
Pattern Generating Procedure For The Cutting Stock Problem
No ratings yet
Pattern Generating Procedure For The Cutting Stock Problem
9 pages
DMBAR Chapter 4 Dimension Reduction
No ratings yet
DMBAR Chapter 4 Dimension Reduction
25 pages
Voronoi Diagram, Dynamic Path Planning
No ratings yet
Voronoi Diagram, Dynamic Path Planning
4 pages
Lec7_10_HMM Learning
No ratings yet
Lec7_10_HMM Learning
88 pages
Chap 4 Molecular Velocity Distribution
No ratings yet
Chap 4 Molecular Velocity Distribution
22 pages
Adversarial Attacks
No ratings yet
Adversarial Attacks
5 pages
Lecture Script Iqc w2023
No ratings yet
Lecture Script Iqc w2023
40 pages
CHaitanya Mondi - CV
No ratings yet
CHaitanya Mondi - CV
3 pages
fNRIS
No ratings yet
fNRIS
9 pages
AIML LAB Final
No ratings yet
AIML LAB Final
13 pages
4-6-normal-distributions-003eCoyDOpsVHlU3
No ratings yet
4-6-normal-distributions-003eCoyDOpsVHlU3
39 pages
(Ebook) Algorithm Design. A Methodological Approach 150 Problems and Detailed Solutions by Patrick Bosc, Marc Guyomard, Laurent Miclet ISBN 9781003334590, 9781032369419, 1003334598, 1032369418 - Experience the full ebook by downloading it now
100% (2)
(Ebook) Algorithm Design. A Methodological Approach 150 Problems and Detailed Solutions by Patrick Bosc, Marc Guyomard, Laurent Miclet ISBN 9781003334590, 9781032369419, 1003334598, 1032369418 - Experience the full ebook by downloading it now
83 pages
Alexander 4
No ratings yet
Alexander 4
9 pages
Shallow and Deep Artificial Neural Networks For Structural Reliability Analysis
No ratings yet
Shallow and Deep Artificial Neural Networks For Structural Reliability Analysis
8 pages
Apma 1200
No ratings yet
Apma 1200
2 pages
2016 Midterm
No ratings yet
2016 Midterm
2 pages
Project Ideas
No ratings yet
Project Ideas
8 pages
Jedox Ebook Ultimate Guide Scenario Planning en
No ratings yet
Jedox Ebook Ultimate Guide Scenario Planning en
13 pages
CSC304 Lecture 6
No ratings yet
CSC304 Lecture 6
21 pages