Introduction
Introduction
Introduction
1.1 Preamble
Speech processing is an interdisciplinary field that studies acoustic signals using both signal
processing techniques and knowledge from linguistics. An efficient speech recognition system
needs both speech enhancement and speech feature extraction methods to handle clean and noisy
data. The goal of error free speech recognition systems has remained unsolved over the years for
noisy speech signals. This is not true for clean speech signal recognition system. Speech
enhancement focuses on finding and estimating optimal parameters by reducing noise or by
enhancing the speech of noisy signal. Normally, speech recognition system works well in a
controlled environment but they perform poor in noisy environments. This work addresses on
‘In handling of clean and noisy speech signals using wavelets and fuzzy techniques in improving
their recognition performances by proposing better feature enhancement and extraction
algorithms. The proposed new algorithms are verified for the performance of the speech
recognition systems for both clean and noisy speech signals.
If a recognition system is used under noisy environment, it must be robust to many different
types and levels of noise or change in the speaker’s voice. Noise is categorized as either additive
noise or convoluted noise. The noise vectors contaminate the speech signal changing the data
vectors that represent the speech. The noise can be induced by adding various types of
background or environmental noise. These noises can be babble noise, street noise, car noise or
voices recorded during the usage of phone and fan entities. The noise can also be induced by the
change of speakers. Changes in the speaker’s voice are caused by the modifications of
articulation parameters. The main variances can be found in increasing the speakers pitch,
amplitude, vowel duration, and spectral tilt, as well as shift in formant frequencies F1 and F2.
However these changes are by no means constant, even for the same speaker under similar noise
conditions. It is highly difficult to either remove or model these signals. Hence it is necessary to
use efficient techniques like wavelets and fuzzy to handle the issues in feature extraction process.
Wavelet transform is well proved to be good for handling non-stationary signals and the
algorithms derived from the wavelet theory have became standards in digital signal processing.
Along with this technique even Soft Computing techniques like (SC) i.e. fuzzy is identified as a
foundation and more powerful tool to design and develop intelligent systems, to provide feasible
solutions with better features. Hence these two techniques are proposed in this thesis for
extracting the features of speech signals. Next section presents a detailed study on speech feature
extraction techniques.
wiener filter, time varying speech model-based or state-based method fail to handle
noisy speech signals, Yuan [Yuan, 2003] proposed adaptive Bionic Wavelet Transform. The
algorithm was proposed to handle various types of additive noises at two folds. i) By Segmental
signal-to-noise ratio (SSNR) and ii) signal-to-noise ratio (SNR) using Morlet as the mother
wavelet. Experiments were conducted on TIMIT dataset and the results are tabulated for using
Bionic wavelet over other thresholding methods. The Risk thresholding function when used with
adaptive Bionic wavelet presents consistently best performance with increasing the SNR. An
extended work to the above paper [Yuan 2003 (23,24)] is proposed by [Mourad TALBI 2009 ,
(25) ] using Recurrent Neural Network to classify adaptive bionic features. These features are
enhanced using Elman filter along with Bionic wavelet . The results are compared with spectral
subtraction, Bionic wavelet and Elman filter adapted for Bionic Wavelet for thresholding.
Elman Adaptive Bionic wavelet has improved SNR compared to spectral subtraction and Bionic
wavelet .The improved SNR is obtained for additive white noise, Volvo noise over the Case of
F16 noise. Authors [Rajkumar ,Angamba Singh , K.Pritamdas, 2016,(26) ] proposed a Modified
Speech Enhancement Algorithm Based on the Continuous Wavelet Transform using Morlet
wavelet as a mother wavelet. The paper discuss results for additive Gaussian noise over other
noises like Babble, Airport and Car noises of NOIZEUS dataset. The Continuous adaptive
Bionic wavelet is compared over traditional DWT for high level noise i.e.at 0 dB I/P SNR’s. A
range of scales are fixed for different SNR’s to enhance the signals. Experiments are conducted
for the SNR’s varying from 0 to 30 db. An improvement in SNR is observed when continuous
wavelet is applied over discrete wavelet packet. In 2018, [Hangting Chen, Pengyuan Zhang 2018
(27,28)] proposed a Deep Convolution Neural Network classifier with Scalogram for Audio
Scene Modeling using Morlet wavelets for extracting the features of DCASE dataset. An
approach to learning is presented based on short-time Fourier transform and hand-tailored
filters. The extracted feature results are compared over scalograms with classical Mel energy.
90.5% increased accuracy is observed for multi-scale features of continuous Morlet wavelets.
Fuzzy logic evolved as prominent methods to develop speech recognition system. Authors
[Ramin Halavati, Saeed Bagheri Shouraki, et al., 2006 (28,29)] propose fuzzy for TIMIT
database phoneme Recognition. The author presents the application of fuzzy modeling approach
to ignore noise instead of reducing or removing. The speech features are extracted by converting
speech spectrogram into a fuzzy linguistic description instead of precise acoustic features. The
fuzzy modeling uses features defined by linguistic terms for phonemes. Genetic algorithm has
been used as a classifier to find appropriate definitions for phonemes and results are compared
with Hidden Markov model. The GA method gives better results in noisy environment. Each
test is repeated five times and with six different noise levels, i.e. for Clean, 30, 20, 10, 0 and 10
dB SNRs. The fuzzy approach has shown 20–28% more immunity against noise in normal noisy
environment (SNR: 30–10 dB) and this immunity decreases while the noise level approaches the
amounts that makes the input not identifiable for human beings but it is always above that of
HMM–MFCC. [Ingjr Ding ,2013 (30)] Proposed Speech recognition using variable-length frame
overlaps by intelligent fuzzy control by fuzzy logic control (FLC) mechanism. This is used to
determine a variable-length frame overlap between two consecutive frames. The FLC is devised
to regulating the frame overlap size. The fuzzy overlapped features are compared LPC, LPCC
and MFCC feature test cases. The proposed scheme of variable-length frame overlapping is
observed to be superior than to fixed-length frame overlapping. The fuzzy overlapping is
applied with a decrement 2ms of overlaps for all the classes. An adaptive learning approach is
used to derive accurate acoustic features. These variable frame features have been experimented
using LPC,LPCC and MFCC conventional approaches. The frame analysis with MFCC yields
99.33% when modeled with HMM over LPC and LPCC methods. In 2014 authors [Seyed
Mostafa Mirhassani , Hua-Nong Ting (2014) (31) ] proposed Fuzzy-based discriminative,
complementary feature extraction and selection procedures for Malay vowel children’s speaker
independent phoneme recognition system. The fuzzification is applied by using discriminative
criteria, fuzzy codification and fuzzy aggregation criteria to extract and select the optimal
features from MFCC. The features obtained are compared with other feature selection methods
like Sequential Forward Floating Search (SFFS) and sequential Backward Floating Selection
(SBFS) methods. The features obtained have been classified using Multi Layer Perceptron
(MLP) and Hidden Markov Model (HMM) for phoneme recognition accuracy. Improved
recognition accuracy rate 95.28% is observed for fuzzy features when classified with HMM
classifier where as MLP classifier gave 93.14% accuracy.
Amane [Amane Taleeb (2012)(32)] proposed Speech Recognition by Fuzzy-Neuro ANFIS
Network and Genetic Algorithms for TIMIT speech database. The learning algorithm is applied
on TIMIT to extract the MFCC co-effiecients, whereas genetic algorithm is applied to minimize
the number of input parameters of ANFIS and to minimize the classification error rate by
determining the optimal parameters . The work is oriented on the continuous speech recognition
in no-noisy condition and it is observed that GA with ANFIS yields better results 1 crossover
point technique. Authors [Lubna Eljawad, Rami Aljamaeen, 2019(33)] Proposed speech
recognition for ARABIC language using adaptive Neuro fuzzy inference system. The process
involves preprocessing using DC level removal and resizing, Feature extraction, and MLP and
fuzzy logic classifier. These systems are used to construct two intelligent recognizer i.e Fuzzy
logic and Neural network recognizers. These models are used to study the recognition ability of
MLP and fuzzy logic system for Arabic and English languages. Testing is performed on male
and female using cross validation. The recognition accuracies are compared over fuzzy logic
recognizer and neural network recognizer . An improved 94.5 is obtained for MLP compared to
Sugeno fuzzy inference model with 77%. The authors have suggested combining these two
intelligent recognition system to increase the recognition accuracy. Authors [Samiya Silarbi,
Bendahmane et al., 2014 (34)] proposed ANFIS for phoneme recognition using TIMIT database.
The pre-processing and feature extraction have been carried out for MFCC parameters. The
network is learnt by using subtractive clustering to define an optimal structure with small number
of rules. Hybrid learning using Gradient decent and least square estimation is used to find
feasible set of parameters. A recognition accuracy of 100 % is obtained for 6 vowels, 6 fricatives
and 6 plosives phoneme.
[Sankar K. Pal (1992)(36)] developed linguistic recognition system based on approximate
reasoning for handling various imprecise input patterns. Natural decisions are provided using
fuzzy rules for designing Fuzzy Inference System for classification of phonemes. This model has
obtained 80% of recognition accuracies. Here authors considered only three properties like
SMALL, MEDIUM, and HIGH and suggested to use very small, more or less small a linguistic
hedge leading to less impreciseness in input linguistic information.
Extended to this [Cetisli, B. (2010)(37)] proposed the concept of Neuro Fuzzy Classifier using
Linguistic Hedges to classify non-linear signals for Pima Indian Diabetes dataset, spam e-mail
dataset processing to handle the overlaps. LH is not applied for Speech processing applications.
Observations:
It is obtained from the literature review that hybrid models have potential for developing
improved accuracy in speech recognition application. These models have not been tried
with convoluted noise.
It is also seen that continuous wavelets have been experimented and shown that to
produce improved accuracy specifically with additive noise. Classification is achieved
using well known methods . An accuracy up to 90.5% is reported in the above works.
They have not been tried for convoluted noise.
LH concept has been applied for Pima Indian Diabetes dataset, spam e-mail dataset.
ANFIS is proposed for Phone recognition but not for word recognition with noisy data.
Fuzzy is used for ignoring noise not for removal. Fuzzy tool is majorly used for feature
extraction and selection.
1.4 Motivation:
From the literature it is clearly evident that previous methods on hybrid feature extraction,
speech enhancement and classification using wavelets and fuzzy techniques are not analyzed
for convoluted noisy speech signals for various types of noises. As such, the study of the speech
recognition problem under degraded conditions is a difficult problem and thus an interesting area
to handle the challenges like ambiguity, impreciseness and incompleteness present in the speech
data. These challenges motivated us to conduct research and propose novel algorithms for feature
enhancement, extraction of speech signal using fuzzy and wavelet techniques.
1.6 Objective:
The main objective is to propose feature extraction algorithms using MFCC as a base model
using wavelets and fuzzy techniques. It is proposed to use hybrid techniques, continuous
wavelets to enhance and extract the speech features that increases the recognition accuracy.
Fuzzy based framing and Linguistic hedges are proposed and experimented for various
homogeneous and heterogeneous data.
Chapter1 presents about the review of state of the art methods in feature extraction using fuzzy
and wavelet techniques along with motivation, objectives and challenges in speech recognition
system. A general introduction to Automatic Speech Recognition is discussed in chapter 2 with
the relevant issues and designs of ASR systems. In Chapter 3 we present Hybrid, Hilbert Huang
Transform methods with wavelets to extract and cluster the speech features using various
clustering algorithms.
Feature extraction algorithms using Adaptive Bionic and continuous perceptual Morlet wavelets
are discussed to enhance and extract the speech features using thresholding functions in chapter
4.
In chapter 5 we present Fuzzy framing and Linguistic Hedge classifier to extract and classify
features for homogeneous and heterogeneous data set. Chapter 6 presents the conclusions as
well as suggestions for future work.