An Experimental Analysis of Speech Features For Tone Speech Recognition

International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-2, December 2019
An Experimental Analysis of Speech Features

for Tone Speech Recognition
Utpal Bhattacharjee and Jyoti Mannala

of the speech production in any language [3]. According to
Abstract: Recently Automatic Speech Recognition (ASR) has C.M.Doke [4] tones may be classified into two broad
been successfully integrated in many commercial applications. categories - characteristics tone and significance tone.
These applications are performing significantly well in relatively Characteristic tone is the method of grouping of musical
controlled acoustical environments. However, the performance
pitch which characterize a particular language, language
of an Automatic Speech Recognition system developed for non-
tonal languages degrades considerably when tested for tonal group or language family. Significant tone on the other hand
languages. One of the main reason for this performance plays an active part in the grammatical significance of the
degradation is the non-consideration of tone related information language, may be a means of distinguishing words of
in the feature set of the ASR systems developed for non-tonal different meaning otherwise phonetically alike. A generally
languages. In this paper we have investigated the performance of accepted definition of tone language was proposed by
commonly used feature for tonal speech recognition. A model
K.Pink [5]. According to this definition, a tone language
has been proposed for extracting features for tonal speech
recognition. A statistical analysis has been done to evaluate the must have lexical constructive tone. In generative
performance of proposed feature set with reference to the phonology, it means tone of a tonal phonemes are no way
Apatani language of Arunachal Pradesh of North-East India, predictable, must have to specify in the lexicon of each
which is a tonal language of Tibeto-Burman group of languages. morpheme [3]. For any tone language, the basic building
block is tonal syllable. A tonal syllable consist of two
Keywords: Feature Selection, LPCC, MFCC, Tonal components - a syllabic sound unit and an associated lexical
Language, Prosodic Features, Speech Recognition
tone. If the tone is ignored, it is called base syllable. Each
syllable consist of vowel and consonant sounds. Tone is
I. INTRODUCTION
realized in voiced segment, therefore, tonal base units
Automatic speech recognition (ASR) research has made (TBU) in most of the time are voiced vowels [6]. Since tone
remarkable progress since its inception in the mid of 20th associated with the vowels are sufficient to express the tone
century making it a viable option for human-machine associated with the syllable, in the present study, only tonal
interaction. However, there are few issues which are still vowels will be analysed to determine the tonal phoneme
hindering its wide spread use in commercial applications. discrimination capability of the feature sets. Tone may be
One such issue is the language dependency of the speech broadly classified into two categories -- Level tone and
recognition systems. Based on the use of tone for Contour tone. Level tones are the tones which remain
discriminating phones, the languages may be divided into constant throughout the TBU. Level tones are classified as
two broad categories - Tone language and Non-tone High, Low and Middle. In construct, contour tones shows a
language. A language is regarded as `Tone Language' if the clear shifting from one level to another within the syllabic
change in the tone of the word results in changing the boundary. Contour tones may be classified into rising and
meaning of the word [1]. The basis of tone is the pitch of falling. Woo [3] argued that contour tones can be considered
the sound. Pitch is the perceived fundamental frequency or as collection of multiple level tones. Her argument was
the rate of vibration of the vocal folds during the production supported by other scholar like Leben [7], Goldsmith [8]
of the sound. The most general definition of tone language and Yip [1] with suitable evidence to justify the fact.
was proposed by D.M. Beach in the year 1924 [2]. Beach However, many other scholars did not support that contour
defined tone language as a language that uses pitch tone should be decomposed into level tones [6].
constructively in any manner of its articulation. According A major section of world population spreading across
to this definition all the languages are tone language since south-east Asia, East Asia and Sub-Sahara Africa are
intonation in terms of pitch modulation is inherent to the speakers of tonal language [9]. In the present study, an
articulation of any language. However, this definition fails attempt has been made to analyse the tonal phoneme
to distinguish the languages where tone is used to discrimination capability of popular feature extraction
distinguish words of different meaning otherwise techniques namely Mel frequency cepstral coefficient
phonetically alike. Tone or intonation is the musical (MFCC), Linear predictor cepstral coefficient (LPCC) and
modulation of the voice in speech and as such integral part prosodic features.
Selection of suitable feature set is one of the most crucial

Revised Manuscript Received on July 22, 2019. design decision for the development of a speech based
* Correspondence Author system. Speech signal not only conveys the linguistic
Utpal Bhattacharjee*, Department of Computer Science
and Engineering, Rajiv Gandhi University, Rono Hills, Doimukh,
information, but lots of other information like information
Arunachal Pradesh, India, Pin - 791 112 about the speaker, gender, social and regional identity,
Email: utpal.bhattacharjee@rgu.ac.in health and emotional status etc.
Jyoti Mannala, Department of Computer Science and Different speech features
Engineering, Rajiv Gandhi University, Rono Hills, Doimukh, Arunachal
Pradesh, India, Pin - 791 112 Email: mannalajoy@gmail.com
represent different aspects of the
speech signal. Moreover, the
Retrieval Number: paper_id//2019©BEIESP

Published By:
DOI: 10.35940/ijitee.xxxxx.xxxxxx Blue Eyes Intelligence Engineering
1 & Sciences Publication
An Experimental Analysis of Speech Features for Tone Speech Recognition
information present in different speech features are one of the most crucial design decision for speech based
redundant and overlapping. Therefore, it is difficult to system development. The speech features can be categorize
identify and separate which aspect of the speech signal is into three categories -- Excitation source features, vocal tract
represented by which feature. In speech research, very often features and prosodic features.
features are selected on experimental basis, and sometimes Speech features extracted from excitation source signal is
using the mathematical approach like Principal component called source features. Excitation source signal is obtained
analysis (PCA). by discarding the vocal tract information from the speech
The Apatani language of Arunachal Pradesh of North signal. This is achieved by first predicting the vocal tract
East India is belongs to the Tani group of language. Tani information using linear predictor filter coefficients
languages constitute a distinct subgroup within Tibeto- extracted from the speech signal and then separating it by
Burman group of languages [10]. The other languages of the using inverse transformation. The resulting signal is called
group are Adi, Bangni, Bokar, Bori, Damu, Gaol, Hill Miri, linear predictor residual signal [14]. The features extracted
Milang, Na, Nyishi, Tagin, Tangam and yano. The Tani from LP residual signal is called excitation source features
languages are found basically in the continuous areas from or source features. A sound unit is characterized by a
the Kamng river to the Siang river of Arunachal Pradesh. A sequence of shapes assumed by the vocal tract during
small number of Tani speakers are found in the contiguous production of the sound. The vocal tract system can be
area of Tibet and only the speakers of Missing language are considered as a cascade of cavities of varying cross sectional
found in the Brahmaputra valley of Assam [11]. The areas. During speech production, the vocal tract act as a
Apatani language has 06(six) vowels and 17 (seventeen) resonator and emphasizes certain frequency components
consonants [12]. depending on the shape of the oral cavity. The information
The Table. 1 presents the Apatani vowels and Table. 2 about the sequence of shapes of vocal tract that produce the
presents the Apatani consonants with their manner and sound unit is captured by vocal tract features also called
position of articulation. system or spectral features. The vocal tract characteristics
can be approximately modelled by spectral features like
Table1: Apatani vowels. linear predictor coefficients (LPC) and ceptral coefficients
Tongue Tongue position (CC) [13]. Prosody plays a key role in the perception of
Height Front central Back human speech. The information contained in prosodic
High ɪ ʊ features is partly different from the information contained in
Mid ɛ ə ɔ source and spectral features. Therefore, more and more
Low ɑ: researchers from the speech recognition area are showing
interests in prosodic features. Generally, prosody means "the
structure that organizes sound". Pitch (tone), Energy
Table 2: Apatani consonants with their manner and place
(loudness) and normalized duration (rhythm) are the main
of articulation
components of prosody for a speaker. Prosody can vary
Manner of Place of Articulation
Articulation Labial Alveola Palatal Velar Glottal from speaker to speaker and relies on long-term information
r of speech.
Stop p, b t, d ʧ, ʤ k, g Very often, prosodic features are extracted with larger
frame size than acoustical features as prosodic features exist
Nasals m n ŋ
over a long speech segment such as syllables. The pitch and
Fricative s kʰ h energy contours change slowly compared to the spectrum,
Flap r which implies that the variation can be captured over a long
Approximat ɭ ȷ
speech segment [15].
e The source, system and prosodic features are distinct
from each other in speech production, feature extraction and
perception point of view. They are mostly non-overlapping
II. THE SPEECH FEATURES in nature and represent different aspects of the speech
production system. The basic objective of ASR system is to
Speech is the output of a vocal tract system excited by an recognize the phonetic content of the speech signal
excitation source signal. Characteristics of both the vocal discarding other irrelevant information.
tract response and excitation source signal vary with time to
produce different sounds. At the time of speech production,
human beings impose duration and intonational pattern on
top of the vocal tract response to convey the intended
message [13]. Speech signal not only conveys the linguistic
information but lots of other information like information
about the speaker, gender, social and regional identity,
health and emotional status etc. The first step of automatic
speech recognition system is to form a compact
representation of the speech signal emphasizing phonetic
information of the signal over other information. Choosing
suitable features for developing a speech based system is
Retrieval Number: paper_id//2019©BEIESP Published By:

Most of the state-of-the-art ASR systems are developed

using only system or spectral features as these features are
concern with the shape of the vocal tract during production
of different sound units, which in turn reveals the
information about the sound unit produced. However, in
case of tonal speech recognition, speech unit having the
same phonetic structure but of different tones convey
different meaning. Therefore, the system feature itself is not
sufficient for the recognition of the tonal speech. To
enhance the performance of tonal speech recognition
system, the prosodic information, which represents the tonal
characteristics of the speech must have to be incorporated in
the feature set. The major challenge in incorporating
prosodic features with the spectral features comes from the
extraction process itself. The spectral features are short-term
features. The change pattern of the spectral features can be
recorded with high resolution if the observation window size
is 15~25 microseconds. However, due to the slow-varying
nature of the prosodic features, in this observation window
the changes in the prosodic features of the speech signal
cannot be captured. To overcome the problem, fusion of the
features extracted from this two domains has been carried
out. In speech processing, two commonly used methods of
fusion are - feature-level fusion and score-level fusion. In a modified k-mean clustering algorithm which preserve the
feature-level fusion, prosodic features like pitch and temporal information of the speech feature. We are calling it
temporal energy were computed frame by frame and they temporal k-mean (TKM) algorithm. The algorithm is given
are appended to the below:
spectral features. To capture the dynamic property of the
features, their first-order and second-order derivatives are
also added. However, vital information which can be
Fig. 1. Block diagram of the hybrid feature extraction
observed only in long-duration observation window are
system
missed out in this approach. In the second approach, the
spectral and prosodic features are extracted from the tonal
Temporal K-Mean (TKM) Algorithm
base unit (TBU) using separate observation window. The
spectral features are then feed to a classifier that computes a 1. Compute the initial value for the ith cluster centroid as
class label for the base acoustical unit of the TBU and the follows:
prosodic features are feed to a classifier that computes a i∗ M
class label for the tone associate with the TBU. One of the 1
c ij = ∑ c … (1)
major problem with this approach is that correlation M 1+(i−1)∗M j
between the spectral and prosodic features are completely
ignored at classifier level. N
In this paper we have proposed a hybrid method where where M= , N and k are the total number of frames and
k
the features are extracted with different observation
number of clusters respectively, c j is the value of the jth
windows and then combined together to take a decision on
class boundary of the TBU. coefficient of the feature and c ij is the initial value of the ith
cluster for jth coefficient
III. PROPOSED METHOD 2. Use a data structure for the centroid as (centroid_values,
proximity_index), the proximity_index referred to the
The block diagram of the proposed model is given in Fig. central location of each cluster derived in the time scale.
1. The pre-emphasized speech signal is first blocked into
frame of 100 ms duration with 50% overlapping. From each 3. For each frame j repeat step 4 to 6
block, two types of features have been extracted -- spectral
features and prosodic features. The spectral features 4. Select the two nearby clusters m and k for jth frame based
considered in the present study are Mel Frequency Cepstral on proximity index. The cluster with two consecutive
Coefficients (MFCC) and Linear Predictor Cepstral proximity index m and k are nearby clusters to j if
Coefficients (LPCC). To extract the spectral features, each M∗m≤ j≤ k∗M … (2)
speech frame of 100 ms has been re-framed into frame of
size 20 ms with 50% overlapping. The spectral features 5. Compute the distance of the
namely MFCC and LPCC have been extracted from each 20 jth frame from this two cluster
ms frame separately. In the present study we have proposed centroids.

Published By:
6. Assign the frame to the nearby cluster and update its [ ə́] Vowel ə with level tone
cluster centroid.
A feature would be effective in discriminating between
The algorithm has been applied separately to both MFCC
different tonal vowels if the distribution of different tonal
and LPCC features and reduced feature sets have been
vowels are concentrated at widely different location in the
extracted which represents the spectral characteristic of the
parameter space although they are different from each other
speech signal for the entire 100 ms duration. These features
only in associated tone[16]. A good measure of
are combined with prosodic features extracted from the 100
effectiveness would be the ratio of inter-vowel to intra-
ms frame considering it as a single unit. The prosodic
vowel (within the class) variance for the tonal vowels,
features extracted are maximum, minimum and average
referred to as F-ratio, which is defined as
values of F0 and Energy computed over the entire 100 ms
period. These prosodic features are combined with MFCC
Variance withinthe class
and LPCC features separately and two different sets of F=
features have been computed. Each feature set is evaluated Average variance across all classes
for their relative performance in tonal speech recognition. … (3)
To compute the overall F-ratio values across all class. The
IV. EXPERIMENTAL SETUP equation is:
N
In the present study, each tonal instance of a vowel has 1
been considered as different tonal vowel. For example, the
∑ (μ − μ́)
N i=1 i
vowel [ɑ :] has three associated tones -- rising, falling and
F= N
1
level. Thus vowel ¿:] gives raise to the tonal vowels [ɑ́ :] (¿ ∑S
N i=1 i
] rising), [ ɑ̀ :] ((¿:] falling) and [ɑ́ :] ((¿:] level). We
… (4)
referred to these vowels as tonal vowels. Considering the
tonal instances as a separate vowel, we get sixteen tonal Where N is the number of tonal vowels, μi is the mean of
vowels in Apatani language. The vowels are given in Table. a particular coefficient of the feature vector for ith tonal
3. Since the vowel [ə] has only one tone, it is not taken into vowel, μ́ is the overall mean value for that coefficient of the
consideration while evaluating the performance of the feature vector for all the tonal vowels. Si , within a tonal
feature vectors. vowel variance is given by
A speech database of Apatani tonal words has been Mi
prepared to carry out the experiments. The database consist 1
Si= ∑ (x −μ )
of 12 isolated tonal words spoken by 20 different speakers M i j=1 ij i
(13 males and 7 females). The recording has been done in a … (5)
controlled acoustical environment at 16 KHz sampling
frequency and 16 bit mono format. A headphone where x ij is the value of the coefficient for jth observation
microphone has been used for recoding the database. The
of the ith tonal vowel and M i is the number of observations
words are selected in such a way that each tonal instance of
the vowel has at least 5 instances among the words. Thus, for ith tonal vowel. Higher F-ratio value for a coefficient
for each tonal vowel, we have minimum 100 instance indicates that it can be used for good classification
recorded from 20 speakers. Another metric used for measuring the performance of
Table. 3. Apatani Tonal vowels. features in discriminating among the tonal instances of a
vowel is the Kullback-Leibler distance (KLD). The KLD
provides a natural distance between a probability
[ɑ́ :] Vowel ɑ: with level tone
distribution and a target probability distribution. KL
[ɑ́ :] Vowel ɑ: with rising tone
distances have been measure among features extracted from
[ ɑ̀ :] Vowel ɑ: with falling tone the tonal vowel and their average has been taken. If the
[ ɪ́ ] Vowel ɪ with level tone distance is higher, the feature has better tonal phoneme
discrimination capability.
[ ɪ́ ] Vowel ɪ with rising tone
[ ɪ̀ ] Vowel ɪ with falling tone
V. RESULTS AND DISCUSSIONS
[ ɔ́] Vowel ɔ with level tone
All the experiments were carried out using the database
[ ɔ́] Vowel ɔ with rising tone
described in Section - IV. The vowels are segmented from
[ ɔ̀] Vowel ɔ with falling tone the isolated words for all its tonal instances. The
[ ɛ́ ] Vowel ɛ with level tone segmentation has been done using PRAAT software which
is followed by subjective verification. The speech signal is
[ ɛ́ ] Vowel ɛ with rising tone
first segmented into frame of 100 ms with 50% overlapping.
[ ɛ̀ ] Vowel ɛ with falling tone
We will refer to this as 1st level frame. Each 1st level frame
[ʊ́ ] Vowel ʊ with level tone is now passed through two parallel system. The 1st system
[ʊ́ ] Vowel ʊ with rising tone extracts the spectral features –MFCC and LPCC separately.
To extract the spectral features,
[ ʊ̀ ] Vowel ʊ with falling tone
whose characteristics are correctly

visible only in short duration frame, we have re-framed the High-Level MFCC 4.2870 0.4258
1st level frame into frame of size 20 ms with 50% High-Level LPCC 4.4580 0.3516
overlapping. We refer to this as 2nd level frame. The MFCC
and LPCC features are extracted from each 2nd level frame.
From the above results it has been observed that the
The MFCC feature has been computed using a 21-channel
proposed features have better intra-tone phone
filter bank resulting in a 13-dimensional cepstral features
discrimination capability. This observation justify the fact
consisting of c 0 to c 12 coefficients. The LPCC has been
that these features can be used for both tonal and non-tonal
computed using a 10th dimensional predictor signal speech recognizer.
aggregated to a 13-dimenaional cepstral coefficients. Now, In the third set of experiments, we have evaluated the
the MFCC and LPCC features are clustered into 3 clusters performance of features for their inter-tone discrimination
using temporal k-mean algorithm. The cluster centroids are capability. In this experiment, we have computed F-ratio
clubbed together and we get a 39-dimentional MFCC and value considering all the instances of a tonal vowel as intra-
39-dimensional LPCC feature vector for the 1st level frame class and other tonal instances of the same vowel as inter-
of the speech signal. These two set of features are then class. Further, KL-distances have been measures among the
combined with the prosodic features separately. The tonal instances of the same base vowel only. The results of
prosodic features – maximum, minimum and average F0 and the experiments are given in Table. 6.
Energy are computed from each 1st level frame directly.
Thus, we get two sets of 45-dimensional feature vectors (39 Feature vector F-ratio KL Distance
spectral features and 6 prosodic features) for each 1st level
Baseline MFCC + ∆ + ∆∆ 0.7365 0.0538
frame. We will refer to this features as High-level MFCC
and High-level LPCC features respectively. Baseline LPCC + ∆ + ∆∆ 0.8383 0.293
To perform a comparative study of the proposed feature High-Level MFCC 4.7813 0.5754
set, we have extracted baseline MFCC and LPCC features High-Level LPCC 3.9852 0.2958
from the speech signal with 20 ms frame size and 50%
overlapping considering the same experimental setup as
described above. To capture the dynamic property of the From the above results it has been observed that the
speech signal, the 1st order and second order derivatives of proposed features are performing significantly well in inter-
the coefficients are also added. Thus we get a 39- tone discrimination of the phoneme when the base phoneme
dimensional MFCC feature vector and 39-dimensional is the same and different tonal instances are distinct from
LPCC feature vector. The result of the experiment carried each other only due to change in tone. In this scenario the
out is given in the Table. 4. baseline MFCC and LPCC features are completely failed to
discrimination among the phonemes.
Table. 4. Average F-ratio and KL Distance for the
features.
Feature vector F-ratio KL Distance VI. CONCLUSION
Baseline MFCC + ∆ + ∆∆ 2.0136 0.4404 This paper presents a feature set for tonal speech
recognition. The spectral and prosodic features are
Baseline LPCC + ∆ + ∆∆ 2.5569 0.6956
combined together using a late fusion technique to produce a
High-Level MFCC 5.3350 0.8727 feature set for the classifier. The proposed feature extraction
High-Level LPCC 4.3350 0.8754 technique has been evaluated for tonal phoneme
discrimination task. It has been observed that the proposed
feature set is performing significantly well in tonal as well
From the above experiments it have been observed that as as tone-independent evaluation scenario. Therefore, the
a result of adding prosodic features along with the MFCC proposed feature set can be used as a universal feature
and LPCC features, the overall tonal phoneme vector for both tonal and non-tonal speech recognition
discrimination capability increases considerably compared systems which is a long standing need for global
to baseline MFCC and LPCC features. acceptability of automatic speech recognition system.
In the second set of experiments, we have computed the
intra-tone phoneme discrimination capability of the
proposed feature set. We have computed the F-ratio value ACKNOWLEDGMENT
considering all the phonemes of a particular tone (level,
rising or falling) intra-class. Similarly, KL-distance has been This work is supported by UGC major project grant MRP-
measures only with other vowels of the same tone. The MAJOR-COM-2013-40580.
result is summarized in Table. 5.
REFERENCES
Table. 5. Average F-ratio and KL Distance for the 1. M. Yip, The Tonal Phonology of Chinese, New York: Garland
Publishing, 1991.
features for intra-tone phoneme discrimination capability 2. D. M. Beach, “The Science of Tonetics and Its Application to Bantu
Feature vector F-ratio KL Distance Languages”, in Bantu Studies, 2nd
Baseline MFCC + ∆ + ∆∆ 3.0731 0.4721 Series, Vol. 2, PP. 75-106, 1924.
3. N. H. Woo, Prosody and
Baseline LPCC + ∆ + ∆∆ 3.7763 0.3846 Phonology, Doctoral dissertation,
MIT, 1969.

Published By:
4. C. M. Doke, A Comparative Study in Shona Phonetics, Johannesburg,

University of Witwatersrand Press, 1931.
5. K. Pink, “Tone Languages”, Ann Arbor, University of Michigan Press,
1964.
6. P. Sarmah, “Tone Systems of Dimasa and Rabha: A Phonetic and
Phonological Study”, Doctoral dissertation, University of Florida,
2009.
7. W. Leben, “Suprasegmental Phonology”. Ph.D. dissertation, MIT,
1973.
8. J. Goldsmith, “An overview of autosegmental phonology”, Linguistic
Analysis, 2(1): 23-68, 1976.
9. U. Bhattacharjee, “Recognition of the Tonal Words of Bodo
Language”, In International Journal of Recent Technology and
Engineering, Volume-1, Issue-6, 2013.
10. M.W. Post and T. Kanno, “Apatani Phonology and Lexicon, with a
Special Focus on Tone”, Himalayan Linguistics, Vol. 12(1):17-75,
2013.
11. J. T. Sun, “Tani languages”, In The Sino-Tibetan Languages, edited by
G. Thurgood and R. LaPolla, pp. 456-466, London and New York:
Routledge, 2003.
12. P. T. Abraham, Apatani-English-Hindi Dictionary, Central Institute of
Indian Language, Mysore, India, 1987.
13. K. S. Rao, “Application of prosody models for developing speech
systems in Indian languages”, International Journal of Speech
Technology, 14(1), 19-33, 2011.
14. J. Makhoul, “Linear prediction: A tutorial review”, Proceedings of the
IEEE, 63(4), 561-580, 1975.
15. E. E. Shriberg, “Higher Level Features in Speaker Recognition”, In C.
Muller (Ed.) Speaker Classification I. Volume 4343 of Lecture Notes in
Computer Science / Artificial Intelligence, Springer: Heidelberg /
Berlin / New York, pp. 241-259, 2007.
16. G. S. Raja and S. Dandapat, “Sinusoidal model based speaker
identification”, Proc. NCC-2004, vol. 1, pp. 523–527, 2004.
AUTHORS PROFILE
Utpal Bhattacharjee received his Master Degree from

Dibrugarh University, India and Ph.D. from Gauhati
University, India in the year 1999 and 2008 respectively.
Currently he is working as a Professor in the department of
Computer Science and Engineering of Rajiv Gandhi
University, Arunachal Pradesh, India. His research interest is in the field of
Speech and Natural language Processing and Machine Learning
Jyoti Mannala received her Master Degree from Rajiv

Gandhi University, Arunachal Pradesh, India in the year
2012. Presently she is working as a research scholar in the
department of Computer Science and Engineering of the
University. Her research area is natural language
processing.


An Experimental Analysis of Speech Features For Tone Speech Recognition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Experimental Analysis of Speech Features For Tone Speech Recognition

Uploaded by

Copyright:

Available Formats

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-9 Issue-2, December 2019

An Experimental Analysis of Speech Features

Retrieval Number: paper_id//2019©BEIESP

Retrieval Number: paper_id//2019©BEIESP Published By:

Most of the state-of-the-art ASR systems are developed

Retrieval Number: paper_id//2019©BEIESP

Retrieval Number: paper_id//2019©BEIESP Published By:

Retrieval Number: paper_id//2019©BEIESP

4. C. M. Doke, A Comparative Study in Shona Phonetics, Johannesburg,

Utpal Bhattacharjee received his Master Degree from

Jyoti Mannala received her Master Degree from Rajiv

Retrieval Number: paper_id//2019©BEIESP Published By:

You might also like