Voice Recognition

A
PROJECT
ON
SPEECH RECOGNITION
MADE BY:- GUIDED BY:-

 SAKSHI PIMPLAPURE  JITENDRA SIR
 RITUPARNA DAS
 TARNI VERMA
INTRODUCTION
 VOICE signal identification consist of the process to convert a speech
waveform into features that are useful for further processing. There
are many algorithms and techniques are use. It depends on features
capability to capture time frequency and energy into set of coefficients
for cepstrum analysis. Generally, human voice conveys much information
such as gender, emotion and identity of the speaker. The objective of
voice recognition is to determine which speaker is present based on the
individual’s utterance .Several techniques have been proposed for
reducing the mismatch between the testing and training environments.
Many of these methods operate either in spectral , or in cepstral
domain . firstly, human voice is converted into digital signal form to
produce digital data representing each level of signal at every discrete
time step. The digitized speech samples are then processed using MFCC
to produce voice features. After that, the coefficient of voice features
can go trough DTW to select the pattern that matches the database
and input frame in order to minimize the resulting error between them.
REVIEW
Voice Recognition Algorithms using Mel Frequency
Cepstral Coefficient (MFCC) and Dynamic Time
Warping (DTW) Techniques Lindasalwa Muda,
Mumtaj Begam and I. Elamvazuthi:-
By reading this paper we got the idea how to implement the
procedure of MFCC in a sequence and correct way.
METHADOLOGY
The voice algorithms consist of two distinguished
phases. The first one is training sessions, whilst, the
second one is referred to as operation session or
testing phase
Feature extraction
It is the process that extracts a small amount of data from the
voice signal that can later be used to represent each speaker.
Feature matching
It involves the actual procedure to identify the unknown
speaker by comparing extracted features from his/her voice
input with the ones from a set of known speakers.
Speech Feature Extraction:

The purpose of this module is to convert the speech
waveform, using digital signal processing (DSP) tools, to a set
of features (at a considerably lower information rate) for
further analysis. This is often referred as the signal-processing
front end .
MEL-FREQUENCY CEPSTRUM
COEFFICIENTS PROCESSOR
DESCRIPTION OF MFCC:
Frame Blocking
 :
In this step the continuous speech signal is blocked into

frames of N samples, with adjacent frames being separated by
M ( M < N ). The first frame consists of the first N samples.
Windowing:

The next step in the processing is to window each individual
frame so as to minimize the signal discontinuities at the
beginning and end of each frame. The concept here is to
minimize the spectral distortion by using the window to taper
the signal to zero at the beginning and end of each frame.
Fast Fourier Transform
The next processing step is the Fast Fourier Transform,
which converts each frame of N samples from the time
domain into the frequency domain. The FFT is a fast
algorithm to implement the Discrete Fourier
Transform (DFT), which is defined on the set of N
samples { x n }
Mel Frequency Wrapping
As mentioned above, psychophysical studies have
shown that human perception of the frequency
contents of sounds for speech signals does not follow a
linear scale. Thus for each tone with an actual
frequency, f , measured in Hz, a subjective pitch is
measured on a scale called the ‘Mel’ scale
. The Mel-frequency scale is a linear frequency spacing
below 1000 Hz and a logarithmic spacing above 1000
Hz.
Cepstrum
In this final step, we convert the log Mel spectrum back
to time. The result is called the Mel frequency
cepstrum coefficients (MFCC). The cepstral
representation of the speech spectrum provides a good
representation of the local spectral properties of the
signal for the given frame analysis.
SPEECH VERIFICATION BLOCK DIAGRAM
Speaker Verification is also called as Feature Matching
or Pattern Matching. Vector Quantization Method
(VQ) is used for high accuracy and ease of
implementation.
Vector Quantization:
VQ is a process of mapping vectors from a large
vector space to a finite number of regions in that
space. Each region is called a cluster and can be
represented by its center called a codeword . The
collection of all codeword's is called a codebook .
Clustering the training Vectors:
After the enrolment session, the acoustic vectors
extracted from input speech of each speaker provide
a set of training vectors for that speaker. As
described above, the next important step is to build
a speaker-specific VQ codebook for each speaker
using those training vectors. There is a well-know
algorithm, namely LBG algorithm [Linde, Buzo and
Gray, 1980], for clustering a set of L training vectors
into a set of M codebook vectors.
APPLICATIONS
Banking by Telephone
Database Access Service
Voice Dialling
 Telephone Shopping
Information Services
Voice Mail
Security Control for Secret information Areas
Remote Access to Computer
Thank You

Voice Recognition

Uploaded by

Copyright:

Available Formats

Voice Recognition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Voice Recognition

Uploaded by

Copyright:

Available Formats

A

MADE BY:- GUIDED BY:-

Speech Feature Extraction:

In this step the continuous speech signal is blocked into

You might also like