Digital Signal Processing: The Final
Digital Signal Processing: The Final
DIGITAL SIGNAL
PROCESSING
The Final
REPORT
BY :
• AAKASH SARAWGI 17BEC0918
SLOT:- G1+TG1
Topic :
Speaker Recognition
System Based on MFCC
Table of Content:
Sr.
Topics
No.
1 Abstract
2 Introduction
3 Data processing or Speech Feature Extraction
4 Feature Matching
5 RESULT AND CONCLUSION
6 Acknowledgement
7 References
Abstract :
• Speaker recognition is the process of automatically recognizing who
is speaking on the basis of individual information included in speech
waves. This technique makes it possible to use the speaker's voice to
verify their identity and control access to services such as voice
dialing, banking by telephone, telephone shopping, database access
services, information services, voice mail, security control for
confidential information areas, and remote access to computers. This
document describes how to build a simple, yet complete and
representative automatic speaker recognition system. Such a speaker
recognition system has potential in many security applications. For
example, users have to speak a PIN (Personal Identification Number)
in order to gain access to the laboratory door, or users have to speak
their credit card number over the telephone line to verify their
identity. By checking the voice characteristics of the input utterance,
using an automatic speaker recognition system similar to the one that
we will describe, the system is able to add an extra level of security
Introduction :
• Speaker recognition can be classified into identification and verification.
Speaker identification is the process of determining which registered speaker
provides a given utterance. Speaker verification, on the other hand, is the
process of accepting or rejecting the identity claim of a speaker. Figure 1
shows the basic structures of speaker identification and verification systems.
The system that we will describe is classified as text-independent speaker
identification system since its task is to identify the person who speaks
regardless of what is saying. At the highest level, all speaker recognition
systems contain two main modules (refer to Figure 1): feature extraction and
feature matching. Feature extraction is the process that extracts a small amount
of data from the voice signal that can later be used to represent each speaker.
Feature matching involves the actual procedure to identify the unknown
speaker by comparing extracted features from his/her voice input with the ones
from a set of known speakers.
All speaker recognition systems have to serve two distinguished phases. The
first one is referred to the enrolment or training phase, while the second one is
referred to as the operational or testing phase. In the training phase, each
registered speaker has to provide samples of their speech so that the system can
build or train a reference model for that speaker. In case of speaker verification
systems, in addition, a speaker-specific threshold is also computed from the
training samples. In the testing phase, the input speech I matched with stored
reference model(s) and a recognition decision is made.
Speaker recognition is a difficult task. Automatic speaker recognition works
base on the premise that a person’s speech exhibits characteristics that are
unique to the speaker. However this task has been challenged by the highly
variant of input speech signals. The principle source of variance is the speaker
himself/herself. Speech signals in training and testing sessions can be greatly
different due to many facts such as people voice change with time, health
conditions (e.g. the speaker has a cold), speaking rates, and so on. There are
also other factors, beyond speaker variability, that present a challenge to
speaker recognition technology. Examples of these are acoustical noise and
variations in recording environments (e.g. speaker uses different telephone
handsets).
Data processing or Speech Feature
Extraction
1. Introduction
The purpose of this module is to convert the speech waveform, using digital
signal processing (DSP) tools, to a set of features (at a considerably lower
information rate) for further analysis. This is often referred as the signal-
processing front end. The speech signal is a slowly timed varying signal (it is
called quasi-stationary). An example of speech signal is shown in Figure 2.
When examined over a sufficiently short period of time (between 5 and 100
msec), its characteristics are fairly stationary. However, over long periods of
time (on the order of 1/5 seconds or more) the signal characteristic change to
reflect the different speech sounds being spoken. Therefore, short-time spectral
analysis is the most common way to characterize the speech signal