Digital Signal Processing: The Final

This document describes a speaker recognition system based on Mel Frequency Cepstral Coefficients (MFCC) that is able to identify speakers with good accuracy. The system extracts MFCC features from speech samples during an enrollment phase to create codebooks for each known speaker. During testing, MFCC features are extracted from unknown speech and compared to the codebooks using vector quantization to identify the speaker with the closest matching codebook. The system was able to correctly identify 11 speakers from separate enrollment and test data, demonstrating its potential for security applications like access control through speaker verification.

Uploaded by

Akash Saraogi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Digital Signal Processing: The Final

Uploaded by

Akash Saraogi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ECE 2006

DIGITAL SIGNAL
PROCESSING

The Final
REPORT

BY :
 
• AAKASH SARAWGI 17BEC0918
SLOT:- G1+TG1
Topic :
Speaker Recognition
System Based on MFCC
Table of Content:
Sr.
Topics
No.
1 Abstract
2 Introduction
3 Data processing or Speech Feature Extraction
4 Feature Matching
5 RESULT AND CONCLUSION
6 Acknowledgement
7 References
Abstract :
• Speaker recognition is the process of automatically recognizing who
is speaking on the basis of individual information included in speech
waves. This technique makes it possible to use the speaker's voice to
verify their identity and control access to services such as voice
dialing, banking by telephone, telephone shopping, database access
services, information services, voice mail, security control for
confidential information areas, and remote access to computers. This
document describes how to build a simple, yet complete and
representative automatic speaker recognition system. Such a speaker
recognition system has potential in many security applications. For
example, users have to speak a PIN (Personal Identification Number)
in order to gain access to the laboratory door, or users have to speak
their credit card number over the telephone line to verify their
identity. By checking the voice characteristics of the input utterance,
using an automatic speaker recognition system similar to the one that
we will describe, the system is able to add an extra level of security
Introduction :
• Speaker recognition can be classified into identification and verification.
Speaker identification is the process of determining which registered speaker
provides a given utterance. Speaker verification, on the other hand, is the
process of accepting or rejecting the identity claim of a speaker. Figure 1
shows the basic structures of speaker identification and verification systems.
The system that we will describe is classified as text-independent speaker
identification system since its task is to identify the person who speaks
regardless of what is saying. At the highest level, all speaker recognition
systems contain two main modules (refer to Figure 1): feature extraction and
feature matching. Feature extraction is the process that extracts a small amount
of data from the voice signal that can later be used to represent each speaker.
Feature matching involves the actual procedure to identify the unknown
speaker by comparing extracted features from his/her voice input with the ones
from a set of known speakers.
All speaker recognition systems have to serve two distinguished phases. The
first one is referred to the enrolment or training phase, while the second one is
referred to as the operational or testing phase. In the training phase, each
registered speaker has to provide samples of their speech so that the system can
build or train a reference model for that speaker. In case of speaker verification
systems, in addition, a speaker-specific threshold is also computed from the
training samples. In the testing phase, the input speech I matched with stored
reference model(s) and a recognition decision is made.
Speaker recognition is a difficult task. Automatic speaker recognition works
base on the premise that a person’s speech exhibits characteristics that are
unique to the speaker. However this task has been challenged by the highly
variant of input speech signals. The principle source of variance is the speaker
himself/herself. Speech signals in training and testing sessions can be greatly
different due to many facts such as people voice change with time, health
conditions (e.g. the speaker has a cold), speaking rates, and so on. There are
also other factors, beyond speaker variability, that present a challenge to
speaker recognition technology. Examples of these are acoustical noise and
variations in recording environments (e.g. speaker uses different telephone
handsets).
Data processing or Speech Feature
Extraction
1. Introduction
The purpose of this module is to convert the speech waveform, using digital
signal processing (DSP) tools, to a set of features (at a considerably lower
information rate) for further analysis. This is often referred as the signal-
processing front end. The speech signal is a slowly timed varying signal (it is
called quasi-stationary). An example of speech signal is shown in Figure 2.
When examined over a sufficiently short period of time (between 5 and 100
msec), its characteristics are fairly stationary. However, over long periods of
time (on the order of 1/5 seconds or more) the signal characteristic change to
reflect the different speech sounds being spoken. Therefore, short-time spectral
analysis is the most common way to characterize the speech signal

A wide range of possibilities exist for parametrically representing the speech

signal for the speaker recognition task, such as Linear Prediction Coding
(LPC), Mel-Frequency Cepstrum Coefficients (MFCC), and others. MFCC is
perhaps the best known and most popular, and will be described in this paper.
MFCC’s are based on the known variation of the human ear’s critical
bandwidths with frequency, filters spaced linearly at low frequencies and
logarithmically at high frequencies have been used to capture the phonetically
important characteristics of speech. This is expressed in the mel-frequency
scale, which is a linear frequency spacing below 1000 Hz and a logarithmic
spacing above 1000 Hz.

2. Mel-frequency cepstrum coefficients processor

A block diagram of the structure of an MFCC processor is given in Figure 3.
The speech input is typically recorded at a sampling rate above 10000 Hz. This
sampling frequency was chosen to minimize the effects of aliasing in the
analog-to-digital conversion. These sampled signals can capture all frequencies
up to 5 kHz, which cover most energy of sounds that are generated by humans.
As been discussed previously, the main purpose of the MFCC processor is to
mimic the behaviour of the human ears. In addition, rather than the speech
waveforms themselves, MFFC’s are shown to be less susceptible to mentioned
variations.
Feature Matching
The problem of speaker recognition belongs to a much broader topic in scientific
and engineering so called pattern recognition. The goal of pattern recognition is to
classify objects of interest into one of a number of categories or classes. The objects
of interest are generically called patterns and in our case are sequences of acoustic
vectors that are extracted from an input speech using the techniques described in the
previous section. The classes here refer to individual speakers. Since the
classification procedure in our case is applied on extracted features, it can be also
referred to as feature matching. Furthermore, if there exists some set of patterns that
the individual classes of which are already known, then one has a problem in
supervised pattern recognition. These patterns comprise the training set and are used
to derive a classification algorithm. The remaining patterns are then used to test the
classification algorithm; these patterns are collectively referred to as the test set. If
the correct classes of the individual patterns in the test set are also known, then one
can evaluate the performance of the algorithm. The state-of-the-art in feature
matching techniques used in speaker recognition include Dynamic Time Warping
(DTW), Hidden Markov Modeling (HMM), and Vector Quantization (VQ). In this
project, the VQ approach will be used, due to ease of implementation and high
accuracy. VQ is a process of mapping vectors from a large vector space to a finite
number of regions in that space. Each region is called a cluster and can be
represented by its center called a codeword. The collection of all codewords is called
a codebook.

Figure 5 shows a conceptual diagram to illustrate this recognition process. In the

figure, only two speakers and two dimensions of the acoustic space are shown. The
circles refer to the acoustic vectors from the speaker 1 while the triangles are from the
speaker 2. In the training phase, using the clustering algorithm ,a speaker-specific VQ
codebook is generated for each known speaker by clustering his/her training acoustic
vectors. The result codewords (centroids) are shown in Figure 5 by black circles and
black triangles for speaker 1 and 2, respectively. The distance from a vector to the
closest codeword of a codebook is called a VQ-distortion. In the recognition phase,
an input utterance of an unknown voice is “vector-quantized” using each trained
codebook and the total VQ distortion is computed. The speaker corresponding to the
VQ codebook with smallest total distortion is identified as the speaker of the input
utterance.
RESULT AND CONCLUSION
• We have data of 11 speakers ,they all are speaking same word “ one” ,we
proceed our this data through above method and get the code book for each
and every speaker,which we will use as a reference for the matching.After
saving this code book. We took another speech data of same speakers and run
them in MAT-LAB in our test function to test whether our code and process is
able to identify it or not, and finally our system was able to detect and identify
each and every speaker with good accuracy. This method of feature extraction
is really very accurate and use full for various functions in Security purpose
,PIN number and various purposes as stated above.So we can create data base
from various users and that data can be used in identification purposes which
increases security in very good way.Hence this mfcc method should be
implemented in various regions for identification and this is the best method
for recognition than HMM model
Acknowledgement :
The satisfaction that accompanies with the successful completion of any
task would be incomplete without the mention of people whose ceaseless
cooperation made it possible, whose constant guidance and encouragement
crown all efforts with success.

I am grateful to my DIGITAL SIGNAL PROCESSING faculty, Prof.

ABHIJIT BHOWMIK , for his constant support, guidance, inspiration and
constructive suggestions that helped me the preparation of this report.
References :
1. Preliminary design of an ASR- maryland university,eastern shore.
2. Speech coding and recognition- university of Copenhagen.
3. Human computer interface for Kindarwanda language.
4. Hearing aids system for impaired peoples.
5. Algorithms for speech recognition and simulation in MAT-LAB University of
Gavle
6. Control of device through voice recognition using MAT-LAB
7. www.mathwork.com
8. www.cryptography.com

Dial Plan and Call Routing Demystified On Cisco Collaboration Technologies: Cisco Unified Communication Manager
From Everand
Dial Plan and Call Routing Demystified On Cisco Collaboration Technologies: Cisco Unified Communication Manager
Redouane MEDDANE
No ratings yet
JUPITER-6000k-H1 - 09021982 - Electrical Drawing - Rev.D
No ratings yet
JUPITER-6000k-H1 - 09021982 - Electrical Drawing - Rev.D
130 pages
DAFX: Digital Audio Effects
From Everand
DAFX: Digital Audio Effects
Udo Zölzer
3.5/5 (2)
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
An Automatic Speaker Recognition System
No ratings yet
An Automatic Speaker Recognition System
11 pages
EEL6586 Final Project:: A Speaker Identification and Verification System
No ratings yet
EEL6586 Final Project:: A Speaker Identification and Verification System
16 pages
Speaker Recognition
100% (1)
Speaker Recognition
15 pages
Advanced Signal Processing Using Matlab
No ratings yet
Advanced Signal Processing Using Matlab
20 pages
Speaker Recognition Using MFCC and VQ
No ratings yet
Speaker Recognition Using MFCC and VQ
2 pages
Automatic+Speaker+Recognition+System - EEE
No ratings yet
Automatic+Speaker+Recognition+System - EEE
11 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
hedha houa
No ratings yet
hedha houa
5 pages
Speaker Recognition Using Matlab
No ratings yet
Speaker Recognition Using Matlab
14 pages
MajorInterim Report1
No ratings yet
MajorInterim Report1
10 pages
Ijves Y14 05338
No ratings yet
Ijves Y14 05338
5 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Ma Kale
No ratings yet
Ma Kale
3 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
Speaker Recognition System Based On VQ in MATLAB Environment
No ratings yet
Speaker Recognition System Based On VQ in MATLAB Environment
8 pages
Algorithm For The Identification and Verification Phase
No ratings yet
Algorithm For The Identification and Verification Phase
9 pages
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
No ratings yet
Voice Command Recognition System Based On MFCC and DTW: Anjali Bala
8 pages
Performance Comparison of Robust Speech PDF
No ratings yet
Performance Comparison of Robust Speech PDF
6 pages
Speaker Recognition System Using MFCC and Vector Quantization
No ratings yet
Speaker Recognition System Using MFCC and Vector Quantization
7 pages
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
No ratings yet
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
4 pages
Puede-ser-Speaker Identification Based On Hybrid Feature
No ratings yet
Puede-ser-Speaker Identification Based On Hybrid Feature
6 pages
Speaker Recognition
No ratings yet
Speaker Recognition
29 pages
Voice Recognition
100% (1)
Voice Recognition
18 pages
Speaker Recognition System: A Project Report On
No ratings yet
Speaker Recognition System: A Project Report On
48 pages
MFCC and Vector Quantization For Arabic Fricatives2012
No ratings yet
MFCC and Vector Quantization For Arabic Fricatives2012
6 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Monalisha_barik_paper
No ratings yet
Monalisha_barik_paper
5 pages
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
No ratings yet
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
16 pages
Speaker Verification For Remote Authentication
100% (2)
Speaker Verification For Remote Authentication
31 pages
DC Motor Control
No ratings yet
DC Motor Control
2 pages
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
No ratings yet
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
8 pages
Mohini Dey - Capstone
No ratings yet
Mohini Dey - Capstone
52 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
45 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Speaker Recognition Using MATLAB
95% (64)
Speaker Recognition Using MATLAB
75 pages
M FCC Review
No ratings yet
M FCC Review
10 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Group Members:: Coefficients (MFCC), and Others. MFCC Is Perhaps The Best Known and Most Popular
No ratings yet
Group Members:: Coefficients (MFCC), and Others. MFCC Is Perhaps The Best Known and Most Popular
1 page
Speaker Recognition System - v1
No ratings yet
Speaker Recognition System - v1
7 pages
Speaker Identification Using Mel Frequency Cepstral Coefficients
No ratings yet
Speaker Identification Using Mel Frequency Cepstral Coefficients
5 pages
A Review On Feature Extraction and Noise Reduction Technique
No ratings yet
A Review On Feature Extraction and Noise Reduction Technique
5 pages
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
No ratings yet
Speaker Recognition Using Vector Quantization and Gaussian Mixture Models
6 pages
Sita#1part2 Merged
No ratings yet
Sita#1part2 Merged
61 pages
2_CNN based speaker recognition in language and text independent small scale system
No ratings yet
2_CNN based speaker recognition in language and text independent small scale system
4 pages
Speaker Recognition: Fundamentals and Applications
From Everand
Speaker Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Perceptual Computing: Fundamentals and Applications
From Everand
Perceptual Computing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Principles of Communications Networks and Systems
From Everand
Principles of Communications Networks and Systems
Nevio Benvenuto
No ratings yet
Applied Digital Signal Processing and Applications
From Everand
Applied Digital Signal Processing and Applications
Othman Omran Khalifa
No ratings yet
13-MCQs On Abstract Class and Interfaces
No ratings yet
13-MCQs On Abstract Class and Interfaces
5 pages
Application of Microcontrollers in IoT and Automation
No ratings yet
Application of Microcontrollers in IoT and Automation
3 pages
Computer Numerical Control (CNC) : Developed AT
No ratings yet
Computer Numerical Control (CNC) : Developed AT
52 pages
Smart Parking System
100% (2)
Smart Parking System
9 pages
001 Bizgram Daily DIY Pricelist
No ratings yet
001 Bizgram Daily DIY Pricelist
4 pages
How To Identify Integrated Circuit (Chip) Manufacturers by Their logos/U-Z
No ratings yet
How To Identify Integrated Circuit (Chip) Manufacturers by Their logos/U-Z
1 page
Installation Instructions Model CDC-4: Conventional Detector Card
No ratings yet
Installation Instructions Model CDC-4: Conventional Detector Card
12 pages
Light
No ratings yet
Light
42 pages
Lopifit Manual
100% (1)
Lopifit Manual
18 pages
BESS Delta Proposal - ESS Project - 20200901
No ratings yet
BESS Delta Proposal - ESS Project - 20200901
9 pages
Cohesive Energy
100% (2)
Cohesive Energy
7 pages
Tut 1
No ratings yet
Tut 1
3 pages
Infinity R10&R12 Owners Manual NA
No ratings yet
Infinity R10&R12 Owners Manual NA
4 pages
MVD1000 Series Catalogue PDF
No ratings yet
MVD1000 Series Catalogue PDF
20 pages
Kakatiya University, Warangal 2020 PDF
No ratings yet
Kakatiya University, Warangal 2020 PDF
2 pages
Series Coded Magnetic Safety Sensors: Selection Diagram
No ratings yet
Series Coded Magnetic Safety Sensors: Selection Diagram
6 pages
Ecu Diagram Forester 2002
No ratings yet
Ecu Diagram Forester 2002
7 pages
Digital Circuits Questions and Answers
No ratings yet
Digital Circuits Questions and Answers
168 pages
1SFA896214R7000 psrc85 600 70
No ratings yet
1SFA896214R7000 psrc85 600 70
4 pages
Hobart Om-2175 User & Service Manual
100% (1)
Hobart Om-2175 User & Service Manual
321 pages
Processor Untuk HP Pavilion g3260d Desktop PC
0% (1)
Processor Untuk HP Pavilion g3260d Desktop PC
11 pages
Physics GRE Solutions
100% (2)
Physics GRE Solutions
338 pages
Enhancement of Dielectric Constant in Polymer-Ceramic
No ratings yet
Enhancement of Dielectric Constant in Polymer-Ceramic
9 pages
Spellman
100% (1)
Spellman
262 pages
CB245C Eng PDF
No ratings yet
CB245C Eng PDF
4 pages
Eekels' Minis
No ratings yet
Eekels' Minis
18 pages
Software Requirements Specification: Exam Earn
No ratings yet
Software Requirements Specification: Exam Earn
22 pages
Continuously Evolving TAWERS!
No ratings yet
Continuously Evolving TAWERS!
13 pages
Lab 2: Developing and Debugging C Programs in Codewarrior For The Hcs12 Microcontroller
No ratings yet
Lab 2: Developing and Debugging C Programs in Codewarrior For The Hcs12 Microcontroller
12 pages
Brochure Bushing 2016 English 1 PDF
No ratings yet
Brochure Bushing 2016 English 1 PDF
2 pages
Library Management System
50% (2)
Library Management System
33 pages
Electronic Slot Car Controller Mysteries Revealed - JayGee Racing
No ratings yet
Electronic Slot Car Controller Mysteries Revealed - JayGee Racing
13 pages
2 Razred TEST Skolska Razina 2020
No ratings yet
2 Razred TEST Skolska Razina 2020
8 pages