Methodology For Gender Identification, Classification and Recognition of Human Age
Methodology For Gender Identification, Classification and Recognition of Human Age
Methodology For Gender Identification, Classification and Recognition of Human Age
5
International Journal of Computer Applications (0975 – 8887)
National Conference on Advances in Computing (NCAC 2015)
approaches studied, the first is the age group (senior, adult, In [10] present speaker characteristicrecognition and
and young)classification and the second is an accurate age identification field has made extensive use of speaker MAP
estimation using regression technique. Thesetwo approaches adaptation techniques.The adaptation allows speaker model
use the GMM super vectors as features for a classifier model. feature parameters to be estimated using lessspeech data than
Age groupclassification assigns an age group to the speaker needed for Maximum Likelihood (ML) training method. The
and age regression estimates the speaker’sprecise age in years. Maximum LikelihoodLinear Regression (MLLR) and
In paper [5] presents a gender detection is an extremely useful Maximum a Posteriori (MAP) techniques have typicallybeen
task for an extensive varietyof voice or speech based used for speaker model adaptation. Recently, these adaptation
applications. In the spoken language systems INESC ID, the techniques have been incorporatedinto the feature extraction
gender identificationcomponent is initialand the basic stage of the SVM classifier based speaker identification and
component of our voice processing system, where it is recognition systems.
utilizedprior to speaker clustering, in order to avoid mixing In [15] humans, emotional speech recognition contributes
speakers between male and female gender in thesame cluster. much to create harmonious humanto machine interaction,
Gender information (male or female) is also used to create additionally with many potential applications. Three
gender dependentacoustic module for speech recognition. approachesto augment parallel classifier are compared for
In [6] introduce a new gender detection and an age estimation recognizing emotions from a speech by thespeech database.
approach is proposed. Todevelop this method, after deciding Classifier applied on prosody, spectral, MFCC and other
an acoustic features model for all speakers of the sample common features.One is standard classification schemes (one
database, Gaussianmixture weights are extricated and versus one) and two methods are Directed AcyclicGraph
connected to build a supervector for each speaker. (DAG) and Unbalanced Decision Tree (UDT) that can form a
Then,hybrid architecture ofGeneral Regression Neural binary decision tree classifier.The hierarchical classification
Network (GRNN) and Weighted Supervised Non Negative technique of feature driven hierarchical SVMs classifiersis
Matrix Factorization(WSNMF) are developed using the designed, it uses different feature parameters to drive each
created supervectors ofthe training data set. The hybrid layer and the emotion can be subdividedlayer by layer.
method is used to detect the gender speaker while testing Finally, analysis of the classification rate of those three
andto estimate their age. Different biometric features can be extends binaryclassification, DAG system performs the best
used for forensic identification.Choosing a method depends for testing database and standard classifier is notfar behind,
on its use and efficientreliability of a particular application the UDT is the poorest because of relying on upper layer
and the available data type.In some crime cases, the available order processing.
evidenceor proof might be in the form of recorded In [16] The extraction and matching process is implemented
voice.Speech patterns can include unique and important after the signal preprocessingis performed. The non
information for law enforcement personnel. parametric method for modeling the human voice processing
In [7] mainly focused on enhancing emotion recognition and system. The nonlinear sequence alignment called as Dynamic
identification performancebased on a two stages that is Time Warping (DTW) used asfeatures matching techniques.
combination of gender recognizer andemotion recognizer. The This paper presents the technique of MFCC feature
system work is a gender dependent, text independent extractionand wrapping technique to compare the test
andspeaker independent emotion recognizer. Both Hidden patterns.
Markov Model (HMM) and Supra segmentalHidden Markov 3. PROBLEM STATEMENT
Model (SPHMM) have used as classifiers in the two stage This helps to identify the gender, then classify the speaker
architecture.This architecture has been evaluated on two agegroup belong to a certain category, then further system
different and separate speechdatabases. The two databases are will process to recognize exact age andalso system display
emotional prosody speech andtranscripts database and human emotional state of the speaker with his/her profile detail which
voice collected database. is stored inthe database. The objective of the system is to
In [8] explores the detection of specific type emotions using extract the feature and compare with databaseto identify the
discourse informationand language in combination with gender and also it classify the certain speaker age group, this
acoustic signal features of emotion in speech signals. The two task helps toget increase the system performance and
main focus ison a detecting type of emotions using spoken accuracy. On the other side, perform feature selectionfor
language data obtained froma call center application. Most speaker classification and matching using popular
previous work in type emotion recognition has used only classification techniques, so efficientclassifier classifying
theacoustic features information contained in the speech. The speaker characteristics.This system applies techniques like
system contains three sources ofinformation, lexical, acoustic MFCC feature extraction algorithm, GMM modeling
and discourse is used for speaker’s emotion recognition. technique, SVM classifier and matching technique. The main
In [9] develop models for detecting various characteristics of issue in voice or speech processingresearch to achieve high
a speaker based on spoken thetext alone. These characteristics efficiency and performance of different age group and
or attributes include whether the speaker is speaking differentlanguage dependent speaker and to reduce the large
nativelanguage, the speakers age and gender, the regional size of the dimension of feature matrixusing many techniques.
information reported by the speakers. Theresearch explores
various lexical features information as well as features 4. SYSTEM DESIGN
inspired by linguistic(a languagerelated) informationand a 4.1 System Architecture
number of word and dictionary of affect in language. This System Architecture divided into two phases that is
system suggeststhat when audio or voice data is not available, A. Training phase
by exploring effective audio feature sets onlyfrom uttered text B. Testing phase
and system combinations of multiple classification
algorithms, researcherbuild statistical models to detect these Most of the operations are same in the training phase and
attributes of speakers, equivalent to frameworks that testing phase in figure 1.
canexplore the audio information. Training Phase
6
International Journal of Computer Applications (0975 – 8887)
National Conference on Advances in Computing (NCAC 2015)
The training phase used large audio dataset for training the a specific speaker. The adaptation is done using theMFCC
system using the MFFC featureextraction technique applied features extracted by the speaker session.
for extracting the unique feature of audio/voice file and create
thefeature vector. GMM super vector representation and 3. Build the supervector with the help of GMM model is
dimension reduction for each featuretype, etc. Training phase represented by one supervector,formed by concatenating all
applied to the large set sample data set for training purpose. the M component gaussian means.
1. Train the system is an over MFCC features, extracted from V = (u1, u2,…….ui)T
speech utterances of speechsessions. The speech sessions used
to train the system background model should be Where ui is the mean vector of the ith gaussian. The training
diversifiedand uniformly distributed over speaker ages and super vectors are formedusing the MAP adaptation models. In
genders. the baseline system, the super vectors are usedas feature
vectors [1], [3], [7].
2. An adaptation of the speaker model is constructed, MAP
estimation is used to adapt themodel to represent the model of
7
International Journal of Computer Applications (0975 – 8887)
National Conference on Advances in Computing (NCAC 2015)
The system gives the different and dependent age and gender
Fig2: Support Vector Machine identification and recognitionresult for testing audio files. The
result accuracy and performance based on the quality
𝑛
ofvarious kinds of train data, shown in the table 2.
𝑓 𝑥 = 𝑎𝑖 𝑡𝑖 𝐾 𝑋, 𝑋𝑖 + 𝑑
𝑘=1
Where the ti are the ideal outputs, i greater than 0. The vectors
Xi is support vectors andformedusing the training set by an
optimization process. The classifieroutputs are either 1 or -
1,depending upon whether the corresponding input data
support vector is in class 0 or class 1, is respectively shown in
the figure 2. A class decision is based upon whether the value
f(x),is above or below a threshold [13], [17] [21].
8
International Journal of Computer Applications (0975 – 8887)
National Conference on Advances in Computing (NCAC 2015)
The table 2 and figure 3 shows the audio file dataset is created [7] Chul Min Lee and Shrikanth S. Narayanan, 2005 Toward
for each age group and for each type of emotion, someaudio Detecting Emotions in Spoken Dialogs, IEEE transaction
files used for system training and remaining sample are given 1063-6676.
for testing purpose. The resultis percentage is calculated for
each group individually, age recognition depends on the [8] Tetsuya Takiguchi and Yasuo Ariki, 2006 Robust feature
genderidentification. The system gives the high average extraction using kernel PCA,Department of Computer
accuracy and performance result is 80.3 %for gender and System Engg Kobe University, Japan, ICASSP 1-
identification and 89.3 % for age recognition. 4244-0469.
[9] Michael Feld, Felix Burkhardt and Christian Muller,
6. CONCLUSION 2010 Automatic Speaker Age and Gender Recognition in
Thus the proposed system help to identify, classify and the Car for Tailoring Dialog and Mobile
recognize exact speaker age withemotion and displaying Services,German Research Center for Artificial
profiles of speaker using the trained database. The speaker Intelligence, INTERSPEECH.
profile is helpfulin many applications like for advertisement,
targeting to particular people, automatically identificationof [10] M A. Hossan, Sheeraz Memon and Mark A Gregory, A
this feature, age, emotion to provide facility and service to Novel Approach for MFCC Feature extraction, RMIT
customer in a call center,in some field speaker’s voice can be university, Melbourne, Australia, IEEE, 2010.
used as the biometric security because each human has a [11] Ruben Solera-Ure, 2008 Real-time Robust Automatic
unique voice pattern and unique feature. The the result in the Speech Recognition Using Compact Support Vector
feasible way to increase the accuracy and efficiency of Machines,TEC 2008-06382 and TEC 2008-02473.
systemoutput.
[12] Wei HAN and Cheong fat CHAN, 2006 An Efficient
The future enhancement of the system can be extended to MFCC Extraction Method in Speech
recognize for more complicatednoise sample (.wav file). The Recognition,Department of Electronic Engineering, The
health condition of the speaker can also identify separate Chinese University of Hong Kong Hong Kong, 7803-
theindividual speaker classification and age also possible to 9390-06/IEEE ISCAS.
detect for mix mode gender speaker.
[13] AU Khan and L. P. Bhaiya, 2008 Text Dependent
7. ACKNOWLEDGMENTS Method for Person Identification through Voice
The authors would like to thank the researchers as well as Segment,ISSN- 2277-1956 IJECSE.
publishers for making their resources available and teachers
for their guidance. We are thankful to the Ramesh M kagalkar [14] Felix Burkhardt, Martin Eckert, Wiebke Johannsen and
for his valuable guidance and constant guidelines also thank Joachim Stegmann, 2010A Database of Age and Gender
full the computer department staff of DYPSOET, lohegoan, Annotated Telephone Speech, Deutsche Telekom AG
Pune and support. Finally, we would like to extend a heartfelt Laboratories, Ernst-Reuter-Platz 7, 10587 Berlin,
gratitude to friends and family members Germany.
[15] Lingli Yu and Kaijun Zhou, March 2014, A
8. REFERENCES Comparative Study on Support Vector Machines
[1] Gil Dobry, Ron M. Hecht, Mireille Avigal and Yaniv Z, classifiers for Emotional Speech Recognition, Immune
SEPTEMBER, 2011. Supervector Dimension Reduction Computation (IC) Volume2, Number:1.
for Efficient Speaker Age Estimation Based on the
Acoustic Speech Signal,IEEE transaction V.19, NO. 7. [16] Rui Martins, Isabel Trancoso, Alberto Abad and Hugo
Meinedo, 2009, Detection of Childrens Voices, Intituto
[2] Hugo Meinedo1 and Isabel Trancoso, 2008Age and Superior Tecnico, Lisboa, Portugal INESC-ID Lisboa,
Gender Classification using Fusion of Acoustic and Portugal.
Prosodic Features,Spoken Language Systems Lab,
INESC-ID Lisboa, Portugal, Instituto Superior Tecnico, [17] Chao Gao, Guruprasad Saikumar, Amit Srivastava and
Lisboa, Portugal. Premkumar Natarajan, 2011, Open set Speaker
Identification in Broadcast News, IEEE 978-1-4577-
[3] Ismail Mohd Adnan Shahin, 2013Gender-dependent 0539.
emotion recognition based on HMMs and SPHMMs,Int J
Speech Technol, Springer 16:133141. [18] Shivaji J Chaudhari and RameshMKagalkar, 2014, A
Review of Automatic Speaker Age Classification,
[4] Mohamad Hasan Bahari and Hugo Van h, ITN2008 Recognition and Identifying Speaker Emotion Using
Speaker Age Estimation and Gender Detection Based on Voice Signal, International Journal of Science and
Supervised NonNegative Matrix Factorization, Centre Research (IJSR 2014), ISSN(Online): 2319-
for Processing Speech and Images Belgium. 7064,Volume 3.
[5] Shivaji J Chaudhari and Ramesh M Kagalkar, May 2015 [19] M Ferras, C CLeung, C Barras and Jean Luc Gauvain,
Automatic Speaker Age Estimation and Gender 2010, Comparison of Speaker Adaptation Methods as
Dependent Emotion Recognition, International Journal of Feature Extraction for SVM-Based Speaker
Computer Applications(IJCA) (0975 - 8887),Volume Recognition,IEEE Transaction 1558-7916.
117 No. 17.
[20] Chao Gao, Guruprasad Saikumar, Amit Srivastava and
[6] Shivaji J. Chaudhari and Ramesh M. Kagalkar, July 2015 Premkumar Natarajan, 2011, Open-SetSpeaker
A Methodology for Efficient Gender Dependent Speaker Identification in Broadcast News, IEEE 978-1-4577-
Age and Emotion Identification System,International 0539.
Journal of Advanced Research in Computer and
Communication Engineering(IJARCCE) ISSN 2319- [21] ChaoWang, Ruifei Zhu, Hongguang Jia, QunWei, Huhai
5940,Volume 4, Issue 7. Jiang, Tianyi Zhang and LinyaoYu, 2013, Design of
9
International Journal of Computer Applications (0975 – 8887)
National Conference on Advances in Computing (NCAC 2015)
Speech Recognition System, IEEE 978-1-4673-2764- Scholar in Visveswaraiah Technological University, Belgaum,
0/13. He had obtained M.Tech (CSE) Degree in 2006 from VTU
Belgaum and He received BE (CSE) Degree in 2001 from
[22] Manan Vyas, 2013“Gaussian Mixture Model Based Gulbarga University, Gulbarga. He is the author of text book
Speech Recognition System Using Matlab”,Signal and Advance Computer Architecture which cover the syllabus of
Image Proc An International Journal (SIPIJ) Vol.4, No.4. final year computer science and engineering, Visveswaraiah
Technological University, Belgaum. One of his research
article “A Novel Approach for Privacy Preserving” has been
9. AUTHORS PROFILE consider as text in LAMBERT Academic Publishing,
Shivaji J Chaudhari Research Scholar Dr. D.Y.Patil School Germany (Available in online). He is waiting for submission
of Engineering and Technology, Charoli, B.K.Via Lohegaon, of two research articles for patent right. He has published
Pune, Maharashtra, India. University of Pune. He received more than 25 research papers in International Journals and
B.E. in Information Technology from SVPM COE Malegaon, presented few of there in international conferences. His main
Baramati, and Pune University. Currently He completed M.E. research interest includes Image processing, Gesture
in Computer Network from Dr. D. Y. Patil School of recognition, speech processing, voice to sign language and
Engineering & Technology, Pune, and University of Pune. CBIR. Under his guidance four ME students awarded degree
Prof Ramesh. M. Kagalkar was born on Jun 1st, 1979 in in SPPU, Pune, five students at the edge of completion their
Karnataka, India and presently working as an Assistant. ME final dissertation reports and two students started are
Professor, Department of Computer Engineering, Dr.D.Y.Patil started new research work and they have publish their
School Of Engineering and Technology, Charoli, B.K.Via – research papers on International Journals and International
Lohegaon, Pune, Maharashtra, India. He has 13.5 years of conference.
teaching experience at various institutions. He is a Research
IJCATM : www.ijcaonline.org 10