Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition

Kim, Myung-jae; Yang, Il-ho; Kim, Min-seok; Yu, Ha-jin

doi:10.1631/FITEE.1500380

Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition

Published: 27 May 2017

Volume 18, pages 738–750, (2017)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Myung-jae Kim¹,
Il-ho Yang¹,
Min-seok Kim¹ &
…
Ha-jin Yu¹

83 Accesses
Explore all metrics

Abstract

We propose a method for histogram equalization using supplement sets to improve the performance of speaker recognition when the training and test utterances are very short. The supplement sets are derived using outputs of selection or clustering algorithms from the background speakers’ utterances. The proposed approach is used as a feature normalization method for building histograms when there are insufficient input utterance samples. In addition, the proposed method is used as an i-vector normalization method in an i-vector-based probabilistic linear discriminant analysis (PLDA) system, which is the current state-of-the-art for speaker verification. The ranks of sample values for histogram equalization are estimated in ascending order from both the input utterances and the supplement set. New ranks are obtained by computing the sum of different kinds of ranks. Subsequently, the proposed method determines the cumulative distribution function of the test utterance using the newly defined ranks. The proposed method is compared with conventional feature normalization methods, such as cepstral mean normalization (CMN), cepstral mean and variance normalization (MVN), histogram equalization (HEQ), and the European Telecommunications Standards Institute (ETSI) advanced front-end methods. In addition, performance is compared for a case in which the greedy selection algorithm is used with fuzzy C-means and K-means algorithms. The YOHO and Electronics and Telecommunications Research Institute (ETRI) databases are used in an evaluation in the feature space. The test sets are simulated by the Opus VoIP codec. We also use the 2008 National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) corpus for the i-vector system. The results of the experimental evaluation demonstrate that the average system performance is improved when the proposed method is used, compared to the conventional feature normalization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers’ Utterances for Speaker Identification

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

Article 01 October 2016

An empirical study on analysis window functions for text-independent speaker recognition

Article 19 February 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Atal, B.S., 1974. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am., 55(6):1304–1312. http://dx.doi.org/10.1121/1.1914702
Article Google Scholar
Blanco, Y., Zazo, S., Principe, J.C., 2000. Alternative statistical Gaussianity measure using the cumulative density function. Proc. 2nd Int. Workshop on Independent Component Analysis and Blind Signal Separation, p.537–542.
Google Scholar
Bousquet, P., Matrouf, D., Bonastre, J., 2011. Intersession compensation and scoring methods in the i-vectors space for speaker recognition. INTERSPEECH, p.485–488.
Google Scholar
Bousquet, P., Larcher, A., Matrouf, D., et al., 2012. Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis. Odyssey: the Speaker and Language Recognition Workshop, p.157–164.
Google Scholar
Cannon, R.L., Dave, J.V., Bezdek, J.C., 1986. Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Patt. Anal. Mach. Intell., 8(2):248–255. http://dx.doi.org/10.1109/TPAMI.1986.4767778
Article MATH Google Scholar
Dehak, N., Kenny, P., Dehak, R., et al., 2011. Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process., 19(4):788–798. http://dx.doi.org/10.1109/TASL.2010.2064307
Article Google Scholar
de la Torre, A., Peinado, A.M., Segura, J.C., et al., 2005. Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Audio Speech Lang. Process., 13(3):355–366. http://dx.doi.org/10.1109/TSA.2005.845805
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G., 2012. Pattern Classification. John Wiley & Sons, Tronto.
MATH Google Scholar
ETSI, 2005. Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended Advanced Front-End Feature Extraction Algorithm; Compression Algorithms; Back-End Speech Reconstruction Algorithm, ETSI ES202 212. European Telecommunication Standards Institute, Sophia Antipolis.
Franc, V., 2005. Optimization Algorithms for Kernel Methods. PhD Thesis, Centre for Machine Perception, Czech Technical University, Prague, Czech Republic.
Google Scholar
Garcia-Romero, D., Espy-Wilson, C.Y., 2011. Analysis of i-vector length normalization in speaker recognition systems. INTERSPEECH, p.249–252.
Google Scholar
Gonzalez, R.C., Wintz, P., 1987. Digital Image Processing. Addision-Wesley Publishing Company, Boston.
MATH Google Scholar
Jiang, Y., Lee, K., Tang, Z., et al., 2012. PLDA modeling in i-vector and supervector space for speaker verification. INTERSPEECH, p.1680–1683.
Google Scholar
Jones, E., Oliphant, T., Peterson, P., 2001. Scipy: Open Source Scientific Tools for Python. http://www.scipy.org/
Google Scholar
Kenny, P., 2010. Bayesian speaker verification with heavytailed priors. Odyssey: the Speaker and Language Recognition Workshop.
Kim, M., Yang, I., Yu, H., 2008. Robust speaker identification using greedy kernel PCA. 20th IEEE Int. Conf. on Tools with Artificial Intelligence, p.143–146. http://dx.doi.org/10.1109/ICTAI.2008.105
Google Scholar
Kim, N., 1998. Statistical linear approximation for environment compensation. IEEE Signal Process. Lett., 5(1):8–10. http://dx.doi.org/10.1109/97.654866
Article Google Scholar
Larcher, A., Bonastre, J., Fauve, B., et al., 2013. Alize 3.0—open source toolkit for state-of-the-art speaker recognition. INTERSPEECH, p.2768–2772.
Google Scholar
Moreno, P.J., Raj, B., Stern, R.M., 1996. A vector Taylor series approach for environment-independent speech recognition. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.733–736. http://dx.doi.org/10.1109/ICASSP.1996.543225
Google Scholar
Pelecanos, J., Sridharan, S., 2001. Feature warping for robust speaker verification. Odyssey: the Speaker and Language Recognition Workshop, p.213–218.
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B., 2000. Speaker verification using adapted Gaussian mixture models. Dig. Signal Process., 10(1):19–41. http://dx.doi.org/10.1006/dspr.1999.0361
Article Google Scholar
Segura, J.C., Benítez, C., de la Torre, A., et al., 2004. Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Process. Lett., 11(5):517–520. http://dx.doi.org/10.1109/LSP.2004.826648
Article Google Scholar
Skosan, M., Mashao, D., 2006. Modified segmental histogram equalization for robust speaker verification. Patt. Recog. Lett., 27(5):479–486. http://dx.doi.org/10.1016/j.patrec.2005.09.009
Article Google Scholar
Stolcke, A., Kajarekar, S., Ferrer, L., 2008. Nonparametric feature normalization for SVM-based speaker verification. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.1577–1580.
Google Scholar
Valin, J.M., Vos, K., Terriberry, T., 2012. Definition of the Opus Audio Codec. http://opus-codec.org/
Viikki, O., Laurila, K., 1998. Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun., 25(1):133–147. http://dx.doi.org/10.1016/S0167-6393(98)00033-8
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Seoul, Seoul, 02504, Korea
Myung-jae Kim, Il-ho Yang, Min-seok Kim & Ha-jin Yu

Authors

Myung-jae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Il-ho Yang
View author publications
You can also search for this author in PubMed Google Scholar
Min-seok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ha-jin Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ha-jin Yu.

Additional information

Project supported by the IT R&D Program of MOTIE/KEIT (No. 10041610)

ORCID: Ha-Jin YU, http://orcid.org/0000-0003-3657-0665

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, Mj., Yang, Ih., Kim, Ms. et al. Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition. Frontiers Inf Technol Electronic Eng 18, 738–750 (2017). https://doi.org/10.1631/FITEE.1500380

Download citation

Received: 03 November 2015
Accepted: 18 April 2016
Published: 27 May 2017
Issue Date: May 2017
DOI: https://doi.org/10.1631/FITEE.1500380

Key words

CLC number

TN912.34

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers’ Utterances for Speaker Identification

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

An empirical study on analysis window functions for text-independent speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Subscribe and save

Buy Now

Navigation

Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers’ Utterances for Speaker Identification

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

An empirical study on analysis window functions for text-independent speaker recognition

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now

Search

Navigation