Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

We propose a method for histogram equalization using supplement sets to improve the performance of speaker recognition when the training and test utterances are very short. The supplement sets are derived using outputs of selection or clustering algorithms from the background speakers’ utterances. The proposed approach is used as a feature normalization method for building histograms when there are insufficient input utterance samples. In addition, the proposed method is used as an i-vector normalization method in an i-vector-based probabilistic linear discriminant analysis (PLDA) system, which is the current state-of-the-art for speaker verification. The ranks of sample values for histogram equalization are estimated in ascending order from both the input utterances and the supplement set. New ranks are obtained by computing the sum of different kinds of ranks. Subsequently, the proposed method determines the cumulative distribution function of the test utterance using the newly defined ranks. The proposed method is compared with conventional feature normalization methods, such as cepstral mean normalization (CMN), cepstral mean and variance normalization (MVN), histogram equalization (HEQ), and the European Telecommunications Standards Institute (ETSI) advanced front-end methods. In addition, performance is compared for a case in which the greedy selection algorithm is used with fuzzy C-means and K-means algorithms. The YOHO and Electronics and Telecommunications Research Institute (ETRI) databases are used in an evaluation in the feature space. The test sets are simulated by the Opus VoIP codec. We also use the 2008 National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) corpus for the i-vector system. The results of the experimental evaluation demonstrate that the average system performance is improved when the proposed method is used, compared to the conventional feature normalization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Atal, B.S., 1974. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am., 55(6):1304–1312. http://dx.doi.org/10.1121/1.1914702

    Article  Google Scholar 

  • Blanco, Y., Zazo, S., Principe, J.C., 2000. Alternative statistical Gaussianity measure using the cumulative density function. Proc. 2nd Int. Workshop on Independent Component Analysis and Blind Signal Separation, p.537–542.

    Google Scholar 

  • Bousquet, P., Matrouf, D., Bonastre, J., 2011. Intersession compensation and scoring methods in the i-vectors space for speaker recognition. INTERSPEECH, p.485–488.

    Google Scholar 

  • Bousquet, P., Larcher, A., Matrouf, D., et al., 2012. Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis. Odyssey: the Speaker and Language Recognition Workshop, p.157–164.

    Google Scholar 

  • Cannon, R.L., Dave, J.V., Bezdek, J.C., 1986. Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Patt. Anal. Mach. Intell., 8(2):248–255. http://dx.doi.org/10.1109/TPAMI.1986.4767778

    Article  MATH  Google Scholar 

  • Dehak, N., Kenny, P., Dehak, R., et al., 2011. Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process., 19(4):788–798. http://dx.doi.org/10.1109/TASL.2010.2064307

    Article  Google Scholar 

  • de la Torre, A., Peinado, A.M., Segura, J.C., et al., 2005. Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Audio Speech Lang. Process., 13(3):355–366. http://dx.doi.org/10.1109/TSA.2005.845805

    Article  Google Scholar 

  • Duda, R.O., Hart, P.E., Stork, D.G., 2012. Pattern Classification. John Wiley & Sons, Tronto.

    MATH  Google Scholar 

  • ETSI, 2005. Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended Advanced Front-End Feature Extraction Algorithm; Compression Algorithms; Back-End Speech Reconstruction Algorithm, ETSI ES202 212. European Telecommunication Standards Institute, Sophia Antipolis.

  • Franc, V., 2005. Optimization Algorithms for Kernel Methods. PhD Thesis, Centre for Machine Perception, Czech Technical University, Prague, Czech Republic.

    Google Scholar 

  • Garcia-Romero, D., Espy-Wilson, C.Y., 2011. Analysis of i-vector length normalization in speaker recognition systems. INTERSPEECH, p.249–252.

    Google Scholar 

  • Gonzalez, R.C., Wintz, P., 1987. Digital Image Processing. Addision-Wesley Publishing Company, Boston.

    MATH  Google Scholar 

  • Jiang, Y., Lee, K., Tang, Z., et al., 2012. PLDA modeling in i-vector and supervector space for speaker verification. INTERSPEECH, p.1680–1683.

    Google Scholar 

  • Jones, E., Oliphant, T., Peterson, P., 2001. Scipy: Open Source Scientific Tools for Python. http://www.scipy.org/

    Google Scholar 

  • Kenny, P., 2010. Bayesian speaker verification with heavytailed priors. Odyssey: the Speaker and Language Recognition Workshop.

  • Kim, M., Yang, I., Yu, H., 2008. Robust speaker identification using greedy kernel PCA. 20th IEEE Int. Conf. on Tools with Artificial Intelligence, p.143–146. http://dx.doi.org/10.1109/ICTAI.2008.105

    Google Scholar 

  • Kim, N., 1998. Statistical linear approximation for environment compensation. IEEE Signal Process. Lett., 5(1):8–10. http://dx.doi.org/10.1109/97.654866

    Article  Google Scholar 

  • Larcher, A., Bonastre, J., Fauve, B., et al., 2013. Alize 3.0—open source toolkit for state-of-the-art speaker recognition. INTERSPEECH, p.2768–2772.

    Google Scholar 

  • Moreno, P.J., Raj, B., Stern, R.M., 1996. A vector Taylor series approach for environment-independent speech recognition. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.733–736. http://dx.doi.org/10.1109/ICASSP.1996.543225

    Google Scholar 

  • Pelecanos, J., Sridharan, S., 2001. Feature warping for robust speaker verification. Odyssey: the Speaker and Language Recognition Workshop, p.213–218.

    Google Scholar 

  • Reynolds, D.A., Quatieri, T.F., Dunn, R.B., 2000. Speaker verification using adapted Gaussian mixture models. Dig. Signal Process., 10(1):19–41. http://dx.doi.org/10.1006/dspr.1999.0361

    Article  Google Scholar 

  • Segura, J.C., Benítez, C., de la Torre, A., et al., 2004. Cepstral domain segmental nonlinear feature transformations for robust speech recognition. IEEE Signal Process. Lett., 11(5):517–520. http://dx.doi.org/10.1109/LSP.2004.826648

    Article  Google Scholar 

  • Skosan, M., Mashao, D., 2006. Modified segmental histogram equalization for robust speaker verification. Patt. Recog. Lett., 27(5):479–486. http://dx.doi.org/10.1016/j.patrec.2005.09.009

    Article  Google Scholar 

  • Stolcke, A., Kajarekar, S., Ferrer, L., 2008. Nonparametric feature normalization for SVM-based speaker verification. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.1577–1580.

    Google Scholar 

  • Valin, J.M., Vos, K., Terriberry, T., 2012. Definition of the Opus Audio Codec. http://opus-codec.org/

  • Viikki, O., Laurila, K., 1998. Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun., 25(1):133–147. http://dx.doi.org/10.1016/S0167-6393(98)00033-8

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ha-jin Yu.

Additional information

Project supported by the IT R&D Program of MOTIE/KEIT (No. 10041610)

ORCID: Ha-Jin YU, http://orcid.org/0000-0003-3657-0665

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, Mj., Yang, Ih., Kim, Ms. et al. Histogram equalization using a reduced feature set of background speakers’ utterances for speaker recognition. Frontiers Inf Technol Electronic Eng 18, 738–750 (2017). https://doi.org/10.1631/FITEE.1500380

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500380

Key words

CLC number