Abstract
In this paper, we present the evaluation results of our proposed text-independent speaker recognition method based on the Earth Mover’s Distance (EMD) using ISCSLP2006 Chinese speaker recognition evaluation corpus developed by the Chinese Corpus Consortium (CCC). The EMD based speaker recognition (EMD-SR) was originally designed to apply to a distributed speaker identification system, in which the feature vectors are compressed by vector quantization at a terminal and sent to a server that executes a pattern matching process. In this structure, we had to train speaker models using quantized data, so that we utilized a non-parametric speaker model and EMD. From the experimental results on a Japanese speech corpus, EMD-SR showed higher robustness to the quantized data than the conventional GMM technique. Moreover, it has achieved higher accuracy than the GMM even if the data were not quantized. Hence, we have taken the challenge of ISCSLP2006 speaker recognition evaluation by using EMD-SR. Since the identification tasks defined in the evaluation were on an open-set basis, we introduce a new speaker verification module in this paper. Evaluation results showed that EMD-SR achieves 99.3% Identification Correctness Rate in a closed-channel speaker identification task.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fattah, M.A., Ren, F., Kuroiwa, S., Fukuda, I.: Phoneme Based Speaker Modeling to Improve Speaker Recognition. Information 9(1), 135–147 (2006)
Kuroiwa, S., Umeda, Y., Tsuge, S., Ren, F.: Nonparametric Speaker Recognition Method using Earth Mover’s Distance. IEICE Transactions on Information and Systems E89-D(3), 1074–1081 (2006)
Fattah, M.A., Ren, F., Kuroiwa, S.: Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification. IEICE Transactions on Information and Systems E89-D(5), 1712–1719 (2006)
Pearce, D.: Enabling New Speech Driven Services for Mobile Devices: An overview of the ETSI standards activities for Distributed Speech Recognition Front-ends. In: Proc. Applied Voice Input/Output Society Conference (December 2000)
ETSI standard document, Speech processing, transmission and auality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108 v1.1.2 (April 2000)
Broun, C.C., Campbell, W.M., Pearce, D., Kelleher, H.: Distributed Speaker Recognition Using the ETSI Distributed Speech Recognition Standard. In: Proc. A Speaker Odyssey - The Speaker Recognition Workshop, June 2001, pp. 121–124 (2001)
Grassi, S., Ansorge, M., Pellandini, F., Farine, P.-A.: Distributed Speaker Recognition Using the ETSI AURORA Standard. In: Proc. 3rd COST 276 Workshop on Information and Knowledge Management for Integrated Media Communication, pp. 120–125 (2002)
Sit, C.-H., Mak, M.-W., Kung, S.-Y.: Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems. In: Proc. the 1st Int. Conf. on Biometric Authentication (July 2004)
Fukuda, I., Fattah, M.A., Tsuge, S., Kuroiwa, S.: Distributed Speaker Identification on Japanese Speech Corpus Using the ETSI Aurora Standard. In: Proc. 3rd Int. Conf. on Information, November 2004, pp. 207–210 (2004)
http://www.kddi.com/english/corporate/newsrelease/2006/0112/
Rubner, Y., Guibas, L., Tomasi, C.: The Earth Mover’s Distance, Multi- Dimensional Scaling, and Color-Based Image Retrieval. In: Proc. the ARPA Image Understanding Workshop, May 1997, pp. 661–668 (1997)
Kuroiwa, S., Sakayori, S., Yamamoto, S., Fujioka, M.: Prank call rejection system for home country direct service. In: Proc. of IVTTA 1996, Basking Ridge, U.S.A, Sepember-October 1996, pp. 135–138 (1996)
Park, A., Hazen, T.: ASR dependent techniques for speaker identification. In: Proc. ICSLP 2002, September 2002, pp. 1337–1340 (2002)
Uchibe, T., Kuroiwa, S., Higuchi, N.: Determination of threshold for speaker verification using speaker adaptation gain in likelihood during training. In: Proc. ICSLP 2000, Beijing, China, October 2000, vol. 2, pp. 326–329 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuroiwa, S., Tsuge, S., Kita, M., Ren, F. (2006). Evaluation of EMD-Based Speaker Recognition Using ISCSLP2006 Chinese Speaker Recognition Evaluation Corpus. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_56
Download citation
DOI: https://doi.org/10.1007/11939993_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)