Abstract
Well-known vulnerabilities of voice-based biometrics are impersonation, replay attacks, artificial signals/speech synthesis, and voice conversion. Among these, voice impersonation is the obvious and simplest way of attack that can be performed. Though voice impersonation by amateurs is considered not a severe threat to ASV systems, studies show that professional impersonators can successfully influence the performance of the voice-based biometrics system. In this work, we have created a novel voice impersonation attack dataset and studied the impact of voice impersonation on automatic speaker verification systems. The dataset consisting of celebrity speeches from 3 different languages, and their impersonations are acquired from YouTube. The vulnerability of speaker verification is observed among all three languages on both the classical i-vector based method and the deep neural network-based x-vector method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
VoxCeleb Models: http://kaldi-asr.org/models/m7.
- 2.
Kaldi GitHub:https://github.com/kaldi-asr/kaldi.
References
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011). https://doi.org/10.1109/TASL.2010.2064307
Farrús Cabeceran, M., Wagner, M., Erro Eslava, D., Hernando Pericás, F.J.: Automatic speaker recognition as a measurement of voice imitation and conversion. Int. J. Speech Lang. Law 1(17), 119–142 (2010)
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Laukkanen, A.M.: Automatic versus human speaker verification: the case of voice mimicry. Speech Commun. 72, 13–31 (2015)
ISO/IEC JTC1 SC37 Biometrics: ISO/IEC 19795–1:2006. Information Technology - Biometric Performance Testing and Reporting - Part 1: Principles and Framework. International Organization for Standardization and International Electrotechnical Committee, March 2006
ISO/IEC JTC1 SC37 Biometrics: ISO/IEC FDIS 30107–3. Information Technology - Biometric presentation attack detection - Part 3: Testing and Reporting. International Organization for Standardization (2017)
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Lau, Y.W., Tran, D., Wagner, M.: Testing voice mimicry with the YOHO speaker verification corpus. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3684, pp. 15–21. Springer, Heidelberg (2005). https://doi.org/10.1007/11554028_3
Lau, Y.W., Wagner, M., Tran, D.: Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 145–148, October 2004. https://doi.org/10.1109/ISIMP.2004.1434021
Mariéthoz, J., Bengio, S.: Can a professional imitator fool a GMM-based speaker verification system? Technical report, IDIAP (2005)
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Lacerda, F. (ed.) Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, 20–24 August 2017, pp. 2616–2620. ISCA (2017). https://doi.org/10.21437/Interspeech. http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0950.html
Panjwani, S., Prakash, A.: Crowdsourcing attacks on biometric systems. In: Symposium On Usable Privacy and Security (SOUPS 2014), pp. 257–269 (2014)
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. No. CONF, IEEE Signal Processing Society (2011)
Prince, S., Li, P., Fu, Y., Mohammed, U., Elder, J.: Probabilistic models for inference about identity. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 144–157 (2012)
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust dnn embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333, April 2018. https://doi.org/10.1109/ICASSP.2018.8461375
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mandalapu, H., Ramachandra, R., Busch, C. (2021). Multilingual Voice Impersonation Dataset and Evaluation. In: Yildirim Yayilgan, S., Bajwa, I.S., Sanfilippo, F. (eds) Intelligent Technologies and Applications. INTAP 2020. Communications in Computer and Information Science, vol 1382. Springer, Cham. https://doi.org/10.1007/978-3-030-71711-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-71711-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71710-0
Online ISBN: 978-3-030-71711-7
eBook Packages: Computer ScienceComputer Science (R0)