Multilingual Voice Impersonation Dataset and Evaluation

Mandalapu, Hareesh; Ramachandra, Raghavendra; Busch, Christoph

doi:10.1007/978-3-030-71711-7_15

Hareesh Mandalapu⁸,
Raghavendra Ramachandra⁸ &
Christoph Busch⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1382))

Included in the following conference series:

International Conference on Intelligent Technologies and Applications

659 Accesses

Abstract

Well-known vulnerabilities of voice-based biometrics are impersonation, replay attacks, artificial signals/speech synthesis, and voice conversion. Among these, voice impersonation is the obvious and simplest way of attack that can be performed. Though voice impersonation by amateurs is considered not a severe threat to ASV systems, studies show that professional impersonators can successfully influence the performance of the voice-based biometrics system. In this work, we have created a novel voice impersonation attack dataset and studied the impact of voice impersonation on automatic speaker verification systems. The dataset consisting of celebrity speeches from 3 different languages, and their impersonations are acquired from YouTube. The vulnerability of speaker verification is observed among all three languages on both the classical i-vector based method and the deep neural network-based x-vector method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures

Article 28 June 2023

Voice Presentation Attack Detection Using Convolutional Neural Networks

Automatic speaker verification systems and spoof detection techniques: review and analysis

Article 16 August 2021

Notes

1.
VoxCeleb Models: http://kaldi-asr.org/models/m7.
2.
Kaldi GitHub:https://github.com/kaldi-asr/kaldi.

References

Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011). https://doi.org/10.1109/TASL.2010.2064307
Article Google Scholar
Farrús Cabeceran, M., Wagner, M., Erro Eslava, D., Hernando Pericás, F.J.: Automatic speaker recognition as a measurement of voice imitation and conversion. Int. J. Speech Lang. Law 1(17), 119–142 (2010)
Google Scholar
Hautamäki, R.G., Kinnunen, T., Hautamäki, V., Laukkanen, A.M.: Automatic versus human speaker verification: the case of voice mimicry. Speech Commun. 72, 13–31 (2015)
Article Google Scholar
ISO/IEC JTC1 SC37 Biometrics: ISO/IEC 19795–1:2006. Information Technology - Biometric Performance Testing and Reporting - Part 1: Principles and Framework. International Organization for Standardization and International Electrotechnical Committee, March 2006
Google Scholar
ISO/IEC JTC1 SC37 Biometrics: ISO/IEC FDIS 30107–3. Information Technology - Biometric presentation attack detection - Part 3: Testing and Reporting. International Organization for Standardization (2017)
Google Scholar
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)
Google Scholar
Lau, Y.W., Tran, D., Wagner, M.: Testing voice mimicry with the YOHO speaker verification corpus. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3684, pp. 15–21. Springer, Heidelberg (2005). https://doi.org/10.1007/11554028_3
Chapter Google Scholar
Lau, Y.W., Wagner, M., Tran, D.: Vulnerability of speaker verification to voice mimicking. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 145–148, October 2004. https://doi.org/10.1109/ISIMP.2004.1434021
Mariéthoz, J., Bengio, S.: Can a professional imitator fool a GMM-based speaker verification system? Technical report, IDIAP (2005)
Google Scholar
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Lacerda, F. (ed.) Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, 20–24 August 2017, pp. 2616–2620. ISCA (2017). https://doi.org/10.21437/Interspeech. http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0950.html
Panjwani, S., Prakash, A.: Crowdsourcing attacks on biometric systems. In: Symposium On Usable Privacy and Security (SOUPS 2014), pp. 257–269 (2014)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. No. CONF, IEEE Signal Processing Society (2011)
Google Scholar
Prince, S., Li, P., Fu, Y., Mohammed, U., Elder, J.: Probabilistic models for inference about identity. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 144–157 (2012)
Article Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust dnn embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333, April 2018. https://doi.org/10.1109/ICASSP.2018.8461375
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian University of Science and Technology, 2815, Gjøvik, Norway
Hareesh Mandalapu, Raghavendra Ramachandra & Christoph Busch

Authors

Hareesh Mandalapu
View author publications
You can also search for this author in PubMed Google Scholar
Raghavendra Ramachandra
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Busch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hareesh Mandalapu , Raghavendra Ramachandra or Christoph Busch .

Editor information

Editors and Affiliations

NTNU, Gjøvik, Norway
Sule Yildirim Yayilgan
The Islamia University of Bahawalpur, Punjab, Pakistan
Imran Sarwar Bajwa
University of Agder, Kristiansand, Norway
Filippo Sanfilippo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandalapu, H., Ramachandra, R., Busch, C. (2021). Multilingual Voice Impersonation Dataset and Evaluation. In: Yildirim Yayilgan, S., Bajwa, I.S., Sanfilippo, F. (eds) Intelligent Technologies and Applications. INTAP 2020. Communications in Computer and Information Science, vol 1382. Springer, Cham. https://doi.org/10.1007/978-3-030-71711-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-71711-7_15
Published: 15 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71710-0
Online ISBN: 978-3-030-71711-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics