Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3658664.3659656acmconferencesArticle/Chapter ViewAbstractPublication Pagesih-n-mmsecConference Proceedingsconference-collections
short-paper

Is Audio Spoof Detection Robust to Laundering Attacks?

Published: 24 June 2024 Publication History

Abstract

Voice-cloning (VC) systems have seen an exceptional increase in the realism of synthesized speech in recent years. The high quality of synthesized speech and the availability of low-cost VC services have given rise to many potential abuses of this technology. Several detection methodologies have been proposed over the years that can detect voice spoofs with reasonably good accuracy. However, these methodologies are mostly evaluated on clean audio databases, such as ASVSpoof 2019. This paper evaluates SOTA Audio Spoof Detection approaches in the presence of laundering attacks. In that regard, a new laundering attack database, called ASVSpoof Laundered Database, is created. This database is based on the ASVSpoof 2019 (LA) eval database comprising a total of 1388.22 hours of audio recordings. Seven SOTA audio spoof detection approaches are evaluated on this laundered database. The results indicate that SOTA systems perform poorly in the presence of aggressive laundering attacks, especially reverberation and additive noise attacks. This suggests the need for robust audio spoof detection.

References

[1]
Hashim Ali, Dhimant Khuttan, Rafi Ud Daula Refat, and Hafiz Malik. 2023. Protecting Voice-Controlled Devices against LASER Injection Attacks. In 2023 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1--6.
[2]
BT Balamurali, Kinwah Edward Lin, Simon Lui, Jer-Ming Chen, and Dorien Herremans. 2019. Toward robust audio spoofing detection: A detailed comparison of traditional and learned features. IEEE Access, Vol. 7 (2019), 84229--84241.
[3]
Edresson Casanova, Julian Weber, Christopher D Shulby, Arnaldo Candido Junior, Eren Gölge, and Moacir A Ponti. 2022. Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone. In International Conference on Machine Learning. PMLR, 2709--2720.
[4]
David Dean, Ahilan Kanagasundaram, Houman Ghaemmaghami, Md Hafizur Rahman, and Sridha Sridharan. 2015. The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition. In Proceedings of the 16th Annual Conference of the International Speech Communication Association, Interspeech 2015. International Speech Communication Association, 3456--3460.
[5]
Héctor Delgado, Massimiliano Todisco, Md Sahidullah, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, and Junichi Yamagishi. 2018. ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements. In The Speaker and Language Recognition Workshop (Odyssey 2018). ISCA, 296--303. https://doi.org/10.21437/Odyssey.2018--42
[6]
Cemal Hanilçi, Tomi Kinnunen, Md Sahidullah, and Aleksandr Sizov. 2016. Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise. Speech Communication, Vol. 85 (Dec. 2016), 83--97. https://doi.org/10.1016/j.specom.2016.10.002
[7]
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, and Nicholas Evans. 2022. Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6367--6371.
[8]
Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, and Ha-Jin Yu. 2020. Improved rawnet with feature map scaling for text-independent speaker verification using raw waveforms. arXiv preprint arXiv:2004.00526 (2020).
[9]
Madhu R Kamble, Hardik B Sailor, Hemant A Patil, and Haizhou Li. 2020. Advances in anti-spoofing: from the perspective of ASVspoof challenges. APSIPA Trans. on Signal and Information Processing, Vol. 9 (2020), e2.
[10]
Awais Khan, Khalid Mahmood Malik, James Ryan, and Mikul Saravanan. 2023. Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artificial Intelligence Review, Vol. 56, Suppl 1 (2023), 513--566.
[11]
Kate Knibbs. 2024. Researchers Say the Deepfake Biden Robocall Was Likely Made With Tools From AI Startup ElevenLabs. Wired (Jan. 2024). https://www.wired.com/story/biden-robocall-deepfake-elevenlabs/ Section: tags.
[12]
Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, and Alexandr Kozlov. 2019. STC antispoofing systems for the ASVspoof2019 challenge. arXiv preprint arXiv:1904.05576 (2019).
[13]
Jiaxin Li and Lianhai Zhang. 2023. ZSE-VITS: A Zero-Shot Expressive Voice Cloning Method Based on VITS. Electronics, Vol. 12, 4 (2023), 820.
[14]
Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, and Kong Aik Lee. 2023. ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31 (2023), 2507--2522. https://doi.org/10.1109/TASLP.2023.3285283
[15]
Nicolas M. Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, and Konstantin Böttinger. 2022. Does Audio Deepfake Detection Generalize? http://arxiv.org/abs/2203.16263 arXiv:2203.16263 [cs, eess].
[16]
Yanmin Qian, Nanxin Chen, and Kai Yu. 2016. Deep features for automatic spoofing detection. Speech Communication, Vol. 85 (2016), 43--52.
[17]
Siladitya Ray. 2023. Imran Khan-Pakistan's Jailed Ex-Leader-Uses AI Deepfake To Address Online Election Rally. https://www.forbes.com/sites/siladityaray/2023/12/18/imran-khan-pakistans-jailed-ex-leader-uses-ai-deepfake-to-address-online-election-rally/ Section: Business.
[18]
Robin Scheibler, Eric Bezzam, and Ivan Dokmanić. 2018. Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 351--355.
[19]
Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, and Anthony Larcher. 2021a. End-to-end anti-spoofing with rawnet2. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6369--6373.
[20]
Hemlata Tak, Jee weon Jung, Jose Patino, Madhu Kamble, Massimiliano Todisco, and Nicholas Evans. 2021b. End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. 1--8. https://doi.org/10.21437/ASVSPOOF.2021--1
[21]
Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, and Haizhou Li. 2016. An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions. In Interspeech 2016. ISCA, 1715--1719. https://doi.org/10.21437/Interspeech.2016--743
[22]
Massimiliano Todisco, Xin Wang, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi H. Kinnunen, and Kong Aik Lee. 2019. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In Proc. Interspeech 2019. 1008--1012. https://doi.org/10.21437/Interspeech.2019--2249
[23]
Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, et al. [n.,d.]. Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers, 2023. URL: https://arxiv. org/abs/2301.02111. Vol. 10 ( [n.,d.]).
[24]
Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, and Zhen-Hua Ling. 2020. ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech. arXiv:1911.01601 [cs, eess] (July 2020). http://arxiv.org/abs/1911.01601 arXiv: 1911.01601.
[25]
Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, and Aleksandr Sizov. 2015. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Interspeech 2015. ISCA, 2037--2041. https://doi.org/10.21437/Interspeech.2015--462
[26]
Amit Kumar Singh Yadav, Ziyue Xiang, Emily R Bartusiak, Paolo Bestagini, Stefano Tubaro, and Edward J Delp. 2023. ASSD: Synthetic Speech Detection in the AAC Compressed Domain. In ICASSP 2023--2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5.
[27]
Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. 2019. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit. (Nov. 2019). https://doi.org/10.7488/ds/2645 The Centre for Speech Technology Research (CSTR), University of Edinburgh.
[28]
Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, and Héctor Delgado. 2021. ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. In 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge. ISCA, 47--54. https://doi.org/10.21437/ASVSPOOF.2021--8
[29]
Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Rainer Martin, and Jun Guo. 2017. Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features. IEEE trans. on neural networks and learning systems, Vol. 29, 10 (2017), 4633--4644.
[30]
Jinghong Zhang, Xiaowei Yi, and Xianfeng Zhao. 2023. A Compressed Synthetic Speech Detection Method with Compression Feature Embedding. In Proc. INTERSPEECH, Vol. 2023. 5376--5380.
[31]
You Zhang, Fei Jiang, and Zhiyao Duan. 2021. One-Class Learning Towards Synthetic Voice Spoofing Detection. IEEE Signal Processing Letters, Vol. 28 (2021), 937--941. https://doi.org/10.1109/LSP.2021.3076358

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IH&MMSec '24: Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security
June 2024
305 pages
ISBN:9798400706370
DOI:10.1145/3658664
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. audio antispoofing
  2. audio deepfakes
  3. audio forensics
  4. machine learning

Qualifiers

  • Short-paper

Conference

IH&MMSEC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 128 of 318 submissions, 40%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 119
    Total Downloads
  • Downloads (Last 12 months)119
  • Downloads (Last 6 weeks)15
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media