Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3460120.3484755acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Robust Detection of Machine-induced Audio Attacks in Intelligent Audio Systems with Microphone Array

Published: 13 November 2021 Publication History

Abstract

With the popularity of intelligent audio systems in recent years, their vulnerabilities have become an increasing public concern. Existing studies have designed a set of machine-induced audio attacks, such as replay attacks, synthesis attacks, hidden voice commands, inaudible attacks, and audio adversarial examples, which could expose users to serious security and privacy threats. To defend against these attacks, existing efforts have been treating them individually. While they have yielded reasonably good performance in certain cases, they can hardly be combined into an all-in-one solution to be deployed on the audio systems in practice. Additionally, modern intelligent audio devices, such as Amazon Echo and Apple HomePod, usually come equipped with microphone arrays for far-field voice recognition and noise reduction. Existing defense strategies have been focusing on single- and dual-channel audio, while only few studies have explored using multi-channel microphone array for defending specific types of audio attack. Motivated by the lack of systematic research on defending miscellaneous audio attacks and the potential benefits of multi-channel audio, this paper builds a holistic solution for detecting machine-induced audio attacks leveraging multi-channel microphone arrays on modern intelligent audio systems. Specifically, we utilize magnitude and phase spectrograms of multi-channel audio to extract spatial information and leverage a deep learning model to detect the fundamental difference between human speech and adversarial audio generated by the playback machines. Moreover, we adopt an unsupervised domain adaptation training framework to further improve the model's generalizability in new acoustic environments. Evaluation is conducted under various settings on a public multi-channel replay attack dataset and a self-collected multi-channel audio attack dataset involving 5 types of advanced audio attacks. The results show that our method can achieve an equal error rate (EER) as low as 6.6% in detecting a variety of machine-induced attacks. Even in new acoustic environments, our method can still achieve an EER as low as 8.8%.

References

[1]
2020. Google Text-to-Speech. (2020). https://cloud.google.com/text-to-speech/docs
[2]
2021. Hidden Voice Commands. (2021). https://www.hiddenvoicecommands.com/demo
[3]
2021. The LJ Speech Dataset. (2021). https://keithito.com/LJ-Speech-Dataset/
[4]
2021. Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. (2021). https://sites.google.com/view/practicalhiddenvoice/home
[5]
2021. Ultrasonic Dynamic Speaker Vifa. (2021). http://www.avisoft.com/playback/vifa/
[6]
2021. ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan. (2021). https://www.asvspoof.org/asvspoof2021/asvspoof2021_evaluation_plan.pdf
[7]
Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, and Joseph Wilson. 2019. Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. (2019). arxiv: cs.CR/1904.05734
[8]
Anup Agarwal, Mohit Jain, Pratyush Kumar, and Shwetak Patel. 2018. Opportunistic sensing with MIC arrays on smart speakers for distal interaction and exercise tracking. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6403--6407.
[9]
Muhammad Ejaz Ahmed, Il-Youp Kwak, Jun Ho Huh, Iljoo Kim, Taekkyung Oh, and Hyoungshick Kim. 2020. Void: A fast and light voice liveness detection system. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 2685--2702. https://www.usenix.org/conference/usenixsecurity20/presentation/ahmed-muhammad
[10]
Jacob Benesty, Jingdong Chen, and Yiteng Huang. 2008. Microphone array signal processing. Vol. 1. Springer Science & Business Media.
[11]
Logan Blue, Hadi Abdullah, Luis Vargas, and Patrick Traynor. 2018a. 2ma: Verifying voice commands via two microphone authentication. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security (ACM ASIA CCS). 89--100.
[12]
Logan Blue, Luis Vargas, and Patrick Traynor. 2018b. Hello, is it me you're looking for? differentiating between human and electronic speakers for voice interface security. In Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks. 123--133.
[13]
Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In 25th $$USENIX$$ Security Symposium ($$USENIX$$ Security 16). 513--530.
[14]
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 1--7.
[15]
Si Chen, Kui Ren, Sixu Piao, Cong Wang, Qian Wang, Jian Weng, Lu Su, and Aziz Mohaisen. 2017. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 183--195.
[16]
Jian Cheng, Jiaxiang Wu, Cong Leng, Yuhang Wang, and Qinghao Hu. 2017. Quantized CNN: A unified approach to accelerate and compress convolutional networks. IEEE transactions on neural networks and learning systems, Vol. 29, 10 (2017), 4730--4743.
[17]
Phillip L De Leon, Michael Pucher, Junichi Yamagishi, Inma Hernaez, and Ibon Saratxaga. 2012. Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, 8 (2012), 2280--2290.
[18]
Chunhua Deng, Siyu Liao, Yi Xie, Keshab K Parhi, Xuehai Qian, and Bo Yuan. 2018. PermDNN: Efficient compressed DNN architecture with permuted diagonal matrices. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 189--202.
[19]
Huan Feng, Kassem Fawaz, and Kang G Shin. 2017. Continuous authentication for voice assistants. In Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking. 343--355.
[20]
Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. International conference on machine learning. PMLR, 1180--1189.
[21]
Yang Gao, Yincheng Jin, Jagmohan Chauhan, Seokmin Choi, Jiyang Li, and Zhanpeng Jin. 2021. Voice In Ear: Spoofing-Resistant and Passphrase-Independent Body Sound Authentication. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 5, 1 (2021), 1--25.
[22]
Yuan Gong, Jian Yang, Jacob Huber, Mitchell MacKnight, and Christian Poellabauer. 2019. ReMASC: realistic replay attack corpus for voice controlled systems. arXiv preprint arXiv:1904.03365 (2019).
[23]
Yuan Gong, Jian Yang, and Christian Poellabauer. 2020. Detecting Replay Attacks Using Multi-Channel Audio: A Neural Network-Based Method. IEEE Signal Processing Letters, Vol. 27 (2020), 920--924.
[24]
Cemal Hanilcc i. 2017. Features and classifiers for replay spoofing attack detection. In 2017 10Th international conference on electrical and electronics engineering (ELECO). IEEE, 1187--1191.
[25]
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling up end-to-end speech recognition. (2014). arxiv: cs.CL/1412.5567
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[27]
Yitao He, Junyu Bian, Xinyu Tong, Zihui Qian, Wei Zhu, Xiaohua Tian, and Xinbing Wang. 2019. Canceling inaudible voice commands against voice control systems. In The 25th Annual International Conference on Mobile Computing and Networking. 1--15.
[28]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[29]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[30]
Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, and Farinaz Koushanfar. 2021. WaveGuard: Understanding and Mitigating Audio Adversarial Examples. arXiv preprint arXiv:2103.03344 (2021).
[31]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[32]
Tomi Kinnunen, Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, and Kong Aik Lee. 2017. The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. (2017).
[33]
Phillip L De Leon, Bryan Stewart, and Junichi Yamagishi. 2012. Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In Thirteenth Annual Conference of the International Speech Communication Association .
[34]
En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications, Vol. 19, 1 (2019), 447--457.
[35]
Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2020 a. Practical adversarial attacks against speaker recognition systems. In Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications. 9--14.
[36]
Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020 b. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1121--1134.
[37]
Meng Liu, Longbiao Wang, Jianwu Dang, Seiichi Nakagawa, Haotian Guan, and Xiangang Li. 2019. Replay attack detection using magnitude and phase information with attention-based adaptive filters. In ICASSP 2019--2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6201--6205.
[38]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).
[39]
Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All your voices are belong to us: Stealing voices to fool humans and machines. In European Symposium on Research in Computer Security. Springer, 599--621.
[40]
Alan V Oppenheim, John R Buck, and Ronald W Schafer. 2001. Discrete-time signal processing. Vol. 2 .Upper Saddle River, NJ: Prentice Hall.
[41]
Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible voice commands: The long-range attack and defense. In 15th $$USENIX$$ Symposium on Networked Systems Design and Implementation ($$NSDI$$ 18). 547--560.
[42]
Md Sahidullah, Tomi Kinnunen, and Cemal Hanilcc i. 2015. A comparison of features for synthetic speech detection. (2015).
[43]
Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas, Daniel Erro, and Tuomo Raitio. 2015. Toward a universal synthetic speech spoofing detection using phase information. IEEE Transactions on Information Forensics and Security, Vol. 10, 4 (2015), 810--820.
[44]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.
[45]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[46]
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. 2018a. Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. (2018). arxiv: cs.CL/1712.05884
[47]
Jian Shen, Yanru Qu, Weinan Zhang, and Yong Yu. 2018b. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[48]
Cong Shi, Yan Wang, Yingying Chen, Nitesh Saxena, and Chen Wang*. 2020. WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training. In Annual Computer Security Applications Conference (ACSAC). 829--842.
[49]
Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, and Tomoko Matsui. 2015. Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In Sixteenth annual conference of the international speech communication association .
[50]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representation. 1--14.
[51]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[52]
Yu-hsin Chen, Ignacio Lopez-Moreno, Tara N Sainath, Mirkó Visontai, Raziel Alvarez, and Carolina Parada. 2015. Locally-connected and convolutional neural networks for small footprint speaker recognition. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association.
[53]
F Alton Everest and Ken C Pohlmann. 2015. Master handbook of acoustics. McGraw-Hill Education .
[54]
Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, and Alexandr Kozlov. 2019. STC antispoofing systems for the ASVspoof2019 challenge. arXiv preprint arXiv:1904.05576 ( 2019).
[55]
Khomdet Phapatanaburi, Longbiao Wang, Seiichi Nakagawa, and Masahiro Iwahashi. 2019. Replay attack detection using linear prediction analysis-based relative phase features. IEEE Access, Vol. 7 ( 2019), 183614--183625 .
[56]
Mirco Ravanelli and Yoshua Bengio. 2018. Speaker recognition from raw waveform with sincnet. In Proceedings of 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1021--1028 .
[57]
Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. 2020. Light commands: laser-based audio injection attacks on voice-controllable systems. In Proceedings of the 29th $$USENIX$$ Security Symposium ($$USENIX$$ Security 20). 2631--2648 .
[58]
Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas Evans, and Massimiliano Todisco. 2020. Spoofing attack detection using the non-linear fusion of sub-band classifiers. arXiv preprint arXiv:2005.10393 ( 2020).
[59]
Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, and Anthony Larcher. 2021. End-to-end anti-spoofing with RawNet2. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6369--6373 .
[60]
Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, and Kong Aik Lee. 2019. ASVspoof 2019: Future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 ( 2019).
[61]
Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, and Ming Zhou. 2020. Low latency end-to-end streaming speech recognition with a scout network. arXiv preprint arXiv:2003.10369 ( 2020).
[62]
Xin Wang and Junich Yamagishi. 2021. A comparative study on recent neural spoofing countermeasures for synthetic speech detection. arXiv preprint arXiv:2103.11326 ( 2021).
[63]
Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light cnn for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, Vol. 13, 11 ( 2018), 2884--2896 .
[64]
Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The catcher in the field: A fieldprint based spoofing detection for text-independent speaker verification. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1215--1229 .
[65]
Guoming Zhang, Xiaoyu Ji, Xinfeng Li, Gang Qu, and Wenyuan Xu. 2021. EarArray: Defending against Dolphin Attack via Acoustic Attenuation. In Network and Distributed Systems Security (NDSS) Symposium.
[66]
Massimiliano Todisco, Héctor Delgado, and Nicholas WD Evans. 2016. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients. In Odyssey, Vol. 2016. 283--290.
[67]
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7167--7176.
[68]
Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th $$USENIX$$ Workshop on Offensive Technologies ($$WOOT$$ 15) .
[69]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. WaveNet: A Generative Model for Raw Audio. (2016). arxiv: cs.SD/1609.03499
[70]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[71]
Chen Wang, S Abhishek Anand, Jian Liu, Payton Walker, Yingying Chen, and Nitesh Saxena. 2019. Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In Proceedings of the 35th Annual Computer Security Applications Conference. 42--56.
[72]
Longbiao Wang, Yohei Yoshida, Yuta Kawakami, and Seiichi Nakagawa. 2015. Relative phase information for detecting human speech and spoofed speech. In Sixteenth Annual Conference of the International Speech Communication Association .
[73]
Run Wang, Felix Juefei-Xu, Yihao Huang, Qing Guo, Xiaofei Xie, Lei Ma, and Yang Liu. 2020 b. DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices. In Proceedings of the 28th ACM International Conference on Multimedia. 1207--1216.
[74]
Shu Wang, Jiahao Cao, Xu He, Kun Sun, and Qi Li. 2020 a. When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (Oct 2020). https://doi.org/10.1145/3372297.3417254
[75]
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, and Rif A. Saurous. 2017. Tacotron: Towards End-to-End Speech Synthesis. (2017). arxiv: cs.CL/1703.10135
[76]
Marcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, and Jakub Galka. 2017. Audio Replay Attack Detection Using High-Frequency Features. In Interspeech. 27--31.
[77]
Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems, Vol. 2, 1--3 (1987), 37--52.
[78]
Zhizheng Wu and Haizhou Li. 2013. Voice conversion and spoofing attack on speaker verification systems. In 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 1--9.
[79]
Xiong Xiao, Xiaohai Tian, Steven Du, Haihua Xu, Eng Siong Chng, and Haizhou Li. 2015. Spoofing speech detection using high dimensional magnitude and phase features: The NTU approach for ASVspoof 2015 challenge. Sixteenth Annual Conference of the International Speech Communication Association .
[80]
Ryoya Yaguchi, Sayaka Shiota, Nobutaka Ono, and Hitoshi Kiya. 2019. Replay attack detection using generalized cross-correlation of stereo signal. In 2019 27th European Signal Processing Conference (EUSIPCO). IEEE, 1--5.
[81]
Hiromu Yakura and Jun Sakuma. 2018. Robust audio adversarial example for a physical attack. arXiv preprint arXiv:1810.11793 (2018).
[82]
Chao-Han Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, and Chin-Hui Lee. 2020. Characterizing speech adversarial examples using self-attention u-net enhancement. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3107--3111.
[83]
Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Characterizing audio adversarial examples using temporal dependency. arXiv preprint arXiv:1809.10875 (2018).
[84]
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In 27th $$USENIX$$ Security Symposium ($$USENIX$$ Security 18). 49--64.
[85]
Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. Dolphinattack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 103--117.
[86]
Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 57--71.
[87]
Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1080--1091.

Cited By

View all
  • (2024)Toward Robust ASR System against Audio Adversarial Examples using Agitated LogitACM Transactions on Privacy and Security10.1145/366182227:2(1-26)Online publication date: 10-Jun-2024
  • (2024)F2Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice InteractionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661860(127-140)Online publication date: 3-Jun-2024
  • (2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
  • Show More Cited By

Index Terms

  1. Robust Detection of Machine-induced Audio Attacks in Intelligent Audio Systems with Microphone Array

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
    November 2021
    3558 pages
    ISBN:9781450384544
    DOI:10.1145/3460120
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio attack
    2. intelligent audio system
    3. microphone array

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CCS '21
    Sponsor:
    CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security
    November 15 - 19, 2021
    Virtual Event, Republic of Korea

    Acceptance Rates

    Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

    Upcoming Conference

    CCS '24
    ACM SIGSAC Conference on Computer and Communications Security
    October 14 - 18, 2024
    Salt Lake City , UT , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)378
    • Downloads (Last 6 weeks)81
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Toward Robust ASR System against Audio Adversarial Examples using Agitated LogitACM Transactions on Privacy and Security10.1145/366182227:2(1-26)Online publication date: 10-Jun-2024
    • (2024)F2Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice InteractionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661860(127-140)Online publication date: 3-Jun-2024
    • (2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
    • (2024)AdvReverb: Rethinking the Stealthiness of Audio Adversarial Examples to Human PerceptionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334563919(1948-1962)Online publication date: 1-Jan-2024
    • (2024)Room-scale Voice Liveness Detection for Smart DevicesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.3367269(1-14)Online publication date: 2024
    • (2023)Learning normality is enoughProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620375(2455-2472)Online publication date: 9-Aug-2023
    • (2023)Systemization of Knowledge: Robust Deep Learning using Hardware-software co-design in Centralized and Federated SettingsACM Transactions on Design Automation of Electronic Systems10.1145/361686828:6(1-32)Online publication date: 16-Oct-2023
    • (2023)The Ethical Implications of Generative Audio Models: A Systematic Literature ReviewProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604686(146-161)Online publication date: 8-Aug-2023
    • (2023)XPorter: A Study of the Multi-Port Charger Security on Privacy Leakage and Voice InjectionProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613293(1-15)Online publication date: 2-Oct-2023
    • (2023)MASTERKEY: Practical Backdoor Attack Against Speaker Verification SystemsProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613261(1-15)Online publication date: 2-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media