research-article

When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

Authors:

Shu Wang,

Jiahao Cao,

Xu He,

Kun Sun,

Qi LiAuthors Info & Claims

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Pages 1103 - 1119

https://doi.org/10.1145/3372297.3417254

Published: 02 November 2020 Publication History

Get Access

Abstract

Automatic speech recognition (ASR) systems have been widely deployed in modern smart devices to provide convenient and diverse voice-controlled services. Since ASR systems are vulnerable to audio replay attacks that can spoof and mislead ASR systems, a number of defense systems have been proposed to identify replayed audio signals based on the speakers' unique acoustic features in the frequency domain. In this paper, we uncover a new type of replay attack called modulated replay attack, which can bypass the existing frequency domain based defense systems. The basic idea is to compensate for the frequency distortion of a given electronic speaker using an inverse filter that is customized to the speaker's transform characteristics. Our experiments on real smart devices confirm the modulated replay attacks can successfully escape the existing detection mechanisms that rely on identifying suspicious features in the frequency domain. To defeat modulated replay attacks, we design and implement a countermeasure named DualGuard. We discover and formally prove that no matter how the replay audio signals could be modulated, the replay attacks will either leave ringing artifacts in the time domain or cause spectrum distortion in the frequency domain. Therefore, by jointly checking suspicious features in both frequency and time domains, DualGuard~can successfully detect various replay attacks including the modulated replay attacks. We implement a prototype of DualGuard~on a popular voice interactive platform, ReSpeaker Core v2. The experimental results show DualGuard~can achieve 98% accuracy on detecting modulated replay attacks.

Supplementary Material

MOV File (Copy of CCS2020_fp205_ShuWang - Brian Hollendyke.mov)

Presentation video

Download
311.38 MB

References

[1]

Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, and Joseph Wilson. 2019. Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. In Proceedings of the 2019 The Network and Distributed System Security Symposium (NDSS '19).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Combining Mel frequency Cepstral coefficients and fractal dimensions for automatic speech recognition

Psycho-acoustics inspired automatic speech recognition

Measuring the intelligibility of dysarthric speech through automatic speech recognition in a pluricentric language

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations