Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3666025.3699358acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article
Open access

PiezoBud: A Piezo-Aided Secure Earbud with Practical Speaker Authentication

Published: 04 November 2024 Publication History

Abstract

With the advancement of AI-powered personal voice assistants, speaker authentication via earbuds has become increasingly vital, serving as a critical interface between users and mobile devices. However, existing audio-based speaker authentication methods fail to defend against voice spoofing threats such as replay and deep-fake attacks. To counteract these risks, we introduce PiezoBud, a pioneering multi-modal user authentication system that is truly practical and lightweight for earbuds. PiezoBud uses miniature piezoelectric sensors to detect micro-vibrations on the skin, extracting user-specific biometric data to authenticate legitimate access on the local smartphone and protect against malicious attacks. Our exploratory study, involving 85 participants, demonstrates the effectiveness of PiezoBud in various everyday scenarios, including ambient noise, body movement, and in-ear media playing. Using only 15 seconds of enrollment data, PiezoBud achieves an Equal Error Rate (EER) of 1.05% and attain a mean authentication latency of 0.06 seconds on mobile devices. We also evaluate PiezoBud's effectiveness in countering challenging adaptive attack scenarios and its overall performance in various real-world situations. Our evaluation highlights that PiezoBud stands out as a practical, resilient, responsive, and secure option for earbuds users.

References

[1]
Yang Gao, Yincheng Jin, Jagmohan Chauhan, Seokmin Choi, Jiyang Li, and Zhanpeng Jin. Voice in ear: Spoofing-resistant and passphrase-independent body sound authentication. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 5(1):1--25, 2021.
[2]
Yang Gao, Wei Wang, Vir V. Phoha, Wei Sun, and Zhanpeng Jin. Earecho: Using ear canal echo for wearable authentication. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 3(3), 2019.
[3]
Zi Wang, Sheng Tan, Linghan Zhang, Yili Ren, Zhi Wang, and Jie Yang. Eardynamic: An ear canal deformation based continuous user authentication using in-ear wearables. Proceedings of Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 5(1), 2021.
[4]
Seokmin Choi, Junghwan Yim, Yincheng Jin, Yang Gao, Jiyang Li, and Zhanpeng Jin. Earppg: Securing your identity with your ears. In Proceedings of ACM IUI, 2023.
[5]
Huan Feng, Kassem Fawaz, and Kang G Shin. Continuous authentication for voice assistants. In Proceedings of ACM MobiCom, 2017.
[6]
Jianwei Liu, Wenfan Song, Leming Shen, Jinsong Han, and Kui Ren. Secure user verification and continuous authentication via earphone imu. IEEE Transactions on Mobile Computing, 22(11):6755--6769, 2023.
[7]
Tanmay Srivastava, Shijia Pan, Phuc Nguyen, and Shubham Jain. Jawthenticate: Microphone-free speech-based authentication using jaw motion and facial vibrations. In In Proceedings of ACM SenSys, 2023.
[8]
Jiacheng Shang and Jie Wu. Enabling secure voice input on augmented reality headsets using internal body voice. In Proceedings of IEEE SECON, 2019.
[9]
Rui Liu, Cory Cornelius, Reza Rawassizadeh, Ronald Peterson, and David Kotz. Vocal resonance: Using internal body voice for wearable authentication. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2(1):1--23, 2018.
[10]
WireCutter. Your wireless earbuds are trash (eventually). https://www.nytimes.com/wirecutter/blog/your-wireless-earbuds-are-trash-eventually/, Retrieved by June 23 2024.
[11]
Apple. Siri. https://www.apple.com/siri/, Retrieved by June 23 2024.
[12]
Google. Google assistant. https://assistant.google.com/.
[13]
Yun-Tai Chang and Marc J. Dupuis. My voiceprint is my authenticator: A two-layer authentication approach using voiceprint for voice assistants. In 2019 IEEE SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI, 2019.
[14]
Gen Li, Zhichao Cao, and Tianxing Li. Echoattack: Practical inaudible attacks to smart earbuds. In Proceedings of ACM MobiSys, 2023.
[15]
Zhi-Feng Wang, Gang Wei, and Qian-Hua He. Channel pattern noise based playback attack detection algorithm for speaker recognition. In Proceedings of IEEE ICMI.
[16]
Yuan Gong and Christian Poellabauer. Crafting adversarial examples for speech paralinguistics applications. arXiv preprint arXiv:1711.03280, 2017.
[17]
Rafizah Mohd Hanifa, Khalid Isa, and Shamsul Mohamad. A review on speaker recognition: Technology and challenges. Computers & Electrical Engineering, 90:107005, 2021.
[18]
Google. Which pixel buds are right for you? https://store.google.com/us/magazine/pixel_buds_compare?hl=en-US, Retrieved by June 23 2024.
[19]
Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, and Vadim Shchemelinin. Audio replay attack detection with deep learning frameworks. In Proceedings of ISCA Interspeech, 2017.
[20]
Anton Firc, Kamil Malinka, and Petr Hanáček. Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors. Heliyon, 9(4):e15090, 2023.
[21]
Apple. Airpods (3rd generation)). https://www.apple.com/airpods-3rd-generation/, Retrieved by June 23 2024.
[22]
Jingjin Li, Chao Chen, Mostafa Rahimi Azghadi, Hossein Ghodosi, Lei Pan, and Jun Zhang. Security and privacy problems in voice assistant applications: A survey. Computers & Security.
[23]
Nikolay Ivanov, Chenning Li, Qiben Yan, Zhiyuan Sun, Zhichao Cao, and Xiapu Luo. Security threat mitigation for smart contracts: A comprehensive survey. ACM Computing Surveys, 2023.
[24]
Guangjing Wang, Ce Zhou, Yuanda Wang, Bocheng Chen, Hanqing Guo, and Qiben Yan. Beyond boundaries: A comprehensive survey of transferable attacks on ai systems. arXiv preprint arXiv:2311.11796, 2023.
[25]
Hanqing Guo, Guangjing Wang, Yuanda Wang, Bocheng Chen, Qiben Yan, and Li Xiao. Phantomsound: Black-box, query-efficient audio adversarial attack via split-second phoneme injection. In In Proceedings of ACM RAiD, 2023.
[26]
Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, and Qiben Yan. Vsmask: Defending against voice synthesis attack via real-time predictive perturbation. In In Proceedings of ACM WISEC, pages 239--250, 2023.
[27]
Mark D Fletcher, Sian Lloyd Jones, Paul R White, Craig N Dolder, Timothy G Leighton, and Benjamin Lineton. Effects of very high-frequency sound and ultrasound on humans. part ii: A double-blind randomized provocation study of inaudible 20-khz ultrasound. The Journal of the Acoustical Society of America.
[28]
Jasper Lastoria. The 7 best true wireless earbuds - summer 2024 reviews. https://www.rtings.com/headphones/reviews/best/by-type/truly-wireless-earbuds.
[29]
Maolin Gan, Yimeng Liu, Li Liu, Chenshu Wu, Younsuk Dong, Huacheng Zeng, and Zhichao Cao. Poster: mmleaf: Versatile leaf wetness detection via mmwave sensing. In Proceedings of ACM MobiSys, 2023.
[30]
Ruihao Wang, Yimeng Liu, and Rolf Müller. Detection of passageways in natural foliage using biomimetic sonar. Bioinspiration & Biomimetics, 2022.
[31]
Yimeng Liu, Maolin Gan, Huaili Zeng, Liu Li, Younsuk Dong, and Zhichao Cao. Hydra: Accurate multi-modal leaf wetness sensing with mm-wave and camera fusion. In Proceedings of ACM MobiCom, 2024.
[32]
Hans Von Leden. The mechanism of phonation: a search for a rational theory of voice production. Archives of Otolaryngology, 74(6):660--676, 1961.
[33]
Huaili Zeng, Wei Xu, Bo Dong, Changyuan Yu, Wei Zhao, Yishan Wang, and Wenye Sun. Beat-to-beat heart rate estimation from mzi-bcg signal based on hierarchical clustering. In 2021 Opto-Electronics and Communications Conference (OECC), 2021.
[34]
Huaili Zeng, Wei Xu, Bo Dong, Changyuan Yu, Wei Zhao, Yishan Wang, and Wenye Sun. Non-invasive highly sensitive under mattress vital signs monitoring based on fiber sagnac loop. In 2021 Opto-Electronics and Communications Conference (OECC), 2021.
[35]
Kurt Barbe, Rik Pintelon, and Johan Schoukens. Welch method revisited: non-parametric power spectrum estimation via circular overlap. IEEE Transactions on signal processing.
[36]
Hervé Abdi and Lynne J Williams. Principal component analysis. Wiley interdisciplinary reviews: computational statistics.
[37]
Mach1 Research. Imu enabled devices. https://research.mach1.tech/posts/imuenabled-devices/, Retrieved by Apr 17 2024.
[38]
TDK Corporation. Phua2010-049b-00-000. https://product.tdk.com/en/search/sw_piezo/speaker/piezolisten/info?part_no=PHUA2010-049B-00-000, Retrieved by Apr 17 2024.
[39]
TDK Invensense. Mpu-9250. https://invensense.tdk.com/products/motion-tracking/9-axis/mpu-9250/, Retrieved by Apr 17 2024.
[40]
Lin Zhou, Eric Fischer, Can Tunca, Clemens Markus Brahms, Cem Ersoy, Urs Granacher, and Bert Arnrich. How we found our imu: Guidelines to imu selection and a comparison of seven imus for pervasive healthcare applications. Sensors, 20(15):4090, 2020.
[41]
Jingdong Zhao. A review of wearable imu (inertial-measurement-unit)-based pose estimation and drift reduction technologies. In Journal of Physics: Conference Series, volume 1087, page 042003, 2018.
[42]
TDK Corporation. Phua3015-049b-00-000. https://product.tdk.com/en/search/sw_piezo/speaker/piezolisten/info?part_no=PHUA3015-049B-00-000, Retrieved by Apr 17 2024.
[43]
TDK Corporation. Phua3030-049b-00-000. https://product.tdk.com/en/search/sw_piezo/speaker/piezolisten/info?part_no=PHUA3030-049B-00-000, Retrieved by Apr 17 2024.
[44]
Resemble. Resemble.ai. https://www.resemble.ai/, Retrieved by Apr 17 2024.
[45]
Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, and Furu Wei. Neural codec language models are zero-shot text to speech synthesizers, 2023.
[46]
play.hy. Playht. https://play.ht/, Retrieved by Apr 17 2024.
[47]
Xu Tan, Tao Qin, Frank Soong, and Tie-Yan Liu. A survey on neural speech synthesis, 2021.
[48]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of IEEE CVPR, 2016.
[49]
Google. Webrtc, Retrieved by Apr 17 2024. [Online; accessed 30-June-2020].
[50]
Tim Sainburg, Marvin Thielk, and Timothy Q Gentner. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS computational biology, 16(10):e1008228, 2020.
[51]
Maans Klingspor. Hilbert transform: Mathematical theory and applications to signal processing, 2015.
[52]
Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. In Proceedings of ISCA Interspeech, 2020.
[53]
Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of IEEE CVPR, 2018.
[54]
Zhenduo Zhao, Zhuo Li, Wenchao Wang, and Pengyuan Zhang. Pcf: Ecapa-tdnn with progressive channel fusion for speaker verification. In Proceedings of IEEE ICASSP, 2023.
[55]
Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation, 2015.
[56]
Li Wan, Quan Wang, Alan Papir, and Ignacio Lopez Moreno. Generalized end-to-end loss for speaker verification, 2020.
[57]
PUIaudio. Ab1070b-lw100-r. https://puiaudio.com/product/benders/ab1070b-lw100-r, Retrieved by Apr 17 2024.
[58]
Texas Instruments. Lm358p. https://www.ti.com/lit/ds/symlink/lm358b.pdf?ts=1712387960629&ref_url=https%253A%252F%252Fwww.google.com%252F, Retrieved by Apr 17 2024.
[59]
Analog. Max4466. https://www.analog.com/en/products/max4466.html, Retrieved by Apr 17 2024.
[60]
STMicroelectronics. Mp23db01hp. https://www.st.com/en/mems-and-sensors/mp23db01hp.html, Retrieved by Apr 17 2024.
[61]
Nordic. nrf52840. https://www.nordicsemi.com/Products/nRF52840, Retrieved by Apr 17 2024.
[62]
Huaili Zeng, Gen Li, and Tianxing Li. Pyrosense: 3d posture reconstruction using pyroelectric infrared sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2024.
[63]
Yidong Ren, Puyu Cai, Jinyan Jiang, Jialuo Du, and Zhichao Cao. Prism: High-throughput lora backscatter with non-linear chirps. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications, 2023.
[64]
Yidong Ren, Wei Sun, Jialuo Du, Huaili Zeng, Younsuk Dong, Mi Zhang, Shigang Chen, Yunhao Liu, Tianxing Li, and Zhichao Cao. Demeter: Reliable cross-soil lpwan with low-cost signal polarization alignment. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024.
[65]
A. Nagrani, J. S. Chung, and A. Zisserman. Voxceleb: a large-scale speaker identification dataset. In Proceedings of ISCA Interspeech, 2017.
[66]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF CVPR, 2019.
[67]
Hanqing Guo, Qiben Yan, Nikolay Ivanov, Ying Zhu, Li Xiao, and Eric J. Hunter. Supervoice: Text-independent speaker verification using ultrasound energy in human speech. In Proceedings of ACM CCS ASIA, 2022.
[68]
James L Wayman. Error rate equations for the general biometric system. IEEE Robotics & Automation Magazine, 6(1):35--48, 1999.
[69]
Avisoft Biosacoustics. Ultrasonic dynamic speaker vifa. http://www.avisoft.com/playback/vifa/.
[70]
RVC-Project. Retrieval-based-voice-conversion-webui. https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI?tab=readme-ov-file, Retrieved by Apr 17 2024.
[71]
Monsoon. High voltage power monitor. https://www.msoon.com/online-store/High-Voltage-Power-Monitor-p90002590, Retrieved by Apr 17 2024.
[72]
Lei Zhang, Yan Meng, Jiahao Yu, Chong Xiang, Brandon Falk, and Haojin Zhu. Voiceprint mimicry attack towards speaker verification system in smart home. In Proceedings of IEEE INFOCOM, 2020.
[73]
Rui Zhang, Zheng Yan, Xuerui Wang, and Robert H. Deng. Livoauth: Liveness detection in voiceprint authentication with random challenges and detection modes. IEEE Transactions on Industrial Informatics, 19(6):7676--7688, 2023.
[74]
Linghan Zhang, Sheng Tan, and Jie Yang. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of ACM CCS, 2017.
[75]
Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. The catcher in the field: A fieldprint based spoofing detection for text-independent speaker verification. In Proceedings of ACM CCS, 2019.
[76]
Huining Li, Chenhan Xu, Aditya Singh Rathore, Zhengxiong Li, Hanbin Zhang, Chen Song, Kun Wang, Lu Su, Feng Lin, Kui Ren, et al. Vocalprint: exploring a resilient and secure voice authentication via mmwave biometric interrogation. In Proceedings of ACM SenSys, 2020.
[77]
Meta. Meta quest for business. https://forwork.meta.com/quest/business-subscription/, Retrieved by Apr 17 2024.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems
November 2024
950 pages
ISBN:9798400706974
DOI:10.1145/3666025
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2024

Check for updates

Author Tags

  1. multi-modality
  2. piezoelectric
  3. earbuds
  4. user authentication

Qualifiers

  • Research-article

Funding Sources

  • NSF CAREER

Conference

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 259
    Total Downloads
  • Downloads (Last 12 months)259
  • Downloads (Last 6 weeks)74
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media