Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Building a speech recognition system with privacy identification information based on Google Voice for social robots

Published: 01 September 2022 Publication History

Abstract

Currently, many smart speakers, even social robots, appear on the market to help people's lives become more convenient. Usually, people use smart speakers to check their daily schedule or control home appliances in their house. Many social robots also include smart speakers. They have the common property of being used in voice control machines. Regardless of where the smart speaker is installed and used, when people start a conversation with voice equipment, a security or privacy risk is exposed. Hence, we want to build a speech recognition (SR) that contains the privacy identification information (PII) system in this paper. We call this the SR-PII system. We used a Google Artificial-Intelligence-Yourself (AIY) Voice Kit released from Google to build a simple, smart dialog speaker and included our SR-PII system. In our experiments, we test SR accuracy and the reliability of privacy settings in three environments (quiet, noise, and playing music). We also examine the cloud response and speaker response times during our experiments. The results show that the speaker response is approximately 3.74 s in the cloud environment and approximately 9.04 s from the speaker. We also showed the response accuracy of the speaker, which successfully prevented personal information with the SR-PII system in three environments. The speaker has a response mean time of approximately 8.86 s with 93% mean accuracy in a quiet room, approximately 9.18 s with 89% mean accuracy in a noisy environment, and approximately 9.62 s with 90% mean accuracy in an environment that plays music. We conclude that the SR-PII system can secure private information and that the most important factor affecting the response speed of the speaker is the network connection status. We hope that people can, through our experiments, have some guidelines in building social robots and installing the SR-PII system to protect users’ personal identification information.

References

[1]
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, pp 1764–1772
[2]
Dong L, Xu S, Xu B (2018) Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 5884–5888
[3]
Hinton G Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups IEEE Signal Process Mag 2012 29 6 82-97
[4]
Xiong W, Wu L, Alleva F, Droppo J, Huang X, Stolcke A (2018) The Microsoft 2017 conversational speech recognition system. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 5934–5938
[5]
Salazar J, Kirchhoff K, Huang Z (2019) Self-attention networks for connectionist temporal classification in speech recognition. In: Proceedings of the ICASSP - IEEE International Conference on Acoustics, Speech and Signal Processing, pp 7115–7119
[6]
Haeb-Umbach R, Watanabe R, Nakatani T, and Bacchiani M Speech processing for digital home assistants IEEE Signal Process Mag 2019 36 6 111-124
[7]
Li B, Sainath T, Narayanan A, Caroselli J (2017) Acoustic modeling for Google Home. In: Interspeech, pp 399–403
[8]
Barker J, Watanabe S, Vincent E, Tr-mal J (2018) The fifth CHiME speech separation and recognition challenge: Dataset, task, and baselines. In: Interspeech, pp 1561–1565
[9]
Yu D, Deng L (2015) Automatic speech recognition a deep learning approach. Springer, p 4
[10]
Li B (2017) Acoustic modeling for google home. In: Proceedings of the Interspeech, pp 399–403
[11]
Tian Z, Yi J, Tao J, Bai Y, Wen Z (2019) Self-attention transducers for end-to-end speech recognition. In: Proceedings of the Interspeech, pp 4395–4399
[12]
Gannot S, Vincent E, Markovich-Golan S, Ozerov A (2017) A consolidated perspective on multimicrophone speech enhancement and source separation. In: IEEE/ACM transactions on ASLP, vol 25, no 4, pp 692–730
[13]
Gangamohan P, Mittal VK, Yegnanarayana B (2012) A flexible analysis synthesis tool (fast) for studying the characteristic features of emotion in speech. In: Consumer Communications and Networking Conference (CCNC), pp 250–254
[14]
Yegnanarayana B, Murty K (2009) Event-based instantaneous fundamental frequency estimation from speech signals. In: IEEE transactions on audio speech and language processing, vol 17, no 4, pp 614–624
[15]
Krothapalli SR, Yadav J, Sarkar S, Koolagudi SG, and Vuppala AK Neural network-based feature transformation for emotion independent speaker identification Int J Speech Technol 2012 15 3 335-349
[16]
Desai S, Black AW, Yegnanarayana B, and Prahallad K Spectral mapping using artificial neural networks for voice conversion IEEE Trans Audio Speech Lang Process 2010 18 5 954-964
[17]
Saon G (2017) English conversational telephone speech recognition by humans and machines. In: Proceedings of the Interspeech, pp 132–136
[18]
Karita S, Soplin NEY, Watanabe S, Delcroix M, Ogawa A, Nakatani T (2019) Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. In: Proceedings of the Interspeech, pp 1408–1412
[19]
Rahulamathavan Y, Sutharsini KR, Ray IG, Lu R, Rajarajan M (2019) Privacy-preserving vector-based speaker verification. In: IEEE/ACM Transactions on the audio speech language process, vol 27, no 3, pp 496–506
[20]
European Parliament and Council (2016) Regulation (EU) 2016/679 of the European Parliament and the council of 27 Apr. 2016 on the Protection of Natural Persons With Regard to the Processing of Personal Data and on the Free Movement of Such Data and Repealing Directive 95/46/EC (General Data Protection Regulation
[21]
Nautsch A, Jasserand C, Kindt E, Todisco M, Trancoso L, Evans N (2019) The GDPR & speech data: Reflections of legal and technology communities first steps towards a common understanding. In: Proceedings of the Interspeech, pp 3695–3699
[22]
Papernot N, McDaniel P, Sinha A, Wellman MP (2018) SoK: security and privacy in machine learning. In: Proceedings of the IEEE European symposium security, privacy, pp 399–414
[23]
Povey D, et al (2011) The Kaldi speech recognition toolkit. In: Proceedings of the IEEE workshop automatic speech recognition understanding
[24]
Lin P-C, Yankson B, Hung PCK (2020) A prototype of privacy identification system for smart toy dialogue design. In: 17th IEEE International Conference on Networking, Sensing and Control (IEEE ICNSC 2020), Nanjing, China, pp 1–6
[25]
Lin P-C, Yankson B, Lu ZH, Hung PCK (2019) Children privacy identification system in LINE Chatbot for smart toys. In: IEEE International Conference on Cloud Computing (IEEE CLOUD 2019), Milan, Italy
[26]
Lin P-C, Lin YH, Lai C-A, Chauhan V, Arbai N (2019) Movie recommender system with perceptual engineering in Chatbot. In: The 2nd International Conference on Innovative & Advanced Multidisciplinary Research (ICIAMR 2019), Park Avenue Convention Centre, Singapore, Oct. 12–13
[27]
Thomas L, Lothar F, Kilian P, Kai R (2004) Exploitation of public and private WiFi coverage for new business models. In: Lamersdorf W, Tschammer V, Amarger S (eds)
[28]
Bahl P, Padmanabhan VN (2000) RADAR: an in-building RF-based user location and tracking system. In: Proceedings of 19th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’00), vol 2, pp 775–784
[29]
Manikanta K, Kiran J, Dinesh B, Sachin K (2015) Decimeter Level Localization Using Wi-Fi. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. SIGCOMM '15. New York, NY, USA. ACM. pp 269–282
[30]
Franzago M Collaborative model-driven software engineering: a classification framework and a research map IEEE Trans Softw Eng 2018
[31]
Storey MA (2017) How social and communication channels shape and challenge a participatory culture in software development. In: IEEE Transactions on the Software Engineering, vol 43, no 2, pp 185–204
[32]
Lin B (2016) Why developers are slacking off: understanding how software teams use slack. In: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW 16), pp 333–336
[33]
Tian Y (2017) APIBot: question answering bot for API documentation. In: Proceedings of the 32nd IEEE/ACM Conference on the Automated Software Engineering (ASE 17), pp 153–158
[34]
Nenkov N, Dimitrov G, Dyachenko Y, Koeva K (2016) Artificial intelligence technologies for personnel learning management systems. In: 2016 IEEE 8th International Conference on Intelligent Systems (IS), pp 189–195.
[35]
Nguyen MH (2017) The latest market research trends and landscape in the growing AI Chatbot Industry. https://www.businessinsider.de/chatbot-market-stats-trends-size-ecosystem-research-2017-10?r=US&IR=T.
[36]
Fawthrop A (2019) Voice payments set to drive a new era of commerce for Amazon Pay
[37]
Payments pioneer launches Voice Pay transactions (2017) Biometric Technology Today, vol 15, pp 2–3, 05
[38]
The Rise of Voice Payment Technology in Banking (2019)
[39]
Hattersley L (2018) AIY Projects: create a voice kita with your raspberry Pi. In: The MagPi 2017.
[40]
Lee H (2018) Voice user interface projects: build voice-enabled applications using dialogflow for google home and Alexa skills kit for Amazon Echo. Packt Publishing Ltd
[41]
Gunawan TS, Mokhtar MN, Kartiwi M, Ismail N, Effendi MR, Qodim H (2020) Development of voice-based smart home security system using google voice kit. In: 2020 6th International Conference on Wireless and Telematics (ICWT), 2020, pp 1–4.
[42]
Vimalkumar M, Sharma SK, Singh JB, and Yogesh K Okay google, what about my privacy? User's privacy perceptions and acceptance of voice-based digital assistants Comput Hum Behav 2021 120 106763
[43]
Krazit T (2010) Google finding its voice In: CNet. Retrieved from https://www.cnet.com/news/google-finding-its-voice/.
[44]
Epstein J and Klinkenberg WD From eliza to internet: a brief history of computerized assessment Comput Hum Behav 2001 17 3 295-314
[45]
Marketer E (2019) US voice assistant users 2019. In: Insider intelligence trends, forecasts and statistics. Retrieved February 1, 2021, from eMarketer website: https://www.emarketer.com/content/us-voice-assistant-users-2019.
[46]
Foehr J and Germelmann CC Alexa, can I trust you? Exploring consumer paths to trust in smart voice-interaction technologies J Assoc Consum Res 2020 5 2 181-205
[47]
Bellman S, Johnson EJ, Kobrin SJ, and Lohse GL International differences in information privacy concerns: a global survey of consumers Inf Soc 2004 20 5 313-324
[48]
Miltgen CL, Henseler J, Gelhard C, and Popovi A Introducing new products that affect consumer privacy: a mediation model J Bus Res 2016 69 10 4659-4666
[49]
Zeng F, Ye Q, Li J, and Yang Z Does self-disclosure matter? A dynamic two-stage perspective for the personalization-privacy paradox J Bus Res 2020
[50]
Herrero A, San MH, and Garcia-De MLS Explaining the adoption of social networks sites for sharing user-generated content: a revision of the UTAUT2 Comput Hum Behav 2017 71 209-217
[51]
Merhi M, Hone K, and Tarhini A A cross-cultural study of the intention to use mobile banking between Lebanese and British consumers: extending UTAUT2 with security, privacy, and trust” Technol Soc 2019 59 101
[52]
Kowalczuk P Consumer acceptance of smart speakers: a mixed-methods approach J Res Indian Med 2018 12 4 418-431
[53]
Pitardi V and Marriott RH Alexa, she’s not human, but Unveiling the drivers of consumers’ trust in voice-based artificial intelligence Psychol Mark 2021
[54]
Smith H, Dinev T, and Xu H Information privacy research: an interdisciplinary review MIS Q 2011 35 4 989-1016
[55]
Weisbaum H (2018) Hey Alexa, How Secure Are Voice-Activated Assistants Like You? In: NBC News, https://www.nbcnews.com/tech/security/hey-alexa-how-secure-arevoice-activated-assistants-you-n824566
[56]
Song L and Mittal P Inaudible Voice Commands 2017 New Jersey, USA In Princeton University
[57]
Russakovskii A (2017) Google Is Permanently Nerfing All Home Minis Because Mine Spied on Everything I Said 24/7 [Update x2]. In: Android Police https://www.androidpolice.com/2017/10/10/google-nerfing-home-minis-mine-spied-everything-said-247/
[58]
Greatrex C, White M (2017) Parrot Manages to Fool Amazon’s Alexa and Orders His Own GiftBox Without His Owners Knowing. In: Mirror, https://www.mirror.co.uk/news/uk-news/parrot-manages-fool-amazons-alexa-11207953.
[60]
Finn RL, Wright D, and Friedewald M Gutwirth S, Leenes R, De Hert P, and Poullet Y Seven types of privacy European Data Protection: Coming of Age 2013 New York Springer 3-32
[61]
Warren SD and Brandeis LD The right to privacy Harward Law Rev 1890 4 193-220
[62]
Westin A Privacy and freedom 1967 Cambridge Atheneum Press
[63]
Altman I The environment and social behavior: privacy, personal space, territory, and crowding 1975 Monterey Wadsworth Publishing Company
[64]
Bygrave LA Data protection law: approaching its rationale, logic and limits 2002 Alphen aan den Rijn Wolters Kluwer
[65]
Solove DJ Understanding privacy 2008 Cambridge Harvard University Press
[66]
Kaminski ME Robots in the home: What will we have agreed to? Idaho Law Rev 2015 51 661-677
[67]
Yankson B Continuous improvement process (CIP)-based privacy-preserving framework for smart connected toys Int J Inf Secur 2021 20 849-869
[68]
Kaminski ME, Rueben M, Grimm C, and Smart WD Averting robot eyes Maryland Law Rev. 2017 76 983-1023
[69]
Krasnova H, Veltri NF, and Günther O Self-disclosure and privacy calculus on social networking sites: the role of culture Bus Inform Syst Eng 2012 4 127-135
[70]
Trepte S, Reinecke L, Ellison NB, Quiring O, Yao MZ, and Ziegele M A cross-cultural perspective on the privacy calculus Soc Med Soc 2017 3 1-13
[71]
Baxter P, Kennedy J, Senft E, Lemaignan S, Belpaeme T (2016) From characterising three years of HRI to methodology and reporting recommendations. In: 11th ACM/IEEE International Conference on Human–Robot Interaction (HRI) (Christchurch), pp 391–398.
[72]
Lee MK, Tang KP, Forlizzi J, Kiesler S (2011) Understanding users' perception of privacy in human–robot interaction. In: Proceedings of the 6th International Conference on Human–Robot Interaction. ACM, Lausanne, pp 181–182.
[73]
Syrdal DS, Walters ML, Otero N, Koay KL, Dautenhahn K (2007) He knows when you are sleeping – Privacy and the personal robot companion. In: Proceedings of the 2007 AAAI workshop human implications of human–robot interaction, AAAI, Washington DC, pp 28–33
[74]
Butler DJ, Huang J, Roesner F, Cakmak M (2015) The privacy-utility tradeoff for remotely teleoperated robots. In: Proceedings of the tenth annual ACM/IEEE International Conference on Human–Robot Interaction. ACM, Portland OR, pp. 27–34.
[75]
Krupp MM, Rueben M, Grimm CM, Smart WD (2017) Privacy and telepresence robotics: What do non-scientists think? In: Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human–Robot Interaction. ACM, Vienna, pp 175–176.
[76]
Rueben M, Bernieri FJ, Grimm CM, Smart WD (2017) Framing effects on privacy concerns about a home telepresence robot. In: Proceedings of the 2017 ACM/IEEE International Conference on Human–Robot Interaction. ACM, Vienna, pp 435–444.
[77]
Lutz C, Schöttler M, and Hoffmann CP The privacy implications of social robots: scoping review and expert interviews Mobile Media Commun 2019 7 412-434
[78]
Eurobarometer (2015) Special Eurobarometer 427: Autonomous systems
[79]
Madden M, Rainie L (2015) Americans' attitudes about privacy, security and surveillance. Pew Internet, Science and Tech Report
[80]
Lutz C, Hoffmann CP, and Ranzini G Data capitalism and the user: an exploration of privacy cynicism in Germany New Media Soc 2020 22 1168-1187

Cited By

View all
  • (2024)A literature review of user privacy concerns in conversational chatbotsJournal of the Association for Information Science and Technology10.1002/asi.2489876:1(121-154)Online publication date: 26-Dec-2024

Index Terms

  1. Building a speech recognition system with privacy identification information based on Google Voice for social robots
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image The Journal of Supercomputing
      The Journal of Supercomputing  Volume 78, Issue 13
      Sep 2022
      945 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 September 2022
      Accepted: 26 March 2022

      Author Tags

      1. Google AIY Voice Kit
      2. Speech recognition
      3. Personal identification information
      4. Artificial intelligent
      5. Social robots
      6. Robot computing
      7. Smart speaker
      8. Google assistant

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A literature review of user privacy concerns in conversational chatbotsJournal of the Association for Information Science and Technology10.1002/asi.2489876:1(121-154)Online publication date: 26-Dec-2024

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media