Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Voice Privacy Through Time-Scale and Pitch Modification

  • Conference paper
  • First Online:
Pattern Recognition and Machine Intelligence (PReMI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13102))

  • 64 Accesses

Abstract

An attacker can fraudulently get access (instead of the genuine user) if the users’ speech data has not been preserved by using any protection. Hence, it is important to protect users’ speech data for which a voice privacy system can be employed. A voice privacy system is not designed based on any particular kind of attack. Instead, it is designed in a generalized way, making it as universal system. This study presents the time-scale and pitch modification-based anonymization methods to modify the speaker-dependent speech parameters (i.e., \(F_{0}\)) for better privacy preservation of speech data. The proposed voice privacy performance is compared with the signal processing-based baseline system of the INTERSPEECH 2020 voice privacy challenge. The authors have used various perturbation methods, concluding that speed perturbation with factor 0.8 is better to get adequate speaker anonymization (with \(38.5\%\) Equal Error Rate (EER) and \(91.3\%\) De-IDentification (DeID)) and acceptable speech intelligibility (\(4.86\%\) WER) for female speakers. It is observed that speed and pitch perturbation are two important candidates for anonymization. However, the tempo perturbation is not found to be so useful for speaker anonymization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Introducing voiceprivacy. https://www.voiceprivacychallenge.org/. Accessed 15 June 2021

  2. Sox, audio manipulation tool. http://sox.sourceforge.net/. Accessed 15 June 2021

  3. Atal, B.S.: Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. (JASA) 52(6B), 1687–1697 (1972)

    Article  Google Scholar 

  4. Fang, F., et al.: Speaker anonymization using x-vector and neural waveform models. In: Speech Synthesis Workshop (SSW), 20–22 September 2019, pp. 155–160 (2019)

    Google Scholar 

  5. Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Speaker de-identification via voice transformation. In: IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), Merano, Italy, 13–17 December 2009, pp. 529–533 (2009)

    Google Scholar 

  6. Kanda, N., Takeda, R., Obuchi, Y.: Elastic spectral distortion for low resource speech recognition with deep neural networks. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Olomouc, Czech Republic, 8–13 December 2013, pp. 309–314 (2013)

    Google Scholar 

  7. Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech recognition. In: INTERSPEECH, Dresden, Germany, 6–10 September 2015 (2015)

    Google Scholar 

  8. Lin, Q., Jan, E.E., Che, C., Yuk, D.S., Flanagan, J.: Selective use of the speech spectrum and a vqgmm method for speaker identification. In: Proceeding of \(4^{th}\) International Conference on Spoken Language Processing. ICSLP 1996, 3–6 October 1996, vol. 4, pp. 2415–2418. IEEE, Philadelphia (1996)

    Google Scholar 

  9. Mawalim, C.O., Galajit, K., Karnjana, J., Unoki, M.: X-vector singular value modification and statistical-based decomposition with ensemble regression modeling for speaker anonymization system. In: INTERSPEECH, pp. 1703–1707 (2020)

    Google Scholar 

  10. Nautsch, A., Jasserand, C., Kindt, E., Todisco, M., Trancoso, I., Evans, N.: The GDPR and speech data: Reflections of legal and technology communities, first steps towards a common understanding. In: INTERSPEECH, 15–19 September 2019, pp. 3695–3699 (2019)

    Google Scholar 

  11. Nautsch, A., et al.: Preserving privacy in speaker and speech characterization. Comput. Speech Lang. 58, 441–480 (2019)

    Article  Google Scholar 

  12. Noé, P.G., Bonastre, J.F., Matrouf, D., Tomashenko, N., Nautsch, A., Evans, N.: Speech pseudonymisation assessment using voice similarity matrices. In: INTERSPEECH, 25–29 October 2020, pp. 1718–1722 (2020)

    Google Scholar 

  13. Oppenheim, A.V.: Discrete-Time Signal Processing. Pearson Education India (1999)

    Google Scholar 

  14. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Queensland, Australia, 19–24 April 2015, pp. 5206–5210 (2015)

    Google Scholar 

  15. Patino, J., Tomashenko, N., Todisco, M., Nautsch, A., Evans, N.: Speaker anonymisation using the mcadams coefficient. arXiv preprint arXiv:2011.01130 (2020). Accessed 15 May 2021

  16. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). No. CONF, Big Island, Hawaii, USA, 11–15 December 2011 (2011)

    Google Scholar 

  17. Qian, J., Du, H., Hou, J., Chen, L., Jung, T., Li, X.Y.: Hidebehind: enjoy voice input with voiceprint unclonability and anonymity. In: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, Shenzhen, China, November 2018, pp. 82–94 (2018)

    Google Scholar 

  18. Srivastava, B.M.L., Vauquier, N., Sahidullah, M., Bellet, A., Tommasi, M., Vincent, E.: Evaluating voice conversion-based privacy protection against informed attackers. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2802–2806 (2020)

    Google Scholar 

  19. Stylianou, Y.: Voice transformation: a survey. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, 19–24 April 2009, pp. 3585–3588 (2009)

    Google Scholar 

  20. Tomashenko, N., et al.: The VoicePrivacy 2020 challenge evaluation plan. https://www.voiceprivacychallenge.org/docs/VoicePrivacy_2020_Eval_Plan_v1_3.pdf. Accessed 15 May 2021

  21. Veldhuis, R., He, H.: Time-scale and pitch modifications of speech signals and resynthesis from the discrete short-time Fourier transform. Speech Commun. 18(3), 257–282 (1996)

    Article  Google Scholar 

  22. Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Minneapolis, Minnesota, USA, 27–30 April 1993, vol. 2, pp. 554–557 (1993)

    Google Scholar 

  23. Wang, X., Yamagishi, J.: Neural harmonic-plus-noise waveform model with trainable maximum voice frequency for text-to-speech synthesis. In: Speech Synthesis Workshop (SSW), 20–22 September 2019, pp. 1–6 (2019)

    Google Scholar 

  24. Zhang, S.X., Gong, Y., Yu, D.: Encrypted speech recognition using deep polynomial networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 12–17 May 2019, pp. 5691–5695 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gauri P. Prajapati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Prajapati, G.P., Singh, D.K., Patil, H.A. (2024). Voice Privacy Through Time-Scale and Pitch Modification. In: Ghosh, A., King, I., Bhattacharyya, M., Sankar Ray, S., K. Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2021. Lecture Notes in Computer Science, vol 13102. Springer, Cham. https://doi.org/10.1007/978-3-031-12700-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-12700-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-12699-4

  • Online ISBN: 978-3-031-12700-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics