Vocal Emotion Conversion Using WSOLA and Linear Prediction

Vekkot, Susmitha; Tripathi, Shikha

doi:10.1007/978-3-319-66429-3_78

Susmitha Vekkot¹⁶ &
Shikha Tripathi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2344 Accesses

Abstract

The paper deals with speech emotion conversion using Waveform Similarity Overlap Add (WSOLA) and subsequent linear prediction analysis for spectral transformation. Duration modification is done by taking the ratio between segment durations of neutral and target speech. After performing modification using WSOLA, the duration modified source speech is time aligned with target and further subjected to linear prediction analysis to yield the LP coefficients. The target emotion is re-synthesised by using the prosody manipulated residual and LPCs from source. The waveform similarity property of WSOLA is exploited to give output with minimal distortion. The proposed algorithm is subjectively and objectively evaluated along with popular TD-PSOLA algorithm. The correlation between synthesised and real target shows an average improvement of 60% across all emotions with the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Inter-Emotion Conversion using Dynamic Time Warping and Prosody Imposition

STRAIGHT-Based Emotion Conversion Using Quadratic Multivariate Polynomial

Article 18 September 2017

Significance of Glottal Closure Instants Detection Algorithms in Vocal Emotion Conversion

References

Burkhardt, F., Sendilmeier, W.F.: Verification of acoustical correlates of emotional speech using formant synthesis. In: Proceedings of ISCA Workshop on Speech & Emotion, pp. 151–156 (2000)
Google Scholar
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio, Speech, Lang. Pro. 14, 1145–1154 (2006)
Google Scholar
Cabral, J., Oliveira, L.C.: Emovoice: a system to generate emotions in speech. In: Proceedings of INTERSPEECH, 17–21 September, PA, USA, pp. 1798–1801 (2006)
Google Scholar
Govind, D., Prasanna, S.R.M.: Dynamic prosody modification using zero frequency filtered signal. Int. J. Speech Tech. 16, 41–54 (2013)
Article Google Scholar
Rao, K.S., Vuppala, A.K.: Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. 55, 745–756 (2013)
Article Google Scholar
Vuppala, A.K., Kaidiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: Proceedings of 9th International Conference on Industrial and Information Systems (ICIIS), 15–17 December, pp. 1–4 (2014)
Google Scholar
Vydana, H.K., Raju, V.V.V., Gangashetty, S.V., Vuppala, A.K.: Significance of emotionally significant regions of speech for emotive to neutral conversion. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 287–296. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_28
Chapter Google Scholar
Yadav, J., Rao, K.S.: Generation of emotional speech by prosody imposition on sentence, word and syllable level fragments of neutral speech,. In: Proceedings of International Conference on Cognitive Computing and Information Processing (CCIP), 3–4 March. pp. 1–5 (2015)
Google Scholar
Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits, Syst., Sig. Proc. 35(5), 1643–1663 (2016)
Article Google Scholar
Vekkot, S., Tripathi, S.: Inter-Emotion conversion using dynamic time warping and prosody imposition. In: Proceedings of 2nd International Symposium on Intelligent Systems, Technologies & Applications, LNMIIT, Jaipur, 21–24 September, pp. 913–924 (2016)
Google Scholar
Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1993, vol. 2. IEEE (1993)
Google Scholar
Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03547-0_46
Chapter Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(2), 561–580 (1975). IEEE Press, New York
Article Google Scholar
Mourlines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Amrita University, Bengaluru, India
Susmitha Vekkot & Shikha Tripathi

Authors

Susmitha Vekkot
View author publications
You can also search for this author in PubMed Google Scholar
Shikha Tripathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susmitha Vekkot .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vekkot, S., Tripathi, S. (2017). Vocal Emotion Conversion Using WSOLA and Linear Prediction. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_78

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_78
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics