Abstract
The paper deals with speech emotion conversion using Waveform Similarity Overlap Add (WSOLA) and subsequent linear prediction analysis for spectral transformation. Duration modification is done by taking the ratio between segment durations of neutral and target speech. After performing modification using WSOLA, the duration modified source speech is time aligned with target and further subjected to linear prediction analysis to yield the LP coefficients. The target emotion is re-synthesised by using the prosody manipulated residual and LPCs from source. The waveform similarity property of WSOLA is exploited to give output with minimal distortion. The proposed algorithm is subjectively and objectively evaluated along with popular TD-PSOLA algorithm. The correlation between synthesised and real target shows an average improvement of 60% across all emotions with the proposed technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Burkhardt, F., Sendilmeier, W.F.: Verification of acoustical correlates of emotional speech using formant synthesis. In: Proceedings of ISCA Workshop on Speech & Emotion, pp. 151–156 (2000)
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. Audio, Speech, Lang. Pro. 14, 1145–1154 (2006)
Cabral, J., Oliveira, L.C.: Emovoice: a system to generate emotions in speech. In: Proceedings of INTERSPEECH, 17–21 September, PA, USA, pp. 1798–1801 (2006)
Govind, D., Prasanna, S.R.M.: Dynamic prosody modification using zero frequency filtered signal. Int. J. Speech Tech. 16, 41–54 (2013)
Rao, K.S., Vuppala, A.K.: Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. 55, 745–756 (2013)
Vuppala, A.K., Kaidiri, S.R.: Neutral to anger speech conversion using non-uniform duration modification. In: Proceedings of 9th International Conference on Industrial and Information Systems (ICIIS), 15–17 December, pp. 1–4 (2014)
Vydana, H.K., Raju, V.V.V., Gangashetty, S.V., Vuppala, A.K.: Significance of emotionally significant regions of speech for emotive to neutral conversion. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 287–296. Springer, Cham (2015). doi:10.1007/978-3-319-26832-3_28
Yadav, J., Rao, K.S.: Generation of emotional speech by prosody imposition on sentence, word and syllable level fragments of neutral speech,. In: Proceedings of International Conference on Cognitive Computing and Information Processing (CCIP), 3–4 March. pp. 1–5 (2015)
Vydana, H.K., Kadiri, S.R., Vuppala, A.K.: Vowel-based non-uniform prosody modification for emotion conversion. Circuits, Syst., Sig. Proc. 35(5), 1643–1663 (2016)
Vekkot, S., Tripathi, S.: Inter-Emotion conversion using dynamic time warping and prosody imposition. In: Proceedings of 2nd International Symposium on Intelligent Systems, Technologies & Applications, LNMIIT, Jaipur, 21–24 September, pp. 913–924 (2016)
Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1993, vol. 2. IEEE (1993)
Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Ranka, S., Aluru, S., Buyya, R., Chung, Y.-C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V. (eds.) IC3 2009. CCIS, vol. 40, pp. 485–492. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03547-0_46
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(2), 561–580 (1975). IEEE Press, New York
Mourlines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 16, 175–205 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Vekkot, S., Tripathi, S. (2017). Vocal Emotion Conversion Using WSOLA and Linear Prediction. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_78
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)