Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Spectral Flatness Analysis for Emotional Speech Synthesis and Transformation

  • Conference paper
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5641))

  • 1707 Accesses

Abstract

According to psychological research of emotional speech different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Obtained statistical parameters and emotional-to-neutral ratios of their mean values show good correlation for both male and female voices and all three emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gray Jr., A.H., Markel, J.D.: A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-22, 207–217 (1974)

    Article  Google Scholar 

  2. Esposito, A., Stejskal, V., Smékal, Z., Bourbakis, N.: The Significance of Empty Speech Pauses: Cognitive and Algorithmic Issues. In: Proceedings of the 2nd International Symposium on Brain Vision and Artificial Intelligence, Naples, pp. 542–554 (2007)

    Google Scholar 

  3. Ito, T., Takeda, K., Itakura, F.: Analysis and Recognition of Whispered Speech. Speech Communication 45, 139–152 (2005)

    Article  Google Scholar 

  4. Přibil, J., Přibilová, A.: Voicing Transition Frequency Determination for Harmonic Speech Model. In: Proceedings of the 13th International Conference on Systems, Signals and Image Processing, Budapest, pp. 25–28 (2006)

    Google Scholar 

  5. Přibil, J., Madlová, A.: Two Synthesis Methods Based on Cepstral Parameterization. Radioengineering 11(2), 35–39 (2002)

    Google Scholar 

  6. Scherer, K.R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  7. Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation, and Gain Matching in Cepstral Speech Synthesis. In: Proceedings of the 15th Biennial International EURASIP Conference Biosignal, Brno, pp. 77–82 (2000)

    Google Scholar 

  8. Paeschke, A.: Global Trend of Fundamental Frequency in Emotional Speech. In: Proceedings of Speech Prosody, Nara, Japan, pp. 671–674 (2004)

    Google Scholar 

  9. Bulut, M., Lee, S., Narayanan, S.: A Statistical Approach for Modeling Prosody Features Using POS Tags for Emotional Speech Synthesis. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawai, pp. 1237–1240 (2007)

    Google Scholar 

  10. Markel, J.D., Gray Jr., A.H.: Linear Prediction of Speech. Springer, Heidelberg (1976)

    Book  MATH  Google Scholar 

  11. Suhov, Y., Kelbert, M.: Probability and Statistics by Example. Basic Probability and Statistics, vol. I. Cambridge University Press, Cambridge (2005)

    Book  MATH  Google Scholar 

  12. Everitt, B.S.: The Cambridge Dictionary of Statistics, 3rd edn. Cambridge University Press, Cambridge (2006)

    MATH  Google Scholar 

  13. Boersma, P., Weenink, D.: Praat: Doing Phonetics by Computer (Version 5.0.32) [Computer Program], http://www.praat.org/ (retrieved August 12, 2008)

  14. Boersma, P., Weenink, D.: Praat - Tutorial, Intro 4. Pitch analysis (September 5, 2007), http://www.fon.hum.uva.nl/praat/manual/Intro_4__Pitch_analysis.html

  15. Přibil, J., Přibilová, A.: Application of Expressive Speech in TTS System with Cepstral Description. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 200–212. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Přibilová, A., Přibil, J.: Spectrum Modification for Emotional Speech Synthesis. In: Esposito, A., et al. (eds.) Multimodal Signals: Cognitive and Algorithmic Issues. LNCS (LNAI), vol. 5398, pp. 232–241. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Přibil, J., Přibilová, A. (2009). Spectral Flatness Analysis for Emotional Speech Synthesis and Transformation. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03320-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03319-3

  • Online ISBN: 978-3-642-03320-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics