Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Emotion recognition from speech using global and local prosodic features

Published: 01 June 2013 Publication History

Abstract

In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions.

References

[1]
Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46, 252-267.
[2]
Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer.
[3]
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech.
[4]
Cahn, J. E. (1990). The generation of affect in synthesized speech. In JAVIOS (pp. 1-19), July 1990
[5]
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5-32.
[6]
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing (pp. 1970-1973), Philadelphia, PA, USA, Oct. 1996.
[7]
Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161-187.
[8]
Iliou, T., & Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, July (pp. 121-126).
[9]
Kao, Y.h., & Lee, L.s. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH-ICSLP (pp. 1814-1817), Pittsburgh, Pennsylvania, Sept. 2006.
[10]
Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35-48.
[11]
Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99-117.
[12]
Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system and prosodic features. International Journal of Speech Technology, 15(2), 265-289.
[13]
Koolagudi, S. G., & Rao, K. S. (2012c). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology. s10772-012-9150-8.
[14]
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis, Aug. 2009. Communications in computer and information science, Lecture notes in computer science. Berlin: Springer.
[15]
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293-303.
[16]
Luengo, I., Navas, E., Hernez, I., & Snchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal (pp. 493-496), Sept. 2005.
[17]
Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP (pp. IV17-IV20), Honolulu, Hawai, USA, May 2007. New York: IEEE Press.
[18]
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCA workshop on speech and emotion, Belfast.
[19]
Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion by rule in synthetic speech. Speech Communication, 16, 369-390.
[20]
Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: Progress and future directions. Speech Communication, 20, 85-91.
[21]
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602-1613.
[22]
Nwe, T. L., Foo, S.W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603-623.
[23]
Prasanna, S. R.M. (2004). Event-based analysis of speech. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, Mar. 2004.
[24]
Prasanna, S. R. M., & Zachariah, J. M. (2002). Detection of vowel onset point in speech. In Proc. IEEE int. conf. acoust., speech, signal processing. Orlando, Florida, USA, May 2002.
[25]
Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556-565.
[26]
Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.
[27]
Rao, K. S. (2011a). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19-33.
[28]
Rao, K. S. (2011b). Role of neural network models for developing speech systems. Sadhana, 36, 783-836.
[29]
Rao, K. S. & Koolagudi, S. G., (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. IJSCI: International Journal of Systemics, Cybernetics and Informatics, 9(4), 24-33.
[30]
Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972-980.
[31]
Rao, K. S., Prasanna, S. R.M., & Sagar, T. V. (2007). Emotion recognition using multilevel prosodic information. In Workshop on image and signal processing (WISP-2007). Guwahati, India, Dec. 2007. Guwahati: IIT Guwahati.
[32]
Rao, K. S., Reddy, R., Maity, S., & Koolagudi, S. G. (2010). Characterization of emotions using the dynamics of prosodic features. In International conference on speech prosody. Chicago, USA, May 2010.
[33]
Rao, K. S., Saroj, V. K., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38, 13181-13185.
[34]
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227-256.
[35]
Schroder, M. (2001). Emotional speech synthesis: a review. In Seventh European conference on speech communication and technology, Eurospeech, Aalborg, Denmark, Sept. 2001
[36]
Schroder, M., & Cowie, R. (2006). Issues in emotion-oriented computing toward a shared understanding. In Workshop on emotion and computing, HUMAINE.
[37]
Schuller, B. (2012). The computational paralinguistics challenge. IEEE Signal Processing Magazine, 29, 97-101.
[38]
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062-1087.
[39]
Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, Dec. 2006.
[40]
Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593-I596). New York: IEEE Press.
[41]
Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1894-1903.
[42]
Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation (pp. 407-411).
[43]
Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art, the future challenges (pp. 23-40). Chichester: Wiley.
[44]
Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun et al. (Ed.), Advances in neural networks (pp. 457-464). Lecture notes in computer science. Berlin: Springer.
[45]
Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), Human computer interaction, Part III, HCII (pp. 544-552). Lecture notes in computer science. Berlin: Springer.

Cited By

View all
  • (2025)Speech emotion recognition in real static and dynamic human-robot interaction scenariosComputer Speech and Language10.1016/j.csl.2024.10166689:COnline publication date: 1-Jan-2025
  • (2024)Hilbert Domain Analysis of Wavelet Packets for Emotional Speech ClassificationCircuits, Systems, and Signal Processing10.1007/s00034-023-02544-743:4(2224-2250)Online publication date: 1-Apr-2024
  • (2023)IIMH: Intention Identification In Multimodal Human UtterancesProceedings of the 2023 Fifteenth International Conference on Contemporary Computing10.1145/3607947.3608016(337-344)Online publication date: 3-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Speech Technology
International Journal of Speech Technology  Volume 16, Issue 2
June 2013
121 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2013

Author Tags

  1. Emo-DB
  2. Emotion recognition
  3. Global prosodic features
  4. IITKGP-SESC
  5. Local prosodic features
  6. Region-wise emotion recognition
  7. Segment-wise emotion recognition
  8. Vowel onset point

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Speech emotion recognition in real static and dynamic human-robot interaction scenariosComputer Speech and Language10.1016/j.csl.2024.10166689:COnline publication date: 1-Jan-2025
  • (2024)Hilbert Domain Analysis of Wavelet Packets for Emotional Speech ClassificationCircuits, Systems, and Signal Processing10.1007/s00034-023-02544-743:4(2224-2250)Online publication date: 1-Apr-2024
  • (2023)IIMH: Intention Identification In Multimodal Human UtterancesProceedings of the 2023 Fifteenth International Conference on Contemporary Computing10.1145/3607947.3608016(337-344)Online publication date: 3-Aug-2023
  • (2023)Applying Segment-Level Attention on Bi-Modal Transformer Encoder for Audio-Visual Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.325890014:4(3231-3243)Online publication date: 1-Oct-2023
  • (2023)Speech emotion recognition approachesSpeech Communication10.1016/j.specom.2023.102974154:COnline publication date: 1-Oct-2023
  • (2023)Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global featuresMultimedia Tools and Applications10.1007/s11042-023-15132-382:23(36413-36430)Online publication date: 30-Mar-2023
  • (2023)Trends in speech emotion recognition: a comprehensive surveyMultimedia Tools and Applications10.1007/s11042-023-14656-y82:19(29307-29351)Online publication date: 22-Feb-2023
  • (2023)Speaker and gender dependencies in within/cross linguistic Speech Emotion RecognitionInternational Journal of Speech Technology10.1007/s10772-023-10038-926:3(609-625)Online publication date: 1-Sep-2023
  • (2023)Linguistic analysis for emotion recognition: a case of Chinese speakersInternational Journal of Speech Technology10.1007/s10772-023-10028-x26:2(417-432)Online publication date: 18-Mar-2023
  • (2023)Investigating Acoustic Cues of Emotional Valence in Mandarin Speech Prosody - A Corpus ApproachChinese Lexical Semantics10.1007/978-981-97-0586-3_25(316-330)Online publication date: 19-May-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media