Abstract
An emotion recognition framework based on sound processing could improve services in human–computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.
Similar content being viewed by others
References
P. J. Lang, “The Emotion Probe: Studies of Motivation and Attention”, American Psychologist 50(5), 1995, pp. 372–385.
V. Hozjan and Z. Kacic, “Context-independent multilingual emotion recognition from speech signals”, International Journal of Speech Technology 6, 2003, pp. 311–320.
F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss. “A database of german emotional speech”. In Proceedings Interspeech, Lisbon, Portugal, 2005.
S. Kim, P. Georgiou, S. Lee, and S. Narayanan. “Real-time emotion detection system using speech: Multi-modal fusion of different timescale features”, Proceedings of IEEE Multimedia Signal Processing Workshop, Chania, Greece, 2007.
D. Morrison, R. Wang, and L. C. De Silva. “Ensemble methods for spoken emotion recognition in call-centres”, Speech Communication 49, 2007, pp. 98–112.
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke. “Prosody-based automatic detection of annoyance and frustration in human–computer dialog”. Proceedings of the International Conference on Spoken Language Processing (ICSLP 2002), Denver, Colorado.
V. Petrushin, 2000. Emotion recognition in speech signal: experimental study, development, and application. In: Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China.
T. Bänziger and K. R.Scherer, 2005, “The role of intonation in emotional expression”, Speech Communication 46, pp. 252–267.
W. H. Abdulla and N. K. Kasabov, 2001, “Improving speech recognition performance through gender separation”, In Proceedings of ANNES, Dunedin, New Zealand, pp. 218–222.
Y. Wang and L. Guan, “Recognizing human emotion from audiovisual information,” Proceedings ICASP 2005, pp. 1125-1128.
T. Vogt and E. Andre, 2006, “Improving Automatic Emotion Recognition from Speech via Gender Differentiation” In Proc. of Language Resources and Evaluation Conference, 2006, pp. 1123–1126.
T. P. Kostoulas and N. Fakotakis, 2006, “A Speaker Dependent Emotion Recognition Framework”, CSNDSP Fifth International Symposium, Patras, July 19–21, pp. 305–309.
M. Fingerhut, 2004, “Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits”, IAML-IASA Congress, Oslo (Norway), August 8–13.
K. R. Scherer, 2003, “Vocal communication of emotion: a review of research paradigms. Speech Communication 40, pp. 227–256.
Waikato Environment for Knowledge Analysis (WEKA), [Computer program]. Retrieved January 24, 2006, from http://www.cs.waikato.ac.nz/ml/weka/
D.Talkin, “A Robust Algorithm for Pitch Tracking (RAPT)”, Speech Coding & Synthesis, 1995.
B. Schuller, S. Reiter, R. Müller, M. Al-Hames, M. Lang, and G. Rigoll, 2005, “Speaker independent speech emotion recognition by ensemble classification”, Proc. of IEEE International Conference on Multimedia and Expo, pp. 864–867.
E. H. Kim, K. H. Hyun, and Y. K. Kwak, 2005, “Robust emotion recognition feature, frequency range of meaningful signal”, Proc. of 2005 IEEE International Workshop on Robots and Human Interactive Communication, pp. 667–671.
F. Yu, E. Chang, Y.-Q. Xu, and H.-Y. Shum, Emotion Detection from Speech to Enrich Multimedia Content, H.-Y. Shum, M. Liao, and S.-F. Chang (Eds.): PCM 2001, Lecture Notes in Computer Science, vol. 2195, 2001, pp. 550–557.
T. Kostoulas, T. Ganchev, and N. Fakotakis, “Study on speaker-independent emotion recognition from speech on real-world data”, Cost2102 Workshop, Lecture Notes in Computer Science, 2007, (in press).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Anagnostopoulos, C.N., Vovoli, E. (2009). Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database. In: Papadopoulos, G., Wojtkowski, W., Wojtkowski, G., Wrycza, S., Zupancic, J. (eds) Information Systems Development. Springer, Boston, MA. https://doi.org/10.1007/b137171_43
Download citation
DOI: https://doi.org/10.1007/b137171_43
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84809-9
Online ISBN: 978-0-387-84810-5
eBook Packages: Computer ScienceComputer Science (R0)