Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Novel Markovian Framework for Integrating Absolute and Relative Ordinal Emotion Information

Published: 01 July 2023 Publication History

Abstract

There is growing interest in affective computing for the representation and prediction of emotions along ordinal scales. However, the term ordinal emotion label has been used to refer to both absolute notions such as low or high arousal, as well as relation notions such as arousal is higher at one instance compared to another. In this paper, we introduce the terminology absolute and relative ordinal labels to make this distinction clear and investigate both with a view to integrate them and exploit their complementary nature. We propose a Markovian framework referred to as Dynamic Ordinal Markov Model (DOMM) that makes use of both absolute and relative ordinal information, to improve speech based ordinal emotion prediction. Finally, the proposed framework is validated on two speech corpora commonly used in affective computing, the RECOLA and the IEMOCAP databases, across a range of system configurations. The results consistently indicate that integrating relative ordinal information improves absolute ordinal emotion prediction.

References

[1]
V. Sethu, J. Epps, and E. Ambikairajah, “Speech based emotion recognition,” in Speech and Audio Processing For Coding, Enhancement and Recognition. Berlin, Germany: Springer, 2015, pp. 197–228.
[2]
G. Shashidhar, K. Koolagudi, and R. Sreenivasa, Emotion Recognition From Speech: A Review. Berlin, Germany: Springer, 2012, pp. 99–117.
[3]
R. Cowie et al., “Emotion recognition in human-computer interaction,” IEEE Signal Process. Mag., vol. 18, no. 1, pp. 32–80, Jan. 2001.
[4]
M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recognit., vol. 44, no. 3, pp. 572–587, 2011.
[5]
M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, 2020.
[6]
D. Grandjean, D. Sander, and K. R. Scherer, “Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization,” Consciousness Cogn., vol. 17, no. 2, pp. 484–495, 2008.
[7]
L. Devillers, L. Vidrascu, and L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection,” Neural Netw., vol. 18, no. 4, pp. 407–422, 2005.
[8]
R. Plutchik, “Emotions: A general psychoevolutionary theory,” Approaches Emotion, vol. 1984, pp. 197–219, 1984.
[9]
R. Cowie and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Commun., vol. 40, no. 1/2, pp. 5–32, 2003.
[10]
J. A. Russell, “A circumplex model of affect,” J. Pers. Social Psychol., vol. 39, no. 6, 1980, Art. no.
[11]
R. Cowie, C. Cox, J.-C. Martin, A. Batliner, D. Heylen, and K. Karpouzis, “Issues in data labelling,” in Emotion-Oriented Systems, Berlin, Germany: Springer, 2011, pp. 213–241.
[12]
G. N. Yannakakis, R. Cowie, and C. Busso, “The ordinal nature of emotions: An emerging approach,” IEEE Trans. Affect. Comput., vol. 12, no. 1, pp. 16–35, Jan.–Mar. 2021.
[13]
A. Metallinou and S. Narayanan, “Annotation and processing of continuous emotional attributes: Challenges and opportunities,” in Proc. 10th IEEE Int. Conf. Workshops Autom. Face Gesture Recognit., 2013, pp. 1–8.
[14]
G. A. Miller, “The magical number seven, plus or minus two: Some limits on our capacity for processing information,” Psychol. Rev., vol. 63, no. 2, 1956, Art. no.
[15]
H. Helson, “Adaptation-level theory: An experimental and systematic approach to behavior,” New York, NY, USA: Harper, 1964.
[16]
J. A. Russell and U. F. Lanius, “Adaptation level and the affective appraisal of environments,” J. Environ. Psychol., vol. 4, no. 2, pp. 119–135, 1984.
[17]
G. N. Yannakakis, R. Cowie, and C. Busso, “The ordinal nature of emotions,” in Proc. 7th Int. Conf. Affect. Comput. Intell. Interaction, 2017, pp. 248–255.
[18]
N. Stewart, G. D. Brown, and N. Chater, “Absolute identification by relative judgment,” Psychol Rev., vol. 112, no. 4, 2005, Art. no.
[19]
B. Seymour and S. M. McClure, “Anchors, scales and the relative coding of value in the brain,” Curr. Opin. Neurobiol., vol. 18, no. 2, pp. 173–178, 2008.
[20]
G. N. Yannakakis and H. P. Martínez, “Ratings are overrated!,” Front. ICT, vol. 2, 2015, Art. no.
[21]
S. Parthasarathy, R. Lotfian, and C. Busso, “Ranking emotional attributes with deep neural networks,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2017, pp. 4995–4999.
[22]
H. Cao, R. Verma, and A. Nenkova, “Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech,” Comput Speech Lang., vol. 29, no. 1, pp. 186–202, 2015.
[23]
R. Lotfian and C. Busso, “Practical considerations on the use of preference learning for ranking emotional speech,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2016, pp. 5205–5209.
[24]
Y.-H. Yang and H. H. Chen, “Ranking-based emotion recognition for music organization and retrieval,” IEEE Trans. Audio Speech Lang. Process., vol. 19, no. 4, pp. 762–774, May 2011.
[25]
D. Melhart, K. Sfikas, G. Giannakakis, and G. Y. A. Liapis, “A study on affect model validity: Nominal vs ordinal labels,” in Proc. Workshop Artif. Intell. Affect. Comput., 2020, pp. 27–34.
[26]
H. P. Martinez, G. N. Yannakakis, and J. Hallam, “Don't classify ratings of affect; rank them!,” IEEE Trans. Affect. Comput., vol. 5, no. 3, pp. 314–326, Jul.–Sep. 2014.
[27]
P. A. Gutierrez, M. Perez-Ortiz, J. Sanchez-Monedero, F. Fernandez-Navarro, and C. Hervas-Martinez, “Ordinal regression methods: Survey and experimental study,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 1, pp. 127–146, Jan. 2016.
[28]
I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools Techniques. San Mateo, CA, USA: Morgan Kaufmann, 2016.
[29]
J. Sánchez-Monedero, P. A. Gutiérrez, P. Tiňo, and C. J. N. C. Hervás-Martínez, “Exploitation of pairwise class distances for ordinal classification,” Neural Computation, vol. 25, no. 9, pp. 2450–2485, 2013.
[30]
A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan, “Context-sensitive learning for enhanced audiovisual emotion classification,” IEEE Trans. Affect. Comput., vol. 3, no. 2, pp. 184–198, Second Quarter 2012.
[31]
J. C. Kim and M. A. Clements, “Multimodal affect classification at various temporal lengths,” IEEE Trans. Affect. Comput., vol. 6, no. 4, pp. 371–384, Oct.–Dec. 2015.
[32]
M. Neumann, “Cross-lingual and multilingual speech emotion recognition on english and french,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2018, pp. 5769–5773.
[33]
Z. Zhang, F. Ringeval, B. Dong, E. Coutinho, E. Marchi, and B. Schüller, “Enhanced semi-supervised learning for multimodal emotion recognition,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2016, pp. 5185–5189.
[34]
A. Agresti, Categorical Data Analysis. Hoboken, NJ, USA: Wiley, 2003.
[35]
J. Verwaeren, W. Waegeman, and B. De Baets, “Learning partial ordinal class memberships with kernel-based proportional odds models,” Comput. Statist. Data Anal., vol. 56, no. 4, pp. 928–942, 2012.
[36]
K.-. Kim and H. Ahn, “A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach,” Comput. Operations Res., vol. 39, no. 8, pp. 1800–1811, 2012.
[37]
H. Ahn and K.-J. Kim, “Corporate credit rating using multiclass classification models with order information,” vol. 5, no. 12, pp. 1783–1788, 2011.
[38]
J. Kwon, K. Choi, and Y. Suh, “Double ensemble approaches to predicting firms' credit rating,” in Proc. Pacific Asia Conf. Inf. Syst., 2013, Art. no.
[39]
L. Cao, L. K. Guan, and Z. Jingqing, “Bond rating using support vector machine,” Intell. Data Anal., vol. 10, no. 3, pp. 285–296, 2006.
[40]
W. Han, T. Jiang, Y. Li, B. Schuller, and H. Ruan, “Ordinal learning for emotion recognition in customer service calls,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2020, pp. 6494–6498.
[41]
L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
[42]
A. W. Bowman and A. Azzalini, Applied Smoothing Techniques For Data Analysis: The Kernel Approach With S-Plus Illustrations. Oxford, U.K.: Oxford Univ. Press, 1997.
[43]
H. Peter D, “Kernel estimation of a distribution function,” vol. 14, no. 3, pp. 605–620, 1985.
[44]
G. D. Forney, “The viterbi algorithm,” vol. 61, no. 3, pp. 268–278, 1973.
[45]
N. Fuhr, “Optimum polynomial retrieval functions based on the probability ranking principle,” ACM Trans. Inf. Syst., vol. 7, no. 3, pp. 183–204, 1989.
[46]
C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
[47]
T. Joachims, “Optimizing search engines using clickthrough data,” in Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2002, pp. 133–142.
[48]
M. Gonen, A. G. Tanugur, and E. Alpaydin, “Multiclass posterior probability support vector machines,” IEEE Trans. Neural Netw., vol. 19, no. 1, pp. 130–139, Jun. 2008.
[49]
C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415–425, Mar. 2002.
[50]
A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statist. Comput., vol. 14, no. 3, pp. 199–222, 2004.
[51]
J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999.
[52]
J. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Adv. Large Margin Classifiers, vol. 10, no. 3, pp. 61–74, 1999.
[53]
F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, “Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions,” in Proc. 10th IEEE Int. Conf. Workshops Autom. Face Gesture Recognit., 2013, pp. 1–8.
[54]
C. Busso et al., “IEMOCAP: Interactive emotional dyadic motion capture database,” Lang. Resour. Eval., vol. 42, no. 4, 2008, Art. no.
[55]
M. Valstar et al., “Avec 2016: Depression, mood, and emotion recognition workshop and challenge,” in Proc. 6th Int. Workshop Audio/Vis. Emotion Challenge, 2016, pp. 3–10.
[56]
F. Ringeval et al., “Av+ ec 2015: The first affect recognition challenge bridging across audio, video, and physiological data,” in Proc. 5th Int. Workshop Audio/Vis. Emotion Challenge, 2015, pp. 3–8.
[57]
Z. Huang et al., “An investigation of annotation delay compensation and output-associative fusion for multimodal continuous emotion prediction,” in Proc. 5th Int. Workshop Audio/Vis. Emotion Challenge, 2015, pp. 41–48.
[58]
S. Parthasarathy and C. Busso, “Defining emotionally salient regions using qualitative agreement method,” in Proc. INTERSPEECH, 2016, pp. 3598–3602.
[59]
M. Neumann and N. T. Vu, “Improving speech emotion recognition with unsupervised representation learning on unlabeled speech,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2019, pp. 7390–7394.
[60]
S. Parthasarathy and C. Busso, “Preference-learning with qualitative agreement for sentence level emotional annotations,” in Proc. Interspeech, 2018, pp. 252–256.
[61]
O. Verkholyak, D. Fedotov, H. Kaya, Y. Zhang, and A. Karpov, “Hierarchical two-level modelling of emotional states in spoken dialog systems,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2019, pp. 6700–6704.
[62]
C.-C. Lee, C. Busso, S. Lee, and S. S. Narayanan, “Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions,” in Proc. 10th Annu. Conf. Int. Speech Commun. Assoc., 2009, pp. 1983–1986.
[63]
F. Eyben et al., “The geneva minimalistic acoustic parameter set for voice research and affective computing,” IEEE Trans. Affect. Comput., vol. 7, no. 2, pp. 190–202, Jun. 2016.
[64]
L. Tian, J. Moore, and C. Lai, “Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features,” in Proc. IEEE Spoken Lang. Technol. Workshop, 2016, pp. 565–572.
[65]
F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: The munich versatile and fast open-source audio feature extractor,” in Proc. 18th ACM Int. Conf. Multimedia, 2010, pp. 1459–1462.
[66]
S. Escalera, O. Pujol, and P. Radeva, “Separability of ternary codes for sparse designs of error-correcting output codes,” Pattern Recognit. Lett., vol. 30, no. 3, pp. 285–297, 2009.
[67]
O. Chapelle and S. S. Keerthi, “Efficient algorithms for ranking with SVMs,” Inf. Retrieval, vol. 13, no. 3, pp. 201–215, 2010.
[68]
S. S. Keerthi and D. DeCoste, “A modified finite newton method for fast solution of large scale linear SVMs,” J. Mach. Learn. Res., vol. 6, pp. 341–361, 2005.
[69]
O. Chapelle, “Training a support vector machine in the primal,” Neural Computation, vol. 19, no. 5, pp. 1155–1178, 2007.
[70]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” J. Mach. Learn. Res., vol. 9, pp. 1871–1874, 2008.
[71]
J. Cohen, “Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit,” Psychol. Bull., vol. 70, no. 4, 1968, Art. no.
[72]
A. R. Gilpin, “Table for conversion of kendall's tau to spearman's rho within the context of measures of magnitude of effect for meta-analysis,” Educ. Psychol. Meas., vol. 53, no. 1, pp. 87–92, 1993.
[73]
I. StatSoft, Electronic Statistics Textbook. Tulsa, OK, USA: StatSoft, 2013.
[74]
J. Han, Z. Zhang, F. Ringeval, and B. Schuller, “Reconstruction-error-based learning for continuous emotion recognition in speech,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2017, pp. 2367–2371.

Cited By

View all
  • (2024)Learning With Rater-Expanded Label Space to Improve Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336042815:3(1539-1552)Online publication date: 1-Jul-2024
  • (2024)Iterative minority oversampling and its ensemble for ordinal imbalanced datasetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107211127:PAOnline publication date: 1-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing  Volume 14, Issue 3
July-Sept. 2023
853 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Learning With Rater-Expanded Label Space to Improve Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2024.336042815:3(1539-1552)Online publication date: 1-Jul-2024
  • (2024)Iterative minority oversampling and its ensemble for ordinal imbalanced datasetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107211127:PAOnline publication date: 1-Feb-2024

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media