1. Introduction
Could the future of virtual reality (VR) involve experiences that understand and respond to our emotions in real time? The promising field of virtual reality encompasses diverse applications in entertainment, therapy, and training [
1]. As VR technology matures, the potential to recognize the emotional state of a user and customize the virtual environment accordingly is immense. Such advancements could significantly enhance immersion, personalize experiences, and introduce innovative forms of human–computer interaction [
2].
Although the field of emotion classification has explored various modalities, its seamless integration into VR systems is still evolving [
3]. Current methods often depend on facial expressions or user input, which may not be consistently accurate or may interrupt the VR experience [
4,
5,
6]. The use of physiological signals, which more directly reflect emotional states, promises a more natural and continuous recognition of emotions within VR settings [
7].
Our research presents a method that processes and analyzes electrocardiography (ECG) and galvanic skin response (GSR) signals to classify emotions such as excitement, happiness, and anxiety in VR, leveraging the physiological reactions these emotions invoke. The established efficacy of biosignals, with ECG and GSR being the primary indicators of arousal and valence, underpins our approach [
8]. We extracted essential features, including heart rate variability (HRV), morphological aspects, and Hjorth parameters. Using a Chi-square-based feature selection in MATLAB, we carefully partitioned our dataset to maintain the integrity of the machine learning process. The application of an ensemble learning algorithm provided robust emotion classification, demonstrating the high accuracy of our method.
The overarching aim of this research was to forge a method for emotion classification within VR environments using physiological signals. This method strove not only for high accuracy but also for an enhanced user experience through instantaneous emotion detection. Our approach’s precision in classifying emotions in VR hints at the transformative potential of biosignal processing coupled with machine learning to redefine VR experiences.
The remainder of this paper is structured as follows.
Section 2 provides background information on existing approaches to emotion classification in VR.
Section 3 details our proposed methodology, including the data collection, feature extraction, and classification techniques.
Section 4 presents our experimental results and evaluation metrics. Finally,
Section 5 discusses the implications of our work and outlines directions for future research.
2. Background
Virtual reality (VR), as a transformative technology, has redefined human–computer interactions by creating immersive three-dimensional experiences that transcend traditional interfaces [
9]. Despite its advancements, the exploration of VR’s potential for customization and adaptability, especially in the realm of emotion recognition and response, remains relatively nascent. The ability to accurately classify and respond to user emotions could initiate a new era of VR applications, one that is more intuitive, interactive, and tailored to individual experiences [
10]. However, the complex nature of human emotions poses significant challenges in achieving precise emotion recognition within VR environments.
The integration of biosignal processing techniques with advanced machine learning algorithms offers a promising approach to navigate the complex landscape of human emotions in VR [
11]. Furthermore, the development of comprehensive datasets, such as DER-VREEG, which take advantage of low-cost wearable EEG headsets in VR settings, has facilitated the advancement of emotion recognition algorithms [
12]. The exploration of embodied interactions within VR systems also provides valuable insights into user experiences and emotional responses, underscoring the importance of a nuanced understanding of human–VR interactions [
13]. Moreover, the analysis of VR as a communication process highlights the potential of VR to mediate complex emotional and cognitive experiences [
14]. Finally, the examination of variation in VR gameplay and its impact on affective responses illuminates the complex relationship between VR experiences and emotional states [
15].
Traditionally, emotion recognition methods have predominantly utilized facial expression analysis and speech patterns to discern emotional states [
16,
17,
18]. Although effective to a certain extent, these techniques may not fully translate to the virtual reality (VR) environment due to user comfort considerations and the inherent limitations of VR headsets. This limitation calls for the exploration of more subtle and less invasive techniques for emotion recognition within VR settings. Biosignal processing, which involves the interpretation of physiological signals such as electrocardiography (ECG) and galvanic skin response (GSR), presents a viable alternative [
19]. These biosignals are closely related to human emotional responses and, as such, offer a promising avenue for emotion classification that bypasses the constraints of traditional methods.
In the domain of emotion recognition through physiological signals and machine learning, seminal works have established a solid foundation for our investigation. Bota et al. [
20] offered an comprehensive review, defining current challenges and prospective avenues in the field, particularly highlighting the significant role of machine learning techniques and physiological indicators in emotion recognition. Hsu et al. [
7] developed an algorithm based on electrocardiography (ECG) for the detection of emotions, employing a series of experiments involving varying durations of listening to music to evoke different emotional responses. The ECG signals recorded during these sessions were subjected to feature selection and reduction using machine learning approaches, resulting in classification accuracies of 82.78% for valence, 72.91% for arousal, and 61.52% for a four-class emotion model. In addition, Uyank et al. [
21] explored the classification of emotional valence using electroencephalogram (EEG) signals within a virtual reality (VR) setting. This study extracted differential entropy features from various EEG bands and evaluated the accuracy of several classifiers, including support vector machines (SVM), k-nearest neighbor (kNN), naive Bayesian, decision trees, and logistic regression, with the SVM classifier achieving an accuracy of 76 22%. Zhang et al. [
17] focused on emotion recognition using galvanic skin response (GSR) signals, proposing a novel approach utilizing quantum neural networks. Given the diverse levels of information encapsulated in the features of the GSR signal corresponding to different emotions, the quantum neural network, optimized with particle swarm optimization, facilitated a more adaptable classification scheme, achieving an average accuracy of 84. 9% in five emotion categories with data from 35 participants.
In the evolving field of emotion recognition through physiological signals, several studies have pioneered the use of singular biosignals for sophisticated classification tasks. Chen et al. [
22] investigated the efficacy of artificial neural networks by harnessing heart rate variability (HRV) data to classify emotions into five distinct categories: pleasure, happiness, fear, anger, and neutral. HRV data, obtained from wristband monitors during gameplay, facilitated the training of neural networks, with a notable configuration that achieved an average accuracy of 82.84%. In a quest to discern three emotional states, Dominguez-Jimenez et al. [
23] used photoplethysmography (PPG) and galvanic skin response (GSR) signals collected via an instrumented glove, employing classifiers such as SVM and LDA. Interestingly, the combination of GSR signals and the SVM classifier achieved a perfect accuracy rate in identifying amusement, sadness, and neutral states.
Further exploring the domain of wearable technologies in healthcare, Ayata et al. [
24] introduced an algorithm that assesses emotional states through the dimensions of arousal and valence using various machine learning techniques. Their research demonstrated the superior accuracy of a fused model incorporating a respiratory belt, PPG, and fingertip temperature signals, underscoring the potential of multimodal physiological data to enhance emotion recognition accuracy. The highest accuracies recorded were 73.08% for arousal and 72.18% for valence.
Bălan et al. [
25] investigated the utility of machine learning classifiers in a VR setting aimed at acrophobia treatment, utilizing EEG, GSR, and PPG signals to determine fear levels to tailor VR therapies. Although a 2-choice fear scale yielded an accuracy rate of 89 5%, the performance was markedly reduced with a more granular 4-choice scale. Complementing these empirical studies, Marín-Morales et al. [
26] provided a comprehensive review on emotion recognition within immersive VR, charting the field’s progression and setting the stage for future inquiries. These seminal contributions illustrate the importance of emotion recognition techniques and provide invaluable insights, laying the groundwork for our investigation.
Within the dynamic field of affective computing, the endeavor to devise reliable emotion recognition systems is accelerating, driven by technological advances in sensor capabilities and machine learning methodologies. Saganowski et al. [
27] delineated the shift of emotion recognition efforts from the confines of laboratory environments to real-world applications, underlining the critical role that readily available sensor technologies and advanced signal processing methods play. This shift has been further propelled by innovations in learning algorithms, as expounded by Pal et al. [
28], who highlighted the seamless integration of these advancements into digital platforms, thus expanding the horizons of emotion recognition applications.
Given the complex nature of human emotions, a multimodal approach is paramount to enhance the accuracy and dependability of emotion recognition systems. Wang et al. [
29] offered an extensive analysis of the field of affective computing, emphasizing the amalgamation of behavioral cues with physiological data to discern subtler emotional nuances. Their review not only illuminated the prevailing methodologies, but also paved the way for forthcoming investigative endeavors within this sphere. Echoing the multimodal paradigm, Dissanayake et al. [
30] examined the realm of wearable technologies for emotion detection, presenting SigRep, a pioneering framework that adopts contrastive representation learning to exploit the discreet and persistent data collection capabilities of wearable devices, thus fortifying the efficacy of emotion recognition frameworks.
The quest for generalizability in emotion recognition systems across varied demographic and physiological spectra presents a formidable challenge within the realm of affective computing. Addressing this concern, Ali et al. [
31] introduced a globally generalized framework for emotion recognition that demonstrated high accuracy rates, irrespective of physiological signal discrepancies or demographic variations among subjects. This approach, which transcends subject-specific constraints, indicates a new era in the applicability of universal emotion recognition solutions. Concurrently, Su et al. [
32] unveiled an ontology-based model that exhibited commendable precision in discerning emotional states through EEG signals, pinpointing the importance of certain signal attributes—specifically, the power ratio between the beta and theta waves—as pivotal for precise emotion categorization [
19]. Together, these pioneering investigations represent the forefront of research in emotion detection, illuminating the profound capacity of physiological signals to foster more intuitive and empathetic human–computer interactions.
In conclusion, the collective body of work reviewed herein underscores the dynamic and multifaceted nature of emotion recognition research within the realm of affective computing. From pioneering methodologies that leverage the latest in sensor and machine learning innovations to nuanced approaches that seek to transcend demographic and physiological variabilities, each study significantly contributes to our understanding and capabilities in this field. As we move forward, these insights not only pave the way for more sophisticated and universally applicable emotion recognition systems, but also hold the promise of fostering more intuitive and empathetic interactions between humans and technology. The convergence of these advances promises a future where digital systems can more accurately interpret and respond to the complex tapestry of human emotions, thus enhancing the user experience across a multitude of applications. Within this evolving landscape, the proposed work stands out for its novel approach, aiming to fill a gap that has not been addressed before and thus holds the potential to make a significant contribution to the field.
3. Methods
Our study introduces an innovative methodology for recognizing emotions within virtual reality (VR) environments, capitalizing on the integrated strengths of biosignal processing and advanced machine learning techniques. Central to our approach is an intricate analysis of electrocardiography (ECG) and galvanic skin response (GSR) signals, from which we extract key features indicative of various emotional states. This process involves a comprehensive application of signal processing techniques, culminating in sophisticated feature extraction processes. Using the power of ensemble machine learning algorithms and employing rigorous feature selection strategies, such as the Chi-square method [4], we have developed a predictive model that showcases remarkable precision and efficiency in emotion classification within VR contexts. This research sets forth a groundbreaking framework aimed at enhancing the capabilities of VR technologies, making strides toward creating VR experiences that are not only immersive but also emotionally responsive and adaptive. The methodology and its implementation are detailed in
Figure 1, providing a clear road map for the development of emotionally intelligent VR systems.
3.1. Dataset Summary
This study capitalizes on the innovative “VR Eyes: Emotions Dataset” (VREED), specifically curated to elicit emotional responses through immersive 360° virtual environments (360-VEs) presented via a virtual reality (VR) headset. The dataset encompasses both self-reported emotional assessments and physiological data, including electrocardiography (ECG) and galvanic skin response (GSR) readings, collected from 34 healthy individuals. Participants experienced a series of twelve unique 360-VEs, with durations ranging from one to three minutes each, designed to immerse and evoke distinct emotional states [
33].
3.1.1. Subject Properties
The dataset utilized in this study is composed of 34 healthy individuals, providing a varied but methodically curated sample. Participants provided demographic details including gender, sexual orientation, age, and ethnicity, in addition to responses to a health screening questionnaire.
Although the VR Eyes: Emotions Dataset does not predefine specific emotional categories, we identified five distinct emotions for this study: excitement, happiness, anxiety, calmness, and sadness. These categories were selected based on their relevance to the VR scenarios used and their distinct physiological signatures in terms of heart rate variability (HRV) and galvanic skin response (GSR).
The selection of the five emotional categories (i.e., excitement, happiness, anxiety, calmness, and sadness) was informed by a combination of empirical data and prior studies on emotion recognition using physiological signals. Several studies have highlighted the distinct physiological patterns associated with these emotions, particularly in terms of heart rate variability (HRV) and galvanic skin response (GSR) [
34,
35,
36,
37]. For example, high-arousal emotions such as anxiety and excitement, although similar, show different HRV profiles [
35,
37], while low-arousal emotions such as calmness and sadness are associated with unique GSR responses [
34]. The VR scenarios used in this study were also designed to evoke emotional responses according to these categories, ensuring that the emotional states selected for classification were relevant to the stimuli provided [
36]. This combination of physiological evidence and the relevance of the VR scenario guided the selection process, ensuring that the categories chosen were both empirically grounded and contextually appropriate.
To maintain the integrity of the sample, exclusion criteria were rigorously enforced, disqualifying individuals with histories of seizure episodes, cardiac anomalies, vestibular disturbances, recurrent migraines, or psychological conditions. In addition, those susceptible to motion sickness or with compromised visual or auditory abilities were also omitted from the study. This selection process was designed to cultivate a participant pool reflecting broad demographic diversity, while adhering to stringent health and safety standards.
3.1.2. Signal Properties
The dataset encompasses physiological signals, specifically electrocardiography (ECG) and galvanic skin response (GSR) obtained with precise configurations to ensure data integrity. ECG recordings were performed using a Lead II setup, in which electrodes were strategically placed on the participant’s right arm (Vin−) at the wrist and on the left calf (Vin+), facilitating comprehensive monitoring of cardiac activity. For GSR measurements, disposable adhesive electrodes were affixed to the index and middle fingers of the participant’s right hand, capturing the subtle changes in skin conductance associated with emotional responses. The acquisition of these signals was meticulously performed, with ECG data sampled at a high resolution of 2000 Hz to capture the intricate details of heart activity, while GSR data were recorded at a frequency of 15 Hz, appropriate for tracking slower fluctuations in skin conductance [
33].
3.1.3. Selection and Evaluation Process
The curatorial process for the 360 virtual environments (360-VEs) was rigorous, involving detailed focus group discussions and a pilot study to confirm that each environment reliably induced specific emotional reactions. Participants experienced the 360-VEs in a randomized order and subsequently evaluated their emotional experiences through both the Self-Assessment Manikin (SAM) and the Visual Analog Scale (VAS), in addition to reporting their immersion levels in each 360-VE.
To validate the effectiveness of the selected 360-VEs, a preliminary trial was carried out with a group of 12 volunteers. These individuals assessed the emotional impact of the VEs employing VAS, articulating their feelings across a spectrum of emotions, including anger, calmness, sadness, happiness, and fear. This rich dataset underpinned our investigation into emotion recognition, leveraging biosignal processing and advanced machine learning techniques to discern emotional states.
3.2. Feature Extraction
We extracted 60 initial features from the biosignals, including heart rate variability (HRV), morphological characteristics, and Hjorth parameters. These features were chosen for their established relevance in emotion classification studies. After applying a feature selection process, 10 features were retained based on their statistical relevance and contribution to classification accuracy.
3.2.1. Electrocardiography (ECG)
Electrocardiography (ECG) represents a non-intrusive technique to monitor the heart’s electrical dynamics, proving indispensable for deciphering emotional states. This method involves the application of electrodes on the skin to capture the heart’s electrical signals during its contraction and relaxation phases, thus offering a detailed view of cardiac electrical activity. Such detailed cardiac measurements are crucial for understanding the physiological underpinnings related to various emotional states.
The utility of ECG in categorizing emotional states is attributed to the integral connection of the heart with the autonomic nervous system, which orchestrates emotional reactions. Emotional stimuli cause specific physiological changes, such as variations in heart rate, heart rate variability (HRV), and other ECG-derived metrics, all of which are instrumental in identifying different emotions.
Moreover, the analysis extended to the morphological aspects of ECG signals, particularly the QRS complex width. This parameter, representing the time span from the QRS complex’s start to its conclusion, reflects the duration of ventricular depolarization and is susceptible to alterations under different emotional conditions.
Figure 2 illustrates ECG traces corresponding to various scenarios, with the R peaks formally annotated, providing visual information on the cardiac responses elicited by various emotional stimuli.
The Teager energy operator was used to detect R peaks [
38,
39].
where
is our ECG signal.
3.2.2. Heart Rate Variability (HRV))
Heart rate variability (HRV) signifies the physiological variations in the intervals between successive heartbeats and can be discerned from electrocardiogram (ECG) data. It embodies the ability of the cardiovascular system to adapt to an array of stimuli, both within and outside the body. The connection between HRV and the autonomic nervous system has led to its recognition as a significant metric for evaluating emotional states.
The autonomic nervous system consists of the sympathetic and parasympathetic nervous systems. The former triggers the body’s “fight-or-flight” response during stress, while the latter modulates rest and digestion. Together, they maintain a delicate equilibrium, modulating cardiovascular functions, such as heart rate, to meet emotional and environmental demands.
HRV analysis offers a window into this sympathetic–parasympathetic nexus, yielding vital clues about a person’s emotional condition. In
Figure 3, we illustrate the HRV patterns corresponding to five distinct emotional experiences, demonstrating the potential of HRV as a tool for emotional assessment.
3.2.3. Discrete Wavelet Transform (DWT)
For a thorough and precise analysis of ECG signals, our study employed the discrete wavelet transform (DWT), a mathematical tool designed for multi-level signal analysis. This technique allows for the simultaneous investigation of a signal’s behavior in both time and frequency domains. Through the DWT, the ECG signal undergoes a detailed decomposition, resulting in a range of wavelet coefficients that represent different frequency bands over specific time periods.
The application of the DWT meticulously segments the ECG signal into a collection of wavelet coefficients, each of which captures unique frequency details at corresponding time intervals. These coefficients give us a deeper understanding of the time–frequency dynamics of the ECG signal, enabling the detection of specific patterns and variations indicative of various physiological events. Each set of coefficients is linked to a discrete wavelet scale and a particular time frame, offering invaluable insights into the oscillatory components of the ECG during that interval.
The DWT operates by breaking down the signal into multiple frequency bands through wavelet filters, isolating lower frequency components that provide an approximation of the original signal, and higher frequency components that detail finer aspects. The process is inherently dyadic, meaning that with each successive decomposition level, the signal’s frequency range is halved, providing a hierarchical analysis. In our research, we conducted a 12-level DWT in the ECG sub-band with an initial sampling rate of 2000 Hz using the
Daubechies 4 wavelet. The computation of DWT and its inverse for a signal
is defined by the subsequent equations.
where
is the wavelet function,
a denotes the scaling factor, and
b represents the translation factor [
40].
3.2.4. Galvanic Skin Response (GSR)
Galvanic skin response (GSR), also known as electrodermal activity (EDA), is a measurement of the electrical conductance of the skin, which fluctuates with physiological and psychological stimuli. These changes are predominantly due to the activity of sweat glands that are innervated by the sympathetic nervous system. When arousal levels increase, so does sweat production, leading to higher skin conductance. GSR sensors monitor these electrical changes, and the acquired data are then analyzed to deduce features indicative of physiological arousal and the corresponding emotional reactions. The application of GSR extends across disciplines such as psychology, neuroscience, and human–computer interaction, contributing to the study of emotions and stress. The interplay between GSR readings, sweat gland activity, and the workings of the autonomic nervous system provides researchers with valuable insights into the nuances of human behavior and affective experiences.
The GSR readouts shown in
Figure 4 are associated with five different emotional states.
3.2.5. Morphological Features
Extracting morphological features from ECG and GSR signals facilitates a detailed evaluation of their structural characteristics. The key morphological features derived from these signals include their width, area, skewness, kurtosis, and slope.
Width: In the context of signal analysis, ’width’ refers to the time span of a distinct segment of the signal. This metric is essential for understanding the timing and persistence of physiological processes. Analysis of the width of ECG and GSR signals enables the identification of temporal patterns and events within the physiological data, thus aiding in the differentiation of typical and atypical physiological responses. To calculate the width, the onset and conclusion of the relevant signal fragment are pinpointed, with the width being the interval between these two temporal markers.
where
is the ending time and
is the starting time of the signal component of interest [
41].
Area: The area feature is determined by computing the integral of the absolute value of the signal over a specified time interval. This metric offers a quantification of the total activity or magnitude of the signal within that interval. Analyzing the area under the curve for ECG and GSR signals allows researchers to gauge the cumulative magnitude of physiological responses. Variations in the area metric can reflect changes in the intensity or overall magnitude of physiological activities.
where
represents the signal, and the integral is evaluated over the specified time interval of interest [
42].
Skewness: Skewness is a statistical metric that evaluates the asymmetry in the distribution of signal amplitude values. This measure sheds light on the signal’s distribution, indicating whether it leans more towards positive or negative values. Analyzing the skewness of ECG and GSR signals allows for the identification of the dominant trend in the amplitude distributions, which is instrumental in delineating specific physiological response patterns. The skewness is determined by employing the following formula:
where
E denotes the expected value,
represents the signal,
is the mean of the signal, and
is the standard deviation of the signal. The term
is cubed to normalize the skewness measure, ensuring it is dimensionless and provides a consistent scale for comparison [
43].
Kurtosis: In signal analysis, kurtosis is a statistical metric that quantifies the distribution’s “peakedness” or its deviation from being flat-topped. This measure is instrumental in detecting outliers or extreme values within a dataset. When applied to ECG and GSR signals, kurtosis analysis sheds light on the form and distribution tendencies of the signal amplitudes. Any departure from typical kurtosis values could signal unusual physiological reactions or anomalies within the signal. The computation of kurtosis is facilitated through the following formula:
In this expression,
denotes the signal at time
t,
is the mean of the signal,
represents the standard deviation of the signal, and
signifies the expected value. The subtraction of 3 adjusts the kurtosis value to zero for a normal distribution [
44].
The term is raised to the fourth power to normalize the kurtosis measure. The addition of “−3” at the end of the formula results in excess kurtosis, which is used to compare the kurtosis of the given distribution with that of a normal distribution.
Slope: The slope characteristic in signal analysis denotes the maximal rate at which the signal alters over a specific time span. This feature sheds light on the abruptness or speed of changes within the signal. Investigating the slope within ECG and GSR signals enables the detection of abrupt variations or tendencies in physiological reactions. Notably sharp slopes might signal substantial transitions in physiological conditions, reflecting swift fluctuations in arousal or emotional states. The slope, or the signal’s rate of change across a designated period, can be effectively estimated through differentiation.
where
represents the derivative of the signal with respect to time, approximating the signal’s rate of change over a designated time interval.
Energy: The energy of a signal is determined through the application of the Fourier transform [
45].
, represents the frequency response of a digital filter, with denoting the filter’s coefficients. This expression is the standard way to characterize how the filter will affect signals of different frequencies.
Figure 5 illustrates the energy profiles of ECG signals corresponding to five different emotional states.
3.2.6. Hjorth Parameters
The Hjorth parameters constitute a trio of mathematical descriptors conceived by Bengt Hjorth in the 1970s, designed for the quantitative analysis of time-series data characteristics.
These parameters, along with other features such as heart rate variability and GSR responses, were instrumental in differentiating the five defined emotional states: excitement, happiness, anxiety, calmness, and sadness. The selection of these emotions was driven by their distinct physiological patterns, which were consistently observed across the VR experiences. In addition, they have received widespread application across various domains, particularly in the realm of physiological signal analysis, encompassing electrocardiogram (ECG) and galvanic skin response (GSR) signals.
The principal Hjorth parameters—activity, mobility, and complexity—each unravel unique facets of signal dynamics, offering a multifaceted perspective on the underlying physiological processes.
Activity, one of the Hjorth parameters, quantifies the signal’s power and is determined by calculating the signal’s variance. Analyzing the activity of the ECG and GSR signals allows researchers to evaluate the overall power or intensity of the physiological responses these signals represent. Variations in activity can shed light on the strength or magnitude of the physiological processes in question. The activity, being synonymous with variance, is mathematically defined as
where
E denotes the expected value,
represents the signal at time
t, and
is the mean of the signal [
46].
Mobility is a Hjorth parameter indicative of a signal’s dynamics, calculated as the square root of the ratio of the variance of the signal’s first derivative to the variance of the signal itself. This parameter essentially captures the mean frequency or spectral breadth of the signal. Analyzing the mobility of ECG and GSR signals provides researchers with an understanding of dynamic shifts and fluctuations within physiological responses, aiding in the detection of frequency variations or spectral characteristics. The formula for mobility is given by
where
is the variance of the first derivative of the signal and
is the variance of the signal itself [
47].
Complexity, another Hjorth parameter, provides a measure of the frequency changes within a signal. Calculated as the mobility of the signal’s first derivative divided by the mobility of the signal itself, the complexity reflects the relative frequency changes or the rate of spectral modulation. By analyzing the complexity of physiological signals, such as ECG and GSR, researchers can uncover patterns and variations in frequency content. This sheds light on dynamic physiological processes and potential regulatory mechanisms. Put simply, complexity quantifies the changes in signal frequency, and its calculation is as follows:
where
is the mobility of the signal’s first derivative, and
is the mobility of the signal itself [
48]. These measures provide valuable information on the dynamics of physiological responses captured by the ECG and GSR signals through quantitative assessments of signal properties.
3.3. Feature Selection
Feature selection is a fundamental process in machine learning that directly influences model performance. By pinpointing the most informative features within a dataset, it streamlines dimensionality, mitigates the risk of overfitting, and can significantly enhance a model’s predictive power. Thus, careful consideration of feature selection was paramount in our study.
Our feature selection process employed the chi-squared () statistical test. As a robust non-parametric method, assesses the independence between each feature and the target variable, aiding in the identification of the most relevant features.
The chi-square feature selection method serves as an algorithmic approach to identify and retain pertinent features within a dataset, while discarding those considered irrelevant [
49]. This method improves the performance of machine learning models by focusing on the most informative attributes.
To complement the statistical analysis, we employed scatter plots. These visualizations depict potential relationships between pairs of variables. By plotting each characteristic against the target variable (see
Figure 6), we gained insights into patterns and correlations, corroborating the results of
. This graphical approach facilitated the elimination of features that did not exhibit a discernible relationship to the target variable.
Our rigorous feature selection process allowed us to refine the dataset by isolating the most informative features. This distillation laid a strong foundation for the subsequent machine learning phase.
3.4. Machine Learning
To ensure the reliability and generalizability of our model, we employed a 5-fold cross-validation strategy during the training process. This technique involved randomly partitioning the dataset into five equal-sized subsamples. In each iteration, four subsamples were used for training, while the remaining subsample served as the validation set. This process was repeated five times, ensuring that each subsample was used for validation exactly once. We then averaged the performance of the model in all five trials, providing a robust performance estimate [
50].
Five-fold cross-validation mitigated the risk of overfitting, a scenario where a model becomes overly attuned to the training data, hindering its ability to generalize to unseen data. By evaluating the model across different data subsets, we gained a more reliable picture of its true generalization potential.
We selected an ensemble of boosted trees as our machine learning model. Boosting is an ensemble technique that constructs a robust classifier by combining multiple weak classifiers, in our case, decision trees. Boosted trees operate sequentially: each tree is fitted to the data, while considering errors made by previous trees. By assigning higher weights to incorrectly classified instances, subsequent trees prioritize the correct classification. This iterative process continues for a defined number of rounds. The final model is a weighted combination of all decision trees, with weights reflecting the predictive power of each tree [
51].
Ensemble-boosted tree models offer exceptional performance and effectively handle the challenges associated with complex, high-dimensional datasets. Employing this algorithm ensured our model could learn from our data’s multivariate and multi-class characteristics, resulting in superior emotion classification accuracy.
We fine-tuned our model by employing a learning rate of 0.1 and setting the number of learners to 30. To further mitigate overfitting, we limited the maximum number of splits per tree to 20.
4. Results
To rigorously assess our model’s performance, we employed four essential metrics: accuracy, precision, recall, and the F1 score.
Accuracy: This fundamental classification metric measures the proportion of correct predictions (true positives and true negatives) out of the total dataset. Our model achieved a remarkable accuracy of 97.1% during validation (
Figure 1), a trend that persisted in testing with an accuracy of 97.4%. This signified exceptional efficacy in correctly classifying emotional states.
Precision: Precision quantifies a model’s exactness. It is the ratio of true positives (TP) to the sum of true positives and false positives (TP + FP). Our model’s high precision scores (
Table 1 and
Table 2) demonstrated a low false-positive rate, indicating that its predictions of specific emotional states were highly reliable.
Recall: Also termed sensitivity, recall measures a model’s ability to identify all relevant instances (true positives). It is calculated as TP/(TP + FN). Our model’s high recall values (
Table 1 and
Table 2) confirmed its capacity to detect most instances of each emotional state, missing few true positives.
F1 Score: The F1 score harmonizes precision and recall, making it especially valuable for potentially imbalanced datasets. Our model’s strong F1 scores (
Table 1 and
Table 2) reflected its ability to maintain both precise predictions and comprehensive identification of relevant instances.
Confusion matrices, constructed using values such the as true positives (TP) and false positives (FP), were calculated to provide a detailed understanding of the model’s accuracy across different emotional states.
The confusion matrix for the model is presented in
Figure 7. The matrix shows that the model performed well in distinguishing between most emotion classes, with a few misclassifications observed between closely related emotional states such as anxiety and excitement.
4.1. Model Performance
Our emotion classification model achieved an overall accuracy of 97.78%, demonstrating high predictive capability across most emotional categories. However, a closer examination of the confusion matrix revealed misclassifications between emotions that share overlapping physiological features, such as anxiety and excitement (both high arousal emotions) and calmness and sadness (both low arousal emotions).
4.2. Error Analysis
Despite the high overall accuracy, certain classification errors emerged, particularly between the following emotional categories: Anxiety vs. Excitement: These two high-arousal emotions exhibited similar patterns in both heart rate variability (HRV) and galvanic skin response (GSR), leading to confusion in the model’s predictions. The overlap in their physiological markers made it difficult for the model to reliably distinguish between these two emotional states. Calmness vs. Sadness: Both emotions were characterized by lower levels of arousal, which resulted in similar physiological responses. Misclassification between these categories indicates that the current feature set does not capture subtle differences in their physiological signatures. Impact of Data Imbalance: A deeper inspection of the dataset revealed that some emotional categories, such as calmness and excitement, were more frequently represented in the data, while others, such as sadness, were underrepresented. This imbalance may have led the model to focus more on classes with a higher frequency of instances, resulting in higher error rates for underrepresented emotions.
4.3. Potential Reasons for Misclassifications
Physiological Signal Overlap: The primary challenge in classifying emotions arises from the physiological similarity between certain emotional states. For example, high-arousal emotions like anxiety and excitement typically increase both HRV and GSR. Without additional discriminative features, such physiological overlaps lead to misclassifications.
Limited Feature Diversity: The features currently extracted from HRV and morphological ECG aspects, while effective, may not fully capture the complexity of emotional experiences in dynamic VR environments. For emotions with similar arousal levels, the differences in physiological responses may be too subtle for the current feature set to detect.
5. Discussion
The multimodal design of the dataset, integrating self-reported responses with physiological signals, enabled a nuanced analysis of emotional states within VR environments. By assembling a diverse cohort, we achieved a balance between experimental control and representativeness, enhancing the generalizability of our findings.
The incorporation of biosignal analysis methodologies, particularly electrocardiography (ECG) and galvanic skin response (GSR), was crucial in understanding the physiological underpinnings of emotional states. The application of the discrete wavelet transform (DWT) on ECG signals revealed critical time–frequency attributes, while the GSR readings provided insights into aspects of physiological arousal and emotional reactions.
The deployment of advanced ensemble machine learning techniques, particularly boosted trees, enabled us to achieve notable precision in discerning emotions within VR contexts. The robust performance metrics of the model, including accuracy, precision, recall, and the F1 score, underscored its efficacy in accurate emotional categorization, while minimizing erroneous classifications. The adoption of 5-fold cross-validation further reinforced the model’s dependability and extrapolative power.
These findings have significant implications for the evolution of emotionally responsive VR technologies. The capacity for real-time emotional adjustment has the potential to dramatically enrich immersive quality and user engagement in VR experiences, with potential utility in entertainment, healthcare, education, and therapeutic domains. Future investigations could explore the refinement of the synergy between biosignal analysis and machine learning to enhance the emotional acuity of VR systems. The comparison of our findings with existing literature, outlined in
Table 3, highlights the superior accuracy of our model, marking a significant advancement in the field.
The classification errors observed in our model point to a few critical challenges in emotion recognition using physiological signals. Emotions that exhibit similar physiological markers, such as those with similar arousal levels, are more likely to be misclassified. This highlights the need for more discriminative features or additional biosignals to improve classification accuracy.
The incorporation of EEG data, for instance, could offer a substantial improvement in the model’s ability to distinguish between closely related emotional states. Similarly, the introduction of advanced signal processing techniques, such as frequency-domain analysis and entropy measures, could yield further improvements by capturing more subtle distinctions in physiological signals.
Data augmentation techniques, aimed at addressing the imbalance in emotional categories, could also bolster performance, particularly for underrepresented emotions. Furthermore, deep learning models, such as LSTM networks, offer a promising avenue for improving real-time emotion classification in VR by effectively handling the temporal dynamics of physiological signals.
Potential Implications
This study makes several key contributions to the field of emotion classification within virtual reality (VR) systems, with significant practical implications across various domains. Using biosignals such as electrocardiography (ECG) and galvanic skin response (GSR), we developed a machine-learning-based model capable of accurately classifying emotions in real time. This technology has the potential to enhance the emotional responsiveness of VR environments, making them more adaptive and personalized for individual users.
Emotion recognition has the potential to make virtual landscapes more responsive, adaptive, and emotionally engaging. Therefore, our findings have significant potential applications in a wide range of virtual environments, particularly in the educational, therapeutic, and entertainment domains.
Applications in Education: In the field of education, our model has the potential to assist educators in detecting and responding to students’ emotions in real time, thereby positively influencing the learning experience. Emotion recognition could be integrated into virtual learning environments to monitor students’ emotional states, providing real-time feedback to both lecturers and students. For instance, signs of frustration or confusion during a lecture could be identified, allowing the system to offer additional guidance or adjust the pace of instruction accordingly. Conversely, when a student exhibits signs of engagement and excitement, the system could introduce more challenging content to maintain momentum and encourage further interest. This adaptive learning process could enhance student motivation, reduce disengagement, and ultimately improve learning outcomes.
Applications in Therapy: From a psychotherapy perspective, the application of emotion recognition in virtual reality (VR) environments presents unique opportunities for personalized treatment. Virtual environments can be designed to simulate stressful or triggering scenarios (e.g., exposure therapy for anxiety disorders) while continuously monitoring the patient’s emotional responses. Therapists could utilize real-time data, such as heart rate variability (HRV) and galvanic skin response (GSR), to assess the patient’s progress and adjust therapeutic interventions accordingly. For example, when heightened anxiety is detected, the virtual environment could automatically shift to a calming scene, or the therapist could intervene to help the patient manage their emotional state. This real-time monitoring allows for dynamic adjustments, enhancing the effectiveness of therapeutic interventions by tailoring the treatment to the patient’s emotional condition.
Entertainment and Gaming: The gaming industry could leverage emotion recognition to create more immersive and personalized gaming experiences. By detecting players’ emotions in real time, developers could dynamically adjust in-game events, challenges, and storylines based on the player’s level of engagement. For instance, if a player exhibits signs of boredom, the system could increase the game’s difficulty or introduce new, stimulating elements to re-engage the player. This real-time adaptation would enhance immersion, personalize gameplay, improve user satisfaction, and potentially extend playtime.
Virtual Training and Simulation: Emotion recognition could play a pivotal role in virtual training and simulation environments, such as those used in military, medical, or emergency response training. By monitoring trainees’ emotional responses to high-stress scenarios, instructors could gain valuable and impactful insights into how individuals manage pressure and stress. Such a system could dynamically adapt training scenarios in real-time and gradually increase intensity to help trainees develop resilience and learn to manage stress in demanding environments. This adaptive training approach could result in more effective preparation for real-world challenges, underscoring the importance and impact of the proposal.
Healthcare and Well-Being: Beyond psychotherapy, emotion recognition systems could be applied to general healthcare and well-being monitoring. In virtual fitness environments, these systems could detect signs of physical or emotional fatigue during a workout session. A virtual coach could then adjust the intensity or modify the exercise, promoting a balanced and mindful workout. Similarly, emotion recognition could be integrated into wellness applications, helping users manage stress and anxiety through real-time feedback in virtual relaxation or meditation environments.
6. Conclusions
This investigation introduced an innovative approach to emotion classification within virtual reality (VR) settings, combining biosignal processing with advanced machine learning techniques.
Five emotions—excitement, happiness, anxiety, calmness, and sadness—were specifically defined for this study based on their physiological signatures and relevance to the VR experiences utilized. Our tailored approach allowed for precise emotion classification, demonstrating the potential of biosignal analysis to improve emotionally intelligent VR environments.
The crucial point of this endeavor was to increase the VR frameworks’ ability to discern and responsively adapt to users’ emotional states in a real-time context. The empirical results underscored the efficacy of harmonizing biosignal analytics with machine learning to create emotionally intuitive VR applications.
Leveraging the “VR Eyes: Emotions Dataset” (VREED), a bespoke multimodal affective dataset crafted to elicit emotions via immersive 360° Virtual Environments (360-VEs), this study captured a rich tapestry of self-reported and physiological data, including ECG and GSR metrics, from 34 healthy subjects. These participants navigated through 12 unique 360-VEs, providing a diverse array of emotional responses for analysis.
In particular, the research methodology used an ensemble machine learning paradigm coupled with advanced feature selection techniques, notably the method, to construct a predictive model tailored for emotion classification. This model distinguished itself with a remarkable accuracy rate of 97.5% in test scenarios, attesting to its ability to delineate precise emotions within VR contexts.
Despite its successes, this study is not without limitations. The reliance on a controlled dataset, while invaluable for foundational research, necessitates further validation in more varied and unstructured real-world settings to fully ascertain the generalizability and applicability of the developed model. Furthermore, the ethical considerations and privacy implications of biosignal-based emotional analysis warrant careful consideration as this technology progresses toward widespread adoption.
Ultimately, the implications of this research for the evolution of VR technology are profound. The integration of biosignal processing with machine learning paves the way for VR experiences that are not only immersive, but also dynamically attuned and emotionally resonant. This pioneering stride in emotion classification can propel a new era of VR systems capable of perceptive and real-time emotional interactivity.
The study demonstrates that by accurately identifying and responding to user emotions in real time, VR systems can become more immersive, personalized, and emotionally resonant. The potential applications of this technology are far-reaching. In education, emotion recognition could be used to create adaptive learning environments that respond to the emotional states of students, consequently enhancing engagement and learning outcomes. In psychotherapy, VR systems equipped with emotion recognition could provide more effective and personalized treatment options by dynamically adjusting therapeutic content based on the user’s emotional responses. Similarly, in the entertainment industry, this technology could be used to tailor experiences to the emotional preferences of users, offering more engaging and emotionally satisfying content. These applications underscore the transformative potential of emotion recognition technology in making digital experiences more intuitive and human-centric.
As we venture into the future, the horizon of possibilities for enhancing and broadening these methodologies is expansive. The exploration of supplementary biosignals, such as EEG, holds promise in deepening the emotional nuance and precision of classification frameworks. Moreover, to optimize user experience, VR designers must prioritize the development of adaptive systems that incorporate real-time user feedback. Importantly, future research should investigate explainability techniques such as LIME (local interpretable model-agnostic explanations) or SHAP (Shapley additive explanations) to illuminate the model’s decision-making process. This would not only foster trust, but also deepen our understanding of the complex links between physiological signals and emotions. Ultimately, such an approach would ensure that VR environments are not only emotionally intelligent, but also finely personalized for individual users. Finally, we believe that this research’s value extends beyond its current discoveries, setting a solid foundation for future explorations aimed at realizing the full potential of emotionally intelligent virtual environments.
This research has raised questions that need further investigation. Further research that incorporates additional biosignals, such as electroencephalography (EEG) and electromyography (EMG), could provide more comprehensive data for emotion classification, particularly for distinguishing between closely related emotional states. It would be interesting to explore more advanced feature extraction techniques and machine learning models, including deep learning approaches such as recurrent neural networks (RNN) or long-short-term memory (LSTM) networks, to improve the temporal analysis of physiological signals. Furthermore, future studies should consider the integration of the model into more diverse and dynamic virtual environments, testing its applicability across different user populations and real-world scenarios. This would help to refine the system’s accuracy and generalizability, broadening its potential applications in areas such as education, therapy, and entertainment.