1 Introduction

Emotions are dynamic multidimensional responses that integrate physiological, psychological, and cognitive processes, triggered by internal and external stimuli [1]. Indeed, they are involved in logical decision-making, perception, human interaction, and human intelligence [2, 3]. In recent years, affective computing has emerged as a multidisciplinary sector considering psychology, physiology, computer sciences, and biomedical engineering [4]. As a result, the research and development of systems and interfaces capable of recognizing and emulating human emotions are increasingly becoming a priority [4,5,6]. In this regard, affective computing can deal with both emotional classification and elicitation, offering significant societal benefits through the implementation of emotion recognition systems across different working environments, including education, medicine, economics, and workload studies, as well as assisting in the identification of cognitive disorders [7], anxiety, or stress [8, 9].

Traditionally, self-reports have been developed to assess individuals' mental states. The most used is the Self-Assessment-Manikin (SAM) [10,11,12], a picture-oriented instrument (Fig. 1) that allows determining emotional states by classifying them using two dimensions, also known as affective states or emotional dimensions/indicators, on an image-based scale: Valence, i.e., pleasantness/unpleasantness, and Arousal, i.e., how strongly the emotion is felt. In this regard, among multiple frameworks focused on human emotions, we adopted the Arousal Valence Space (AVS) model proposed by Rusell [13]. The AVS model, also known as the Circumplex Model of Affect, serves as a fundamental structure for categorizing emotions based on Valence and Arousal as primary dimensions. As described by Suhaimi et al. [14] the AVS model visualizes emotions within this two-dimensional space, allowing for a comprehensive understanding of emotional experiences. Emotions are positioned within this space based on their valence and arousal characteristics, creating a map of emotional states.

Fig. 1
figure 1

Self-Assessment Manikin

Although self-reports are commonly used for analyzing emotions, these can introduce personal bias, which may have a negative impact on the accuracy of emotion detection [6].

On the other hand, EEG devices enable the measurement of emotional metrics, such as Valence and Arousal, by analyzing the brain signals in the theta (θ), alpha (α), beta (β), and gamma (γ) frequency bands, making it possible to describe an individual's affective state [15]. Thus, the use of EEG signals in detecting and classifying emotions, besides being relevant for human–computer interaction [16, 17], could provide a more in-depth characterization of the emotional state.

Furthermore, emotion research has the additional purpose of understanding and profiling the correct stimuli to evoke emotions. One of the most known techniques for emotion elicitation is visual stimulation through images and videos; as a result, many databases containing images, videos, and audio have been created, such as the International Affective Picture System (IAPS) [18], the Geneva Affective Picture Database (GAPED) [19], and the International Affective Digitized Sounds (IADS) [20].

In the current context, virtual environments (VE) are gaining ground in creating immersive and interactive experiences [21] based on virtual reality (VR), due to their suitability to induce more emotional responses than the previously mentioned static approaches [22, 23]. Emotions play a crucial role in social interaction, mental health, education, entertainment, and many other aspects of human life. The ability to assess and monitor emotional states, particularly in immersive virtual environments, can lead to significant advancements in these domains. Recognizing the pressing need for accurate emotion classification, this study aims to analyze EEG as a method for emotion characterization and explore how user emotions are influenced by a set of VEs designed and validated to induce one specific emotion among happiness, sadness, anger, fear and disgust [24]. The EEG outcomes will be then associated with those obtained from the SAM reports; firstly, a comparison between the results of brain activation conducted on the separate band frequencies and the answers to SAM questionnaires will be presented, then the brain activation in terms of emotional indicators will be analyzed by means of the extreme gradient boosting and the random forest algorithms, where SAM questionnaires will be the ground truth labels for EEG instances.

This paper is structured as follows: Sect. 2 presents the background research on the adoption of VR for the study of emotions, emotional assessment with brain signals and related machine learning techniques. Section 3 explains the methodology and the experimental setup. The results are discussed in Sect. 4; finally, conclusions are reported in Sect. 5.

2 Previous works on emotional elicitation and assessment with VEs, EEG and machine learning

Emotion elicitation and assessment have been addressed in the last decades thanks to the technological advancements in VR, used as a stimulus, in assessment techniques, in particular EEG, and in Artificial Intelligence approaches, used for the analysis. Specifically, the adoption of non-clinical cost-effective instruments (EEG headsets) boosted the interest and the research in the affective computing domain. The present section sums up the main findings of the current literature in this area.

2.1 Applications of emotional VEs

The study of emotions has practical implications that span from mental health to marketing and human–computer interaction. In particular, understanding and classifying human emotions using VR opens new possibilities for creating more personalized, compelling, and emotionally resonant experiences in various aspects of life. One of the primary practical applications is in the field of mental health. The ability to accurately assess and monitor a person's emotional state while interacting with different VEs can aid therapists in supporting neurocognitive therapy [25, 26]. For instance, it could be used to create personalized exposure therapy programs for individuals with specific fears or traumas [27]. Furthermore, VR has shown promise in enhancing immersive training experiences, enabling therapists to create controlled environments for exposure therapy, and assisting in the early recognition of emotional disorders. VR environments have shown promise in the treatment of conditions such as post-traumatic stress disorder (PTSD), phobias, anxiety [28,29,30] and emotion disorders in general [31]. Understanding how specific VEs trigger emotional responses can help adapt therapy sessions to individual needs, providing a more personalized and effective approach.

In marketing, knowing how consumers emotionally respond to advertisements and product displays in VR can support marketing strategies [32, 33]. Advertisers can design more emotional campaigns, leading to increased customer engagement [34]. Also, in the entertainment industry, such as the development of video games and virtual experiences, the ability to assess and adapt content based on users' emotional responses can enhance the overall experience [35,36,37]. For example, games can become more immersive by adjusting gameplay and narratives according to the player's emotions. In workplace environments, this research could be further adapted to assess and manage stress levels. Employers can create stress-reducing VEs or identify situations that cause excessive stress [38], contributing to a healthier and more well-being-centered workforce [39, 40]. In addition, human–computer interaction can be improved by creating adaptive interfaces that respond to users' emotional states [41, 42], for instance, detecting when a user is frustrated and offering help or suggesting breaks.

2.2 EEG contribution to emotional assessment

Within the EEG context, features such as Empirical Wavelet Transform (EWT), Event-Related Potential (ERPs), event-related synchronization and steady-state visually evoked potentials, and frontal EEG asymmetry [43,44,45] have been explored for their emotional characteristics. In a EEG study regarding changes in regional brain activity, Ekman and Davidson found that positive emotions produce an activation of the brain's left frontal asymmetry [46]. Moreover, Davidson studied that the change in the α power corresponds to a relative right frontal activation for negative emotions (fear, disgust, and sadness), while a relatively greater activation of the left frontal area is related to positive emotions (joy, happiness) [47]. These observations resulted in what Davidson called "hemispheric lateralization", which states that the left frontal brain hemisphere activation is correlated with approach motivation, and in contrast, the right frontal with avoidance motivation [48].

Other studies have tried to establish a correlation between emotions and brain signals' band power. A decrease in the frontal brain's activity in people feeling scared has been detected [49]. Another study showed that disliking classical music corresponds to low Valence and high Arousal, and a higher right frontal activation, while enjoying it is related to high Valence, low Arousal, and a higher left frontal activation, suggesting an asymmetric frontal activation when considering the α band [50]. On the opposite, β waves result in being more active in the frontal cortex during intense, focused mental activity [51]. Other studies showed that α and θ bands are related to emotional states [34, 52,53,54] and can be used as indicators of cognitive workload. These bands are known to be associated with a variety of mental states since α brain patterns are related to relaxation, while θ ones are commonly associated with drowsiness and deeply meditative conditions.

The study of θ rhythm is still a matter of research; in fact, studies have found that an increase in cognitive effort is characterized by a higher θ power and lower α power, while the opposite happens during rest [8]. A higher θ activity can be associated with higher levels of task difficulty and complexity in the frontal area [55]. It has been related to inward-focused mental tasks, implicit learning, daydreaming and fantasizing [56], and it has also been found to be associated with exploring unfamiliar environments [57]. Recently, a new study [58] found that θ rhythm is greatly amplified during VR applications, which implies promising research opportunities due to the theta's role in neuroplasticity and memory [59].

With respect to the activation of brain regions during emotional experiences, α waves are mainly observed in the frontal and occipital area, θ waves are commonly found in the frontal area, β waves in the frontal and parietal as well as in the occipital and temporal ones, and γ waves are typically observed in the temporal and occipital areas [15, 60].

Moreover, γ corresponds to high mental activity and is linked to perception and consciousness. Its frequencies are associated with positive feelings and Arousal increments during high-intensity visual stimuli [53, 61, 62], while β waves are often related to self-induced positive and negative emotions. Few studies were found on delta waves, mainly associated with deep sleep. Figure 2 provides a graphical summary of the most frequently reported correlations between brain lobes and brain waves associated with emotions in the literature.

Fig. 2
figure 2

Brain lobes and frequencies waves related to emotions

2.3 Machine learning for EEG classification

Machine learning (ML) is suitable for EEG emotional analysis due to its capability to find out patterns in data. Several solutions have been adopted for EEG classification, in terms of both emotions and affective indicators, such as Valence and Arousal. Widely adopted traditional algorithms are support vector machine (SVM) [63,64,65,66,67], K-nearest neighbor (KNN) [68], Naïve Bayes [69], linear discriminant analysis (LDA) [70], quadratic discriminant analysis (QDA) [68], decision tree (DT) [71], random forest (RF) [71], bagged tree (BT) [72], AdaBoost [73], extreme gradient boosting (XGBoost) [74]. SVM is one of the most common data-driven approaches among traditional ML algorithms, proving to be successful in affective studies [75, 76]. It is suitable for EEG classification thanks to its capability of handling multidimensional data on the basis of multivariate patterns. Even if originally designed for binary classification, modifications were made to handle multiclass classification. On the other hand, SVM is prone to overfitting, making it a sub-optimal choice when dealing with small, noisy, and complex datasets [77, 78]. In this viewpoint, ensemble learning offers valid alternatives, both with bagging and boosting [72,73,74]. RF has been used to construct predictive models of mental states such as meditation and concentration, reaching accuracies around 75% [79], outperforming 90% accuracy when deep learning was used for feature extraction [80], and reaching 98% accuracy for Valence [71]. AdaBoost was successfully applied to classify human emotions in the 2D Valence-Arousal space, reaching on DEAP dataset [81] 97% accuracy [82], and outperforming 88% accuracy for binary classification in the Dominance dimension, namely a quantification of the level of ‘control’ and often used as a third dimension of the AVS model, on the same dataset [73]. BT showed high performances on DEAP dataset with over 97% accuracy in 2D Valence-Arousal space [72]. XGBoost is less commonly used in emotion recognition, even if noteworthy results were obtained [74], achieving almost 95% accuracy on DEAP dataset [83].

Artificial neural networks (NNs) are gaining ground in this panorama; shallow and deep NNs have been proposed to classify complex data as EEG; compared to ML, NNs require less specialized knowledge, with notable classification accuracies, thus making NNs a wide approach to emotional analyses [84]. Particularly, remarkable results are obtained when combining NNs and traditional ML, the former for feature extraction and the latter for classification [85, 86]. A drawback of employing NNs is that a large amount of data are required; additionally, the high dimensionality of EEG leads to high computational costs.

3 Materials and methods

In accordance with similar studies that conducted EEG measurements in VR [87, 88], and considering the typical range of 10–20 subjects in lab-based EEG studies [89], we recruited a total of 30 participants who were involved in a controlled experiment; the age range of the participants was between 19 and 36. Before the start of the experiment, all participants provided signed informed consent, indicating their understanding and willingness to participate in the study. All participants were informed about the nature of the research, their rights as research participants, and the ultimate use of their data. Moreover, they were informed that they could withdraw from the study at any time without any consequences. Inclusion criteria refer to (i) age interval (18–40 years old); (ii) no history of neurological or psychiatric disorders, (iii) no assumption of any medications that may affect brain function, and (iv) having normal or corrected-to-normal vision. Participants were recruited through flyers posted on the university campus and social media advertisements. Overall, the study sample consisted of young, predominantly Italian participants with an even distribution of gender.

3.1 Experiment setup

The participants were asked to sit, one at a time, in an adjustable chair in front of a 27-inch monitor equipped with a mouse, a keyboard, and an external speaker (an Ultimate Ears UE Boom). Before starting the experiment, the EEG headset was set up, and the initial baseline in closed-eyes condition, which lasted 15 s, was measured. Every participant was then asked to navigate five affective VEs selected from a total set of ten, one for each emotion (anger, disgust, fear, joy, sadness). Every VR navigation lasted 60 s, and we included a 90-s rest period between each VE visit. Within the 90-s rest period, 30 s were allocated for participants to report their Valence and Arousal on a SAM questionnaire, inspired by the one shown in Fig. 1 but with a 9-points scale, followed by an additional 60-s interval before the next stimuli.

During the VE exploration, the participants’ brain activity was measured with the EEG headset. As the duration of rest periods in affective EEG experiments can vary [90], our choice of a 90-s rest period was based on the nature of our affective experiment and the need to find a balance between emotional response capture and participant comfort.

Given that the primary focus of our study was on assessing participants' emotional states, we opted for providing a resting time that was similar in duration to the stimuli. This approach allowed participants to report their Valence and Arousal using the SAM questionnaire and provided sufficient time for them to return to a neutral affective state. The experimental flow can be seen in Fig. 3.

Fig. 3
figure 3

Experimental procedure. Participants start with a 15-s baseline measurement, followed by five randomly selected VE visits, each lasting 60 s, with a 90-s rest period between visits. At the end of each visit, the participants are asked to complete a SAM questionnaire. EEG measurements are recorded during the VE visits

3.2 The virtual environments

Ten validated VEs were considered in this study to create an immersive experience for the participants. The VEs were composed of certain elements more likely to elicit a specific emotion: anger, disgust, fear, happiness and sadness [24]. In particular, for every emotion two VEs were designed using two layouts: indoor and outdoor spaces. Indoor spaces are typically planned to replicate real-life indoor settings, such as offices, living rooms, or classrooms. These environments often include furniture, decorations, and other objects commonly found in indoor spaces, as well as lighting and sound effects that create a realistic atmosphere. In comparison, outdoor spaces are designed to simulate outside settings, such as parks, forests, or beaches. These environments often feature natural elements such as trees, waterfalls, and open sky, as well as weather effects like rain, snow, and wind. The following is a brief description of the selected VEs, whose realization methodology and other details regarding their validation are described in Dozio et al. [24].

  • Outdoor Happiness (OH). This VE was created to represent a beautiful tropical beach, open and expansive, conveying a sense of vastness. Set during sunny daytime, almost sunset, the colors were warm so that a feeling of relaxation could be felt. The sound of waves characterized the background music (Fig. 4a).

  • Indoor Happiness (IH). In this case, the VE was a playful, colorful, and almost magical world. A starry atmosphere decorated the room, and some festive and bright lights lit it. The whole scenario was characterized by different sounds: the animals and the objects moving, but also the background music, which was an electronic psychedelic melody (Fig. 4b).

  • Outdoor Disgust (OD). This VE consisted of a grimy picnic scene full of dirty tables, cockroaches, and trash all over the place. There were dirty bathrooms along the path, and a person inside was audibly vomiting (Fig. 4c).

  • Indoor Disgust (ID). A dark grey room was designed, representing a dirty public latrine full of excrement and insects (Fig. 4d).

  • Outdoor Fear (OF). A dark and gloomy forest was realized for this VE, characterized by many startling elements such as crows, red eyes peeking in the dark, dead bodies, and zombies (Fig. 4e).

  • Indoor Fear (IF). The VE was set in a haunted house, entirely in the dark, where the only light source was the beam of a torch. There were many frightening elements, such as a monster appearing suddenly, a doll, creepy music, and evil laughers (Fig. 4f).

  • Outdoor Sadness (OS). The VE portrayed a desolate and polluted city that had suffered a nuclear disaster, with no signs of human activity present (Fig. 4g).

  • Indoor Sadness (IS). The VE represented a hospital waiting room rendered in dark grey tones, occupied by individuals in physical distress, visibly and audibly in pain (Fig. 4h).

  • Outdoor Anger (OA). The VE consisted of an inescapable maze with dead ends and openings that circled back to the starting point. To further intensify the emotional experience, a timer was implemented to evoke feelings of anxiety and frustration at the inability to reach the way out (Fig. 4i).

  • Indoor Anger (IA). A school on fire was simulated, with no visible human presence except for the distant sounds of ambulance sirens and desperate screams. A timer was added to emphasize the difficulty of reaching the end before being exposed to danger (Fig. 4j).

Fig. 4
figure 4

Screenshots of the selected VEs. a Outdoor happiness OH. b Indoor happiness IH. c Outdoor disgust OD. d Indoor disgust ID. e Outdoor fear OF. f Indoor fear IF. g Outdoor sadness OS. h Indoor sadness IS. i Outdoor anger OA. j Indoor anger IA

3.3 EEG signal processing

This study was conducted using the Emotiv EPOC X EEG headset composed of 14 saline electrodes positioned according to the International 10–20 System [91]. The sensors are placed in AF3, AF4, F3, F4, F7, F8, FC5, FC6, P7, P8, T7, T8, O1, O2 and two additional common mode sense (CMS) and driven right leg (DRL) reference channels at P3 and P4 (Fig. 5). An even number means the electrodes are in the right brain hemisphere, and an odd number indicates a placement in the left hemisphere.

Fig. 5
figure 5

International 10–20 EEG Electrode Placement System. Blue electrodes denote the left hemisphere, green the right hemisphere, and yellow designates reference channels (color figure online)

The EEG activity was recorded for 60 s at a sampling rate of 128 Hz. We used fast Fourier transform (FFT) and a bandpass filter to obtain the band powers in \(\mu V^2 /Hz\). Before performing the FFT, the data were high-pass filtered to reduce noise. For data analysis and feature extraction, we used a Hanning window size of 2 s epoch, consisting of 256 EEG data samples, and sliding this window by 16 samples to create the new window. Then, the squared magnitude of the complex FFT value was averaged in each frequency band (θ: 4–8 Hz, α: 8–12 Hz, β:12–25 Hz, γ: 25–45 Hz), and only "artifact-free" signals were considered during the experimental phase.

Similar to previous studies, this methodology adopted a within-subject design experiment, meaning that independent acquisitions are performed for each subject. Then, the EEG measurement was divided into two phases: the EEG baseline and the EEG activity while carrying out the task. The analysis considered the difference between these two recordings to assess the contribution of the VE stimuli to the emotional state [23, 92].

3.4 EEG-based affective indicators

Two affective indicators, namely Valence and Arousal, were calculated from EEG waves, relying on formulas available in previous literature studies.

3.4.1 Valence

Valence is known to be an indicator of the pleasantness or unpleasantness of the perceived stimulus; the higher it is, the more positive the emotion; otherwise, the emotion is considered negative. Researchers have found that, according to the International 10–20 System [91], the most analyzed EEG electrodes for the definition of Valence are in positions F3 and F4 (frontal lobe). They are both located in the prefrontal cortex, which is known to play a central role in emotion regulation [93]. In addition, the main activity of α has been registered on F3 and F4 [94]. From these observations, valence can be estimated as an indicator of motivational direction by computing and comparing the hemispherical activation between the logarithmic band power of α and β ratio in F3 and F4 (Table 1, Formula 1) [51, 95, 96].

Table 1 EEG-based affective indicators

Other studies focused on frontal asymmetry as an expression of emotional states [53] and detected activity mainly in the α band. Therefore, the change in the α power corresponds to a relative right frontal activation for negative emotions (fear, disgust, and sadness). In contrast, relatively greater activation of the left frontal area is associated with positive emotions [47]. Accordingly, Valence (Table 1, Formula 2) has been as well studied by subtracting the natural logarithm of the left hemisphere α power (αF3) from the natural logarithm of the right hemisphere α power (αF4) and comparing their difference [23, 55, 94, 97].

In the literature, other Valence formulas have been proposed which quantifies Valence as the difference between the power ratios of β/α measured in prefrontal and frontal electrodes AF3, AF4, F3, and F4. (Table 1, Formula 3) [55].

3.4.2 Arousal

Arousal expresses the intensity of the emotion. It can also be described as the psychological and physiological state of being proactive (activation) or reactive (deactivation) to some stimuli. Since α waves are associated with states of relaxation and β waves with states of alertness, researchers have established that these waves are correlated through an inverse relationship. In addition, a link was found between brain inactivation and α activity. This suggests that the ratio β/α can express a person's arousal state. In Formula 4 listed in Table 1, the arousal is expressed as the ratio between β/α concerning electrodes AF3, AF4, F3, and F4 [97]. Likewise, Arousal (Table 1, Formula 5) can also be expressed in terms of power measured in F3 and F4 [23, 95]. A summary of the most relevant formulas found in literature is reported in Table 1.

Given the prevalence of Formulas 2 and 5 in the existing literature, these formulas were selected as the preferred method for this study.

3.5 The classifiers

For the classification task, gradient boosted decision tree (GBDT) and random forest (RF) were adopted.

Gradient boosting [98] is one among the implementations of ensemble learning, where the prediction of more than one model is involved [99]; it is a widespread supervised approach in multiclass classification tasks, and it is suitable for missing data [100] as some EEG data were lost after the removal of artifacts. GBDT sequentially adds classifiers aimed at correcting the prediction made by the previous classifiers and outputs a weighted average of the predictions. To correct the previous predictions, at each training step, the correct observations have a lower impact than those misclassified. The GBDT implements a series of decision trees that are considered as weak classifiers; their predictions are combined for votes or average, and the final output is weighted on the contributions of each model based on its performances. For each tree, nodes are added to optimize a nonlinear objective, in this case the square error. Extreme gradient boosting (XGBoost) implementation of GBDT was applied [101]. Previous literature adopted supervised ensemble machine learning for sentiment classification with encouraging results [2, 102, 103]. In particular, gradient boosting is suitable when dealing with missing values and outliers and when numerous environmental variables are present [104]. Recent works involving VEs as mean of elicitation, adopted this last to classify the evocated emotional state, showing its feasibility in the problem of classification of EEG data [105, 106]. In fact, the process of acquisition and cleaning of brain activity data suffers from artifacts due to movement and environmental factors, eventually leading to data missing; XGBoost offers some extensions to GBDT as sparsity awareness, that allows handling missing values in data without imputation first [101, 107].

XGBoost Python Package (xgb package version 1.7.4) was used in this study. The functioning of XGBoost is based on the minimization of an objective function that combines the training loss and the regularization term; the former evaluates the training data, while the latter prevents overfitting. The objective function can be represented by (6):

$$\begin{array}{*{20}c} {Objective\left( T \right) = \mathop \sum \limits_{i = 1}^n l\left( {y_{i, } y_{pred\,i} } \right) + \mathop \sum \limits_{k = 1}^K {\Omega }\left( {f_k } \right),} \\ \end{array}$$
(6)

where \(n\) is the number of training samples, \(y_{pred}\) is the prediction made for the i-th sample, \(l\left( {y_{i, } y_{pred\,i} } \right)\) is the loss function, \(k\) is the number of trees of the ensemble, and \({\Omega }\left( {f_k } \right)\) is the regularization term. The regularization term is defined as follows (7):

$$\begin{array}{*{20}c} {\Omega \left( {f_k } \right) = \gamma T + \frac{1}{2}\lambda \mathop \sum \limits_{j = 1}^T \omega_j^2 . } \\ \end{array}$$
(7)

The Hessian is used to manage the nonlinearity of the objective since the second order derivative allows a more precise approximation of the direction of the maximum decrease in the loss function. A schematization of the XGBoost classifier is reported in Fig. 6.

Fig. 6
figure 6

Schematic representation of the XGBoost classifier

A second classification on the same dataset was performed using the random forest algorithm. A different classification approach was adopted to generalize the results obtained in the experiment, and to reduce the risk of biased results due to the use of a single specific classifier. RF was chosen for the possibility of efficiently handling missing data with imputation that makes this algorithm generally robust [108]. As XGBoost, RF is an ensemble learning algorithm and is constructed upon a multitude of trees, where each tree outputs the best solution to fit the problem. The algorithm seeks the characteristics that enable the input observations to be randomly divided into groups with the greatest difference between them and the least difference within each group. The randomness of the splits is a critical point, as it provides a low correlation between the trees. The mathematical representation can be expressed as in (3):

$$\begin{array}{*{20}c} {Ypred = aggregation\left( {f_1 \left( X \right), \ldots , f_k \left( X \right)} \right),} \\ \end{array}$$
(8)

where \(f_k \left( X \right)\) is the prediction made by the kth decision tree and the aggregation function is here represented by the majority of votes (as defined for the classification problems).

The RF Python implementation (scikit-learn package version 1.1.3) was used. A schematization of the random forest classifier is reported in Fig. 7.

Fig. 7
figure 7

Schematic representation of the random forest classifier

4 Results and discussion

After the experiment development, the EEG data were processed, and found artifacts were removed leading to the exclusion of 3 subjects due to detrimental noise or missing data. A total of 27 datasets free of artifacts were considered. Then, Valence and Arousal were extrapolated from the EEG data for each participants' experience. The median was analyzed to determine the central tendency of the dataset. In addition, measuring the band power of α, β, θ, and γ at each sensor's location made it possible to determine the most active brain areas for each brain wave. Furthermore, data from each participant were analyzed by means of two different machine learning algorithms, namely the extreme gradient boosted decision tree and the random forest.

4.1 Results on EEG waves and affective indicators

By analyzing the resultant EEG activity in each VE, the perceived emotion can be characterized in terms of Valence and Arousal, obtained using formulas 2 and 5. Figure 8 provides an instance of an individual's EEG activity with respect to the dimensions of Valence and Arousal, as observed during their interaction with the VE of both IH and OF. The images on the left-hand side of the graph correspond to the participant's Valence and Arousal behavior during the IH scenario, while the images on the right-hand side correspond to the OF. Particularly, it can be observed that at approximately 2 s into the OF scenario, the participant experienced a sharp increase in Arousal coupled with a simultaneous decrease in Valence. Specifically, in this sample it can be observed that the IH scenario elicited a higher Valence compared to the OF, which was predominantly characterized by negative Valence.

Fig. 8
figure 8

15-s windows sample of one participant’s resulting activity during the interaction with the IH and OF VRs, in terms of Valence and Arousal

The overall activation response in terms of band power for the ten VEs is presented in Fig. 9. For each VE, the median powerband values of θ, α, β, and γ frequencies are displayed. The data reflect the EEG band power activation responses of 27 participants, providing insights into the variations in band power activation across the different VEs and frequency bands.

Fig. 9
figure 9

EEG band power medians \((\mu V^2 /Hz)\) in the ten VEs analyzed across 27 participants

The results made evident the strong activation of θ rhythms in all the VEs, which supports the recent findings about the ability of VR to boost θ frequencies [58]. In addition, one of the most effective VE in terms of neural stimuli activation was the Indoor Fear, as it produced the highest γ and β frequencies, reflecting alertness and a high level of focus.

For all the scenarios, the highest measured band power in all locations was θ. In terms of brain areas activation, the frontal lobe was found to be the most active across all scenarios, specifically in the θ, α, and β frequency bands, while for γ frequency band, the activation was observed in either the frontal or temporal lobes (OH, IH, ID, IF). These results were expected since these areas of the brain have been particularly linked with emotion perception [61].

The levels of β increased noticeably, mostly in temporal and parietal lobes, along with the occipital one. This frequency band is commonly associated with alertness and concentration [55, 109] and is predominantly active in the frontal area, while also being present in the temporal, parietal, and occipital lobes. These lobes' primary functions are processing auditory and visual stimuli and memory formation [61].

Due to the association between θ and daydreaming mental states [109, 110], the high activation of θ in all the scenarios suggests that participants were mindful while navigating imaginary worlds. While our study primarily focused on the α, β, and γ frequency bands for EEG-based affective indicators, it is crucial to acknowledge the significance of θ frequency in understanding emotional experiences. It is worth noting that θ frequency has also been linked to states of emotional regulation, playing a crucial role in the modulation of emotional responses, particularly in contexts where relaxation and emotional regulation are essential [111, 112]. The observations of θ frequency in our EEG data suggest that participants in our study were in a condition conducive to emotional regulation.

Although the established EEG-based affective indicators, as defined in Table 1, did not originally incorporate θ frequency as a direct input, our data analysis, motivated by the recognition of the significant implications of θ frequency, has extended its scope to include the computation of θ frequency power for all the VEs. This approach provided a holistic view of EEG activity in the θ range and its variations in response to different emotional stimuli throughout the participants' interactions with the VEs.

In detail, θ was the highest at frontal sensors, corresponding to increasing cognitive attention in the frontal area [8, 55]. The strong activation of θ frequency in our study is consistent with the recent findings of Safaryan and Mehta [58] regarding the ability of VR to amplify θ rhythm. These results are promising since the enhancement of θ through dedicated VEs could support the studies on deep relaxation, memory and emotion regulation, as VR can be used to boost and benefit individuals’ neural dynamics.

Concerning β activations, the results show that the participants were actively engaged in their task, attentively processing sensory information; in fact, β is associated with alertness and concentration [55]. The results obtained from the SAM questionnaires and the EEG outcomes have been found to be consistent with the evaluated environments. The SAM questionnaires showed a high intensity of emotions in response to stimuli that were designed to elicit specific emotional states. The EEG outcomes demonstrated significant changes in the brain activity patterns that are known to be associated with various emotional states. The most stimulating VE was Fear (IF), reflecting a high activity in β and γ rhythms and positive Valence. On the contrary, the Anger (IA) produced the lowest γ and β activity from all the VEs, possibly because of the uniform and uninterrupted stimuli it provided. As expected, the most active brain lobe for all the VEs was the frontal one, including the prefrontal cortex, since the frontal lobe is the most involved in emotion perception. The sensors' activation highlighted that the highest measured bands' power was those of θ and β, which means that increasing cognitive attention, processing, and elaboration of sensory stimuli was achieved with the VR.

The main findings of this study in terms of wave behaviors endorsed by previous studies in literature are summarized in Table 2.

Table 2 Main literature-confirmed findings of this study

4.2 Classification results

The analysis of the EEG data was conducted on disaggregated data using the XGBoost and RF classifiers. Valence and Arousal EEG indices were calculated for each participant throughout the entire duration of the VEs navigation; thus, a sample vector was obtained for each of the participants' experiences. In detail, 134 samples were considered, with a sample length of 537 elements. SAMs rates were used to label the data. As SAMs were rated on a 9-points scale, three ranges were defined corresponding to three classes of activation for both Arousal and Valence: the first class collected the rates from 1 to 3, the second class the rates from 4 to 6, and the third class the rates from 7 to 9. Thus, the first class was labeled as 'low', the second class as 'neutral', and the third class as 'high'. To evaluate the classification results, f-1 score metrics were adopted. Concerning the Arousal, the f1- scores obtained with the XGBoost classifier on the test dataset (0.3 of the complete dataset) were 0.77, 0.94, and 0.97 for class 'low', 'neutral', and 'high', respectively. Due to imbalance in the class observations, instances of class 'low' in the test dataset were half those of the other two classes. Concerning the Valence, the f1-scores on the test dataset were 0.97, 1.00, and 0.93, for class 'low', 'neutral', and 'high', respectively. In this case, the class 'high' had the lowest number of instances, as most of the participants rated their experiences in the 'low' and 'neutral' ranges. Figure 10 provides a graphical representation of the XGBoost classifier's performance metrics, a tabular visualization is presented reporting the confusion matrices from which precision and recall were computed for Arousal and Valence.

Fig. 10
figure 10

Confusion matrix of XGBoost classification computed on the test dataset for Arousal (left) and Valence (right). Label 0 stands for 'low', label 1 stands for 'neutral', and label 2 stands for 'high'

The classification results obtained with the RF algorithm were consistent with those obtained with the XGBoost for the Arousal, while XGBoost performed better for the Valence. In detail, concerning the Arousal, the f1-scores on the test dataset were 0.77, 0.94 and 0.97 for class 'low', 'neutral', and 'high', respectively. Concerning the Valence, the f1-scores were 0.91, 0.95 and 0.86. As done for the XGBoost, a tabular visualization of the classification results is provided; thus, Fig. 11 reports the confusion matrices for Arousal and Valence obtained with the RF classifier.

Fig. 11
figure 11

Confusion matrix of RF classification computed on the test dataset for Arousal (left) and Valence (right). Label 0 stands for 'low', label 1 stands for 'neutral', and label 2 stands for 'high'

In Table 3, precision, recall and f1-score are reported for the two adopted classifiers. In the case of the Arousal, the two algorithms attained nearly identical performances. On the contrary, in regard to Valence and the emotional state labeled as ‘neutral,’ the f1-score was similar for both XGBoost and RF, with XGBoost having a slightly higher score (1.00 and 0.95, respectively); however, for the ‘low’ and ‘high’ classes, XGBoost yielded significantly better outcomes. The RF results remain satisfactory with 91% of overall accuracy, consistent with the literature [113, 114]. Lower accuracies were obtained in previous studies in which RF was adopted to classify mental states; the previously mentioned work by Edla et al. [79] obtained accuracies of around 75%, while in this work the lowest result was 77% on ‘low’ Arousal. Significantly high accuracies were obtained previously with the adoption of deep learning for feature extraction; as an example, the work by Kumar et al. [80] reached accuracies of over 90%, while the work of Ramzan et al. [71] reached 98% accuracy specifically on Valence. This last result is higher than the accuracy obtained in this study, but considerations can be made on the fact that the involvement of deep learning algorithms for feature extraction means computational costs higher than the adoption of traditional machine learning with Arousal and Valence as input features. This consideration is supported by the fact that training time for RF in this study was approximately below two minutes; with RF the most time-consuming phase were the labeling phase and the signal processing, but that did not represent a limit concerning the computational costs related to hardware limits. For the XGBoost, the training time was below five minutes; the identification of the best hyper-parameter ranges to search in was the most time-consuming phase, with approximately 40 min. On the other hand, XGBoost obtained in this study 97% accuracy, thus slightly lower than the 98% accuracy reached by Ramzan et al. [71]. The study by Liu et al. [83] reached accuracies slightly lower than 95% on Valence and Arousal from DEAP dataset. It is remarkable that XGBoost fed with the 537-sample length Valence and Arousal described in this study obtained accuracies of 93% and 97%, respectively. More in detail, highest values were obtained for ‘high’ and ‘neutral’ Arousal, and ‘low’, ‘neutral’, and ‘high’ Valence. Moreover, in the current study, class-specific results are reported in terms of f1 score, allowing a more objective evaluation of the classification performances. Finally, it is important to stress that most previous work has used datasets where elicitation means differ significantly from the current study's VEs.

Table 3 Precision, recall and f1-score of the two classifiers for each of the three classes (‘low’, ‘neutral’ and ‘high’) for Arousal and Valence indicators

The comparison between the outcomes of the XGBoost and RF classifiers and the SAMs rates shows there is consistency between self-assessments and EEG responses. This provides evidence that EEG has the capacity to effectively characterize Valence and Arousal dimensions, as Arousal and Valence affective indicators reflect the SAMs rates given by the participants. In fact, the notable results obtained in the classification task, particularly with XGBoost that outperformed 90% of accuracy for both the indicators, suggest a consistency between the labels extrapolated from the SAMs and the EEG records. The coherence observed between questionnaires and EEG implies that EEG-based emotional indicators could be taken into consideration as a complementary tool for the emotional assessment, at least for Valence and Arousal quantification. Nonetheless, the preliminary nature of this research suggests that further studies are needed to explore and detail this possibility.

5 Conclusions

Emotion recognition is becoming crucial for developing machines to be integrated for the benefit and application in society. In fact, it could be helpful in education to observe students' mental state toward teaching materials, or in medicine to allow doctors to assess their patients' mental states and to get better treatments for their psychological conditions, or more generally to understand a worker cognitive state during ordinary and stressful situations. In addition to self-reports, it is possible to examine complementary techniques for detecting and classifying emotions. In this context, this study aimed to explore the interpretation of emotional states using EEG. To do this, an experiment was conducted to investigate the emotional response of participants interacting with validated affective VEs, by employing a portable EEG headset to record the subject data during the experiment. The analysis has been conducted considering the aggregated data for the VEs and separately on the affective indicators computed from the EEG acquisition of each participant.

The main findings of this study show that a coherence between EEG affective indicators of Valence/Arousal and the respective values in the SAM-based self-assessment exists. In particular, the noteworthy results of the XGBoost and RF classifiers show that these EEG indicators are reliable, at least for experimentations where the navigation in VEs is involved. The strengths of this study rely both on the adoption of validated VEs for eliciting basic emotions and on the self-reporting-based labeling of data; these conditions guarantee a desired (and diverse) level of Valence and Arousal, and an appropriate consistency between EEG emotional indicators and what the participants really felt. Future analyses are needed to evaluate the efficacy of the affective indicators and their applicability in a broader sense. Also, while this study has examined five basic emotions (happiness, disgust, fear, sadness, anger) and Valence/Arousal EEG affective indicators, further exploration of additional emotions, including surprise and stress, and other EEG-based indicators, such as Stress and Dominance, remain a topic for future research.

This research underscores the need for a more comprehensive understanding of human emotions, utilizing the potential of VR settings. By accurately assessing emotional states, researchers and practitioners can offer tailored solutions in different domains. This, in turn, paves the way for more emotionally aware technology and applications that can enhance individuals' quality of life. In the context of disorder recognition, our research could have implications for early diagnosis and monitoring of emotional disorders. By continuously monitoring emotional states, we may be able to detect subtle changes in emotions that could signal the onset of a mood disorder, allowing for timely intervention and support.