Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification

Abdumalikov, Sherzod; Kim, Jingeun; Yoon, Yourim

doi:10.3390/app142210511

Open AccessArticle

Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification

by

Sherzod Abdumalikov

¹,

Jingeun Kim

¹

and

Yourim Yoon

^2,*

¹

Department of IT Convergence Engineering, Gachon University, Seongnam-daero 1342, Seongnam-si 13120, Republic of Korea

²

Department of Computer Engineering, Gachon University, Seongnam-daero 1342, Seongnam-si 13120, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10511; https://doi.org/10.3390/app142210511

Submission received: 10 August 2024 / Revised: 28 September 2024 / Accepted: 12 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue Advances in Biosignal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Emotion classification is a challenge in affective computing, with applications ranging from human–computer interaction to mental health monitoring. In this study, the classification of emotional states using electroencephalography (EEG) data were investigated. Specifically, the efficacy of the combination of various feature selection methods and hyperparameter tuning of machine learning algorithms for accurate and robust emotion recognition was studied. The following feature selection methods were explored: filter (SelectKBest with analysis of variance (ANOVA) F-test), embedded (least absolute shrinkage and selection operator (LASSO) tuned using Bayesian optimization (BO)), and wrapper (genetic algorithm (GA)) methods. We also executed hyperparameter tuning of machine learning algorithms using BO. The performance of each method was assessed. Two different EEG datasets, EEG Emotion and DEAP Dataset, containing 2548 and 160 features, respectively, were evaluated using random forest (RF), logistic regression, XGBoost, and support vector machine (SVM). For both datasets, the experimented three feature selection methods consistently improved the accuracy of the models. For EEG Emotion dataset, RF with LASSO achieved the best result among all the experimented methods increasing the accuracy from 98.78% to 99.39%. In the DEAP dataset experiment, XGBoost with GA showed the best result, increasing the accuracy by 1.59% and 2.84% for valence and arousal. We also show that these results are superior to those by the previous other methods in the literature.

Keywords:

EEG; feature selection; SelectKBest; ANOVA F-test; lasso; genetic algorithm; random forest; logistic regression; XGBoost; SVM

1. Introduction

A brain–computer interface (BCI) is a communication system that does not require peripheral muscle activity, allowing subjects to enable the transmission of commands to electronic devices through brain activity [1,2]. These systems are brain systems that provide a new communication channel for patients with movement disorders such as infarcts or amyotrophic lateral sclerosis. BCIs can be classified into invasive and non-invasive types based on how the brain signals are recorded: invasive BCIs involve surgically implanted electrodes directly into the brain tissue, while noninvasive BCIs use external sensors to detect brain activity through the scalp. An invasive BCI provides better-quality data than a noninvasive BCI because it is inserted into the brain to read signals. It has the advantage of measuring brain signals [3]. A noninvasive BCI is not placed inside the brain. It also displays the EEG signals, which are the electrical signals of the brain. Therefore, it has the advantage that the general public can easily use it [4].

The field of emotion recognition has gained significant attention in recent years due to its potential applications in various domains, such as human–computer interaction, mental health monitoring, and affective computing. Emotion recognition from electroencephalogram (EEG) brainwave signals has emerged as a promising approach for understanding and analyzing human emotions. EEG-based emotion recognition involves extracting specific features from EEG signals, followed by the use of classification algorithms to accurately categorize emotions. Research analyzing EEG data using machine learning or deep learning technology has progressed over the past decade. These studies have also differentiated dementia patients from healthy people [5,6].

Several studies have explored the feasibility and effectiveness of using EEG signals to decipher human emotions [7]. In this section, we review some key related work examples that contribute to our understanding of EEG-based emotion recognition. We can now include some studies that have achieved better accuracy than the earliest studies. According to Chowdary et al. [8], the EEG dataset used in their research was obtained from recurrent neural networks, long short-term memory (LSTM), and gated recurrent (GRU) networks. Bird et al. [9,10] classified emotions using a convolutional neural network and various feature selection and ensemble models to classify emotions. Emish and Young [11] focused on recent advancements in remote wearable neuroimaging devices, particularly EEG, for health monitoring. Their study explored the integration of EEG with other modalities such as fNIRS and PPG for comprehensive emotion detection [11]. S. Leviashvili et al. [12] investigated resting-state functional (FC) within major brain networks in Parkinson’s disease patients using EEG. Atkinson and Campos [13] applied statistical feature selection methods to the DEAP dataset and classified them using two labels (valence and arousal).

Our objective is to enhance classification performance through optimal feature selection and machine learning models. In this study, we developed an accurate emotion classification using EEG brainwave signals. Our method enhances emotion classification by applying various feature selection methods and optimizing machine learning models using Bayesian optimization (BO). We conducted experiments using three feature selection methods and four machine learning algorithms.

The main contributions of this paper are summarized as follows:

We evaluated the performance of several machine learning algorithms, including random forest, logistic regression, XGBoost, and SVM, to analyze the machine learning performance with feature selection for EEG-based emotion classification. Additionally, we tuned the hyperparameters of the machine learning models using Bayesian optimization (BO).
We improved the performance of machine learning algorithms by employing feature selection methods and hyperparameter tuning, making the classification process more efficient.
In Section 4, we compare the performance of machine learning without feature selection to that with feature selection. In Section 5, our proposed method is compared with other methods combining feature selection and ML.

The remainder of this paper is organized as follows: Section 2 describes the background associated with this study. Section 3 describes the methods used and processes for each machine learning algorithm. Section 4 shows the results of each machine learning algorithm. Section 5 summarizes and discusses the classification results. Section 6 concludes this paper.

2. Background

2.1. EEG

Electroencephalograms (EEGs) are widely used in various research fields, including neural engineering and neuroscience [14]. An EEG measures the brain’s electrical activity [15]. This technique involves placing electrodes on the scalp to detect and measure voltage fluctuations resulting from ionic flow within brain neurons. EEG signals are categorized into different frequency bands: delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–100 Hz), each associated with different brain states and functions. Compared to other tests like fMRI, EGG offers advantages such as low cost, portability, and noninvasive measurement, reducing the burden on subjects and minimizing side effects.

EEG-based emotion recognition analyzes human emotions using EEG signals detected noninvasively by measuring electrical brain activity with electrodes placed on the scalp. By capturing the brain’s electrical activity, we aimed to classify emotional states on EEG signal patterns and characteristics.

2.2. Problem Explanation

Emotion classification using EEG data presents a significant challenge in affective computing owing to the complex and nonlinear nature of brain signals. EEG-based emotion recognition aims to decode emotional states by analyzing the electrical activity captured from the brain non-invasively. This field is critical for applications in human–computer interaction, mental health monitoring, and enhancing user experience across technologies.

In our study on EEG-based emotion classification, we selected random forest (RF), logistic regression, XGBoost, and support vector machine (SVM) as our primary machine learning algorithms due to their well-established performance and versatility across a variety of applications. These algorithms were chosen because the algorithm’s ability to handle large datasets with higher dimensionality and to capture complex interactions among features makes it suitable for EEG data, which is high-dimensional and noisy.

2.3. Logistic Regression

Logistic regression is a supervised learning algorithm used for binary classification problems. It is a simple and widely used algorithm that predicts the probability of an instance belonging to a particular class. Despite its name, logistic regression is primarily used for classification rather than for regression tasks. Logistic regression has several advantages, including its simplicity, interpretability, and efficiency. It can handle both numerical and categorical features, and it can provide insights into the importance of each feature through the calculated coefficients. While logistic regression is a straightforward algorithm, it can be useful in emotion classification tasks when feature selection is performed effectively. It provides coefficients for each feature, aiding in the interpretation of which EEG brainwave features are significant for classifying emotions [16]. Logistic regression can provide insight into the importance of individual features in predicting emotions, which can be valuable for understanding the underlying neural processes related to emotions. In this study, we chose logistic regression for its simplicity and interpretability. It is a powerful linear model used for binary and multiclass classification problems. LR is effective for understanding the relationship between the dependent variable and one or more independent variables, which is useful for initial model comparisons.

2.4. Random Forest

The random forest (RF) algorithm is an extension of the decision tree algorithm and was introduced by Leo Breiman [17,18]. RF is a powerful ensemble learning algorithm used for both classification and regression tasks. It is a versatile and widely used algorithm that combines the predictions of multiple decision trees to make more accurate and robust predictions. RF has several advantages. It can handle a large number of features and high-dimensional data [19]. It is robust to outliers and missing values and generally performs well without much parameter tuning. Additionally, it provides a measure of feature importance, allowing for feature selection and interpretation. RF is a popular algorithm that offers accurate and robust predictions by leveraging the power of ensemble learning and decision trees. It is widely used in various domains, including finance, healthcare, and natural language processing. RF is known for its robustness and ability to handle high-dimensional data such as EEG brainwave signals effectively [20]. The importance of the RF lies in its ability to handle noisy data and automatic feature selection. Its ensemble nature often leads to more accurate and reliable results in emotion classification tasks. The algorithm’s ability to handle large datasets with high dimensionality and capture complex interactions among features makes it suitable for EEG data, which are high dimensional and noisy. These factors motivated us to select RF for our study.

2.5. SVM

The support vector machine (SVM) algorithm dates to the 1960s and has since undergone several developments and refinements. The fundamental concepts of SVM were initially proposed by Vladimir Vapnik and Alexey Chervonenk as part of their work on pattern recognition and statistical learning theory [21,22]. SVM is a powerful and versatile machine learning algorithm primarily used for classification tasks but can also be adapted for regression. It aims to find an optimal hyperplane in a high-dimensional space that maximally separates different classes of data points [23]. It can handle high-dimensional data effectively and is robust against overfitting. It works well with both linearly separable and nonlinearly separable data owing to the use of different kernel functions. SVMs also have a solid mathematical foundation, allowing for theoretical analysis and interpretation. SVMs can be effective in emotion classification tasks because they work well with high-dimensional data, such as EEG brainwave signals. It can handle both linear and nonlinear separations, making it versatile for different types of emotional patterns. The importance of SVMs lies in their ability to find a clear boundary between different emotions in EEG brainwave data. It can help identify the regions in the feature space where emotions are distinct, aiding in emotion classification [24]. The linear kernel is the simplest kernel used for SVM. The expression of the linear kernel function is as follows:

K (x_{i}, x_{j}) = (x_{i}, x_{j})

(1)

where

x_{i}

and

x_{j}

denote data points and

K

denotes the kernel function [25].

SVM is particularly effective in high-dimensional spaces, a common characteristic of EEG data. It constructs hyperplanes in multidimensional space to separate different classes, making it ideal for complex classification tasks. These capabilities motivated us to choose SVM as the machine learning model in our study.

2.6. XGBoost

XGBoost is a powerful and versatile machine learning algorithm that has gained significant popularity in various domains owing to its exceptional performance in predictive modeling tasks. It falls under the category of ensemble learning methods, specifically gradient boosting, which combines the outputs of multiple weak learners (typically decision trees) to create a strong predictive model [26]. XGBoost builds a robust and accurate predictive model by iteratively adding decision trees to the ensemble, with each subsequent tree attempting to correct the errors made by the previous trees. This process creates a strong ensemble model that can capture complex relationships within the data and make accurate predictions. XGBoost excels in handling complex, high-dimensional datasets, making it suitable for EEG brainwave data [27]. It can capture intricate patterns and relationships within the data, leading to high accuracy in emotion classification. XGBoost can help identify the most relevant EEG brainwave features for emotion classification by ranking feature importance. This approach aids in feature selection and provides insights into the most informative aspects of brainwave data. The primary goal of XGBoost is to minimize a regularized objective function by incorporating a regularization term to manage the complexity of the model and prevent overfitting.

O b j (θ) = L (θ) - Ω (θ)

(2)

where the objective function, denoted as

O b j (θ)

, is the difference between the loss function (L) and the regularization term (

Ω (θ)

). The regularization term is defined as follows:

Ω (θ) = γ T + \frac{1}{2} α ∥ {ω ∥}^{2}

(3)

where

T

represents the number of leaf nodes, ω signifies the weight associated with each leaf, and γ and

α

are regularization parameters. This regularization mechanism aids in controlling the model’s complexity and consequently mitigating overfitting concerns. XGBoost, developed through boosting techniques, employs gradient descent algorithms to iteratively introduce new models, contributing to its remarkable efficiency and high predictive accuracy [28]. XGBoost is scalable and can effectively handle large datasets, making it beneficial for processing extensive EEG recordings. These factors influenced our decision to select XGBoost for our study.

2.7. Feature Selection

Feature selection is a crucial step in machine learning that involves selecting a subset of the most relevant features from the original set. This process aims to improve model performance by reducing overfitting, enhancing generalization, and decreasing computational cost. Feature selection methods can be categorized into three main types: filter, wrapper, and embedded.

The filter method is advantageous for its simplicity and computational efficiency. This method evaluates features independently and selects those with the highest correlation with the target variable, which, in our case, is the emotional state.

The embedded method directly incorporates feature selection into the model training process. This method combines filter-based and wrapper-based feature selection. Features with nonzero coefficients were selected to train the model with lower complexity and optimize the training procedure.

The wrapper method iteratively evaluates different feature subsets by training and validating the classification model. Our approach explores the feature space comprehensively, potentially uncovering the complex interactions between features that enhance classification performance. However, this process can be computationally intensive, especially with large feature sets.

In our study, feature selection was driven by the need to explore various approaches that could effectively capture the patterns in EEG brainwave data related to emotional states. We chose three methods: the filter method (SelectKBest with ANOVA F-test), the embedded method (least absolute shrinkage and selection operator (LASSO)), and the wrapper method (GA).

3. Methodology

In this section, we outline the methodologies and techniques used in our research on emotion classification using feature-selection and machine-learning algorithms. This section is divided into three main subsections corresponding to the distinct experimental approaches applied in the study.

Figure 1 describes our proposed experiment for the effective classification of the EEG brainwave datasets. We applied three different methods for feature selection to obtain effective performance. First, SelectKBest with the ANOVA F-test was applied to calculate the correlation between a target variable and features, and the features with the highest correlation were selected. Second, we chose to apply LASSO for feature selection to remove multicollinearity between features through L1 regulation. Lastly, we employed a GA for wrapper-based feature selection to select the best feature subset for the classification of the EEG dataset. The classification models were constructed using RF, logistic regression, XGBoost, and SVM. We tuned the hyperparameter of the machine learning models using Bayesian optimization (BO). BO has been widely used for hyperparameter tuning in machine learning [29]. The classification metric was used to assess the performance of the models. For the DEAP dataset, we used only accuracy and F1 scores as performance metrics. We employed 5-fold cross-validation to evaluate the performance of the machine learning models. All feature selection methods and machine learning models were trained and tested on systems with MSI Intel^® Core™ i7-7700HQ CPU @2.80 GHz and RAM 16 GB (MSI Global, New Taipei City, Taiwan).

3.1. Filter-Based Feature Selection Using SelectKBest with ANOVA F-Test

SelectKBest with the ANOVA F-test, categorized as a filter method, evaluates the relevance of features based on statistical methods. Specifically, it employs the ANOVA F-test, a univariate measure that quantifies the significance of each feature concerning the target variable. The goal is to select the ‘k’ best features, where ‘k’ is a user-defined parameter [30].

The ANOVA F-test function is utilized to compute the ANOVA F-statistic between each feature and the target variable. This statistic measures the degree of linear dependency between the feature and the target, enabling the identification of features most likely to be informative for classification. The formula for the F-statistic is as follows:

F = S S B / (k - 1) / S S W (n - k)

(4)

where the sum of squares between (SSB) measures the variance between classes, the sum of squares within (SSW) measures the variance within each group, k is the number of classes, and n is the total number of data points [30]. SelectKBest is a feature selection algorithm, and when combined with the ANOVA F-test, it becomes SelectKBest with the ANOVA F-test. The primary purpose of this method is to evaluate the significance of individual features concerning a target variable. It operates as a filter method, meaning it ranks features based on statistical measures without involving the learning algorithm. The ANOVA F-test, in this context, is a statistical test that assesses whether the means of different groups are equal. In feature selection, it helps quantify the relationship between each feature and the target variable. The methodology employed in this study was designed to explore and evaluate the effectiveness of various feature selection and machine learning methods for emotion classification using EEG brainwave data.

3.2. Embedded-Based Feature Selection Using LASSO

LASSO, categorized as an embedded-based method, incorporates feature selection as an integral part of the model training process. This approach is particularly effective in handling high-dimensional datasets [31]. Its primary objective is to add a penalty term to the loss function to encourage sparsity in the model. This is achieved by minimizing the following objective function:

L_{L a s s o} (ω) = \frac{1}{2 n} \sum_{i = 1}^{n} (Y_{i} - χ_{i}^{T} ω)^{2} + λ Σ_{j = 1}^{p} |ω j|

(5)

where

Y_{i}

represents the observed output for the i-th instance,

x_{i}

denotes the input feature vector, and

ω

is the vector of coefficients to be estimated. The first term

\frac{1}{2 n} \sum_{i = 1}^{n} (Y_{i} - χ_{i}^{T} ω)^{2}

represents the ordinary least squares loss, which aims to minimize the squared differences between the predicted and observed values. The second term

λ Σ_{j = 1}^{p} |ω j|

introduces the LASSO penalty, where

λ

is a hyperparameter controlling the strength of regularization. The key innovation of LASSO regularization lies in the regularization term

λ Σ_{j = 1}^{p} |ω j|

, which enforces sparsity by penalizing the absolute values of the coefficients [32].

This motivates some coefficients to become exactly zero for effective performance. This is beneficial because it eliminates less important predictors, thereby simplifying the model and enhancing interpretability. The zero-out coefficients correspond to features that do not significantly contribute to the model’s predictive power. By excluding these features, LASSO reduces model complexity and helps prevent overfitting, especially in scenarios with high-dimensional data. Thus, the presence of zero coefficients is crucial to achieving an effective and robust predictive model. Consequently, LASSO not only aids in fitting the model to the data but also serves as a valuable tool for identifying and emphasizing the most relevant features in the process. The objective of LASSO is to minimize the mean squared error between the predicted and actual values while imposing a penalty on the absolute values of the model coefficients [33]. LASSO can help in selecting the most relevant EEG features by pushing some feature coefficients to zero, effectively performing feature selection. This is crucial for optimizing the performance of emotion classification models based on EEG data, as it reduces overfitting and enhances the interpretability of the model [34]. It is particularly useful when dealing with high-dimensional data, as it helps create more capable models, making them easier to interpret and potentially more efficient. Owing to its absolute value, LASSO provides a nondifferentiable term, but despite that, there are methods to minimize it. As we show below, LASSO is also robust to outliers. LASSO can effectively handle noisy data by eliminating less important features and preventing the inclusion of irrelevant features [35]. Additionally, it addresses multicollinearity, a strong correlation between features that affect the label concurrently, by tending to select only one feature from highly correlated features [36].

3.3. Wrapper-Based Feature Selection Using GA

We applied the wrapper-based feature selection method using a GA. GA, inspired by the process of natural selection, offers an evolutionary approach for optimizing feature subsets. The process involves iteratively evolving populations of feature combinations to find an optimal subset that maximizes a predefined objective function [37].

Although conventional EEG features provide valuable insights into brainwave dynamics, they may not capture the most relevant information for emotion classification tasks [38]. The feature selection method using GA iteratively identifies the most relevant features by evolving populations of feature subsets. This process improves model performance while reducing complexity. In our study, GA was particularly effective in selecting features that enhanced the classification accuracy for EEG-based emotion recognition.

3.4. Hyperparameters Used for Feature Selection Methods and Machine Learning

Table 1 lists the hyperparameters used for the various feature selection methods in this study, detailing the specific values for SelectKBest, LASSO, and GA. It includes parameters such as scoring function, number of top features, regularization strength, population size, number of generations, crossover rate, and mutation rate.

SelectKBest with an ANOVA F-test played a crucial role in the methodology aiming to identify the most informative features for emotion classification. This feature selection process involved ranking features based on the ANOVA F-value and selecting the top k features for further modeling. For the EEG Emotion datasets and DEAP datasets, we installed the SelectKBest method to choose 100, 500, 1000, 1500, and 2000 features and 50, 100, and 150 features, respectively. The varying number of features selected was chosen to systematically explore the impact of feature reduction on model performance, ensuring classification accuracy in the experiments for both the Emotion and DEAP datasets. This approach was necessary because filter-based feature selection requires the pre-determination of the number of selected features. Then, we applied hyperparameter optimization using BO for LASSO.

Among the representative embedded-based feature selection methods, we employed LASSO, followed by hyperparameter optimization using BO. We optimized the ‘alpha’ value of LASSO using BO over a search space ranging from 10⁻⁶ to 10¹ with a log-uniform distribution. We used 5-fold cross-validation to find the most predictive and robust ‘alpha’ value.

A basic GA comprises three genetic operators: selection, mutation, and crossover.

In GAs, a solution is typically represented as a binary string, called a chromosome. It is essential to evaluate and select the most effective solutions to a particular problem. Each solution is assigned a fitness value that reflects its proximity to the overall specifications of the desired solution. We used the accuracy of each machine learning model as the fitness value of wrapper-based feature selection using a GA.

Selection: This operator examines a set of individuals in a population based on their fitness values. It preferentially retains the best individuals but must also provide the less fit individuals a chance to avoid premature convergence. We used a GA with population sizes of 30 and 20 generations and a 5-fold cross-validation.

Mutation. This operation introduces a small perturbation in the chromosome of an individual, reflecting the operational characteristics of the proposed algorithm. We set the mutation probability to 0.8, 0.2, and 0.01 for the decay rate, respectively.

Crossing. This explores the search space by diversifying the population. This operation typically manipulates the chromosomes of two parents to generate two children. These operations are applied iteratively in the GA, as shown in the flowchart (Figure 2). In our implementation, the crossover rate started at 0.2, increased exponentially, and approached 0.8 over 20 generations. These parameters were set to balance the exploration and exploitation capabilities of the GA during the feature selection process. Initially, a low crossover rate (0.2) encouraged further exploration of the solution. This helps in identifying diverse and potentially high-quality solutions early in the optimization process. As generations progressed, the crossover rate increased exponentially, reaching 0.8. A higher crossover rate in later generations promotes exploitation, in which the algorithm focuses on refining and combining the best solutions. By starting with a lower crossover rate and increasing it gradually, we ensure that the algorithm does not converge to suboptimal solutions and has a higher chance of finding the optimum. This adaptive strategy helps maintain a good balance between diversity and convergence throughout the evolutionary process.

Table 2 lists the hyperparameters for the classification algorithms used in the experiment. As we introduced above, we applied hyperparameter tuning using BO across several machine learning models, including random forest, logistic regression, XGBoost and SVM. For RF, the number of trees (n_estimators) is set to an integer value between 50 and 200, the maximum depth of a tree (max_depth) is set to a categorical choice of 10, 20, or 30 and the minimum number of samples required to be split is set to an integer value between 2 and 10. For logistic regression, the penalty parameter is set to a categorical choice of l1 and 12, c is set to a real number between 10⁻³ and 10³, and the solver parameter is set to ‘liblinear’. For XGBoost, the number of gradient-boosted trees is set to an integer value between 50 and 200, the learning rate is set to a real number between 10⁻³ and 1, and a maximum tree depth (max_depth) is set to an integer value between 3 and 9. For SVM, the kernel parameter is set to a categorical choice of ‘linear’ and ‘rbf’, and the c parameter is set to a real number between 10⁻³ and 10³. The BO process is iterated through 30 trials for each model to tune hyperparameters. For the baseline model, we implemented the models with the default hyperparameter values provided by scikit-learn.

4. Results and Analysis

4.1. Dataset

We used the EEG Emotion dataset provided by J. J. Bird et al. [9] and the DEAP dataset by S. Koelstra et al. [39] for this study. The EEG Emotion dataset was collected from two people (1 male and 1 female) for 3 min per state—positive, neutral, and negative. They used a Muse EEG headband, which recorded the TP9, AF7, AF8, and TP10 EEG placements via dry electrodes. The Muse headband recorded data for sixty seconds with the streaming frequency within the range of 150–270 Hz. The sampling rate was reduced to 200 Hz using Fast Fourier transformations along an axis [40]. The features used in this study represent distinct frequency bands (e.g., delta, theta, alpha) extracted from EEG signals, which are well-established in the literature for their relevance in distinguishing emotional states. For the EEG dataset, J. J. Bird et al. used the filtering method to extract meaningful features from the noisy raw signals, including Fast Fourier transform (FFT), Shannon entropy, and max-min feature extraction in temporal sequences using the filtering method [40].

The second dataset in this study was the DEAP dataset by S. Koelstra et al. [39], which was recorded by 32 participants, each of whom watched 40 min-long extracts of music videos. The DEAP dataset includes 32-channel 512 Hz EEG data and peripheral physiological signals. The dataset was divided into four categories (valence, arousal, dominance, and liking). Participants rated arousal, valence, and dominance (integers between one and nine) using SAM mannequins on a discrete 9-point scale. Additionally, the participants rated the felt emotion using the emotion wheel. Data were recorded at two separate locations: participants 1–22 were recorded in Twente and 23–32 in Geneva. The data matrix of the DEAP dataset was 32 × 40 × 40 × 8064, where 32 is the number of participants, 40 is the number of trials, 40 is the number of EEG channels, and 8064 is the experimental data based on video and sampling sequences. In this experiment, we used two labels (valence and arousal) for 22 participants whose frontal facial videos were recorded [39]. We extracted features using the band frequencies for each EEG channel corresponding to theta (4–8 Hz), low alpha (8–10 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (30–45 Hz). Finally, the data matrix was 880 × 160, where 880 corresponded to 40 trials from 22 participants and 160 corresponded to five bands with 32 channels per band. Two labels (valence and arousal) were used in this study, as in [39], and high-pass filtered with a 2 Hz cutoff frequency using the EEGlab toolbox. This method removes slow, irrelevant signals, which are typically below 2 Hz.

We chose these datasets because they are open source and align with the objectives and scope of our study on EEG-based emotion classification. Moreover, the classification results based on these datasets served as benchmarks for evaluating the effectiveness of our chosen feature selection methods and machine learning algorithms [13,41,42].

Exploratory Data Analysis

We used two EEG datasets (EEG Emotion and DEAP) for the experiments in this paper. In the previous study [42], J. J. Bird et al. generated an EEG Emotion dataset containing 2548 features. These features were created using various statistical methods, such as FFT for each time window, log entropy for every 0.5 s, Shannon entropy for every 1 s, and the minimum and maximum values for each window. Derivatives of the minimum and maximum values were calculated by dividing the window time and quarter-second min, max, and mean derivatives. For the DEAP dataset, we extracted features such as delta, theta, alpha, beta, and gamma bands from the raw EEG data, which represent different states of brain activity. As mentioned in Section 4.1, the DEAP data matrix was 880 × 160 after filtering for the five frequency bands.

We analyzed the distributions among features, and the results showed a balanced distribution, with outliers identified in approximately 34.5% of features for the EEG Emotion dataset. These outliers may reflect noise inherent in EEG recordings. An analysis of the label distribution shows that each label exhibits a similar distribution. Additionally, this balance ensures that the data can provide unbiased insights across different emotional states during the classification process.

For the DEAP dataset, the distribution of valence and arousal labels shows nearly equal representation for each class. This balance indicates that the DEAP dataset offers impartial insights across various emotional categories during the classification process. Each feature in the DEAP dataset contained an average of 3.25% of the outliers. These outliers may require further investigation or removal to ensure the robustness and accuracy of the classification results.

Due to the large number of features in the two EEG datasets, comprising 2548 and 160 features, respectively, explaining each feature in detail within the paper is challenging. Therefore, some analyses were conducted for the part of a whole feature set. Figure 3 illustrates violin plots for representative statistical features extracted from the EEG Emotion dataset. The plots show the distribution of these statistical features, capturing variations in the data. Each plot reflects the density of the data across a range of values, with some features showing high variability (like entropy in Figure 3m), while others (like standard deviation in Figure 3i) exhibit less variability.

Figure 4 depicts violin plots for ten randomly selected features in the DEAP dataset. The shape of these violins reveals the concentration and variability of the data for each feature. The narrow distributions suggest that most of the feature values are concentrated around central values, whereas features with broader distributions indicate greater variability. These features are crucial in analyzing and modeling from the DEAP dataset.

Figure 3 and Figure 4 provide a detailed view of how the data are distributed, showing both the density of the data at different values and the central tendencies, such as the variance, density, and quartiles. These analyses reveal that some features with similar distributions exhibit analogous patterns, indicating that redundancy in the datasets can lead to inefficiencies in the model’s performance and increase unnecessary computations. Therefore, feature engineering, such as feature selection, becomes essential. By focusing on the features that provide meaningful information, the performance of machine learning models can be enhanced.

Figure 5a–c indicate randomly selected samples for each label from the EEG Emotion dataset, and Figure 5d,e refer to the emotion level for each label from the DEAP dataset. As seen in Figure 5a–c, the FFT values for each label in the EEG Emotion dataset exhibit considerable fluctuations and contain significant noise. Such noise can negatively impact the decision-making process of machine learning models. Figure 5d,e demonstrate that the arousal and valence levels, ranging from 1 to 9, are evenly distributed. This indicates that the label distribution is balanced, allowing machine learning to learn without bias.

4.2. Results of the EEG Emotion Dataset

4.2.1. Result of Filter-Based Feature Selection on the EEG Emotion Dataset

Table 3 shows the experimental results. In the pursuit of optimizing machine learning model performance, the experimentation focused on filter-based feature selection using SelectKBest with an ANOVA F-test. The initial assessment without feature selection provided a baseline for comparison across multiple metrics. RF achieved an accuracy of 98.78%, logistic regression achieved an accuracy of 96.72%, XGBoost demonstrated exceptional performance at 99.30%, and SVM showed an accuracy of 96.67%. Subsequent evaluations with feature selection at different numbers of features (100, 500, 1000, 1500, and 2000) offered insights into the impact on model performance. With 100 features, the models displayed a balance between accuracy and, with RF at 98.82%, logistic regression at 96.80%, XGBoost at 99.20%, and SVM at 96.75%. As the number of features increased, the models exhibited varying degrees of performance, reaching peaks and plateaus. Notably, XGBoost, with 1000 features achieved an improved accuracy of 99.33% compared to the model without feature selection, demonstrating the model’s robustness in handling an extensive feature set. This performance can be attributed to the combination of SelectKBest and aspects of the XGBoost design that make it particularly effective for high-dimensional data.

Comparing the initial machine learning models with 2548 features, the models with 100 selected features demonstrated an increase in accuracy, precision, recall, and F1 scores across most models: RF (0.04%, 0.01%, 0.02%, 0.02%), logistic regression (0.08%, 0.02%, 0.09%, 0.09%), XGBoost (−0.1%, −0.1%, −0.1%, −0.1%), and SVM (0.08%, 0.09%, 0.09%, 0.09%).

Training with 2000 selected features revealed noteworthy changes in model performance. The RF, logistic regression, and SVM models show substantial improvements in accuracy (RF: 0.06%, logistic regression: 0.40%, SVM: 0.11%), precision (RF: 0.04%, logistic regression: 0.27%, SVM: 0.05%), recall (RF: 0.06%, logistic regression: 0.35%, SVM: 0.09%), and F1 score (RF: 0.06%, logistic regression: 0.35%, SVM: 0.11%). These findings underscore the practical benefits of feature selection, particularly for logistic regression and SVM, in enhancing model performance. These results suggest that the selected features have essential information for classification, facilitating more streamlined and resource-efficient model training.

4.2.2. Results of Embedded-Based Feature Selection on the EEG Emotion Dataset

Table 4 presents the results of our experiments focused on optimizing machine learning model performance using an embedded feature selection method, LASSO. The baseline model was implemented on the original dataset without feature selection, which contained 2548 features. The models exhibited an accuracy of 98.78% with RF, 96.72% with logistic regression, 99.30% with XGBoost, and 96.67% with SVM.

After applying feature selection using LASSO, the number of features was reduced to 95. This reduction aimed to simplify the model and potentially improve the performance without sacrificing accuracy. The results revealed interesting shifts in model metrics. RF showed an increase in accuracy to 99.39%, with precision, recall, and F1 score also rising to 99.39%. This improvement suggests that the reduced feature set not only maintained but also enhanced performance, likely due to the elimination of less informative features.

Logistic regression also benefited from LASSO feature selection, with an accuracy increase to 97.70% and consistent improvements across precision, recall, and F1 score, each reaching 97.70%. Similarly, SVM showed marked improvements, achieving an accuracy of 97.80% and uniform increases in precision, recall, and F1 score to 97.80%.

XGBoost continued to perform exceptionally well even after feature selection, with an accuracy of 99.34% and similar increases in precision, recall, and F1 score to 99.35%. These results indicate that XGBoost was resilient to feature reduction, maintaining high performance.

In conclusion, the application of LASSO regularization demonstrated its effectiveness in selecting a streamlined set of informative features, which either maintained or improved the performance of machine learning models. The choice of feature selection method should be tailored to the specific objectives and constraints of the task, as evidenced by the varied outcomes observed across different models.

4.2.3. Results of Wrapper-Based Feature Selection on the EEG Emotion Dataset

We employed a wrapper-based feature selection method for the classification task utilizing SVMs, logistic regression, RFs, and XGBoost algorithms. The primary objective was to enhance the classification performance by identifying and incorporating relevant features into the models.

Table 5 describes the experimental outputs obtained from wrapper-based feature selection using GA. The exploration of feature selection using the GA yielded insightful results, demonstrating the algorithm’s ability to tailor feature subsets for improved machine learning model performance. The baseline assessment without feature selection on the original dataset of 2548 features set a performance benchmark, with RF achieving an accuracy of 98.78%, logistic regression at 96.72%, XGBoost showing 99.30%, and SVM exhibiting a solid accuracy of 96.67%.

The GA, which is employed for feature selection, showed its efficacy in identifying optimized feature subsets for each model. For RF, a reduction to 1264 features resulted in an increase in accuracy to 99.30%. The improved precision, recall, and F1 score indicate that the algorithm successfully captured the most informative features. Similar trends were observed for logistic regression and XGBoost, with 1274 and 1241 features selected, respectively. Both models exhibited comparable or improved performance across key metrics, emphasizing the effectiveness of the GA in optimizing feature subsets.

For the RF model, there was a modest increase in accuracy (+0.52%), precision (+0.49%), recall (+0.51%), and F1 score (+0.51%). Logistic regression showed a slight increase in accuracy (+0.92%), precision (+0.85%), recall (+0.93%), and F1 score (+0.94%). XGBoost showed slight changes in accuracy (+0.04%), precision (+0.03%), recall (+0.04%), and F1 score (+0.04%). SVM exhibited an increase in accuracy (+0.36%), precision (+0.32%), recall (+0.36%), and F1 score (+0.37%).

4.3. Result of DEAP Dataset

4.3.1. Results of Filter-Based Feature Selection Methods on the DEAP Dataset

Table 6 presents the experimental outputs obtained from filter-based feature selection on the DEAP dataset. We employed the DEAP dataset with filter-based feature selection using SelectKBest and an ANOVA F-test to demonstrate the potential outputs of the models by selecting the most relevant 50, 100, and 150 features. Applying SelectKBest with ANOVA for feature selection and BO for hyperparameters of machine learning algorithms significantly improved the performance of all the models. When we set the model to choose the most relevant 50 features, the RF model’s accuracy increased to 68.29% (+2.16%) for valence, with performance metrics including an F1 score of 67.83% (+2.06%) matching this value due to the model becoming more intricate from hyperparameter tuning. When we set the model to choose the most relevant 100 features, logistic regression achieved the highest accuracy of 65.00% (+4.1%) for valence, corresponding with F1 scores also at 64.65% (+4.45%). Similarly, with 150 features, random forest exhibited improvements of valence 66.36% (+0.23%), arousal 62.72% (+2.38%), and XGBoost valence 65.34% (+2.28%), arousal 63.06% (+2.39%), respectively. These results demonstrate that SelectKBest with the ANOVA F-test enhances model performance.

4.3.2. Results of Embedded-Based Feature Selection on the DEAP Dataset

Table 7 presents the experimental outputs obtained from embedded-based feature selection using LASSO on the DEAP dataset. This method selected the 34 best features for the classification task. The results indicate varied performances across different models when LASSO was applied.

Logistic regression demonstrated a noticeable improvement, achieving an accuracy of 64.30% for valence and 64.35% for arousal, compared to the baseline accuracy of 60.90% and 63.18%, respectively. This corresponds to a performance increase of +3.40% for valence and +1.17% for arousal.

Similarly, the SVM model exhibited enhancements in both accuracy and F1 scores for both labels after feature selection. Specifically, the accuracy improved to 63.88% (+2.29%) for valence and 63.89% (+0.83%) for arousal.

XGBoost showed a marginal performance increase post-feature selection, with valence accuracy rising to 63.77% (+0.71%) and arousal accuracy to 62.08% (+1.40%). This indicates a modest benefit from the feature reduction process.

For the RF, the results also reflected an enhancement. The accuracy for valence increased to 66.45% (+0.32%) and for arousal to 61.52% (+1.18%), suggesting that reducing the feature set did not detract from its performance.

Overall, while the embedded-based LASSO feature selection method effectively reduced the number of features from 160 to 34, it generally led to improved or consistent performance across the models tested. This indicates that LASSO can be beneficial in managing feature dimensionality, thus potentially enhancing the robustness and generalization of the models.

Table 7 encapsulates these findings, highlighting the performance metrics of accuracy and F1 score for each model and label before and after applying the LASSO feature selection method.

4.3.3. Results of Wrapper-Based Feature Selection on the DEAP Dataset

Table 8 illustrates the impact of applying wrapper-based feature selection using GA on the DEAP dataset, highlighting the performance metrics of various machine learning models with and without feature selection. The feature selection using a GA selected 76, 79, 64, 74, 70, 77, 54, and 59 features for different labels and machine learning algorithms. For the RF model, feature selection significantly improved. The accuracy increased from 60.34% to 62.15% (+1.81%) and the F1 score from 58.08 to 59.88 (+1.8%) for arousal classification. This indicates a clear enhancement in model performance with a reduced feature set.

Logistic regression also benefited from the feature selection, showing an improvement in accuracy from 60.90% to 63.52% (+2.62%) and the F1 score from 60.20% to 63.12% (+2.92%) for valence. The accuracy for arousal decreased from 63.18% to 62.84% (−0.34%), while the F1 score increased from 57.90% to 58.86% (+0.96%), demonstrating better performance with fewer features.

XGBoost performance for valence improved from 63.06% to 64.65% (+1.59%) in accuracy and from 62.67% to 64.05% (+1.38%) in F1 score. The accuracy increased from 60.68% to 63.52% (+2.84%), with an F1 score improvement from 58.49% to 61.43% (+2.94%). These results highlight the GA’s effectiveness in enhancing model accuracy and F1 scores by optimizing feature selection.

For the SVM model for valence, the accuracy increased from 61.59% to 63.63% (+2.04%) and the F1 score from 60.67% to 63.34% (+2.67%). The classification for arousal, however, saw a slight decrease in accuracy from 63.06% to 62.84% (−0.22%) and an increase in the F1 score from 57.08% to 59.17% (+2.09%). This suggests that while SVM performance was slightly mixed, the overall trend was towards improved predictive capability with selected features.

The results clearly demonstrate that the wrapper-based feature selection using a GA generally enhances model performance for both valence and arousal classifications across different machine learning models. This method provides a balanced improvement over the baseline and other feature selection techniques using a GA.

5. Discussion

Figure 6 shows the performance of four machine learning models via the three feature selection methods for classifying the EEG Emotion dataset. Evaluations of different numbers of features revealed their effects on accuracy and efficiency. Models using 100–2000 features for the EEG Emotion dataset proved robustness, highlighting the importance of feature selection. The precision, recall, and F1 score emphasize the trade-offs in feature selection, underscoring its role in accuracy and model efficiency. Future research could explore alternative feature selection methods and their interpretability. Embedded feature selection with LASSO exhibited varied performance changes with all machine learning algorithms that showed better performance. LASSO shows promise for achieving efficient feature sets with stable or improved performance. Future research could optimize the feature selection thresholds and apply LASSO to diverse datasets. Wrapper-based feature selection using a GA proved effective in enhancing model performance by customizing feature subsets.

Figure 7 shows the four machine learning performances using the three feature selection methods for classifying arousal and valence in the DEAP dataset. It reveals the impact of different numbers of features on accuracy and efficiency. Models using 50, 100, and 150 features for the DEAP dataset were robust, highlighting the importance of feature selection. The precision, recall, and F1 scores emphasize the selection trade-offs. Future research should explore alternative methods for improving interpretability. LASSO-based feature selection showed nuanced performance changes: RF experienced a slight drop in accuracy, whereas logistic regression and SVM improved. LASSO proved to be effective for streamlined feature sets with stable or better performance. Future work could refine selection thresholds and apply LASSO to diverse datasets. GA-based wrapper selection enhanced model performance by optimizing feature subsets, reducing the number of features while maintaining metrics. Aligning the algorithm choice with the feature space is crucial.

Filter-based feature selection using SelectKBest with an ANOVA F-test, the initial F-test assessment yielded impressive baseline results, demonstrating the potential of the models. Subsequent evaluations of different numbers of features provided insights into the impact on accuracy. Notably, machine learning models with 100, 500, 1000, 1500, and 2000 features for the EEG Emotion dataset and 50, 100, and 150 features for the DEAP dataset demonstrated robustness, emphasizing the importance of finding the feature selection. The precision, recall, and F1 score considerations underscore the trade-offs in feature selection. The findings highlight the significance of systematic feature selection, ensuring both accuracy and efficient model deployment. Additionally, the time complexity of the ANOVA F-test is

O (n m)

, where

n

represents the number of samples and

m

represents the number of features. This results in fast feature selection, but it may offer slightly lower machine learning’s performance compared to other feature selection methods. Future research directions may involve exploring alternative feature selection methods and their impact on interpretability.

The results of embedded-based feature selection using LASSO indicated nuanced changes in performance, with RF experiencing a slight decrease in accuracy, emphasizing the trade-off between feature reduction and model complexity. Logistic regression and SVM showed improvements, suggesting the efficacy of LASSO in enhancing certain models. The application of LASSO showed potential for achieving streamlined feature sets with maintained or improved model performance for both datasets in this study. The time complexity of feature selection using LASSO is

O (n m k)

, where

n

represents the number of samples,

m

represents the number of features, and

k

indicates the number of iterations until convergence. As k decreases, this method approaches that of the ANOVA F-test. Additionally, since this method incorporates machine learning techniques, it generally yields favorable performance after feature selection. Future research could develop into optimizing the threshold for feature selection and exploring the application of LASSO in diverse datasets.

Wrapper-based feature selection using GA for classification tasks demonstrated its efficacy in tailoring feature subsets for enhanced model performance. Baseline assessments without feature selection established benchmarks, and the GA showed its ability to optimize feature subsets for each model. Reductions in the number of features demonstrated the algorithm’s success in capturing informative features while maintaining or improving key metrics. Our findings highlight the necessity of aligning algorithm selection with the available feature space and application requirements. For tasks where a vast feature space is accessible, algorithms such as RF and XGBoost tend to excel. The time complexity of feature selection using GA is

O (p g t)

, where

p

represents the number of populations,

g

represents the number of generations, and

t

indicates the time complexity of the model training for evaluating each subset. This method generally requires more computational complexity compared to other feature selection methods. However, it has the advantage of potentially achieving better performance, especially when the search space is larger.

Table 9 presents a comparison of different methods in the field of EEG Emotion data classification. In a comparison of EEG emotion analysis methods, notable studies by Chowdary et al. (LSTM), Bird et al. (Info Gain Feature Selection and RF), and Bird et al. (2D DEvoCNN method for KLD) have demonstrated diverse approaches with impressive accuracies ranging from 97.00% to 98.59%. Our research introduces a method that employs feature selection methods, specifically LASSO, in conjunction with Random Forest, resulting in a superior accuracy of 99.39%. In conclusion, this comparative analysis highlights the diverse strategies employed in EEG emotion analysis. Each method exhibits a notable performance, and our proposed approach, which incorporates feature selection methods for random forest, demonstrates particularly high accuracy. This study provides valuable insights into effective methodologies for accurate EEG-based emotion classification. Furthermore, our approach emphasizes the integration of the embedded-based feature selection method with random forest, enhancing model interpretability. These aspects are crucial for practical applications, where understanding the model’s decisions.

Table 10 presents a comparison of different previous approaches in the field of DEAP dataset classification. When applying feature selection using GA and classification using XGBoost, we obtained a superior increase in accuracy of 64.65% for valence and 63.52% for arousal compared with the mRMR and RBF kernels in [13]. Additionally, compared with the GA and SVM scores of 56.69% for valence and 53.46% for arousal, our research demonstrates higher accuracy. This study contributes to effective methodologies for the accurate classification of the DEAP dataset.

Figure 8 visualizes the correlation heatmaps of the EEG Emotion and DEAP datasets, focusing on the feature selection methods that demonstrated the best performance as mentioned in Table 9 and Table 10. When comparing Figure 8b to Figure 8a, stronger positive and negative correlations can be observed, indicating the removal of unnecessary variables. Additionally, comparing Figure 8d,e with Figure 8c shows that positive correlations are more clearly observed.

6. Conclusions

In our pursuit of enhancing machine learning model performance, our examination of three distinct feature selection methods—SelectKBest with ANOVA F-test, LASSO, and GA—has provided valuable insights with practical implications. The initial robust performance without feature selection underscored the capabilities of our models. Subsequent analyses with varying numbers of features revealed a delicate balance between accuracy, emphasizing the need for a thoughtful approach to feature selection. Embedded-based feature selection using LASSO regularization demonstrated its potential in achieving a nuanced trade-off between model complexity and performance, with specific models showing improvement or maintaining stellar performance. The wrapper-based feature selection using GA showed its effectiveness in optimizing feature subsets, leading to improved model performance. In addition, SelectKBest with the ANOVA F-test method demonstrated slightly greater accuracy. We focused on employing various feature selection methods with optimization of the hyperparameters of machine learning models to achieve improved performance for representative EEG datasets (brainwave dataset and DEAP dataset). Through the feature selection methods with the optimized hyperparameters, we could obtain improved accuracy. Although we did not introduce a new technique, our research combines existing methods in a novel way to enhance EEG-based emotion classification. By integrating feature selection techniques with Bayesian optimization across machine learning algorithms, we achieved notable improvements in accuracy and efficiency. This approach can be applied to rapid diagnostic processes and is useful for efficient inference. Our findings provide valuable insights into the effectiveness of these methods for high-dimensional EEG data, offering practical guidance for future research and applications in emotion recognition systems.

Future research directions may involve refining feature selection processes, feature selection using artificial neural networks (ANN), exploring hybrid approaches, and assessing the generalizability of findings across diverse datasets and model architectures. Studying the contribution of feature selection to explainable artificial intelligence is another promising avenue. These insights contribute to advancing feature selection methods and guide practitioners in optimizing machine-learning models for real-world applications. Future research directions may involve exploring the applicability of the GA across different datasets and investigating its scalability to larger feature spaces. Additionally, we acknowledge that additional variables during feature selection could potentially be eliminated. As a further step, we would explore stricter thresholds for SelectKBest and LASSO and experiment with a hybrid approach that combines multiple feature selection methods to further reduce redundancy and enhance performance.

Author Contributions

Conceptualization, S.A. and J.K.; methodology, S.A. and J.K.; software, S.A. and J.K.; validation, S.A., J.K. and Y.Y.; formal analysis S.A., J.K. and Y.Y.; investigation, S.A. and J.K.; resources, S.A. and J.K.; data curation, S.A.; writing—original draft preparation, S.A.; writing—review and editing, J.K. and Y.Y.; visualization, S.A.; supervision, Y.Y.; project administration, Y.Y.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Gachon University research fund of 2021 (GCU-202307780001). This work was also supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2022S1A5C2A07090938) and supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1F1A1066017).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The EEG brainwave dataset “Feeling emotions” is available at https://www.kaggle.com/datasets/birdy654/eeg-brainwave-dataset-feeling-emotions (accessed on 22 March 2024). The DEAP dataset by S. Koelstra is available at https://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html (accessed on 1 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wolpaw, J.R.; Birbaumer, N.; McFarland, D.J.; Pfurtscheller, G.; Vaughan, T.M. Brain–computer interfaces for communication and control. Clin. Neurophysiol. 2002, 113, 767–791. [Google Scholar] [CrossRef]
Vaughan, T.M.; Heetderks, W.J.; Trejo, L.J.; Rymer, W.Z.; Weinrich, M.; Moore, M.M.; Kübler, A.; Dobkin, B.H.; Birbaumer, N.; Donchin, E. Brain-computer interface technology: A review of the Second International Meeting. IEEE Trans. Neural Syst. Rehabil. Eng. A Publ. IEEE Eng. Med. Biol. Soc. 2003, 11, 94–109. [Google Scholar] [CrossRef]
Jebelli, H.; Hwang, S.; Lee, S. EEG signal-processing framework to obtain high-quality brain waves from an off-the-shelf wearable EEG device. J. Comput. Civ. Eng. 2018, 32, 04017070. [Google Scholar] [CrossRef]
Cincotti, F.; Mattia, D.; Aloise, F.; Bufalari, S.; Schalk, G.; Oriolo, G.; Cherubini, A.; Marciani, M.G.; Babiloni, F. Non-invasive brain–computer interface system: Towards its application as assistive technology. Brain Res. Bull. 2008, 75, 796–803. [Google Scholar] [CrossRef]
Zeng, H.; Yang, C.; Dai, G.; Qin, F.; Zhang, J.; Kong, W. EEG classification of driver mental states by deep learning. Cogn. Neurodynamics 2018, 12, 597–606. [Google Scholar] [CrossRef]
Oh, S.L.; Hagiwara, Y.; Raghavendra, U.; Yuvaraj, R.; Arunkumar, N.; Murugappan, M.; Acharya, U.R. A deep learning approach for Parkinson’s disease diagnosis from EEG signals. Neural Comput. Appl. 2020, 32, 10927–10933. [Google Scholar] [CrossRef]
Lin, Y.-T.; Chen, C.-M. Improving effectiveness of learners’ review of video lectures by using an attention-based video lecture review mechanism based on brainwave signals. Interact. Learn. Environ. 2019, 27, 86–102. [Google Scholar] [CrossRef]
Chowdary, M.K.; Anitha, J.; Hemanth, D.J. Emotion recognition from EEG signals using recurrent neural networks. Electronics 2022, 11, 2387. [Google Scholar] [CrossRef]
Bird, J.J.; Ekart, A.; Buckingham, C.D.; Faria, D.R. Mental emotional sentiment classification with an eeg-based brain-machine interface. In Proceedings of the International Conference on Digital Image and Signal Processing (DISP’19), Oxford, UK, 29–30 April 2019. [Google Scholar]
Bird, J.J.; Faria, D.R.; Manso, L.J.; Ayrosa, P.P.; Ekart, A. A study on CNN image classification of EEG signals represented in 2D and 3D. J. Neural Eng. 2021, 18, 026005. [Google Scholar] [CrossRef]
Emish, M.; Young, S.D. Remote Wearable Neuroimaging Devices for Health Monitoring and Neurophenotyping: A Scoping Review. Biomimetics 2024, 9, 237. [Google Scholar] [CrossRef]
Leviashvili, S.; Ezra, Y.; Droby, A.; Ding, H.; Groppa, S.; Mirelman, A.; Muthuraman, M.; Maidan, I. EEG-Based Mapping of Resting-State Functional Brain Networks in Patients with Parkinson’s Disease. Biomimetics 2022, 7, 231. [Google Scholar] [CrossRef]
Atkinson, J.; Campos, D. Improving BCI-based emotion recognition by combining EEG feature selection and kernel classifiers. Expert Syst. Appl. 2016, 47, 35–41. [Google Scholar] [CrossRef]
He, Y.; Eguren, D.; Azorín, J.M.; Grossman, R.G.; Luu, T.P.; Contreras-Vidal, J.L. Brain–machine interfaces for controlling lower-limb powered robotic systems. J. Neural Eng. 2018, 15, 021004. [Google Scholar] [CrossRef]
Sazgar, M.; Young, M.G.; Sazgar, M.; Young, M.G. Overview of EEG, electrode placement, and montages. In Absolute Epilepsy and EEG Rotation Review: Essentials for Trainees; Springer: Cham, Switzerland, 2019; pp. 117–125. [Google Scholar]
Nick, T.G.; Campbell, K.M. Logistic regression. In Topics in Biostatistics; Humana Press: Totowa, NJ, USA, 2007; pp. 273–301. [Google Scholar]
Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rigatti, S.J. Random forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef]
Shaik, A.B.; Srinivasan, S. A brief survey on random forest ensembles in classification model. In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2018; Springer: Singapore, 2019; Volume 56, pp. 253–260. [Google Scholar]
Wang, H.; Hu, D. Comparison of SVM and LS-SVM for regression. In Proceedings of the 2005 International Conference on Neural Networks and Brain, Beijing, China, 13–15 October 2005; pp. 279–283. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Ayat, N.-E.; Cheriet, M.; Suen, C.Y. Automatic model selection for the optimization of SVM kernels. Pattern Recognit. 2005, 38, 1733–1745. [Google Scholar] [CrossRef]
Kuang, F.; Xu, W.; Zhang, S. A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl. Soft Comput. 2014, 18, 178–184. [Google Scholar] [CrossRef]
Otchere, D.A.; Ganat, T.O.A.; Gholami, R.; Ridha, S. Application of supervised machine learning paradigms in the prediction of petroleum reservoir properties: Comparative analysis of ANN and SVM models. J. Pet. Sci. Eng. 2021, 200, 108182. [Google Scholar] [CrossRef]
Chen, J.; Zhao, F.; Sun, Y.; Yin, Y. Improved XGBoost model based on genetic algorithm. Int. J. Comput. Appl. Technol. 2020, 62, 240–245. [Google Scholar] [CrossRef]
Wang, F.; Tian, Y.-C.; Zhang, X.; Hu, F. An ensemble of Xgboost models for detecting disorders of consciousness in brain injuries through EEG connectivity. Expert Syst. Appl. 2022, 198, 116778. [Google Scholar] [CrossRef]
Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2022, 38, 4145–4162. [Google Scholar] [CrossRef]
Wu, J.; Chen, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
Abouloifa, H.; Bahaj, M. Predicting late delivery in Supply chain 4.0 using feature selection: A machine learning model. In Proceedings of the 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet), Marrakech, Morocco, 12–14 December 2022; pp. 1–5. [Google Scholar]
Fonti, V.; Belitser, E. Feature selection using lasso. VU Amst. Res. Pap. Bus. Anal. 2017, 30, 1–25. [Google Scholar]
Muthukrishnan, R.; Rohini, R. LASSO: A feature selection technique in predictive modeling for machine learning. In Proceedings of the 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, 24 October 2016; pp. 18–20. [Google Scholar]
Kim, Y.; Kim, J. Gradient LASSO for feature selection. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 60. [Google Scholar]
Wang, J.-J.; Xue, F.; Li, H. Simultaneous channel and feature selection of fused EEG features based on sparse group lasso. BioMed Res. Int. 2015, 2015, 703768. [Google Scholar] [CrossRef]
Li, F.; Yang, Y.; Xing, E. From lasso regression to feature vector machine. In Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; pp. 779–786. [Google Scholar]
Katrutsa, A.; Strijov, V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria. Expert Syst. Appl. 2017, 76, 1–11. [Google Scholar] [CrossRef]
Al-Fahoum, A.S.; Al-Fraihat, A.A. Methods of EEG Signal Features Extraction Using Linear Analysis in Frequency and Time-Frequency Domains. Int. Sch. Res. Not. 2014, 2014, 730218. [Google Scholar] [CrossRef]
Lambora, A.; Gupta, K.; Chopra, K. Genetic algorithm-A literature review. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 380–384. [Google Scholar]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [Google Scholar] [CrossRef]
Bird, J.J.; Manso, L.J.; Ribeiro, E.P.; Ekart, A.; Faria, D.R. A study on mental state classification using eeg-based brain-machine interface. In Proceedings of the 2018 International Conference on Intelligent Systems (IS), Funchal, Portugal, 25–27 September 2018; pp. 795–800. [Google Scholar]
Bird, J.J.; Kobylarz, J.; Faria, D.R.; Ekárt, A.; Ribeiro, E.P. Cross-domain MLP and CNN transfer learning for biological signal processing: EEG and EMG. IEEE Access 2020, 8, 54789–54801. [Google Scholar] [CrossRef]
Chatterjee, S.; Byun, Y.-C. EEG-based emotion classification using stacking ensemble approach. Sensors 2022, 22, 8550. [Google Scholar] [CrossRef]

Figure 1. EEG brainwave dataset training.

Figure 2. Flowchart of GA.

Figure 3. Violin plots of statistical features in the EEG Emotion dataset: (a) mean, (b) mean difference (computed between windows), (c) min, (d) min difference (computed between windows), (e) min difference (computed for each quarter window), (f) max, (g) max difference (computed between windows), (h) max difference (computed for each quarter window), (i) standard deviation, (j) standard deviation difference (computed between windows), (k) log, (l) correlation, (m) entropy, (n) FFT.

Figure 4. Violin plot of ten randomly selected features included in the DEAP dataset.

Figure 5. FFT-based frequency analysis of the EEG dataset: randomly selected FFT of a sample with (a) positive and (b) negative emotion levels; emotion level analysis of the DEAP dataset: (c) neutral labels from the EEG Emotion dataset, (d) valence level, and (e) arousal level from the DEAP dataset.

Figure 6. Graph comparing the four performance indicators of feature selection methods on the EEG Emotion dataset: (a) filter-based feature selection method; (b) embedded-based feature selection method; (c) wrapper-based feature selection method.

Figure 7. Graph comparing the four performance indicators of feature selection methods on the DEAP dataset: (a) filter-based feature selection method; (b) embedded-based feature selection method; (c) wrapper-based feature selection method.

Figure 8. Correlation heatmaps: (a) before feature selection, (b) after feature selection for the EEG Emotion dataset, (c) before feature selection for the DEAP dataset, (d) after feature selection for the valence label in the DEAP dataset, and (e) after feature selection for the arousal label in the DEAP dataset.

Table 1. Hyperparameters that were used for various feature selection methods.

Feature Selection	Hyperparameter	Explanation	Parameter Value
SelectKBest	score_func	Function to calculate the score for each feature	ANOVA F-value
	K (EEG Emotion)	Number of top features to select	100, 500, 1500, 2000, 2500
	K (DEAP)	Number of top features to select	50, 100, 150
LASSO	alpha	Regularization strength	From 10⁻⁶ to 10¹
GA	Population	Size of population	30
	Generation	Number of generations	20
	Crossover rate	Probability of crossover occurring between individuals	Starts at 0.8, decays to 0.2
	Mutation rate	Probability of mutation occurring in an individual	Starts at 0.2, increases to 0.8

Table 2. Hyperparameters of the classification algorithm used in this experiment.

Model	Parameter	Explanation	Value
RF	n_estimators	The number of trees in the forest	From 50 to 200
	max_depth	Maximum depth of tree	10, 20, 30
	min_samples_split	Minimum number of samples required to split	From 2 to 10
Logistic Regression	penalty	Specifies the norm of the penalty	l1, l2
	c	Inverse of regularization strength	From 10⁻³ to 10³
	solver	Optimization algorithm	Liblinear
XGBoost	n_estimators	Specifies the number of gradient-boosted trees	From 50 to 200
	learning_rate	Determine the percentage of improvement in learning rates	From 10⁻³ to 1
	max_depth	Maximum tree depth for base learners	From 3 to 9
SVM	kernel	Sets the type of kernel that the algorithm uses to classify	linear, rbf
SVM	c	regularization parameter	From 10⁻³ to 10³

Table 3. Performance of machine learning after applying the filter-based feature selection method. Improved results compared to models without feature selection are marked in boldface.

No. of Features	Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
2548	RF	98.78	98.79	98.78	98.78
	Logistic Regression	96.72	96.78	96.71	96.70
	XGBoost	99.30	99.30	99.30	99.30
	SVM	96.67	96.70	96.66	96.66
100	RF	98.82	98.82	98.80	98.80
	Logistic Regression	96.80	96.80	96.80	96.79
	XGBoost	99.20	99.20	99.20	99.20
	SVM	96.75	96.75	96.75	96.75
500	RF	98.94	98.94	98.93	98.93
	Logistic Regression	97.86	97.84	97.84	97.85
	XGBoost	99.33	99.33	99.31	99.32
	SVM	97.01	97.00	97.01	97.01
1000	RF	98.91	98.90	98.91	98.90
	Logistic Regression	97.78	97.78	97.78	97.77
	XGBoost	99.33	99.33	99.33	99.32
	SVM	97.10	97.10	97.10	97.10
1500	RF	98.86	98.85	98.54	98.54
	Logistic Regression	97.64	97.62	97.62	97.62
	XGBoost	98.57	98.55	98.55	98.57
	SVM	96.80	96.78	96.78	96.76
2000	RF	98.84	98.83	98.84	98.84
	Logistic Regression	97.12	97.05	97.06	97.05
	XGBoost	98.54	98.53	98.53	98.53
	SVM	96.78	96.75	96.75	96.77

Table 4. Performance of machine learning after applying the embedded-based feature selection method. Improved results compared to models without feature selection are marked in boldface.

No. of Features	Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
2548	RF	98.78	98.79	98.78	98.78
	Logistic Regression	96.72	96.78	96.71	96.70
	XGBoost	99.30	99.30	99.30	99.30
	SVM	96.67	96.70	96.66	96.66
95	RF	99.39	99.39	99.39	99.39
	Logistic Regression	97.70	97.70	97.70	97.70
	XGBoost	99.34	99.35	99.34	99.34
	SVM	97.80	97.80	97.79	97.79

Table 5. Performance of machine learning after applying the wrapper-based feature selection method. Improved results compared to models without feature selection are marked in boldface.

No. of Features	Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
2548	RF	98.78	98.79	98.78	98.78
	Logistic Regression	96.72	96.78	96.71	96.70
	XGBoost	99.30	99.30	99.30	99.30
	SVM	96.67	96.70	96.66	96.66
1264	RF	99.30	99.28	99.29	99.29
1274	Logistic Regression	97.64	97.63	97.64	97.64
1241	XGBoost	99.34	99.33	99.34	99.34
1275	SVM	97.03	97.02	97.02	97.03

Table 6. Machine learning performance after applying the filter-based feature selection method. Improved results compared to models without feature selection are marked in boldface.

No. of Features	Model	Labels	Accuracy (%)	F1 Score (%)
160	RF	Valence	66.13	65.77
	RF	Arousal	60.34	58.08
	Logistic Regression	Valence	60.90	60.20
	Logistic Regression	Arousal	63.18	57.90
	XGBoost	Valence	63.06	62.67
	XGBoost	Arousal	60.68	58.49
	SVM	Valence	61.59	60.67
	SVM	Arousal	63.06	57.08
50	RF	Valence	68.29	67.83
	RF	Arousal	60.90	59.09
	Logistic Regression	Valence	53.86	50.43
	Logistic Regression	Arousal	41.70	29.43
	XGBoost	Valence	65.22	64.93
	XGBoost	Arousal	62.04	59.94
	SVM	Valence	61.67	60.74
	SVM	Arousal	61.90	59.98
100	RF	Valence	66.59	66.21
	RF	Arousal	62.38	60.07
	Logistic Regression	Valence	65.00	64.65
	Logistic Regression	Arousal	48.06	32.26
	XGBoost	Valence	65.79	65.42
	XGBoost	Arousal	63.18	61.26
	SVM	Valence	62.22	61.91
	SVM	Arousal	62.48	61.08
150	RF	Valence	66.36	65.90
	RF	Arousal	62.72	60.69
	Logistic Regression	Valence	59.31	58.27
	Logistic Regression	Arousal	41.70	29.43
	XGBoost	Valence	65.34	64.77
	XGBoost	Arousal	63.06	60.73
	SVM	Valence	62.77	61.46
	SVM	Arousal	62.85	61.06

Table 7. Machine learning performance after applying the embedded-based feature selection method. Improved results compared to models without feature selection are marked in boldface.

No. of Features	Model	Labels	Accuracy (%)	F1 Score (%)
160	RF	Valence	66.13	65.77
	RF	Arousal	60.34	58.08
	Logistic Regression	Valence	60.90	60.20
	Logistic Regression	Arousal	63.18	57.90
	XGBoost	Valence	63.06	62.67
	XGBoost	Arousal	60.68	58.49
	SVM	Valence	61.59	60.67
	SVM	Arousal	63.06	57.08
34	RF	Valence	66.45	66.44
	RF	Arousal	61.52	61.35
	Logistic Regression	Valence	64.30	61.32
	Logistic Regression	Arousal	64.35	62.22
	XGBoost	Valence	63.77	63.48
	XGBoost	Arousal	62.48	63.49
	SVM	Valence	63.88	62.41
	SVM	Arousal	63.89	62.54

Table 8. Machine learning performance after applying the wrapper-based feature selection method. Improved results compared to models without feature selection are marked in boldface.

No. of Features	Model	Labels	Accuracy (%)	F1 Score (%)
160	RF	Valence	66.13	65.77
	RF	Arousal	60.34	58.08
	Logistic Regression	Valence	60.90	60.20
	Logistic Regression	Arousal	63.18	57.90
	XGBoost	Valence	63.06	62.67
	XGBoost	Arousal	60.68	58.49
	SVM	Valence	61.59	60.67
	SVM	Arousal	63.06	57.08
76	RF	Valence	64.20	63.71
79	RF	Arousal	62.15	59.88
64	Logistic Regression	Valence	63.52	63.12
74	Logistic Regression	Arousal	62.84	58.86
70	XGBoost	Valence	64.65	64.05
77	XGBoost	Arousal	63.52	61.43
54	SVM	Valence	63.63	63.34
59	SVM	Arousal	62.84	59.17

Table 9. Comparative table of different EEG emotion classification methods.

Study	Method	Accuracy
Chowdary, M.K. et al. [8]	LSTM	97.00%
Bird, J.J. et al. [9]	Info, Random Forest	97.89%
Bird, J.J. et al. [10]	2D DEvoCNN, KLD	98.59%
This study	LASSO, Random Forest	99.39%

Table 10. Comparative table of different DEAP classification methods.

Study	Method	Label	Accuracy
[13]	mRMR, RBF kernel	Valence	62.39%
[13]	mRMR, RBF kernel	Arousal	60.72%
[13]	GA, SVM	Valence	56.69%
[13]	GA, SVM	Arousal	53.46%
This study	GA, XGBoost	Valence	64.65%
This study	GA, XGBoost	Arousal	63.52%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abdumalikov, S.; Kim, J.; Yoon, Y. Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification. Appl. Sci. 2024, 14, 10511. https://doi.org/10.3390/app142210511

AMA Style

Abdumalikov S, Kim J, Yoon Y. Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification. Applied Sciences. 2024; 14(22):10511. https://doi.org/10.3390/app142210511

Chicago/Turabian Style

Abdumalikov, Sherzod, Jingeun Kim, and Yourim Yoon. 2024. "Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification" Applied Sciences 14, no. 22: 10511. https://doi.org/10.3390/app142210511

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Performance Analysis and Improvement of Machine Learning with Various Feature Selection Methods for EEG-Based Emotion Classification

Abstract

1. Introduction

2. Background

2.1. EEG

2.2. Problem Explanation

2.3. Logistic Regression

2.4. Random Forest

2.5. SVM

2.6. XGBoost

2.7. Feature Selection

3. Methodology

3.1. Filter-Based Feature Selection Using SelectKBest with ANOVA F-Test

3.2. Embedded-Based Feature Selection Using LASSO

3.3. Wrapper-Based Feature Selection Using GA

3.4. Hyperparameters Used for Feature Selection Methods and Machine Learning

4. Results and Analysis

4.1. Dataset

Exploratory Data Analysis

4.2. Results of the EEG Emotion Dataset

4.2.1. Result of Filter-Based Feature Selection on the EEG Emotion Dataset

4.2.2. Results of Embedded-Based Feature Selection on the EEG Emotion Dataset

4.2.3. Results of Wrapper-Based Feature Selection on the EEG Emotion Dataset

4.3. Result of DEAP Dataset

4.3.1. Results of Filter-Based Feature Selection Methods on the DEAP Dataset

4.3.2. Results of Embedded-Based Feature Selection on the DEAP Dataset

4.3.3. Results of Wrapper-Based Feature Selection on the DEAP Dataset

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI