4.1. Experiment One: Sensor FDD Using the Joint LSTM Autoencoder and Classifier Approach
In this section, FDD of sensor faults in hydraulic test rigs using a joint approach between healthy signal reconstruction to detect sensor faults, followed by fault classification to diagnose the selected sensor faults was introduced, analyzed, and discussed.
The following subsections elucidate each step of the described approach applied on sensor faults and showcase their results.
Figure 2 shows the steps included in experiment one, where each step, stated below, is elaborated in comprehensive detail.
The dataset used for the sensor fault detection and diagnosis was the hydraulic test rig dataset earlier. The mentioned data provided a wide range of component faults that varied from slightly damaged to total failure. However, the dataset did not provide any sensor faults. Thus, it was essential to inject sensor faults to build the sensor FDD model.
Although the sensor FDD architecture navigated in this work was meant for multivariate time-series data, for simplicity, only one sensor only was considered to show results for the sensor FDD process. Sensor PS1 (the first pressure sensor) was used to showcase the sensor FDD results during the fault injection, and sensor FDD was used for the LSTM autoencoder and the sensor detection classification results.
The sensor fault types, their equations, and some of their recent applications are mentioned shortly in [
6,
37,
38,
39]. The faults chosen to be injected were as follows. (1) Stuck-at—three main types of stuck at faults was injected, as the stuck-at or constant faults are the most common form of data-centric faults, and it reflects the severity of the sensor condition. Moreover, constant faults are extremely easy to inject. Consider the input sensor signal was
, then the constant fault could be easily injected by following
, where
is a constant number representing the stationary condition of the sensor. Three main types of constant faults were added. Constant zero when the sensor was stuck at zero, constant high when the sensor was stuck at the highest value in the window, and constant low when stuck at the lowest point of the sensor readings during the observed window. We randomly injected 40 windows of size 60 s (because the sensors in the dataset repeated in a duration of 60 s) with a constant zero fault, 7210 windows of size 60 readings of PS1 was injected with high and low constant faults, which make the overall number of windows injected with stuck-at fault to be 7250 windows. (2) Gain faults; and (3) Bias or offset faults—these faults are a type of system-centric faults; hence it is hard to observe their pattern through sensor signal’s observation alone. Therefore, these faults are significant to study and build ML approaches to dynamically detect and diagnose them. Furthermore, both faults show a clear pattern that makes it easy to inject these types of faults in the data. Gain fault is also known as amplification, where the original signal
was amplified with a constant
;
. To inject this fault, a randomly selected amplification number between 0.3–1.3 (equivalent to 30%–130% of the original signal, was applied as gain or multiplication) was selected each time, to regenerate the magnified fault signal. A total of 7210 samples of 60 PS1 sensor readings were injected with randomly chosen gain values. Bias or offset fault was another example of calibration system-centric fault, where the original signal was shifted with a constant value. Consider the original signal to be
, then the manipulated offset signal would be
, where
is the constant number representing the bias or offset added to the signal. The
value can be too small and hard to notice or observe, or too large and hard to ignore. As a result, it is essential to inject both cases of
. To achieve this, 3480 windows of size 60 were injected with a random number between 0.1–1 to represent the too tiny bias category, while the remaining 3730 windows of size 60 were injected with the comparatively larger biases that were randomly chosen between 1.1–50 (it is not possible to show percentages here, because this is an additive value to the original signal, not a multiplied percentage of it, as in the gain faults). Finally, the overall PS1 sensor data prepared after the fault inject process, possess many windows of size 60 readings that consist of the following—(1) 7210 windows representing fully efficient windows as an example of healthy windows; (2) 7250 windows of constant faults (zero, high, and low); (3) 7120 windows of gain fault; and (4) 7210 windows of bias faults (low and high bias).
4.1.1. LSTM Autoencoder for Sensor Signal Reconstruction
To achieve the problem under investigation, the desired neural network should be able to perform sequence-to-sequence predictions. Hence, the input sequence was the sliding window of the sensor PS1 and the reconstructed signal was from the same nature of the input sequence, and they both had the same size of 60. Then, the encoder–decoder type required to fit the problem was an autoencoder. The choice of LSTM as a type of DL algorithm was due to its tendency to learn the hidden dependencies between many time points at once, which make LSTM one of the most suitable forms of DL when it comes to time-series data, especially sequence-to-sequence (seq2seq) operations.
The LSTM autoencoder created for this experiment, only had one batch of LSTM sequences. This batch was designed to be sequential in direction and nature, which meant that the input layer was directly connected to the hidden layers, then the hidden layer was connected to the output layer. The LSTM hidden layer consisted of a hundred hidden LSTM neurons. The activation function applied for the designed DL model was ReLU, based on a trial and error validation. The hidden layer was chosen to be fully connected by adding the dense layer of output equal to the overall output expected from the LSTM model. The optimizer chosen for the LSTM layer was the Adam optimization algorithm.
In order to utilize the healthy windows of PS1 for LSTM use, it must go under a heavy pre-processing and structuring, to fit the LSTM criteria. The pre-processing and restructuring included the following—(1) flatten the data into a vector; (2) normalize the flattened data between zero and one that can be used in LSTM; and (3) create the target sequence to reconstruct the most important step of all, which determined what to learn and what to predict. In our case, the input sequence was a sliding window of size 60, while the target sequence was the next sliding window. The shift or sliding step was assumed to be only one step to guarantee a higher model accuracy, which meant that if the input point was , then the target point used to train the prediction model was . Therefore, in general, (4) divide the flattened normalized vectors of and between the training and testing samples, where the training windows were the 80% selected from the overall data, while the remaining 20% was equally divided between testing and prediction. (5) The next step included converting the flat, normalized vectors of and into a two-dimensional array (number of samples, window size) shape; followed by (6) converting the training and testing 2D tensor samples into a 3D tensor suitable for use in LSTM. LSTM units in Keras only accept the training and testing data in a 3D tensor shape following the size (number of samples, time points, number of features). Where the z-axis or pages or axis 0 is the number of samples, axis 1 or rows is the number of time-points to store in the memory of LSTM and learn their dependencies, and finally axis 2 or columns represents the number of features inserted in the data. The used in this experiment were of size (11191, 60, 1), where 11191 of samples were of size (60,1), which corresponded to one window of 60 only healthy readings of PS1.
The previously designed LSTM model was trained and validated using the intensely pre-processed healthy data of PS1. In this experiment, the LSTM parameters were set to one hundred epochs and the verbose equaled one.
The validation results of the LSTM healthy signal reconstruction using the formulated testing data at the last epoch (number one hundred) had the errors Mean Square Error (MSE) and Mean Absolute Error (MAE) 0.000039871 and 0.0029, respectively, which are both considered to be exceedingly small loss values.
After training, evaluating, and testing the LSTM autoencoder model, it was time to start making fault detection decisions aided by the model. However, the question arises as how to detect faults based on the quality of the reconstructed signal? Which brings up another important question—how to determine the fault detection threshold?
Taking a glance at the state-of-the-art methods helps answering the previous questions, e.g., [
13]. The approach applied there was similar to ours, as they had separate phases for both detection using signal reconstruction, and diagnosis by applying fault classification. To find the difference between the predicted sequence and the input sequence, they used signal difference that could be easily calculated by taking the amplitude of the subtraction operation between the two sequences (
). Although using signal difference shows accurate results, we propose a different signal similarity measure that showed more accuracy and performance, when compared to signal difference for fault detection.
The threshold determines what is faulty or healthy, based on the value of the signal difference. If the value was higher than the designated threshold, then the reconstructed signal was considered faulty, or else it was healthy. The threshold was best measured by creating a pool of various threshold values between the minimum and maximum values of the calculated signal difference. This was followed with making the fault detection decisions on the prediction samples, based on each threshold in the pool. For each threshold in the pool, we checked if the prediction’s sample signal difference was higher than the threshold that was considered faulty; where a lower signal was detected to be healthy. Finally, the precision, recall, f1-score, and accuracy for all prediction results made by each threshold in the pool were calculated. The choice of the right threshold for the sensor fault detection was made by choosing the threshold that guaranteed the best precision to recall trade-off, also known as the f1-score. In this experiment, a prediction dataset (500 windows of size 60 readings of PS1) that consisted of various healthy and faulty samples, was used to determine the fault detection accuracy using the LSTM autoencoder sequence reconstruction. The 500 windows reconstructed using LSTM were then each compared to the original sequence to show how much they deviated from the original window with regards to their health status. The comparisons between the reconstructed windows and the original ones were made by utilizing two main metrics—(1) signal difference: ; and (2) our new metric that used the complement of Pearson’s autocorrelation.
Pearson’s autocorrelation could be calculated using the formula shown below:
where
is the correlation between vectors
,
. Furthermore,
and
were expected to possess the same length of
. Here the autocorrelation measured the similarity between two sequences, while subtracting the measured similarity from the highest possible value of resemblance (+1) represented another way of calculating the difference between two sequences
.
The tables demonstrated below show some of the threshold values selected to find the optimal threshold necessary for the sensor fault detection, corresponding to their precision, recall, f1-score, and accuracy, using the traditional signal difference, and our proposed correlation complement.
Based on the values shown in
Table 2 and
Table 3, the optimal threshold from each signal difference metric could be easily detected by choosing the threshold that provided the best precision and recall trade-off. It was apparent that the threshold of 0.3 was the optimal threshold when using the regular signal difference metric, and the accuracy of the LSTM autoencoding sensor fault detection when using the optimal threshold of 0.3 was 0.62. On the other hand, the optimal threshold when using the signal difference based on the correlation was 0.5, and the accuracy of the sensor detection given the optimal threshold was 0.71.
As visualized in
Figure 3 and
Figure 4. The optimal threshold provided the best precision and recall trade-off, also known as the f1-score. The optimal threshold could be easily observed as the intersection point between the three metrics mentioned previously. Based on the visualization in
Figure 3, the threshold selected was 0.3, which provided the detection with 0.62 accuracy. Compared to the intersection point shown in
Figure 4, when the threshold was chosen to be 0.5, the corresponding accuracy was observed to be higher at 0.71. This concluded the accuracy of the proposed signal difference measure as compared to the traditional one, to achieve fault detection using the signal reconstruction technique.
4.1.2. Sensor Fault Diagnosis—Classification Schema
In this section, the experimental results conducting sensor fault diagnosis using a variety of supervised learning algorithms is demonstrated. As shown in
Figure 2, the second phase following the detection of existing anomalies involved applying necessary means to diagnose their nature. The faults detected by the previous phase using the LSTM autoencoder, were then fed into a fault classification schema, to determine the type and nature of the detected faults. In other words, to perform the fault diagnosis, only faulty data were classified.
In this work, the classification results were compared when various feature engineering approaches were applied. The feature engineering approaches used for this section were—PCA, feature importance (FI), manually extracted time-domain features, and a new cluster-based feature selection method called RkSE. Feature selection or extraction when applied to univariate datasets in the shape of sliding windows, is simply considered to be a window compression method, to minimize the size of the readings provided by each window and select the features with most contribution to the learning process. Therefore, the time and complexity constraints of the ML or DL models could be managed and minimized with a smarter choice of features. Various ML and DL classifiers were individually trained, validated, and tested with the selected features, using a diversity of feature engineering approaches. Then, their results were documented and compared. The ML approaches used were LR, LDA, KNN, CART, NB, SVM, and RF. The DL approaches selected to perform the classification tasks were CNN and LSTM.
The following experimental results tackled each of the feature engineering process and their results when fed into the above-mentioned ML and DL classifiers, to eventually achieve the FDD for sensor faults, using the PS1 sensor as an example of sensors in the hydraulic test rig system.
The parameters selection for each classifier was chosen by trial and error, to ensure the highest possible accuracy when validating using 10-fold cross-validation over the original data, without any feature engineering applied. The table below describes the applied ML classifiers and their corresponding parameters using Scikit in Python. Furthermore, the mean accuracy for the 10-folds and the standard deviation corresponding to the 10-folds was calculated.
The CNN classifier had the parameters verbose, epochs, and batch size of zero, hundred, and twenty, respectively. The parameters were chosen by trial and error to provide the designed deep neural network with the highest 10-fold classification accuracy. The CNN applied was designed as the sequential model (input, hidden layers, and output). The CNN convolutional layer applied here was a 1-D layer, since the training dataset was a time-series data and of a one-dimensional nature, unlike the usual application of CNN where the data are typically of two-dimensional shapes, such as images. The CNN design included 6 one-dimensional convolutional layers of filter, equaling to 64 and a kernel size of one. The kernel size that showed the length of the convolution window/masking window required for the convolution was selected as one. The number of convolution layers was added to guarantee the highest possible accuracy, and through trial and error, it was set to 6 layers of 1-D convolutional layers. The activation function within the created layers was ReLU. The next process following each convolution layer was the pooling layer. In this work, the pooling function was selected as the maximum pooling, which indicated selecting the maximum entry in the kernel during the pooling phase. Two fully connected layers were added, following the pooling phase, one of size hundred and their activation function was ReLU. The second fully connected layer had three outputs to match the number of classification outputs/faults designated for the training, while its activation function was selected as SoftMax. Finally, the CNN optimizer chosen was the Adam optimizer. The LSTM model designed for classification differed from the one used in the previous step, as this model was a classifier while the previous LSTM model was an autoencoder designed to solve multi-regression problems and not classification. Only one batch of LSTM neurons was used, this batch had hundred sequential hidden layers or neurons. The layers were fully connected using a dense layer of size one hundred with the activation function ReLU, which was connected to another fully connected dense layer of size three (to match the number of outputs expected from the LSTM model), with the activation function SoftMax. The LSTM classifier parameters were represented by verbose, epochs, and batch size, which were equal to 0, 10, and 20, respectively.
The classification results for ML and DL classifiers when fed with only faulty data, to perform PS1 fault diagnosis and classification are listed.
Table 4 lists nine ML and DL classifiers that were trained separately using five different features at a time, which were selected/extracted using PCA, manually extracted time-domain features, FI, R
kSE, as well as the entire faulty dataset without any feature selection. The number of features without feature selection was equivalent to the window size of 60. Four features were extracted using PCA, 45 features were selected using FI, compared to 46 features selected using R
kSE. Finally, four time-domain features extracted from each window are represented by the mean, variance, standard deviation, and signal to noise ratio. The number of features selected by each method was the one with the highest fault classification mean accuracy.
When observing each row with respect to each feature engineering method, the feature engineering approach giving the highest or lowest 10-fold mean accuracy corresponding to each classification method are clearly shown. The mean feature accuracy row shows the overall accuracy for each feature engineering approach, with respect to all ML and DL classifiers combined. PCA had the highest mean feature engineering accuracy when applied to the nine classifiers, which proved the consistency of PCA and its validity with different classification techniques. It was also obvious that the time-domain features selected were the one providing highest accuracy to some ML and DL classifiers, which were LDA, KNN, CART, RF, and LSTM. However, this feature selection technique did not provide consistency in the accuracy results, since the LR and NB classifiers showed exceptionally low performances when applying the four selected time-domain features, as compared to the rest of the feature engineering methods. This explains why the time-domain features result in lower overall mean feature accuracy, as compared to PCA, even though more classifiers have the maximum accuracy when applying time-domain features.
The selection of the suitable feature engineering method was highly dependent on each classifier type and its functionality. The table above serves the purpose of investigating the behavioral changes of some of the most common ML and DL classifiers, with respect to various commonly used feature engineering methods with time-series datasets. Furthermore, finding the best pair of features and classification approach that provides the most optimal accuracy–complexity trade-off when performing sensor fault detection, is the number one aim of these comparisons. As a result, the highest measured sensor fault detection combination was when CART was applied using time-domain features, followed by LSTM, KNN, and RF, using the same extracted features.
4.2. Experiment Two: Component FDD Using the Joint LSTM Autoencoder and Classifier Approach
In this experiment, the component faults existing in the hydraulic test rig were detected and diagnosed using a unique approach, in which the detection and diagnosis stages were carried out separately to ensure more accurate detection of rare occurrences.
Figure 5 shows the framework of this experiment.
The same steps and parameters created in experiment one (sensor FDD) were repeated in this experiment, excluding the data pre-processing and structuring, and the fault injection schema. The pre-processing step differed from the previous experiment, since this time it was a multi-variate autoencoding and classification problem, without the application of sliding windows. Moreover, no fault injection was required in this experiment because the component faults studied were already available in the hydraulic test rig dataset used for this experiment. In this section, the data used were the hydraulic test rig dataset of eleven sensors, which indicated that this was a multi-variate FDD experiment. The healthy data were applied for detection as the first step of the FDD system, represented by the LSTM autoencoder. While the faulty data contained four main component faults—cooler, value, hydraulic accumulator total failures, and pump severe leakage fault, were used in the second stage, which was represented by the fault diagnosis, using the supervised ML and DL methods.
In both stages, the data were organized as a 2D matrix of samples and features, expressed by the eleven sensors and their readings, at different time-points.
4.2.1. Component Fault Detection—LSTM Autoencoder
This section had the same procedure as that explained in experiment one for fault detection in sensors. The LSTM autoencoder for fault detection stage had the following main steps. (1) Design the LSTM autoencoder to fit the problem. (2) Prepare the data into a form acceptable in the LSTM. (3) Train and validate the LSTM autoencoder using only healthy data and calculate the MSE and other error metrics. (4) Predict the samples that contain the faulty and healthy readings. (5) Calculate the signal difference between the original samples and the predicted samples, using the regular difference and the Pearson’s correlation, to establish accuracy comparisons. (6) Find the best threshold of sequence difference to ensure the best trade-off between precision, recall, and f1-score. (7) Make component fault detection decisions using the trained, validated LSTM autoencoding model, and their calculated sequence difference, as compared to the computed threshold.
For training the designed model, 1438 samples of the eleven sensors’ reading were used to train and validate the model. The data should be normalized between zero and one, as well as converted to a three-dimensional tensor format (samples, time-points for LSTM to remember, number of features), before applying to the LSTM model. The LSTM model designed for component FDD detection was an autoencoder of a sequential hundred hidden LSTM neurons, using the activation function ReLU, then a fully connected dense layer of size equal to the number of sensors or features was added. The dense layer contributed to improving the accuracy of the LSTM model, as well as making sure the LSTM model generated outcomes equal to the designated input signal, in terms of size. Finally, the optimizer applied for the LSTM model was Adam. The training parameters of the LSTM autoencoder were epochs = 100 and batch size = 30.
After training and validating the designed model over a hundred epochs, the MSE error of the last epoch was 0.00057 and the MAE was equal to 0.0096. Both error metrics were exceedingly small, which was a high indication of the validity and accuracy of the created model in reconstructing healthy input sequences.
To select the optimal threshold corresponding to the allowed sequence difference between the original sequence and the one reconstructed from the LSTM autoencoder, 4800 samples of size eleven were predicted using the autoencoder, to reconstruct 4800 healthy versions of the prediction samples. The signal difference between each of the corresponding sequences in the original and reconstructed sequences were computed using (1) The traditional signal subtraction to find the signal difference as a vector, and then to find the magnitude of this vector. (2) The sequence difference using (1-Pearson’s autocorrelation) as a measurement created in this work is proposed to be a more accurate measurement for fault detection than the traditional signal subtraction.
To avoid repeating the explanation of each signal difference methods, we jump right through the results and their comparisons.
A pool of candidate threshold values was created, then the labels of the 4800 prediction samples were obtained based on each threshold in the pool, if the value of the signal difference was higher than the threshold a fault was considered to be detected, thus receiving a label 1, else label 0. The precision, recall, and f1-score were computed for each threshold in the pool, based on the generated labels and the original labels of the given prediction samples. When applying signal difference using the Pearson’s autocorrelation complement, the pool of chosen thresholds between the minimum and maximum observed values are shown in
Table 5. Furthermore, for each chosen threshold, the accuracy, precision, recall, and f1-score were computed.
Figure 6 illustrates the process of choosing the component fault detection threshold based on the precision, recall, and f1-score trade-off shown in the table below. As clearly shown in
Figure 6, the selected threshold was the intersection between the three curves, which was approximately equal 0.0007 and the accuracy observed for this threshold value was 0.71.
On the other hand, the optimal threshold was also calculated when the optimal signal difference was applied. In
Figure 7, the intersection between the three curves is demonstrated. It is clearly shown that the optimal threshold for component fault detection using the sequence subtraction, was approximately 0.03, with a fault detection accuracy of 0.69. The optimal accuracy using signal subtraction of 0.69 was less than the measured one using the optimal threshold computed by a Pearson’s correlation of 0.71. As a result, when comparing the accuracy of the optimal thresholds selected using two different signal deviation measurements, which were the autocorrelation complement and the traditional subtraction, it could be seen that the proposed method using Pearson’s correlation complement guaranteed a higher component and sensor fault detection accuracies, as compared to its commonly used traditional subtraction counterpart, based on the measured comparisons in experiment one and two.
4.2.2. Component Fault Diagnosis—Classification Schema
In this section, the feature engineering methods compared were FI, PCA, and RkSE. The time-domain extracted features applied for the multi-variate time-series sequences without the application of sliding windows, were expected to have lower accuracy values, regardless of the ML or DL classifier used. Hence, it did not make sense to compute the mean, standard deviation, and variance to a sample of readings extracted from sensors of different nature. However, the time-domain features were extracted and applied to all classifiers, to prove the point mentioned earlier.
The optimal number for each feature engineering method was computed for each method to ensure the best accuracy and complexity trade-off. The overall number of features in each sample was eleven, corresponding to each sensor in the hydraulic test rig, the optimal features for FI, PCA, the time-domain features, and RKSE were four, five, four, and nine, respectively.
The accuracies computed for all ML and DL classifiers were the results of dividing the fault data with component faults, into training and testing data, with percentages of 80% to 20% of the faulty data, respectively. Followed by applying a 10-fold cross validation technique for each classifier separately.
The parameters and optimizers for each ML method used in this section, were identical to the ones used in the sensor FDD experiment earlier in this paper. Moreover, some minor changes in the CNN and LSTM classifiers’ design and parameters was made from the previous experiment.
The CNN design consisted of only one 1D CNN layer of filter size 64, and kernel size of one. The layer was sequential which meant that the input layer was directly connected to the hidden layer(s) that was connected to the output layer. The activation function used was ReLU. Followed by the pooling layer that had a pooling size of one, maximum pooling was applied as the pooling function. Finally, a fully connected dense layer of size equivalent to the number of expected outputs, with the SoftMax activation function was created to make the classification process for the extracted features during the convolutional and pooling layers. The CNN optimizer used was Adam, as a stochastic gradient descent approach to optimize the network. The LSTM classifier applied had only one LSTM batch with two hundred hidden neurons that were sequential in order and nature. Followed by a fully connected dense layer of SoftMax activation function, Adam was the applied optimizer for the LSTM as well. The verbose, epochs, and batch size parameters were applied by testing various values and their effect on the classification accuracy, and they were set to zero, 10, and 20, accordingly.
The table below comprehend the component fault classification results when trained by faulty hydraulic test rig data, applying various ML and DL approaches using numerous feature engineering methods.
As shown in
Table 6, the feature selection approaches worked better than the extraction ones, such as, PCA and time-domain features, when dealing with traditional ML classifiers, to classify multi-variate time-series datasets. FI consistently showed the highest accuracy results as compared to the rest of the feature engineering methods when dealing with traditional ML classifiers. This was followed by R
kSE, which had a slightly lower accuracy than FI for traditional ML approaches, but showed consistency in all ML classifiers. Moreover, PCA showed the highest accuracy when applying DL classification algorithms, as compared to other feature selection approaches. FI and R
kSE were neck-to-neck when it came to the classification accuracy using the selected DL approaches. The time-domain features were the most accurate ones when applied to sliding-windows for univariate classification, as shown in experiment one. However, as spotted earlier in this experiment, in terms of extracting time-domain features from multi-variate datasets without applying sliding windows, this feature extraction method proved to be weaker than the rest of the approaches, when combined, irrespective of the ML or DL classifiers.
In comparison, KNN, CART, RF, and SVM showed a greater consistency in high accuracy results irrespective of the feature engineering method applied, including time-domain features irrespective of its weakness. CNN and LSTM had lower accuracies as compared to the traditional ML approaches mentioned earlier, and their accuracies dropped radically when the time-domain features were applied.
To conclude experiment two, it was important to know how to apply the trained models and saved parameters from experiment two at run-time, to make new real-time predictions. The input vector for prediction should have one readings of each sensor of the eleven sensors used for the training process, during the offline or training phase. (1) Fault detection—fault detection for new samples could be done by feeding the new sample into the trained and validated LSTM autoencoder model, to reconstruct the healthy form of the sequence. Then, the reconstructed sequence and the original one was compared by calculating the signal difference using Pearson’s autocorrelation. Finally, when the signal difference calculated was above or below the trained threshold, this would determine the existence of faults. (2) Fault diagnosis—in case a fault was detected using the detection step, the fault should be diagnosed by applying the necessary feature engineering approaches, then the new processed sample should be fed to the highest accuracy ML or DL trained classifier, suitable to the features chosen. In our case, based on the results shown in
Table 6, it was more accurate to use RF combined with FI to guarantee better accuracy and complexity trade-off.