2. Related Work
In IoT networks, cyberattack detection is typically a classification challenge. These challenges can be successfully resolved through the use of machine learning/deep learning approaches. An overview of various current machine learning/deep learning-based studies for IoT intrusion detection is provided in this section. To detect atypical attacks, Wang et al. [
8] developed a flow mining-based method. They used MIT Lincoln Laboratories DDoS data and traces from the Slammer and Code Red worms to validate the effectiveness of their methodology. A deep defense network security architecture was also developed by Huang et al. [
9], who suggested using data mining techniques to examine the alarms gathered by the distributed intrusion detection and prevention system (IDS/IPS). The author employed three distinct types of data mining methods to implement the prototype, and they used it in DDoS attack detection to assess the efficacy of the suggested defense architecture. In terms of the attack detection rate and FPR, it performed well. A technique based on the combination of a PCA and an optimized support vector machine (SVM) was presented by Thaseen and Kumar [
10]. SVM parameters and kernels were optimized using recommended automatic parameter selection, which reduced the training time and improved accuracy for a few attacks, such as R2L and U2R. In contemporary networks, other well-liked machine learning techniques, including multilayer perceptron (MLP), random forest (RF), and Naive Bayes (NB), have also been employed to identify threats [
11,
12,
13]. Unfortunately, shallow learning hinders the effectiveness of these conventional machine learning techniques, making them unable to offer a workable solution for a sizable amount of traffic data. Using several ML algorithms from the Weka Data Mining tool, Thanh and Lang [
14] examined and assessed the performance of Bagging, AdaBoost, Stacking, Decorate, Voting, and random forest. Compared to single classifiers, the ensembles used in [
14] that aggregate the results of their base classifiers using Stacking and Decorate procedures took longer to train and test than classifiers that employed other ensemble techniques.
A growing number of deep learning-based models have achieved exceptional performance as deep learning has advanced [
15,
16,
17]. An IDS based on the recurrent neural network (RNN) was proposed by Yin et al. [
18]. The design approach outperformed conventional classification techniques in terms of accuracy and detection rate for both binary and multiclass classification. To obtain more accurate detection by aggregating traffic characteristics, He et al. [
19] suggested an intrusion detection model based on long short-term memory (LSTM) and a multimodal deep autoencoder. To mitigate DDoS attacks in cloud computing, Jaber et al. [
20] employed principal component analysis and linear discriminant analysis in conjunction with a hybrid, nature-inspired metaheuristic algorithm called Ant Lion optimization for feature selection and artificial neural networks to classify and configure the cloud server. An IoT intrusion model leveraging CNN and grey wolf optimization (GWO) was applied to NID [
21]. To address the issue of class imbalance in the dataset under consideration, Zhang et al. created a two-branch CNN and utilized feature fusion [
22]. Their idea was more efficient in terms of execution time and had higher accuracy in detecting a minor class of anomalies. NID was applied to raw packet-level communication by Zhang et al. [
23]. To extract significant spatial and temporal information, they combined CNNs and LSTMs, which resulted in greater detection rates than when utilizing each of these components separately. The performances of eight distinct machine learning algorithms were examined in [
24] against six datasets, including KDD-99, NSL-KDD, UNSW-NB15, Kyoto2006+, and WSN-DS CICIDS2017. The algorithms include DNN, logistic regression (LR), NB, SVM, Adaptive Boosting (AB), KNN, DT, and RF. Despite having higher processing requirements, the deep learning classifier intuitively produced the greatest results when compared to the machine learning classifiers. An intrusion detection system (IDS) based on the deep LSTM algorithm, which uses recurrent neural networks (RNNs) and includes 90 hidden units spread across three hidden layers was created by Kasongo and Sun [
25]. The accuracy of the model was 99.51%. There are 5 SoftMax neurons in the last layer of the DFFL structure and 29 sigmoid neurons in the first layer. This article demonstrated a notable improvement in performance by comparing the outcomes with those of different machine learning techniques. They intend to investigate the effectiveness of every attack in the NSL-KDD dataset in future research.
Additionally, recent studies have demonstrated the effectiveness of CNN-BiLSTM architectures in different applications. For instance, Zhang et al. [
26] utilized a CNN-BiLSTM-attention model for stock price prediction. In [
27], Staffini applied a CNN-BiLSTM for macroeconomic time series forecasting to highlight new techniques that could be added to the set of tools available to a policymaker for forecasting macroeconomic data. Tang et al. [
28] improved power load prediction using a CNN-BiLSTM model, and Cui and Xia [
29] developed an EEG signal anomaly detection algorithm based on CNN-BiLSTM and compared with support vector machine to utilize the ability of CNN to automatically extract features and BiLSTM’s ability to efficiently process time series data. These studies underscore the versatility and robustness of the CNN-BilSTM architecture, further motivating its application in intrusion detection systems. Other deep learning models have also been employed for intrusion detection studies. In [
30], Naseer et al. employed the use of deep neural network structures, including CNN, autoencoders, and RNN, on the NSL-KDD dataset for real-world application in anomaly detection systems. In [
31], Dan Dongseong Kim provided a comprehensive survey to discuss the impact of taxonomy on adversarial learning using deep learning-based network intrusion detection systems. In [
32], Minshu He et al. proposed a framework for anomaly detection based on deep reinforcement learning, giving priority to outliers, state effect, and model transferability for reinforcement learning-based anomaly detection.
4. Experiments
All experiments were performed on a single server. The CPU of the server was AMD Ryzen 5 5600xU+00D7 6-Core
[email protected] GHz which was manufactured by Advanced Micro Devices in Santa Clara, CA, USA, and Windows 10 was installed. Python 3.9 programming language and Tensorflow were used as the deep learning framework to conduct the experiments. These resources provided a reliable and effective environment for our research and evaluation.
4.1. Description of the Datasets
4.1.1. NSL-KDD Datasets
The University of New Brunswick made the NSL-KDD dataset public. The NSL-KDD dataset is an upgrade of the KDDCup’99 dataset, which has inherent flaws, as shown by numerous analyses. NSL-KDD comprises the core records of the entire KDD dataset and is one of the most used datasets for analyzing network intrusion detection systems that can be applied as an effective benchmark to compare different intrusion detection methods, along with UNSW-NB15 and CICIDS-2017 [
43]. The elimination of redundant records, the availability of records in the training and testing datasets, and the inverse relationship between the number of selected records from each difficulty group and the percentage of records in the original KDD dataset are just a few of how NSL-KDD differs from its predecessor. The dataset contains 41 features, categorized into four groups, as listed in
Table 1. Nine elements make up the first group (basic), which includes essential details, including the protocol, service, and length. Thirteen features are represented by the second category (content), which includes details about the content, including login activities. Nine time-based elements are contained in the third group (time), including the number of connections that are connected to the same host in a two-second window. Ten host-based elements are included in the fourth (host) section, and they offer details about the connection to the host, including the frequency of connections with the same destination port number attempting to be accessed by other hosts. The NSL-KDD dataset contains the KDDTrain+dataset as the training set and the KDDTest+ and KDDTest-21 datasets as the testing set. It contains normal traffic and four different attack types, namely denial of service (DoS), root to local (R2L), user to root (U2R), and probing attacks (Probe), as shown in
Table 2.
4.1.2. UNSW-NB15 Datasets
UNSW-NB15 is a sophisticated dataset used in IDS research and is highly referenced in the literature. The IXIA Storm tool in the Cyber Range Laboratory of the Australian Centre for Cybersecurity (ACCS) produced the raw packets (network traces) that make up the UNSW-NB15 dataset. The dataset is simulated over 2.5 million network packets [
44]. Nine different attack types, namely, exploit, reconnaissance, denial-of-service, shellcode, generic, backdoors, worms, fuzzers, and analysis attacks, as well as non-anomalous packets, are included in this dataset. The dataset is highly skewed since over 87% of the packets are non-anomalous. The dataset features and descriptions are provided in
Table 3. The protocol feature, which identifies the protocols used by the hosts, such as TCP or UDP, is included in the first group (flow). The essential connection data, including the length of time and number of packets exchanged between the hosts, is represented by the second group (basic). Fourteen features are grouped. Content information, including base sequence numbers and window advertisement values, is sent by the third group (content) over the TCP. Additionally, it offers certain details about HTTP connections, including the amount of data sent through the HTTP service. There are eight characteristics in this group. Eight features, including packet arrival time and jitter, are included in the fourth group (time).
Table 4 describes the attacks in the dataset.
4.2. Data Pre-Processing
The primary goal of NIDs is to identify attack traffic. As a result, we start by filtering the two datasets, independently choosing the data samples that are marked as normal, and classifying the rest as DoS attacks. One-hot encoding of the categorical features and normalization of the numerical features are typically used to handle the pre-processing of the datasets. However, as previously mentioned, the NSL-KDD dataset contains a more precise amount of data for each attack category. Conversely, the UNSW-NB15 dataset contains a remarkably small number of records for categories such as fuzzers and worms. To address this problem, the training set employs the oversampling technique to ensure that each attack type contains the same number of records. We employ one-hot encoding and normalization for both datasets. Both datasets contain categorical features, which the deep learning model needs to translate into numerical values to produce accurate prediction results. Therefore, at the pre-processing stage, these columns were transformed into numerical values using the pandas Python Library’s get dummies function. Because label encoders generate numerous numbers in a single column, the model may interpret these values incorrectly as being in a specific sequence, which could affect the classification. For this reason, one-hot encoding is preferred over label encoders.
Normalization is the process of rescaling the data into a specific range to minimize redundancy and speed up the model’s training. This study also employs the use of min-max normalization [
44], which rescales the data range to [0, 1].
4.3. Evaluation Metrics
Five standard classification performance measurements are adopted in this study to comprehensively estimate the machine learning and deep learning models. The classification measures are all based on four elements: True Positives (TPs), True Negatives (TNs), False Positives (FPs), and False Negatives (FNs). The representations of the utilized metrics are as follows:
Accuracy represents the proportion of network activities that are correctly classified, including both DoS attacks and normal traffic. Equation (2) defines its calculation. Equation (3) represents precision as the proportion of detected DoS attacks that are correctly classified. Recall measures the proportion of actual DoS attacks that are corrected and detected, as indicated by Equation (4). The F1-score is a thorough assessment metric. Its definition is given in Equation (5) as the weighted harmonic average of precision and recall. The percentage by which normal traffic is regarded as DoS attacks is known as the false alarm rate (FAR). Equation (6) illustrates how this can be quantified.
6. Conclusions
This study presents a comparative analysis of deep CNN-BiLSTM and machine learning methods in intrusion detection scenarios. It evaluates the performance of deep learning and machine learning models using two benchmark datasets, NSL-KDD and UNSW-NB15, focusing on their effectiveness in detecting DoS attacks. The results demonstrate that the CNN-BiLSTM model outperforms the machine learning models in terms of accuracy, precision, recall, and F1-score and exhibits lower false alarm rates. Despite not introducing a new algorithm, this study provides valuable insights into the comparative reliability of deep learning and lightweight machine learning models in network intrusion detection to balance the trade-off between system cost and model performance. The inclusion of standard deviation in the accuracy metrics adds robustness to the performance comparison.
In future work, we aim to propose a deep learning model that can be deployed in a real-world environment, compare the detection of specific attacks such as DDoS attacks, and assess the model’s complexity and robustness by varying the training datasets. This will further enhance the practical applicability and robustness of intrusion detection systems in diverse and dynamic network environments.