1. Introduction
In recent years, the Internet of Things (IoT) has expanded rapidly in various fields such as healthcare, industry, and smart appliances, and has also become indispensable in our daily lives. However, information security challenges for devices deployed in IoT environments have become apparent due to their limited available resources derived from operational power and deployment cost constraints, that is, they are not equipped with sufficient resources to apply advanced security measures within the devices [
1]. In fact, in 2016, there was an attack by the Mirai Botnet that targeted vulnerable IoT devices [
2]. The Mirai Botnet launches attacks against specific IoT devices and initiates malware that communicates with a Command and Control (C2) server to form a botnet that is used to conduct large-scale Distributed Denial-of-Service (DDoS) attacks against its targets. Subsequently, Satori, a variant of the Mirai Botnet, emerged and formed a botnet by conducting zero-day attacks that mainly target unpatched vulnerabilities [
3].
Zero-day attacks are attacks that exploit undiscovered vulnerabilities in software before vendors or others take measures. One of the countermeasures against zero-day attacks on IoT devices is the installation of a Network Intrusion Detection System (NIDS) on the IoT devices’ network to monitor network traffic and notify network administrators when signs of an intrusion are detected. NIDS is generally installed on a network and detection is based on the traffic captured on the network. Therefore, this eliminated the need to execute intrusion detection on resource-constrained IoT devices.
IDS detection methods can be broadly classified into two types: signature-based IDS and anomaly-based IDS. Signature-based IDSs predefine the communication patterns of known attacks as attack signatures and execute intrusion detection by examining the similarity of captured traffic to the predefined attack signatures. However, signature-based IDSs cannot detect unknown attacks until IDS vendors release new attack signatures, namely, it is impossible to handle zero-day attacks since no signature represents the zero-day attacks. Anomaly-based IDSs, on the other hand, predefine the normal state of a monitored network and compare the traffic to this predefined normal state at the time of detection to determine whether there is a deviation from the normal state, usually caused by attacks. Therefore, an anomaly-based IDS can detect unknown attacks, whereas a signature-based IDS requires a signature update for the detection. However, there is a concern that anomaly-based IDSs may increase the false positive rate because normal observations may exceed the predefined normal range [
4]. Considering that IoT networks generally consist of heterogeneous devices with different hardware and operating systems that lead to a diverse attacks compared with non-IoT networks [
5], the use of anomaly-based IDS that can detect unknown attacks without updating signatures in IoT networks is more suitable.
A unique characteristic exists in that the amount of traffic data per network is limited even though a large number of devices generally exist in IoT networks. This results in a limited sample size for training intrusion detection models for IDS systems. Therefore, there are methods that utilize distributed learning for building and improving intrusion detection models in IDS. Combining distributed learning allows an anomaly-based IDS to collect attack-related samples from numerous networks, which enables the learning of intrusion detection models that can detect a variety of attacks, even with a small number of samples per network. However, distributed learning involves the direct exchange of learning data between the participating devices and the aggregation server, which raises privacy concerns. Another concern is that the direct exchange of learning data may consume significant communication resources in IoT networks and aggregation servers, which can potentially result in a considerable communication overhead.
Federated Learning (FL) [
6] is an algorithm that builds a global model by aggregating model update information from clients while keeping the learning data distributed on the client side to address the aforementioned privacy and overhead concerns. In IoT networks, the combination of anomaly-based IDS and FL has a high affinity in terms of resource limitations, device quantity, device diversity, zero-day attack countermeasures, and privacy protection, and some integrated IDSs have been proposed [
7,
8]. These proposals leverage the characteristics of IoT networks and FL to address zero-day attacks. However, there remain challenges, e.g., it is unable to share attack information obtained from different device types, and there is no discussion on how to extract and label zero-day attacks.
In this paper, we introduce IDAC, a novel method that aggregates attack candidates extracted based on communication traffic from each IoT network using FL to address the issue of sharing attack information obtained from different device types. Attack candidate extraction is executed by applying outlier detection, which is an unsupervised learning method, to the entire network traffic. Subsequently, by training the extracted candidate attacks using novelty detection, the proposed method builds an intrusion detection model to classify whether another new input candidate is included in the learned candidate attacks. This approach allows it to realize an autonomous detection of zero-day attacks in environments with multiple device types.
The main contributions of this paper are as follows.
Proposal of IDAC, an intrusion detection method that can be applied to zero-day attacks by sharing attack candidate information through FL to address the issues of the conventional methods that cannot label attack candidates autonomously.
Confirmation that the proposed IDS with IDAC can achieve comparable detection performance against various attacks including zero-day attacks by suppressing false positives and missed detection in the extraction of attack candidates through a computer simulation-based evaluations using the BoTIoT dataset [
9].
Verification that sharing attack candidates can improve both attack detection performance and attack detection time.
Confirmation that IDAC has the capability for real-time processing of incoming traffic by resolving the issue in the flow conversion process.
3. IDAC: Intrusion Detection Based on Attack Candidate
3.1. System Overview
To address the aforementioned issues that conventional intrusion detection methods fail to label captured traffic data autonomously and to share attack information among various kinds of devices, this paper proposes Intrusion Detection based on Attack Candidate (IDAC). IDAC is a novel approach for building intrusion detection models by aggregating attack candidates accepting a certain level of false positives (FP) with One-Class Support Vector Machine (Online OC-SVM) and sharing the intrusion detection models through Federated Learning (FL). This paper assumes an intrusion detection system based on IDAC is installed on each IoT network as shown in
Figure 1a, and a central server is connected to facilitate FL among the networks. Network traffic is mirrored at the IoT Gateway (IoT GW), and IDS with IDAC receives the network traffic for intrusion detection within each IoT network as shown in
Figure 1b.
Figure 2 depicts the intrusion detection process of IDAC installed on each network. IDS with IDAC executes intrusion detection sequentially each time the latest network traffic is inputted. The process from the input of network traffic to the detection of intrusions is carried out through the following phases:
Conversion phase: Convert traffic to flow;
Extraction phase: Extract attack candidates from flow;
Build and execution phase: Build detection model and execute intrusion detection;
Improvement phase: Improve detection models using FL.
The following subsections describe detailed procedures in each phase.
3.2. Conversion Phase
In this phase, time windows are created at a fixed interval for mirrored continuous real-time traffic. The set of packets existing within each time window is then converted into flow information, which is subsequently transformed into feature vectors. Based on the feature vectors of the flows, IDS with IDAC executes intrusion detection for each flow.
Since network traffic is continuously inputted into a conversion mechanism in real-time, detection target time windows of length
, denoted as
, are sequentially created as shown in
Figure 3. This mechanism adopts a similar concept of a sliding window. The mechanism invokes flow conversion within a time window once the window is fully occupied and generates a new time window afterward. However, flow conversion concluded within
may fail to capture the unique characteristics of long-term attacks that have more duration than
. Therefore, the mechanism simultaneously generates a reference time window
of length
to refer to long-term characteristics.
is continuously updated to maintain a time series
, and is referenced at the point of flow conversion in
. Although it can achieve high detection accuracy with
, processing real-time traffic under this condition leads to substantial resource consumption for intrusion detection. Reducing
and
can minimize resource consumption but raises concerns about the inability to capture long-term characteristics. Therefore, setting
is essential to strike a balance between detection effectiveness and resource efficiency.
IDAC utilizes Argus [
13], a tool designed for auditing network activity, to convert network traffic to flow information. Argus can transform traffic from network interfaces or files in the
Packet Capture (pcap) format into network flow information. In the flow conversion process using Argus, packet data contained within the reference time window is transmitted to Argus, and Argus retrieves flow information for the packet data within the detection target time window. Subsequently, ten critical features that can be used for intrusion detection are extracted from the flow information. The selected features are listed in
Table 1. The features to be extracted are based on those used in the creation of the BoTIoT [
9] dataset as it identifies ten features that enable the most accurate detection through statistical methods.
After extracting the features, min–max normalization is applied to them. This process aims to enhance the classification performance in the Support Vector Machine (SVM) used for candidate extraction and intrusion detection by scaling the feature values to a fixed range. Given the need to process traffic in real-time in IDS with IDAC, normalization is executed by setting the maximum and minimum values for each feature based on the ordinal value ranges that they generally take. In this study, the definitions of the values for normalization are provided in
Table 2.
3.3. Extraction Phase
In the extraction phase, unsupervised anomaly detection is executed on time series flow data converted in the previous phase. Anomaly detection is conducted by applying Outlier Detection within the latest preprocessed time series data
, thereby extracting outliers that are considered anomalies. One-class SVM is utilized for anomaly detection with parameters in
Table 3. IDS with IDAC regards these outliers as autonomously extracted anomalous information and treats them as attack candidates.
However, to prevent attack flows from being continuously identified as attack candidates when the number of attack flows exceeds the number of normal flows in the targeted time series data, flows identified as outliers are recorded for a certain period and excluded from the detection target time series data. This approach helps in suppressing false positives. The exclusion applies to flows that the tuples (Protocol, Source Address, Destination Address, Source Port, Destination Port) are completely matched.
3.4. Build and Execution Phase
The attack candidate data require further transformation to build an intrusion detection model by aggregating the attack candidates extracted in the previous process. The outline of the data transformation process in this stage is shown in
Figure 4. Intrusion detection is executed by determining whether new traffic is included in the learned attack candidates. Moreover, developing a model that supports attack information sharing among networks and online learning via FL is crucial. For the requirements to achieve the aforementioned objectives, algorithms that satisfy the following conditions are required.
-
Requirement 1: A one-class classification algorithm that can determine whether the inference data represent an attack based on the learned attack candidates.
-
Requirement 2: A parametric model whose form is predetermined and can be explained by its parameters.
-
Requirement 3: Support online learning that allows for updates based solely on new data rather than batch learning to facilitate real-time traffic processing.
-
Requirement 4: Capable of prioritizing intrusion detection processing on the most recent learning data to adapt to changes in attack characteristics.
In IDAC, a linear Online OC-SVM based on Stochastic Gradient Descent (SGD), which fulfills the aforementioned requirements, is employed to execute learning using attack candidates. Online OC-SVM, a machine learning technique derived from OC-SVM, is a parametric model, and thereby meets conditions Requirements 1 and 2. Moreover, its capability for online learning satisfies Requirement 3, and it also meets Requirement 4 by virtue of its online learning nature. Furthermore, being an online learning model allows faster learning and inference processing compared to traditional OC-SVM.
In Online OC-SVM, and are the primary hyperparameters, and the classification performance of models significantly varies based on the parameters. This approach ensures that the model is not only adaptable and efficient in real-time environments but also capable of continuous improvement and adjustment to emerging threat patterns.
However, since linear models cannot classify linearly inseparable data, it is challenging to build a high-accuracy classifier using Online OC-SVM alone after learning from attack candidates. Therefore, kernel approximation techniques are employed to map data that are linearly inseparable into a higher-dimensional feature space where linear classification is executed. Various approximation methods such as Nyström approximation and Random Fourier Features (RFF) have been proposed with their unique advantages. In this paper, IDAC uses RFF because it has compatibility with FL and does not require initial parameter sharing.
The threshold for anomaly scores varies depending on the learned attack candidates. Therefore, identifying the anomaly score distribution and determining a threshold that marks outliers in this distribution are essential steps after generating the model. However, comprehending the distribution in a feature space of three dimensions or more is challenging due to resource consumption. Thus, Principal Component Analysis (PCA) is used to reduce the information of each flow to a lower-dimensional space. IDAC reduces the feature space to two dimensions using PCA after extracting features listed in
Table 1. The parameters for PCA are predetermined as IDAC executes real-time traffic processing. Although the dimensionality is arbitrary, the dimensionality of the reduced feature space is set to two to facilitate swift scanning of the anomaly score distribution.
The search for anomaly scores is conducted after the dimensionality reduction and kernel approximation processes, followed by the intrusion detection model building using Online OC-SVM. The search for anomaly scores initially sets a grid of
points within the possible value range on the two-dimensional feature space
. Then, a set of anomaly scores
represented by Equation (
1) is created by inputting the score
at
into the built intrusion detection model.
Subsequently, the
z-score
for each anomaly score is computed according to Equation (
2) assuming that
follows a normal distribution.
where
is the mean of
and
is the standard deviation of
. Finally,
is determined according to Equation (
3).
After conducting the intrusion detection based on the threshold , the local intrusion detection model is refined through online learning with the identified attack candidates. This update process is referred to as local aggregation. This step is required to update to reflect the altered anomaly scores distribution of the model after the model update. In the context of enhancing the intrusion detection model with FL, a phase of retraining the local model with the gathered attack candidates is required. This retraining occurs after the last enhancement and before the commencement of the next enhancement, and thus preserving the attack candidates collected during this interval becomes essential for the next model updates.
3.5. Improvement Phase
In IDS with IDAC, the central server receives the model parameters of local intrusion detection models from each local node to aggregate them and then redistributes the aggregated model parameters back to each local not for model update every time a certain local model update occurs. This update process is referred to as global aggregation. Upon receiving the aggregated new model parameters, clients overwrite their model parameters with the received ones to synchronize with the parameters of the global model and then resume online learning. The aggregation is based on the FedAvg algorithm that incorporates a momentum strategy into FedAvgM [
14].
The global aggregation is executed in a synchronous manner that temporarily pauses the local aggregation with a certain number of attack candidates within each network participating in the intrusion detection model improvement and resumes the intrusion detection process after the global aggregation using FL is completed. Initially, the aforementioned detection and learning processes are repeated on each network after initializing the local intrusion detection models in the central server and each network. Subsequently, a request for aggregation is sent to the central server once learning with a certain number of attack candidates has been conducted. Upon receiving the request, the central server collects the current parameters of the local models from each network to aggregate them and conducts a process that receives the aggregated parameters and applies them to the local models for n rounds.