research-article

Open access

Open Set Dandelion Network for IoT Intrusion Detection

Authors:

Jiashu Wu,

Hao Dai,

Kenneth B. Kent,

Jerome Yen,

Chengzhong Xu, and

Yang WangAuthors Info & Claims

ACM Transactions on Internet Technology, Volume 24, Issue 1

Article No.: 4, Pages 1 - 26

https://doi.org/10.1145/3639822

Published: 22 February 2024 Publication History

PDF eReader

Abstract

As Internet of Things devices become widely used in the real-world, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this article, we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%. The contribution of each OSDN constituting component, the stability and the efficiency of the OSDN model are also verified.

1 Introduction

Internet of Things (IoTs) devices become prevalent in many real-world applications [6, 22, 38, 40]. However, they tend to be computational and energy-constrained, which hinder the deployment of effective intrusion detection mechanisms. Together with the lack of maintenance, these limitations compromise the security of IoT devices, making them vulnerable to attacks [20, 42]. To protect the safety of IoT devices, an effective intrusion detection mechanism becomes indispensable [39].

The intrusion detection for IoT has drawn wide attention from the academic community. For instance, signature-based intrusion detectors were proposed [8, 25, 26], which detected malicious behaviours by pattern matching with sophisticated rule repositories. With the rapid growth of machine learning techniques, some machine learning and deep learning-based intrusion detectors were also proposed [28, 29, 44] and achieved satisfactory performance. However, these traditional intrusion detection methods either require a sophisticated, thorough and up-to-date rule repository, or a fully annotated training dataset. These prerequisites either require comprehensive expertise knowledge to build and update, or require a tremendous amount of efforts to annotate. Besides, due to the limited storage and communication capability of the IoT device and the concerns of user privacy, it further hinders the availability of an IoT intrusion rule repository or training dataset. Under such data-scarcity [39], these traditional intrusion detectors suffer from compromised performance.

To work around the data-scarcity, domain adaptation-based (DA) intrusion detection methods [17] can be leveraged by transferring the intrusion knowledge from a knowledge-rich source network intrusion (NI) domain to assist the intrusion detection for the target IoT intrusion (II) domain. Popular solutions [39, 42] performed intrusion knowledge transfer and meanwhile masked the heterogeneities between different domains and achieved satisfying outcomes.

Despite the effectiveness of these DA-based methods, they operate under the assumption that both source and target domains share exactly the same type of intrusions. However, this assumption is sometimes unrealistic in the real-world as the IoT intrusion domain can constantly confront newly-emerged intrusion strategies [23]. Therefore, this assumption hinders the applicability of traditional DA-based intrusion detectors. As a more general solution, Open-Set Domain Adaptation (OSDA) [10, 30] relaxes this assumption and allows the target domain to contain newly-emerged intrusions unobserved in the source NI domain. Some OSDA methods were proposed [14, 18, 21], which tackled this challenging setting via hyperspherical feature space learning, semantic recovery learning and progressive graph learning, and so on. However, these research efforts all suffered from some drawbacks, such as failing to utilise the graph embedding alignment in the learned hyperspherical feature space, lacking the exploitation of the correction effect of the semantics, and so on, which therefore provided room for improvement of a more effective OSDA-based intrusion detector.

In this article, inspired by the structure of the dandelion, we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous DA in an open-set manner. The OSDN model tackles the IoT data scarcity by transferring intrusion knowledge from a knowledge-rich source NI domain to assist the knowledge-scarce IoT target domain. It relaxes the closed-set assumption and can effectively detect both known and unknown intrusions faced by IoT devices, making it applicable in real-world applications. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space with the goal of grouping each intrusion category compactly and meanwhile separating different intrusion categories, i.e., achieving inter-category separability and intra-category compactness, the foundation for an accurate intrusion detector to work on. The dandelion-based target membership mechanism then constructs the target dandelion. Then, the dandelion angular separation mechanism is leveraged to enhance inter-category separability, together with the dandelion embedding alignment mechanism, which transfers intrusion knowledge via a graph embedding perspective. The discriminating sampled dandelion mechanism is also used to promote intra-category compactness. Besides, trained using both known and generated unknown intrusion knowledge, the intrusion classifier produces probabilistic semantics, which forms a semantic dandelion and in turn emphasises easily-confused categories and provide correction for better inter-category separability. Holistically, these mechanisms form the OSDN model that can effectively transfer intrusion knowledge for more accurate IoT intrusion detection.

In summary, the contributions of this article are three-fold as follows:

–

We realise the benefits of the Open-Set DA technique to perform intrusion knowledge transfer and facilitate more accurate intrusion detection for the data-scarce IoT scenarios. The OSDA-based intrusion detector also relaxes the closed-set assumption, making it a more robust intrusion detector in the real-world.

–

We formulate the intrusion feature space into a dandelion-like feature space. The proposed OSDN model leverages mechanisms such as the dandelion angular separation mechanism (DASM), the dandelion embedding alignment mechanism (DEAM), the discriminating sampled dandelion mechanism (DSDM) and the semantic dandelion correction mechanism (SDCM) to promote inter-category separability and intra-category compactness in the dandelion feature space, which is the foundation for an accurate intrusion detector to work on.

–

We conduct comprehensive experiments on five widely recognised intrusion detection datasets and verify the effectiveness of the OSDN model against three state-of-the-art baselines. A $16.9\%$ performance boost is achieved. Besides, the contribution of each OSDN constituting component, the stability and the efficiency of the OSDN model is also verified.

The rest of the article is organised as follows: Section 2 categorises related works and summarises the research opportunities. Section 3 presents model preliminaries and the OSDN model architecture, followed by Section 4, in which the detailed mechanisms constituting the OSDN model are presented. Section 5 presents the experimental setup and detailed experimental analyses. The last section concludes the article. We provide an acronym table (Table 5) and a notation table (Table 6 and 7) for better readability in Appendix A.

Table 1.

Tasks	K $\rightarrow$ G, $\mathcal {O}$ = 0.6		N $\rightarrow$ W, $\mathcal {O}$ = 0.4		C $\rightarrow$ W, $\mathcal {O}$ = 0.5		K $\rightarrow$ B, $\mathcal {O}$ = 0.5		K $\rightarrow$ G, $\mathcal {O}$ = 0.2
Methods	ACC	IND	ACC	IND	ACC	IND	ACC	IND	ACC	IND
AMS	42.90	54.26	36.02	58.30	42.70	58.98	42.12	62.48	44.02	57.14
SROSDA	44.02	57.15	34.32	57.28	37.25	57.36	43.13	62.02	43.56	55.78
PGL	40.42	57.17	43.85	58.38	45.63	62.18	42.80	59.52	39.95	57.14
OSDN	76.18	89.94	61.79	64.51	59.20	63.10	53.78	67.42	75.83	90.11
Tasks	K $\rightarrow$ W, $\mathcal {O}$ = 0.71		C $\rightarrow$ B, $\mathcal {O}$ = 0.5		C $\rightarrow$ W, $\mathcal {O}$ = 0.66		C $\rightarrow$ W, $\mathcal {O}$ = 0.33		Average
Methods	ACC	IND	ACC	IND	ACC	IND	ACC	IND	ACC	IND
AMS	38.36	59.44	49.26	65.92	41.44	57.98	42.98	59.22	42.20	59.30
SROSDA	36.24	57.88	49.56	66.33	38.44	56.36	38.78	58.46	40.59	58.74
PGL	34.52	46.89	51.53	67.68	39.92	60.10	45.05	61.41	42.63	58.94
OSDN	75.05	78.31	56.22	69.56	57.32	62.31	56.39	64.94	63.53	72.24

Table 1. The Intrusion Detection Accuracy Results

Table 2.

Group	Experiment Setting		N $\rightarrow$ W, $\mathcal {O}$ = 0.40		C $\rightarrow$ G, $\mathcal {O}$ = 0.71		K $\rightarrow$ W, $\mathcal {O}$ = 0.71		Average
Group	Experiment Setting		ACC	IND	ACC	IND	ACC	IND	ACC	IND
A	$\alpha _{U}=0$		54.87	60.45	72.36	82.97	69.00	73.06	65.41	72.16
B	$\beta _{\mathcal {S}}=0$	$\beta _{\mathcal {T}}=0$
B1	✕	$\checkmark$	52.62	59.82	68.51	80.45	60.24	67.08	60.46	69.12
B2	$\checkmark$	✕	55.29	60.49	66.96	80.25	63.31	68.37	61.85	69.70
B3	✕	✕	54.47	61.44	72.20	83.72	67.10	72.23	64.59	72.46
C	$\delta =0$		55.89	62.22	67.34	80.05	57.27	65.24	60.17	69.17
D	$\theta =0$		45.15	58.35	58.10	77.02	48.22	58.86	50.49	64.74
E	Discriminating
E	Strategy
E1	$\gamma =0$		56.55	61.61	67.21	78.43	59.89	66.71	61.22	68.92
E2	Domain Adv		54.13	60.03	72.55	83.69	60.03	66.70	62.24	70.14
F	No DA		44.22	57.26	42.87	60.60	43.16	57.13	43.42	58.33
	$\alpha _{U}=0.1,\delta =0.001$
Full	$\beta _{\mathcal {S}}=\beta _{\mathcal {T}}=0.75$		61.79	64.51	75.13	84.34	75.05	78.31	70.66	75.72
	$\gamma =1.0,\theta =1.0$

Table 2. Ablation Study Results for Five Ablation Study Groups

Table 3.

Table 4.

2 Related Work

In this section, we introduce the related works in a categorised manner and outline our research opportunities. In Figure 1, we summarise the traditional IoT intrusion detection methods, their data dependency and their drawbacks, which reflect the merits of the domain adaptation-based intrusion detection methods for the data-scarce IoT scenarios. The OSDN method belongs to the open-set domain adaptation-based intrusion detector.

Fig. 1.

2.1 Traditional Intrusion Detection

Intrusion detection has drawn wide attention from the research community. Traditional intrusion detection methods, including signature-based intrusion detectors [7, 25, 26], which require a sophisticated rule repository for decision-making. It can only detect malicious intrusions if their patterns match certain rules in the repository. Anomaly-based intrusion detectors [4, 5, 33, 37] are also popular. These methods need to go through a comprehensive training process based on a well annotated training dataset to learn the patterns of normal traffic behaviours and then flag any traffic that deviates from the normal patterns. With the rapid advance of machine learning and deep learning techniques, ML and DL-based intrusion detectors are also widely used. Possible methods include multi-kernel SVM [29], isolation forest [9] and deep learning models such as autoencoders [24, 28] and capsule network [44], and so on.

However, all these traditional intrusion detection methods may be hindered by the IoT data-scarcity due to their strong data dependency on a well-built intrusion rule repository or a finely-annotated training dataset. Building an intrusion rule repository requires sophisticated expertise knowledge, and can hardly be thorough and up-to-date. Besides, finely annotating a training dataset is both labour and time-intensive. Without enough annotated datasets, the learning process of anomaly-based, ML-based and DL-based methods is significantly hindered, resulting in compromised efficacy. Therefore, it naturally leads to the domain adaptation-based solutions, which can work under data-scarce IoT scenarios by performing intrusion knowledge transfer, a merit that outperforms traditional intrusion detection methods.

2.2 Domain Adaptation for Intrusion Detection

Domain adaptation can transfer intrusion knowledge from a knowledge-rich source domain to facilitate more accurate intrusion detection for the target domain. Hence, it possesses the merit to comfortably work under the data-scarce IoT scenario. Wu et al. [42] proposed a Joint Semantic Transfer Network, aiming to address the IoT intrusion detection problem under the semi-supervised heterogeneous DA setting. Later, the Geometric Graph Alignment method was also proposed by Wu et al. [39] to tackle the intrusion detection for completely unsupervised target IoT domains. There are other DA methods such as [19, 41, 43], which performed intrusion knowledge transfer via Wasserstein distance minimisation, adversarial learning, Pareto optimal solution searching, adaptive recommendation matching, and so on.

However, the traditional domain adaptation methods work under the closed-set assumption that the intrusion categories in both source and target domains are exactly the same. Hence, these methods cannot tackle the case in which new IoT intrusions emerge as time goes by, limiting their applicability in the real-world.

2.3 Open-Set Domain Adaptation for Intrusion Detection

Open-Set DA methods relax the closed-set assumption of traditional DA methods and allow the target IoT domain to possess new intrusions unobserved in the source domain. Jing et al., [14] presented an open-set DA method with semantic recovery to better exploit the semantic information of the unknown target intrusions. However, it put no effort to explore the possibility brought by the hyperspherical structure formulation with excellent inter-category distinguishability. Li et al., [18] explored the open-set DA problem via the angular margin separation network. Despite its effectiveness, it lacked finer alignment achievable by graph embedding and ignored the correction effect of the semantics. Besides, Luo et al., [21] investigated the graph embedding-based open-set DA solution. However, the proposed Progressive Graph Learning (PGL) also failed to investigate the usefulness of angular-based hyperspherical space with excellent separability and compactness.

2.4 Research Opportunity

The OSDN model transfers intrusion knowledge via the dandelion-based feature space that emphasises both inter-category separability and intra-category compactness, which is lacked by previous open-set DA methods as in [14, 21]. Besides, the graph embedding alignment can achieve both finer feature space alignment and tighter intra-category structure via adversarial learning. Such mechanisms were not attempted in [14, 18]. Moreover, the OSDN model leverages the semantic dandelion correction mechanism, the utilisation of the semantic dandelion fills the void in [18]. The semantic correction is also lacked in these aforementioned methods. By combining these methods to form a holistic framework, the OSDN model can perform finer intrusion knowledge transfer and benefit IoT intrusion detection.

3 Model Preliminary and Architecture

In this section, we introduce the preliminaries and the architecture of the proposed OSDN model.

3.1 Model Preliminary

The OSDN model works under the unsupervised open-set DA setting with heterogeneities exist between domains. Following common notations in [42], we denote the source NI domain $\mathcal {D}_{S}$ as follows:

$\begin{equation} \begin{split} & \mathcal {D}_{S} = \lbrace \mathcal {X}_{S}, \mathcal {Y}_{S}\rbrace = \lbrace (x_{S_i}, y_{S_i})\rbrace , i \in [1, n_{S}], \\ & x_{S_i} \in \mathbb {R}^{d_{S}}, y_{S_i} \in [1, K]\,, \end{split} \end{equation}$

(1)

where $\mathcal {X}_{S}$ contains $n_{S}$ source NI domain traffic features, each feature vector is represented in $d_{S}$ dimensions. $\mathcal {Y}_{S}$ is the corresponding intrusion category label within a total number of K categories, one normal category and others are intrusion categories. Similarly, the target II domain $\mathcal {D}_{T}$ is defined as follows:

$\begin{equation} \begin{split} & \mathcal {D}_{T} = \lbrace \mathcal {X}_{T}, \mathcal {Y}_{T}\rbrace = \lbrace (x_{T_i}, y_{T_i})\rbrace , i \in [1, n_{T}], \\ & x_{T_i} \in \mathbb {R}^{d_{T}}, y_{T_i} \in [1, K^{\prime }], K^{\prime } \gt K\,. \end{split} \end{equation}$

(2)

Under the open-set DA setting, the intrusion categories of the source NI domain is a subset of the intrusion categories of the target II domains, i.e., $\mathcal {Y}_{S} \subset \mathcal {Y}_{T}$ , $K^{\prime } \gt K$ . Both domains share K common intrusion categories. Furthermore, the target II domain contains $K^{\prime } - K$ new intrusion categories unobserved in the source domain. Under the unsupervised setting, the ground truth labels of the target II domain remain agnostic during the training process. As a heterogeneous DA problem, heterogeneities present between domains, e.g., $d_{S} \ne d_{T}$ .

3.2 The OSDN Architecture

The architecture of the OSDN model has been presented in Figure 2. To perform intrusion knowledge transfer, features in each domain will be normalised to form a unit hyperspherical space and then be projected into a $d_{C}$ -dimensional common feature subspace (the grey box) by its corresponding feature projector (the trapezoids). The feature projector E is defined as follows:

$\begin{equation} \begin{split} & f(x_i) = {\left\lbrace \begin{array}{ll} E_{S}(x_i) & \text{if $x_i \in \mathcal {X}_S$} \\ E_{T}(x_i) & \text{if $x_i \in \mathcal {X}_T$} \end{array}\right.} \\ & f(x_i) \in \mathbb {R}^{d_{C}}\,. \end{split} \end{equation}$

(3)

As illustrated in Figure 3, the common feature subspace aims to group each shared intrusion category in a compact manner (each pappus of the dandelion, i.e., intra-category compactness), and meanwhile achieves excellent separability between intrusion categories, i.e., inter-category separability. For these unknown new intrusions in the target II domain, since their number is agnostic, therefore, instead of deliberately grouping them in a brute-force manner, the dandelion-analogous common feature subspace allows them to spread in any gap between pappuses to promote distinguishability between shared and unknown intrusion categories. As visualised in Figure 3, by making the common feature subspace analogous to the structure of the dandelion, i.e., achieving excellent intra-category compactness and inter-category separability, the shared classifier C can then make accurate intrusion detection decisions.

Fig. 2.

Fig. 3.

The source dandelion can be formed directly since the source domain is completely supervised. Then, the dandelion-based target membership mechanism (the red box in Figure 2) is used to form the target dandelion based on the spatial relationship between target instances and the source dandelion. Once both source and target dandelions are formed, the dandelion angular separation mechanism (the orange boxes in Figure 2) is utilised to enhance the inter-category separability in each dandelion. Besides, a dandelion embedder is leveraged to generate graph embeddings for dandelions and it is used in two ways (the brown boxes in Figure 2): the graph embeddings of source and target dandelions are aligned to promote better alignment between domains; moreover, sampled child dandelions are produced and their graph embeddings need to confuse a discriminator to achieve finer intra-category compactness. To better train the shared intrusion classifier C, the source domain data provides supervision information. To equip the intrusion classifier with knowledge of target unknown intrusions, unknown instances residing in the pappus gaps in the source dandelion are generated for unknown intrusion training. Lastly, the probabilistic semantic yielded by the shared intrusion classifier also works as a correction to deliberately emphasise easily-confused categories and remind the dandelion angular separation mechanism to separate them, forming a correction loop (the purple box in Figure 2).

Finally, by forming these mechanisms into a holistic model, fine-grained intrusion knowledge transfer can be achieved and the shared and unknown intrusion categories will be well-separated so that the shared classifier C can enjoy excellent intrusion detection efficacy for the target II domain.

4 The OSDN Algorithm

In this section, we present the detailed mechanism of each OSDN constituting component and the overall optimisation objective of the model.

4.1 Dandelion-based Target Membership Mechanism (DTMM)

The source dandelion can be easily formed based on its supervision information. Then, the source dandelion will guide the membership decision for unsupervised target instances to form the target dandelion. For each source intrusion category i, the maximum intra-category deviation $d_{max}^{i}$ will be calculated as follows:

$\begin{equation} \begin{split} d_{max}^{(i)} &= \text{max}\left(1 - \text{COS}\left(x_{S_j}^{(i)}, \mu _{S}^{(i)}\right)\!\right)\!, j \in \left[1, n_{S}^{(i)}\right]\!, \\ \mu _{S}^{(i)} &= \frac{1}{n_{S}^{(i)}} \sum _{j=1}^{n_{S}^{(i)}} x_{S_j}^{(i)}\,, \end{split} \end{equation}$

(4)

where $COS()$ stands for Cosine Similarity, $n_{S}^{(i)}$ denotes the number of instances in the ith intrusion category in the source domain, $\mu _{S}^{(i)}$ denotes the mean of the source intrusion category i and $x_{S_j}^{(i)}$ means the jth instance of the source ith intrusion category. Then, each target instance will be assigned to its nearest source category i if it resides within the maximum deviation range of source category i, i.e.,

$\begin{equation} \begin{split} y_{T_j}^{D} &= {\left\lbrace \begin{array}{ll} \underset{i}{\mathrm{argmin}} (1 - COS(x_{T_j}, \mu _{S}^{(i)})) & \text{if $1 - COS(x_{T_j}, \mu _{S}^{(i)}) \le d_{max}^{(i)}$} \\ K+1 & \text{otherwise}\,, \end{array}\right.} \end{split} \end{equation}$

(5)

where $y_{T_j}^{D}$ represents the dandelion-based membership for the jth target instance $x_{T_j}$ . Otherwise, that target instance will be assigned to the unknown category $K+1$ to avoid deteriorating the compactness of its closest intrusion category. Unlike methods such as [14, 18] that perform K-means clustering of unknown intrusions, the OSDN assigns all unknown intrusions into a single category $K+1$ and hence does not rely on the availability of the prior knowledge on the number of unknown intrusion categories and is more practical in the real-world. Besides, the OSDN model does not deliberately enforce all unknown target instances to reside at a single place, it allows unknown intrusions to reside at any pappus gap in the target dandelion. Deliberately aligning unknown target instances coming from different intrusion categories may cause negative transfer.

4.2 Dandelion Angular Separation Mechanism (DASM)

To increase the separability between known intrusion categories and meanwhile enhance the discriminability between known and unknown intrusion categories, i.e., enlarge the gap between pappuses, the OSDN model will achieve these goals from an angular perspective. First, the centroid of each intrusion category will be calculated. Then, the source category pair-wise Cosine similarity matrix $CS_{S}$ will be calculated as follows:

$\begin{equation} \begin{split} CS_{S} &= \begin{bmatrix} CS_{S}^{11} & \boldsymbol {CS_{S}^{12}} & \boldsymbol {\cdots } & \boldsymbol {CS_{S}^{1K}}\\ CS_{S}^{21} & CS_{S}^{22} & \boldsymbol {\cdots } & \boldsymbol {CS_{S}^{2K}}\\ \vdots & \vdots & \ddots & \boldsymbol {\vdots }\\ CS_{S}^{K1} & CS_{S}^{K2} & \cdots & CS_{S}^{KK} \end{bmatrix}\!,\\ CS_{S}^{ij} &= COS\left(\mu _{S}^{(i)}, \mu _{S}^{(j)}\right)\!, \end{split} \end{equation}$

(6)

where $CS_{S}^{ij}$ represent the Cosine similarity between the ith and jth intrusion category of the source NI domain. By minimising the sum of the upper triangle of the matrix $CS_{S}$ , it enlarges the inter-category angular divergence. The source dandelion separation loss $\mathcal {L}_{SS}$ is defined as follows:

$\begin{equation} \mathcal {L}_{SS} = \frac{2}{K(K-1)} \sum _{i=1}^{K-1} \sum _{j=i+1}^{K} CS_{S}^{ij}\,. \end{equation}$

(7)

The target dandelion Cosine similarity matrix $CS_{T}$ and the corresponding target dandelion separation loss $\mathcal {L}_{ST}$ are defined similarly. By minimising both $\mathcal {L}_{SS}$ and $\mathcal {L}_{ST}$ , it promotes better dandelion inter-category separability from an angular perspective.

4.3 Dandelion Embedding Alignment Mechanism (DEAM)

To further promote a finer alignment between the source and target dandelions, a dandelion graph embedder is used to produce the graph embeddings for both dandelions. To achieve this, each dandelion is formulated as a graph, defined as follows:

$\begin{equation} \begin{split} G_{S} &= \lt V_{S}, E_{S}\gt \\ V_{S} &= \left\lbrace V_{S}^{(i)}\right\rbrace \!, i \in [1, K], V_{S}^{(i)} = \mu _{S}^{(i)}\\ E_{S} &= \left\lbrace E_{S}^{i,j}\right\rbrace \!, E_{S}^{i,j} = \left|\left|\mu _{S}^{(i)} - \mu _{S}^{(j)}\right|\right|_2^2, i \in \lbrace 𝔭\rbrace \cup [1, K], j \in \lbrace 𝔭\rbrace \cup [1, K], i \ne j\,, \end{split} \end{equation}$

(8)

where $G_{S}$ denotes the source dandelion graph, $V_{S}$ and $E_{S}$ stand for vertices and edges in $G_{S}$ , respectively and $G_{T}$ is defined similarly. Each vertex $V_{S}^{(i)}$ is the centroid of the corresponding intrusion category. The graph is fully connected and each vertex is also connected with the origin, denoted as $𝔭$ .

In the OSDN model, we apply the Feather network [32] as the graph embedder. As a graph embedding algorithm, it enjoys several merits: first, the Feather network can work in an unsupervised manner, which works comfortably under the data-scarce IoT scenario; second, the Feather network enjoys a linear time complexity as proved in [32], the low complexity can enhance the efficiency of the intrusion detection model in real-world applications; finally, the Feather network is comprehensively verified [32] to have superior graph embedding performance.

Using the graph embedder, each dandelion graph will be mapped into a $d_{G}$ -dimensional graph embedding space, in which the more geometrically similar between dandelion graphs, the more similar the graph embeddings will be. Then, the dandelion embedding alignment loss $\mathcal {L}_{EA}$ is defined as follows:

$\begin{equation} \mathcal {L}_{EA} = ||\phi _{S} - \phi _{T}||_{2}^{2}, \phi _{S}, \phi _{T} \in \mathbb {R}^{d_{G}}\,, \end{equation}$

(9)

where $\phi _{S}$ denotes the graph embedding of the source domain dandelion. By minimising the dandelion embedding alignment loss, both dandelions will be further aligned and hence will promote better intrusion knowledge transfer, as verified by experimental evidences in Section 5.6.

4.4 Discriminating Sampled Dandelion Mechanism (DSDM)

To further boost the intra-category compactness and hence promote better known-intrusion separability and unknown-intrusion discriminability, a discriminating sampled dandelion mechanism is proposed. As illustrated in Figure 4, one instance per intrusion category is randomly sampled to form a new child dandelion, such as the orange and the green dandelion in Figure 4. The more compact each intrusion category is, the more similar the embeddings of child dandelions will be. Hence, the OSDN achieves this goal via a discriminating perspective. First, both source and target domain intrusion features will be fused in the common feature subspace to form a fused dandelion, then, N child dandelions will be sampled, where the ith pappus in each child dandelion is a randomly selected instance from the ith category from the fused dandelion. Next, a discriminator is confused using the discriminating sampled dandelion loss $\mathcal {L}_{CP}$ , defined as follows:

$\begin{equation} \mathcal {L}_{CP} = \frac{1}{2} (log(D(\phi _{S})) + log(D(\phi _{T}))) + \frac{1}{N} \sum _{j=1}^{N} \left(1 - log\left(D\left(\phi _{\mathcal {DD}_{*}}^{j}\right)\right)\right)\!, \end{equation}$

(10)

in which $\mathcal {DD}_{S}$ , $\mathcal {DD}_{T}$ , and $\mathcal {DD}_{*}^{j}$ denote the source, target and jth sampled dandelion, $\phi$ denotes the dandelion graph embedding and $D()$ denotes the discriminator. By assigning $\mathcal {DD}_{S}$ and $\mathcal {DD}_{T}$ with label 1 and assign sampled child dandelions with label 0, letting the network to minimise the $\mathcal {L}_{CP}$ will confuse the discriminator to be incapable to distinguish whether the given dandelion embedding is generated from a randomly sampled dandelion or not. Meanwhile, the discriminator will try to stay unconfused. Once the minimax game between the network and the discriminator reaches an equilibrium, the graph embeddings of source, target and sampled child dandelions will become indistinguishable, which in turn enhances the intra-category compactness, as illustrated in Figure 4.

Fig. 4.

4.5 Semantic Dandelion Correction Mechanism (SDCM)

The source NI domain is completely supervised, however, it lacks the knowledge of unknown intrusions in the target II domain. Therefore, directly using the source NI domain supervision to train the shared intrusion classifier C will significantly hinder its ability to detect unknown intrusions. To tackle this issue, the OSDN model generates $n_{R}$ instances residing in the gaps between source dandelion pappuses, and treat these generated instances as unknown intrusions to equip the intrusion classifier C with the ability to detect both known and unknown intrusions under the open-set DA setting. The overall supervision loss of known and unknown training $\mathcal {L}_{SUP}$ is defined as follows:

$\begin{equation} \begin{split} \mathcal {L}_{SUP} &= \mathcal {L}_{SUP_{S}} + \mathcal {L}_{SUP_{U}}\\ &= \frac{1}{n_{S}} \sum _{j=1}^{n_{S}} \mathcal {L}_{CE}(C(f(x_{j})), y_{j}) + \frac{1}{n_{R}} \sum _{j=1}^{n_{R}} \mathcal {L}_{CE}(C(f(x_{j})), y_{j})\\ y_{j} &= {\left\lbrace \begin{array}{ll} y_{S_j} & \text{if $x_{j} \in \mathcal {X}_{S}$}\\ K+1 & \text{if $x_{j} \in \mathcal {X}_{R}$} \end{array}\right.}\,, \end{split} \end{equation}$

(11)

where $\mathcal {L}_{CE}$ denotes the cross entropy loss and $\mathcal {X}_{R}$ represents generated unknown instances for unknown training.

Once the intrusion classifier C is well-trained, it can then yield probabilistic semantics for each intrusion data instance j, i.e., the inter-category probabilistic correlations, denoted as $p_{j}$ . Therefore, the semantic information can also form new semantic dandelions $\mathcal {DD}_{S*}$ in the semantic space, defined as follows:

$\begin{equation} \mathcal {DD}_{\mathcal {S}S}^{(i)} = \frac{1}{n_{S}^{(i)}} \sum _{j=1}^{n_{S}^{(i)}} p_{S_j}^{(i)}, \mathcal {DD}_{\mathcal {S}T}^{(i)} = \frac{1}{|y_{T}^{D}=i|} \sum _{j=1}^{|y_{T}^{D}=i|} p_{T_j}^{(i)}\,, \end{equation}$

(12)

where $\mathcal {DD}_{\mathcal {S}S}^i$ denotes the ith pappus of the source semantic dandelion $\mathcal {DD}_{\mathcal {S}S}$ , $n_{S}^{(i)}$ represents the number of source ith category instances, $y_{T}^{D}$ denotes the membership assigned to target instances by the source dandelion in Section 4.1. Then, the Cosine similarity matrix $CS_{SM}$ between both semantic dandelions are calculated as follows:

$\begin{equation} \begin{split} CS_{SM} &= \begin{bmatrix} \boldsymbol {CS_{SM}^{11}} & \boldsymbol {CS_{SM}^{12}} & \boldsymbol {\cdots } & \boldsymbol {CS_{SM}^{1K}}\\ CS_{SM}^{21} & \boldsymbol {CS_{SM}^{22}} & \boldsymbol {\cdots } & \boldsymbol {CS_{SM}^{2K}}\\ \vdots & \vdots & \boldsymbol {\ddots } & \boldsymbol {\vdots }\\ CS_{SM}^{K1} & CS_{SM}^{K2} & \cdots & \boldsymbol {CS_{SM}^{KK}} \end{bmatrix}\!,\\ CS_{SM}^{ij} &= COS(\mathcal {DD}_{\mathcal {S}S}^{(i)}, \mathcal {DD}_{\mathcal {S}T}^{(j)})\,. \end{split} \end{equation}$

(13)

Ideally, the ith intrusion category from both source NI and target II domain should share similar inter-category probabilistic semantics, while different intrusion categories from both domains should have their inter-category probabilistic semantics diverge from each other. To achieve this, the OSDN model minimises the semantic dandelion correction loss $\mathcal {L}_{SC}$ as follows:

$\begin{equation} \mathcal {L}_{SC} = \frac{2}{K(K+1)} \sum _{i=1}^{K} \sum _{j=i}^{K} CS_{SM}^{ij}\,. \end{equation}$

(14)

By minimising the $CS_{SM}^{ij}, i \ne j$ , inter-category probabilistic semantics will be diverged from each other, leading to better inter-category discriminability. It is worth noting that the $\mathcal {L}_{SC}$ also minimises the $CS_{SM}^{ii}$ , i.e., maximising the divergence between cross-domain same-category probabilistic semantics. The rationale is as follows: if minimising the $CS_{SM}^{ii}$ can easily compromise the semantic of the ith intrusion category, then it indicates the ith intrusion category can be easily confused with other categories from the probabilistic semantic perspective, as indicated in Figure 5. Therefore, deliberately minimising $CS_{SM}^{ii}$ can exploit and emphasise easily-confused intrusion category pairs, i.e., pointing out a possible point to correct for the dandelion angular separation mechanism. Consequently, by utilising this correction mechanism, it can further boost the dandelion separation efficacy, as supported by experimental evidences in Section 5.6 and in turn enhances the intrusion detection accuracy.

Fig. 5.

4.6 Overall Optimisation Objective

Overall, the optimisation objective of the OSDN model is defined as follows:

$\begin{equation} \begin{split} & \min _{E_{S}, E_{T}, C} (\alpha _{S} \mathcal {L}_{SUP_{S}} + \alpha _{U} \mathcal {L}_{SUP_{U}} + \beta _{S} \mathcal {L}_{SS} + \beta _{T} \mathcal {L}_{ST} + \delta \mathcal {L}_{EA} + \theta \mathcal {L}_{SC} + \gamma \mathcal {L}_{CP})\\ & \max _{D} (\gamma \mathcal {L}_{CP})\,, \end{split} \end{equation}$

(15)

where $\alpha _{S}$ , $\alpha _{U}$ , $\beta _{S}$ , $\beta _{T}$ , $\delta$ , $\theta$ and $\gamma$ are hyperparameters controlling the influence of the corresponding loss components. We utilise the gradient reversal layer [11] for the discriminator, which acts as an identity function during forward propagation and reverses the gradient during backpropagation to achieve an end-to-end optimisation process for the OSDN model. Once the above minimax game reaches an equilibrium, the intrusion knowledge is transferred in a fine-grained manner, and the intrusion detection efficacy can therefore benefit.

5 Experiment

To verify the effectiveness of the OSDN model, we perform experiments on five comprehensive and representative intrusion detection datasets with three state-of-the-art baseline counterparts. We also verify the performance stability of the OSDN model under varied openness settings and manipulated hyperparameter settings and demonstrate the contribution and necessity of each OSDN constituting component. Finally, we verify the computational efficiency of the OSDN model.

5.1 Experimental Datasets

We use five comprehensive intrusion detection datasets. Network intrusion detection datasets include NSL-KDD, UNSW-NB15 and CICIDS2017. IoT intrusion detection datasets include UNSW-BOTIOT and UNSW-TONIOT.

Network Intrusion Dataset: NSL-KDD This dataset [36] contains benign network traffic and four types of real-world intrusions, such as probing attacks, Denial of Service (DoS) attacks, and so on. It enjoys excellent data quality compared with its previous version [13]. We follow [2] to use a reasonable amount of $20\%$ of the dataset during experiments. Following [12], we use the top-31 most informative features out of 41 features as the feature representation and denote the dataset as K.

Network Intrusion Dataset: UNSW-NB15 The dataset [27] was released in 2015 and was constructed on a comprehensive security testing platform commonly used by the industry. It includes normal network traffic with nine categories of modern intrusion patterns, such as DoS attack, reconnaissance attack, and so on, and possesses high data quality. We perform data preprocessing to remove four features out of the original 49 features that have a value of 0 for nearly all records. We denote the dataset as N.

Network Intrusion Dataset: CICIDS2017 This dataset [34] was released in 2017 and contained up-to-date intrusion trends that include seven intrusion categories, represented in 77 dimensions. We use $20\%$ of the dataset provided by its creator, and perform preprocessing steps such as categorical-numerical data conversion. We follow [35] to use the top-40 most informative features, and denote the dataset as C.

IoT Intrusion Dataset: UNSW-BOTIOT This dataset [15] was released in 2017. It is constructed on a realistic testbed involving commonly-used IoT devices such as the weather station, smart fridge, and so on, and utilises the common lightweight IoT communication protocol MQTT. The dataset contains four up-to-date intrusion categories, represented in 46 dimensions. We follow the advice from the dataset creator to use the top-10 most informative features. The dataset is denoted as B.

IoT Intrusion Dataset: UNSW-TONIOT The dataset [3] was released in 2021 and involved up-to-date IoT protocols and standards. The testbed used is sophisticated, with seven types of real IoT devices such as the GPS tracker, the weather meter, and so on, and capturing heterogeneous features. The dataset contains nine types of common IoT intrusions [1], such as the DoS attack, scanning attack, and so on. We follow [31] to leverage $10\%$ of the dataset, and select two IoT devices, i.e., the GPS tracker and the weather meter, denoted as G and W, respectively.

Dataset Comprehensiveness and Intrusion Methods The datasets used during experiments are comprehensive and representative. First, these datasets are widely recognised by the intrusion detection research community with a broad range of usage. Second, these datasets are recently released and contain modern intrusion trends and patterns, some of them are released in 2021. Third, these datasets all involve widely recognised testbeds. The IoT datasets also involve real-world IoT devices deployed in a real-world environment. Finally, the network and IoT datasets have at most eight shared intrusion categories, with a coverage of $100\%$ , $55\%$ , $100\%$ , $100\%$ and $98\%$ on NSL-KDD, UNSW-NB15, CICIDS2017, UNSW-BOTIOT and UNSW-TONIOT, respectively. The transferrable intrusion knowledge reflects modern intrusion trends. Hence, the datasets used are sufficient to verify the effectiveness of the OSDN model.

5.2 Implementation Details

We implement the OSDN model using the deep learning framework PyTorch. The feature projectors are implemented as a single-layer neural network and use LeakyRelu as the activation function. Likewise, both the intrusion classifier C and the discriminator D are also implemented as single-layer neural networks.

We apply cross validation with grid search to tune hyperparameters. Since all experiments share a single set of hyperparameter settings, the tuning effort is not too laborious. The default hyperparameter settings are as follows: $\alpha _{S}=0.8$ , $\alpha _{U}=0.1$ , $\beta _{S}=\beta _{T}=0.75$ , $\delta =0.001$ , $\theta =1.0$ , $\gamma =1.0$ , number of sampled dandelions $N=10$ and number of sampled unknown instances $n_{R}=100$ . Additionally, the stability and robustness of the OSDN model with manipulated hyperparameters in their corresponding reasonable ranges are also verified in Section 5.8.

During evaluation, we follow [42] to use accuracy, category-weighted precision (P), recall (R) and F1-score (F) as evaluation metrics. Their definitions are as follows:

$\begin{equation} Accuracy = \frac{\sum _{k=1}^{K} (TP^{(k)} + TN^{(k)})}{n_{T}}\,, \end{equation}$

(16)

$\begin{equation} Precision = \sum _{k=1}^{K} \frac{|\mathcal {X}_{T}^{(k)}|}{n_{T}} \cdot Precision^{(k)} = \sum _{k=1}^{K} \frac{|\mathcal {X}_{T}^{(k)}|}{n_{T}} \cdot \frac{TP^{(k)}}{TP^{(k)} + FP^{(k)}}\,, \end{equation}$

(17)

$\begin{equation} Recall = \sum _{k=1}^{K} \frac{|\mathcal {X}_{T}^{(k)}|}{n_{T}} \cdot Recall^{(k)} = \sum _{k=1}^{K} \frac{|\mathcal {X}_{T}^{(k)}|}{n_{T}} \cdot \frac{TP^{(k)}}{TP^{(k)} + FN^{(k)}}\,, \end{equation}$

(18)

$\begin{equation} F1 = \sum _{k=1}^{K} \frac{|\mathcal {X}_{T}^{(k)}|}{n_{T}} \cdot \frac{2 \cdot Precision^{(k)} \cdot Recall^{(k)}}{Precision^{(k)} + Recall^{(k)}}\,, \end{equation}$

(19)

where the true positive $TP^{(k)}$ denotes the number of category k intrusions being correctly detected, similar for $TN^{(k)}$ , $FP^{(k)}$ and $FN^{(k)}$ . During experiments, we evaluate the performance in two modes: the ACC mode which evaluates the prediction with the corresponding ground truth intrusion label, and the IND mode, which treats all known and unknown intrusions as a single intrusion class.

As an open-set DA method, following Kundu et al. [16], we define openness $\mathcal {O}$ as follows:

$\begin{equation} \mathcal {O} = 1 - \frac{K}{K^{\prime }}\,. \end{equation}$

(20)

The openness $\mathcal {O}$ lies in the range between 0 and 1, the larger the openness is, the more unknown classes will be in the target II domain.

5.3 State-of-the-art Baselines

We use three state-of-the-art baseline methods to verify the superiority of the OSDN model, which include AMS [18], SR-OSDA [14], and PGL [21]. The AMS method attempts the OSDA problem by formulating a framework with four phases. In phase 1, a discriminative representation of seen classes is learned to benefit the seen and unseen intrusion separation performed in the second phase. After performing the seen and unseen separation and the target domain is pseudo-labelled, the phase 3 further optimises the feature representation. Both phase 2 and 3 also form an iterative loop, which gradually improves the quality of intrusion recognition quality. Finally, phase 4 learns a re-projection, which promotes the generalisability of unseen intrusion recognition without sacrificing the ability to correctly recognise the seen classes. The SR-OSDA method deals with the OSDA problem by firstly separating seen and unseen intrusion instances progressively via a threshold-based pseudo-label assignment mechanism and the K-means clustering. Then, the intrusion knowledge transfer is performed by mapping both domains into a domain-invariant and discriminative feature space. Finally, the semantic information is utilised to better exploit the unknown target intrusions, so that they are not deliberately confounded together, which causes negative transfer. The PGL method integrates a graph neural network with the episodic training strategy and meanwhile applies adversarial learning to bridge the gap between two intrusion domains. In the episodic training strategy, the model progressively enlarges the labelled set via pseudo-labelling and utilise the pseudo-labelled target samples for episodic training. On top of it, the graph neural network is benefitted to perform more accurate intrusion detection. We summarise their differences with the OSDN model as follows:

–

From the dandelion-based feature space perspective, the AMS method attempts this direction. However, it lacks other mechanisms such as the graph embedding-based dandelion alignment and the dandelion compactness enhancement, and also fails to form the semantic dandelion and explore its correction effect.

–

From the graph embedding perspective, the PGL method utilises the graph embedding during knowledge transfer. However, the PGL completely ignores the benefit brought by utilising the graph embedding in a dandelion-based feature space.

–

From the semantic alignment perspective, all these methods lack effort to build a semantic hyperspherical space to guide the inter-category separation in the dandelion-based feature space, leaving a void to be filled.

Therefore, these state-of-the-art methods are comparable and representative to verify the effectiveness of the OSDN model.

5.4 Intrusion Detection Performance

The intrusion detection accuracy of nine randomly selected tasks with varied openness has been presented in Table 1. As we can observe, the OSDN model outperforms other baseline counterparts by a large margin, achieving a $20.9\%$ and $12.9\%$ performance improvement under two modes, respectively. We also measure the intrusion detection performance using three other metrics and present the results in Figure 6. Under both modes, the OSDN model is positioned at the top-right corner in all three tasks, indicating that the OSDN model achieves the best precision and recall performance compared with other methods, and hence it is natural to observe the best F1-score is also yielded by the OSDN model. The best precision performance indicates the highest amount of intrusions flagged by the OSDN model are correct, while the best recall performance demonstrates the OSDN model can successfully flag as many intrusions as possible. As a harmonic mean of precision and recall, the best F1-score performance further verifies the OSDN model can elegantly balance between flagging as many intrusions as possible and simultaneously avoid triggering too many false alarms. The same result is also verified by the OSDN’s nearest proximity from the red diagonal line among all methods as shown in Figure 6. Hence, it demonstrates the real-world applicability of the OSDN model as an intrusion detector.

Fig. 6.

5.5 Robustness and Stability under Varied Openness

We first present the performance of the OSDN model and its baseline counterparts under varied openness in Figure 7. We can observe that the OSDN model stably outperforms its baseline counterparts under varied openness evaluated using both accuracy and F1-score. Besides, compared with other baseline methods, the OSDN model shows a flatter trend with less severe fluctuation. Hence, it demonstrates the robustness of the OSDN method under varied openness levels.

Fig. 7.

We further evaluate the OSDN model against two more tasks under both large and small openness ranges. The results are shown in Figure 8. The task in Figure 8(a) and (b) has a relatively higher openness range and the task in Figure 8(c) and (d) presents a relatively lower openness range. From both Figure 7 and Figure 8, the OSDN model maintains a relatively stable trend without heavy fluctuation even when the openness varied significantly. Therefore, the OSDN’s capability to detect unknown intrusions in the target II domain under varied openness levels is verified and can enhance its real-world usefulness.

Fig. 8.

5.6 Ablation Study

To verify the positive contribution and the necessity of each constituting component of the OSDN model, six groups of ablation studies are performed and the corresponding results are demonstrated in Table. 2. In the ablation group A, the unknown training mechanism is ablated, which causes the accuracy to drop around $5.3\%$ and $3.6\%$ under two modes, respectively. In the ablation group B, either the source or target dandelion angular separation mechanism ( $B_1$ and $B_2$ ), or both of them ( $B_3$ ) are dropped. As we can observe, lacking any of the DASM will result in a significant performance reduction, hence it verifies the necessity of the DASM for both domains. Besides, using the DASM for only one domain dandelion will further deteriorate the intrusion detection efficacy. The reason is that when both DASMs are turned off, other mechanisms such as the semantic dandelion correction and discriminating sampled dandelion mechanism will still partially achieve the dandelion separation effect. However, only using a single DASM will end up with a severe dandelion misalignment. Hence, worse performance is observed for the ablation groups $B_1$ and $B_2$ .

The dandelion embedding alignment mechanism is removed in the ablation group C. Without it, the performance drops by $10.5\%$ and $6.6\%$ under two modes, respectively. A heavier performance drop is observed in the ablation group D, in which the semantic dandelion correction mechanism is eliminated. Without this mechanism, there will be no semantic-assisted correction for under-separated intrusion categories, resulting in compromised intrusion detection efficacy. In the ablation group $E_1$ , the discriminating sampled dandelion mechanism is turned off, and in the ablation group $E_2$ , the traditional instance domain discriminator substitutes the proposed DSDM. As we can see, completely lacking the adversarial learning significantly hinders the intrusion detection performance, which yields a $9.4\%$ and $6.8\%$ performance reduction under two modes, respectively. Although the substituting domain adversarial learner slightly increases the performance compared with the ablation group $E_1$ , it still presents a performance that is much lower than the full OSDN model.

Finally, in the ablation group F, the domain adaptation mechanism is completely turned off to verify that the HDA mechanism plays an indispensable role. As we can see, removing the intrusion knowledge transfer performed by the HDA mechanism significantly degrades the intrusion detection performance, which is the worst among all ablated groups. Therefore, it justifies the necessity of having the HDA mechanism, and shows that the HDA mechanism makes a non-negligible contribution towards more accurate IoT intrusion detection.

Overall, the full OSDN model outperforms all its ablated counterparts by a significant margin, which indicates that all constituting components of the OSDN model contribute positively towards finer intrusion knowledge transfer and hence are indispensable for achieving excellent intrusion detection performance.

We further verify the statistical significance of each component’s contribution via the significance T-test with the significance threshold of 0.05. The results are presented in Figure 9. The grey area in the middle stands for the significance threshold $-log(0.05)$ . Among each dimension of the radar chart, the higher the value is, the more statistically significant the contribution is for that corresponding component. As we can observe, under all three tasks and all two modes, the coloured areas present a wider coverage than the grey shaded area. The results verify the statistical soundness of all components’ contributions.

Fig. 9.

5.7 Separability and Compactness Analysis

The ideal common feature subspace should have the inter-category divergence as large as possible to achieve good separability and meanwhile have the intra-category variation as small as possible to achieve compactness. To verify the constituting components of the OSDN model contribute positively towards these goals, we follow Equation (7) to calculate the separability from an angular perspective on the source-target combined dandelion $\mathcal {DD}_{S\cup T}$ , defined as follows:

$\begin{equation} \begin{split} CS_{S\cup T} &= \begin{bmatrix} CS_{S\cup T}^{11} & \boldsymbol {CS_{S\cup T}^{12}} & \boldsymbol {\cdots } & \boldsymbol {CS_{S\cup T}^{1K}}\\ CS_{S\cup T}^{21} & CS_{S\cup T}^{22} & \boldsymbol {\cdots } & \boldsymbol {CS_{S\cup T}^{2K}}\\ \vdots & \vdots & \ddots & \boldsymbol {\vdots }\\ CS_{S\cup T}^{K1} & CS_{S\cup T}^{K2} & \cdots & CS_{S\cup T}^{KK} \end{bmatrix}\,,\\ CS_{S\cup T}^{ij} &= \text{COS}\left(\mu _{S\cup T}^{(i)}, \mu _{S\cup T}^{(j)}\right)\!,\\ \mu _{S\cup T}^{(i)} &= \frac{1}{n_{S}^{(i)} + n_{T}^{(i)}} \left(\sum _{j=1}^{n_{S}^{(i)}} x_{S_j}^{(i)} + \sum _{j=1}^{n_{T}^{(i)}} x_{T_j}^{(i)}\right)\!, \end{split} \end{equation}$

(21)

where $CS_{S\cup T}$ denotes the inter-pappus Cosine similarity matrix of the source-target combined dandelion and $CS_{S\cup T}^{ij}$ represents the Cosine similarity between the ith and jth pappus of the source-target combined dandelion. Then, the separability measurement SP is defined as follows:

$\begin{equation} SP = \frac{2}{K(K-1)} \sum _{i=1}^{K-1} \sum _{j=i+1}^{K} CS_{S\cup T}^{ij}\,, \end{equation}$

(22)

the smaller the separability measurement SP is, the better the separability is for the source-target combined dandelion. We present the separability measurement results between the hyperspherical-based baseline AMS, the separability-related ablated groups and the full OSDN model in Figure 10. As we observe, both the full OSDN model and its ablated groups enjoy better separability compared with the AMS baseline. Moreover, the full OSDN model presents the best inter-category separability by achieving the lowest SP measurement. Hence it verifies the positive contribution of OSDN’s constituting components towards enhancing inter-category separability, and the superior performance of the OSDN model over its hyperspherical-based counterpart.

Fig. 10.

We then follow Equation (4) to measure the compactness by the average category-wise maximum deviation $d_{max}$ , defined as follows:

$\begin{equation} d_{max} = \frac{1}{K} \sum _{i=1}^{K} d_{max}^{(i)}\,, \end{equation}$

(23)

the smaller the $d_{max}$ is, the better the compactness performance is. Again, the measurement results presented in Figure 11 indicate the OSDN model outperforms both the baseline method AMS and its compactness-related ablated groups by a large margin. Hence, the excellent intra-category compactness achieved by the OSDN model is verified.

Fig. 11.

Overall, by achieving the best inter-category separability and intra-category compactness, the OSDN model can lead to more accurate intrusion detection performance.

5.8 Hyperparameter Sensitivity Analysis

We verify the stability and robustness of the OSDN model under varied hyperparameter settings within their corresponding reasonable range. The results are presented in Figure 12 and Figure 13. The dashed lines and solid lines indicate two modes, respectively. The horizontal lines indicates the best-performed baseline counterpart.

Fig. 12.

Fig. 13.

We observe the OSDN model performs relatively stable without showing significant fluctuation in nearly all hyperparameter settings. As well, the OSDN model constantly outperforms the best-performed baseline method under nearly all hyperparameter settings. The OSDN model applies a single set of hyperparameter setting when facing different data domains and different tasks and still achieves such a stable level of performance. Hence, it verifies the stability and robustness of the OSDN model under manipulated hyperparameter settings.

5.9 Intrusion Detection Efficiency

We finally verify the training and intrusion detection inference efficiency of the OSDN model. The training time taken has been summarised in Table 3, and the inference time per network traffic instance has been summarised in Table 4. We only compare the OSDN model with the top-two best-performing baseline methods. As shown in Table 3, under varied settings of the OSDN model, the OSDN model performs more efficiently compared with its counterparts in nearly all settings. Since the model training can be performed on computationally-sufficient devices such as network gateway servers, therefore, the training efficiency of the OSDN model is satisfactory. Besides, as indicated in Table 4, the OSDN model significantly outperforms its baseline counterparts in terms of the inference time taken to examine a network traffic instance. Therefore, the results verify the efficiency of the OSDN model, and demonstrate its real-world applicability as an efficient and accurate intrusion detector.

6 Conclusion

In this article, we propose the OSDN based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model tackles the IoT data scarcity by transferring intrusion knowledge from source NI domain to promote more accurate intrusion detection for the target IoT domain. The relaxation of the closed-set assumption lets the OSDN model detect both known and newly-emerged unknown intrusions in the IoT intrusion domain, hence it is more applicable in the real-world. The OSDN model achieves this by first forming the source domain into a dandelion-like feature space that emphasises inter-category separability and intra-category compactness. Then, the dandelion-based target membership mechanism constructs the target dandelion for intrusion knowledge transfer. The dandelion angular separation mechanism is used to promote inter-category separability, while the dandelion embedding alignment mechanism facilitates knowledge transfer from a graph embedding perspective. Also, the discriminating sampled dandelion mechanism is used to promote intra-category compactness. Trained using both known and generated unknown intrusion information, the intrusion classifier yields probabilistic semantics that can emphasise easily-confused categories and hence provide correction for the inter-category separation mechanism. Holistically, these mechanisms form the OSDN model and benefit in a more effective intrusion detection for IoT scenarios. Comprehensive experiments on five intrusion datasets are conducted. The OSDN model outperforms three state-of-the-art baseline methods by $16.9\%$ . The effectiveness of each OSDN constituting component, the stability and the efficiency of the OSDN model are also verified. For future research, it is worthwhile to extend the OSDN model to the multi-source setting, in which the intrusion knowledge from multiple source domains can jointly benefit the open-set intrusion knowledge transfer. Besides, we can also consider using a category-wise attention mechanism during the intrusion knowledge transfer to account for diverse knowledge transfer sufficiency for each intrusion category. We will leave these as our future research directions.

Acknowledgments

We would like to express our sincere gratitude to Prof. André Brinkmann for his invaluable assistance in enhancing the quality of this article.

A Appendix

A.1 Acronym Table

Table 5.

Acronym	Interpretation
OSDN	Open-Set Dandelion Network
DA	Domain Adaptation
NI	Network Intrusion
II	IoT Intrusion
OSDA	Open-Set Domain Adaptation
DASM	Dandelion Angular Separation Mechanism
DEAM	Dandelion Embedding Alignment Mechanism
DSDM	Discriminating Sampled Dandelion Mechanism
SDCM	Semantic Dandelion Correction Mechanism
ML	Machine Learning
DL	Deep Learning
CS	Cosine Similarity
EA	Embedding Alignment
CP	Compactness
SUP	Supervision
SM	Semantic
SC	Semantic Correction
CE	Cross Entropy

Table 5. The Acronym Table and the Corresponding Interpretation (Based on the Order of Appearance in the Article)

A.2 Notation Table

Table 6.

Notation	Interpretation
$\mathcal {D}_{S}$	Source NI domain
$\mathcal {X}_{S}$	Source NI domain traffic features
$\mathcal {Y}_{S}$	Source NI domain traffic intrusion labels
$x_{S_i}$	The i^th traffic instance in $\mathcal {X}_{S}$
$y_{S_i}$	The intrusion label of $x_{S_i}$
$n_S$	Number of instances in $\mathcal {X}_{S}$
$d_S$	Instance dimension of $\mathcal {X}_{S}$
K	Number of intrusion categories in $\mathcal {D}_{S}$
$K^{\prime }$	Number of intrusion categories in $\mathcal {D}_{T}$
$f(x_i)$	The feature projector
$E_{S}$	The source feature projector
$E_{T}$	The target feature projector
$d_{C}$	The dimension of the common feature subspace
$d_{max}^{(i)}$	The maximum intra-category deviation of source intrusion category i
$COS()$	Cosine Similarity
$n_S^{(i)}$	Number of instances in the i^th source intrusion category
$\mu _{S}^{(i)}$	Mean of the source intrusion category i
$x_{S_j}^{(i)}$	The j^th instance of source i^th intrusion category
$y_{T_j}^{D}$	The dandelion-based membership for the j^th target instance $x_{T_j}$
$CS_{S}$	The source category pair-wise Cosine similarity matrix
$CS_{S}^{ij}$	The Cosine similarity between the i^th and j^th source intrusion category
$\mathcal {L}_{SS}$	Source dandelion separation loss
$\mathcal {L}_{ST}$	Target dandelion separation loss
$G_{S}$	The source dandelion graph
$V_{S}$	Vertices in $G_{S}$
$E_{G}$	Edges in $G_{S}$
$V_{S}^{(i)}$	The i^th vertex in the $G_{S}$
$E_{S}^{ij}$	The edge connecting $V_{S}^{(i)}$ and $V_{S}^{(j)}$
$𝔭$	The origin
$\mathcal {L}_{EA}$	Dandelion embedding alignment loss
$\phi _{S}$	The graph embedding of the source domain dandelion
$\mathcal {L}_{CP}$	Discriminating sampled dandelion loss
$D()$	The discriminator
$G_{\mathcal {DD}_{S}}$	The graph embedding of the source dandelion
$G_{\mathcal {DD}_{T}}$	The graph embedding of the target dandelion
$G_{\mathcal {DD}_{*}^{j}}$	The graph embedding of the j^th sampled dandelion
N	The amount of child dandelion being sampled
$\mathcal {L}_{SUP}$	The overall supervision loss
$\mathcal {L}_{SUP_S}$	The source supervision loss
$\mathcal {L}_{SUP_U}$	The unknown supervision loss
$\mathcal {L}_{CE}$	The cross entropy loss
$n_R$	The amount of unknown instances being generated
$\mathcal {X}_{R}$	The generated unknown instances for unknown training
C	The intrusion classifier

Table 6. The Notation Table and the Corresponding Interpretation (Based on the Order of Appearance in the Article)

Table 7.

Notation	Interpretation
$p_{S_j}^{(i)}$	The probabilistic semantic of the j^th source instance in category i
$\mathcal {DD}_{\mathcal {S}S}$	The source semantic dandelion
$\mathcal {DD}_{\mathcal {S}S}^{(i)}$	The i^th pappus of the source semantic dandelion
$CS_{SM}$	The Cosine similarity matrix between semantic dandelions
$CS_{SM}^{ij}$	The Cosine similarity between $\mathcal {DD}_{\mathcal {S}S}^{(i)}$ and $\mathcal {DD}_{\mathcal {S}T}^{(j)}$
$\mathcal {L}_{SC}$	The semantic dandelion correction loss
$\alpha _{S}, \alpha _{U}$	Hyperparameter controlling $\mathcal {L}_{SUP_S}$ and $\mathcal {L}_{SUP_U}$ , respectively
$\beta _{S}$ , $\beta _{T}$	Hyperparameter controlling $\mathcal {L}_{SS}$ and $\mathcal {L}_{ST}$ , respectively
$\delta$	Hyperparameter controlling $\mathcal {L}_{EA}$
$\theta$	Hyperparameter controlling $\mathcal {L}_{SC}$
$\gamma$	Hyperparameter controlling $\mathcal {L}_{CP}$
$TP^{(k)}$	True positive of category k
$\|\mathcal {X}_T^{(k)}\|$	Number of target instances in intrusion category k
$\mathcal {O}$	The openness level
$\mathcal {DD}_{S\cup T}$	The source-target combined dandelion
$CS_{S\cup T}$	The inter-pappus Cosine similarity matrix of $\mathcal {DD}_{S\cup T}$
$\mu _{S\cup T}^{(i)}$	The i^th pappus of $CS_{S\cup T}$
$CS_{S\cup T}^{ij}$	The Cosine similarity between $\mu _{S\cup T}^{(i)}$ and $\mu _{S\cup T}^{(j)}$
$d_{max}$	The average category-wise maximum deviation

Table 7. The Notation Table and the Corresponding Interpretation (Continued)

References

[1]

Ghada Abdelmoumin, Danda B. Rawat, and Abdul Rahman. 2021. On the performance of machine learning models for anomaly-based intelligent intrusion detection systems for the internet of things. IEEE Internet of Things Journal 9, 6 (2021), 4280–4290.

Group	Experiment Setting		N \(\rightarrow\) W, \(\mathcal {O}\) = 0.40		C \(\rightarrow\) G, \(\mathcal {O}\) = 0.71		K \(\rightarrow\) W, \(\mathcal {O}\) = 0.71		Average
Group	Experiment Setting		ACC	IND	ACC	IND	ACC	IND	ACC	IND
A	\(\alpha _{U}=0\)		54.87	60.45	72.36	82.97	69.00	73.06	65.41	72.16
B	\(\beta _{\mathcal {S}}=0\)	\(\beta _{\mathcal {T}}=0\)
B1	✕	\(\checkmark\)	52.62	59.82	68.51	80.45	60.24	67.08	60.46	69.12
B2	\(\checkmark\)	✕	55.29	60.49	66.96	80.25	63.31	68.37	61.85	69.70
B3	✕	✕	54.47	61.44	72.20	83.72	67.10	72.23	64.59	72.46
C	\(\delta =0\)		55.89	62.22	67.34	80.05	57.27	65.24	60.17	69.17
D	\(\theta =0\)		45.15	58.35	58.10	77.02	48.22	58.86	50.49	64.74
E	Discriminating
E	Strategy
E1	\(\gamma =0\)		56.55	61.61	67.21	78.43	59.89	66.71	61.22	68.92
E2	Domain Adv		54.13	60.03	72.55	83.69	60.03	66.70	62.24	70.14
F	No DA		44.22	57.26	42.87	60.60	43.16	57.13	43.42	58.33
	\(\alpha _{U}=0.1,\delta =0.001\)
Full	\(\beta _{\mathcal {S}}=\beta _{\mathcal {T}}=0.75\)		61.79	64.51	75.13	84.34	75.05	78.31	70.66	75.72
	\(\gamma =1.0,\theta =1.0\)

Notation	Interpretation
\(\mathcal {D}_{S}\)	Source NI domain
\(\mathcal {X}_{S}\)	Source NI domain traffic features
\(\mathcal {Y}_{S}\)	Source NI domain traffic intrusion labels
\(x_{S_i}\)	The i^th traffic instance in \(\mathcal {X}_{S}\)
\(y_{S_i}\)	The intrusion label of \(x_{S_i}\)
\(n_S\)	Number of instances in \(\mathcal {X}_{S}\)
\(d_S\)	Instance dimension of \(\mathcal {X}_{S}\)
K	Number of intrusion categories in \(\mathcal {D}_{S}\)
\(K^{\prime }\)	Number of intrusion categories in \(\mathcal {D}_{T}\)
\(f(x_i)\)	The feature projector
\(E_{S}\)	The source feature projector
\(E_{T}\)	The target feature projector
\(d_{C}\)	The dimension of the common feature subspace
\(d_{max}^{(i)}\)	The maximum intra-category deviation of source intrusion category i
\(COS()\)	Cosine Similarity
\(n_S^{(i)}\)	Number of instances in the i^th source intrusion category
\(\mu _{S}^{(i)}\)	Mean of the source intrusion category i
\(x_{S_j}^{(i)}\)	The j^th instance of source i^th intrusion category
\(y_{T_j}^{D}\)	The dandelion-based membership for the j^th target instance \(x_{T_j}\)
\(CS_{S}\)	The source category pair-wise Cosine similarity matrix
\(CS_{S}^{ij}\)	The Cosine similarity between the i^th and j^th source intrusion category
\(\mathcal {L}_{SS}\)	Source dandelion separation loss
\(\mathcal {L}_{ST}\)	Target dandelion separation loss
\(G_{S}\)	The source dandelion graph
\(V_{S}\)	Vertices in \(G_{S}\)
\(E_{G}\)	Edges in \(G_{S}\)
\(V_{S}^{(i)}\)	The i^th vertex in the \(G_{S}\)
\(E_{S}^{ij}\)	The edge connecting \(V_{S}^{(i)}\) and \(V_{S}^{(j)}\)
\(𝔭\)	The origin
\(\mathcal {L}_{EA}\)	Dandelion embedding alignment loss
\(\phi _{S}\)	The graph embedding of the source domain dandelion
\(\mathcal {L}_{CP}\)	Discriminating sampled dandelion loss
\(D()\)	The discriminator
\(G_{\mathcal {DD}_{S}}\)	The graph embedding of the source dandelion
\(G_{\mathcal {DD}_{T}}\)	The graph embedding of the target dandelion
\(G_{\mathcal {DD}_{*}^{j}}\)	The graph embedding of the j^th sampled dandelion
N	The amount of child dandelion being sampled
\(\mathcal {L}_{SUP}\)	The overall supervision loss
\(\mathcal {L}_{SUP_S}\)	The source supervision loss
\(\mathcal {L}_{SUP_U}\)	The unknown supervision loss
\(\mathcal {L}_{CE}\)	The cross entropy loss
\(n_R\)	The amount of unknown instances being generated
\(\mathcal {X}_{R}\)	The generated unknown instances for unknown training
C	The intrusion classifier

Notation	Interpretation
\(p_{S_j}^{(i)}\)	The probabilistic semantic of the j^th source instance in category i
\(\mathcal {DD}_{\mathcal {S}S}\)	The source semantic dandelion
\(\mathcal {DD}_{\mathcal {S}S}^{(i)}\)	The i^th pappus of the source semantic dandelion
\(CS_{SM}\)	The Cosine similarity matrix between semantic dandelions
\(CS_{SM}^{ij}\)	The Cosine similarity between \(\mathcal {DD}_{\mathcal {S}S}^{(i)}\) and \(\mathcal {DD}_{\mathcal {S}T}^{(j)}\)
\(\mathcal {L}_{SC}\)	The semantic dandelion correction loss
\(\alpha _{S}, \alpha _{U}\)	Hyperparameter controlling \(\mathcal {L}_{SUP_S}\) and \(\mathcal {L}_{SUP_U}\) , respectively
\(\beta _{S}\) , \(\beta _{T}\)	Hyperparameter controlling \(\mathcal {L}_{SS}\) and \(\mathcal {L}_{ST}\) , respectively
\(\delta\)	Hyperparameter controlling \(\mathcal {L}_{EA}\)
\(\theta\)	Hyperparameter controlling \(\mathcal {L}_{SC}\)
\(\gamma\)	Hyperparameter controlling \(\mathcal {L}_{CP}\)
\(TP^{(k)}\)	True positive of category k
\(\|\mathcal {X}_T^{(k)}\|\)	Number of target instances in intrusion category k
\(\mathcal {O}\)	The openness level
\(\mathcal {DD}_{S\cup T}\)	The source-target combined dandelion
\(CS_{S\cup T}\)	The inter-pappus Cosine similarity matrix of \(\mathcal {DD}_{S\cup T}\)
\(\mu _{S\cup T}^{(i)}\)	The i^th pappus of \(CS_{S\cup T}\)
\(CS_{S\cup T}^{ij}\)	The Cosine similarity between \(\mu _{S\cup T}^{(i)}\) and \(\mu _{S\cup T}^{(j)}\)
\(d_{max}\)	The average category-wise maximum deviation

Abstract

1 Introduction

2 Related Work

2.1 Traditional Intrusion Detection

2.2 Domain Adaptation for Intrusion Detection

2.3 Open-Set Domain Adaptation for Intrusion Detection

2.4 Research Opportunity

3 Model Preliminary and Architecture

3.1 Model Preliminary

3.2 The OSDN Architecture

4 The OSDN Algorithm

4.1 Dandelion-based Target Membership Mechanism (DTMM)

4.2 Dandelion Angular Separation Mechanism (DASM)

4.3 Dandelion Embedding Alignment Mechanism (DEAM)

4.4 Discriminating Sampled Dandelion Mechanism (DSDM)

4.5 Semantic Dandelion Correction Mechanism (SDCM)

4.6 Overall Optimisation Objective

5 Experiment

5.1 Experimental Datasets

5.2 Implementation Details

5.3 State-of-the-art Baselines

5.4 Intrusion Detection Performance

5.5 Robustness and Stability under Varied Openness

5.6 Ablation Study

5.7 Separability and Compactness Analysis

5.8 Hyperparameter Sensitivity Analysis

5.9 Intrusion Detection Efficiency

6 Conclusion

Acknowledgments

A Appendix

A.1 Acronym Table

A.2 Notation Table

References

Index Terms

Recommendations

Network Intrusion Detection: Automated and Manual Methods Prone to Attack and Evasion

Syntax vs. semantics: competing approaches to dynamic network intrusion detection

Rule generalisation in intrusion detection systems using SNORT

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations