7.1 Detection of Cache Poisoning Attacks in Named Data Networks
Information Centric Networking (ICN) is a revolutionary paradigm in the context of communications: While most of the Internet follows a host-to-host perspective, ICN adopts a host-to-content vision [
6]. The ICN architecture is more suitable for massive content diffusion (e.g., video streaming), representing the major use cases of modern networks. Despite providing multiple benefits in terms of bandwidth efficiency and scalability, ICN can fall victim to DoS attacks and, in particular, to poisoning attacks [
5,
174]. In this case study, we analyze a real ML detection system that protects against such attacks targeting ICN architectures. The specific techniques are integrated into the Montimage Monitoring Tool,
25 which is a module of the IDS framework developed by Montimage [
118,
169].
Scenario and Challenges. This case study focuses on the well-known ICN approach of
Named Data Networking (NDN) [
181]. This NDN approach leverages a pull-based mechanism using two kinds of packets:
Interest (a request for a content) and
Data (the response with the content). When a given user wants to retrieve some content, the user (i) specifies the desired content’s name (e.g., “/data/video.mp4”) in an Interest, (ii) sends such Interest through the NDN network, and (iii) receives the corresponding Data—which can be provided either by the content
producer or by any intermediate NDN node storing a copy of such Data. The practical implementation of NDN exposes to the risk of new security attacks, such as the
Content Poisoning Attack (CPA) [
174]. In CPA, a malicious
producer (content creator) colludes with a malicious
consumer (a user requesting content) to force any NDN node on their path to insert malicious content in their
content storage (CS), hence causing poisoning attacks. This results in nodes answering some requests with such malicious content: For example, a victim may ask for a specific webpage and instead be redirected to a malicious phishing website. CPA are a dangerous threat to NDN, as shown in Reference [
119]: Analyses on real system highlighted that identifying CPA is
impossible via static and human-based approaches. This is due to the intrinsic characteristics of NDN, as each node in the network topology reacts differently. Moreover, NDN are also susceptible to
Interest Flooding Attacks (IFA), which represent a variant of DoS in which the NDN is “flooded” with interest requests [
148] for existing or even non-existing content that can disrupt the distribution of content. Although IFA are easier to identify than CPA, countering
both IFA and CPA is challenging and requires the usage of more dynamic analytical techniques—such as ML.
Montimage ML-Solution. The ML-solution developed by Montimage leverages
ensembles of ML models organized in a
Bayesian Network Classifier (BNC) [
120]. The intuition is that detection of CPA is only possible by monitoring the behaviour of each node in a NDN network—and, specifically, by analyzing and cross-correlating the evolution of different metrics for each node.
Such a goal is achieved by means of specific probes deployed on each node and monitoring its complete activity. In particular, each probe collect metrics related to the Data plane of NDN: CS, Pending Interest Table (PIT), Faces. The latter, in particular, are an abstraction of a communication channel that NDN uses for packet forwarding. Such abstraction represents data coming from diverse “faces,” i.e., overlay tunnels over TCP and UDP, delivery of NDN network layer packets (e.g., Interest, Data packets), inter-node communication channels that send packets to other nodes, and intra-node communication channels that send packets to another process on the same node.
The information captured by these probes is then analyzed by ensembles of micro-anomaly-detectors, each focusing on deviations from the normal behaviour of a single metric captured by each probe. It is true that CPA can impact many metrics and in different ways, raising hundreds of (likely) false alarms by each micro-detector. However, correlating all the alarms with a BNC allows us to (i) increase the detection performance while (ii) mitigating the high rate of false alarms generated by individual micro-detectors.
A schematic representation of the considered BNC is provided in Figure
12: the “anomaly” node (denoted in red) represents the anomalies that can occur in the entire NDN, whereas the remaining nodes represent the individual micro-detectors. Hence, each node focuses on a single metric, specifically Faces, CS, or PIT (denoted in green, purple, and blue in Figure
12). The (directed) edges in the BNC represent the causal relationships between the Anomaly node and a metric (or pairs of metrics). An edge connects the “causing” node to the “affected” node. The causal relationships are deduced based on the processing of each packet arriving to the NDN node.
Evaluation and Results. It is necessary to conduct a preliminary assessment of the learning efficency of the BNC before its deployment. This is because NDN generate a lot of traffic, and even though the BNC can “condense” the raised alarms it is still important that such alarms—and, specifically, false alarms—are within acceptable levels. To this purpose, Montimage first collects huge amounts of
real data from the probes and then uses such data (assumed to be benign) to train (and test) a BNC. Specifically, multiple BNC are assessed, each considering a different training size: The goal is finding the optimal size that minimizes the rate of false alarms. The results of such an assessment are reported in Figure
13, showing the misclassification error (as measured via fivefold cross-validation) as a function of the training size. We observe that an optimal value is achieved when the training set contains
\(\sim\) 280 samples.
26 For higher values, the error increases due to overfitting (this phenomenon confirms the misconception outlined in Section
5.2). Thus, for the considered deployment scenario, Montimage uses training sets of 280 samples—corresponding to 23 minutes of
real reportings.
To evaluate the performance in production settings, Montimage reproduces the NDN topology in Reference [
182] and creates two distinct environments, each adopting a specific NDN routing strategy:
bestroute or
multicast. Then, each environment is monitored for 10 minutes, and the attack is simulated in the last 5 minutes. Specifically, multiple CPA are launched, each considering increasing
payloads, denoting the number of requests for content (i.e., Interests) per second; in our case, we consider payloads of 5, 10, 20, and 50 Interests per second. In comparison, legitimate clients produce 10 Interests per second (on average): Hence, the malicious traffic ranges from half to five times the legitimate traffic. The traffic generated during such simulations is collected and used to assess the quality of the BNC: The goal is to verify whether the BNC is capable of identifying the CPA, which occurs in the last 5 minutes.
To provide a twofold perspective of the performance (see Section
5.3), Montimage measures the
True-Positive Rate (TPR) and
False Positive Rate (FPR) (–cf. Table
1 in Section
2.1). The results of such evaluation, performed on a testing set of 240 samples, are reported in Table
2. We observe that the TPR increases for greater payloads, because the CPA become more conspicuous. Nonetheless, it is appreciable that even CPA with low payload can be effectively detected. Finally, the low FPR is crucial for real deployments as they are annoying to human operators. All such results are due to the advantages provided by the BNC, because BNC use a probabilistic approach that allows us to take into account the underlying random nature of the observed metrics. Such a property makes BNC tailored for multi-variate anomaly detection in
real environments. In contrast, other ML algorithms present significant drawbacks: For instance, “deep” neural networks are excessively difficult to develop in such settings (also due to their poor explainability), whereas other “shallow” algorithms, such as SVM, simply do not allow us to efficiently represent and correlate all the metrics affected by CPA.
The major limitation of BNC is its intrinsic function as anomaly detector: Indeed, an anomaly is not necessarily malicious. For instance, in a NDN setting, a sudden demand for a video from legitimate users could lead to a temporary increase in traffic, indicating an abnormal activity. To mitigate this problem, Montimage considers four possible “states”:
normal state,
IFA attack state,
CPA attack state, and
number of users increase. Each state is denoted by different “anomalous” combinations taking into account a total of 18 metrics: A similar solution allows us to maintain the FPR to acceptable levels (as shown in Table
2). We take this opportunity to make a crucial remark for real ML deployments: One may believe that defining more “states” and/or increasing the amount of considered metrics leads to better results. However, according to Montimage a similar approach can yield proficient results only in a lab environment, because it induces overfitting, and the true
deployment performance may suffer excessive FPR.
Finally, an intriguing future development of such an ML solution involves the consideration of “stateful” analyses that take into account the time-axis (as done, e.g., in Reference [
56]) and allow to detect even anomalies occurring in the temporal domain. The next case-study by S2Grupo will consider a similar application.
7.2 Combining ML with Non-ML Methods to Protect Industry 4.0 Environments
With the rapid growth of the Industry 4.0 paradigm, industrial environments are even more exposed to
Advanced Persistent Threats (APT) [
132]. Specifically, recent developments of ICS represent an attractive target for attackers [
68]. In this case study, we share the experience in the design and operation of CAIAC,
27 a non-intrusive device that leverages sequential ML to protect ICS against APT and other cyber-threats.
Scenario and Challenges. This case study highlights the advantages of ML applications for anomaly detection in time-series data. The intuition is that APT leverage zero-day vulnerabilities and hence cannot be detected via misuse-based detection approaches—irrespective of being human or data driven. However, pointwise and static anomaly detection approaches are not enough to detect advanced cyberattacks, and the additional perspective provided by the temporal domain may facilitate the detection of refined offensive strategies [
132].
In the specific ICS scenario, there are two crucial requirements that must be met by security systems. First, they should operate in a non-intrusive way, avoiding additional overhead and ensuring the regular functionalities of the ICS: This is a tough requirement, because ICS include hundreds of devices and while excessive false alarms are annoying, slow reaction times may imply a fallout of the entire ICS. Second, they must take into account the complexity and variability of the data in ICS, which is difficult to manage to the intrinsic heterogeneity of ICS. Such a requirement cannot be met just with traditional approaches for time-series anomaly detection based on heuristics: To address this problem, S2Grupo leverages the capabilities of deep learning.
S2Grupo ML-Solution. The ML solution developed by S2Grupo, CAIAC, is an intriguing example of
ML orchestration (Section
6.4): CAIAC not only leverages the benefits provided by “small” ML models (as done in Section
7.1) but also exploits the potential of non-ML methods for time-series analyses. In particular, the idea is to combine deep learning algorithms, epitomized by
Long-Short Term Memory (LSTM) neural networks, with statistical approaches for time-series forecasting, such as
Seasonal Autoregressive Integrated Moving Average (SARIMA). The result is an ensemble of ML and non-ML models, exploiting the benefits of both approaches and overcoming their limitations: Statistical models can be more manageable, but when the data have high complexity deep learning is superior. Such a design choice is particularly suited for real ICS deployments due to a threefold advantage with respect to “one-size-fits-all” ML architectures. Specifically:
•
individual ML models are easier to train, because they must deal only with a tiny portion of the data, resulting in better performance and lower false alarms;
•
it allows combining different algorithms, each addressed to a specific problem and data type.
•
it makes the resulting system more “future proof,” because each ML model can be individually updated, removed, or replaced.
Furthermore, CAIAC is based on passive monitoring in near real time, hence preventing excessive information overhead while still allowing timely responses.
Let us explain CAIAC in more detail. The intuition is to analyze the network traffic of the considered ICS from different perspectives, each associated to a specific time series. This time series can differ on the basis of two criteria: the network
metric (e.g., transmitted packets) and the
granularity used to aggregate the corresponding metric in time slots of fixed length. All such time series are used to devise multiple ML and non-ML models: The performance of each model can be assessed individually by forwarding its detected anomalies to a higher-level correlation layer (similarly to Reference [
132]). The goal of this layer is determining the nature of such anomalies: They can either be legitimate (i.e., a “normal” malfunctioning of a component that must be investigated) or illegitimate (i.e., an attack is taking place). Such a procedure allows us to identify the most suitable models that will be integrated in CAIAC, depending on the pros and cons of each model. Indeed, LSTM models may yield a superior performance but require a training phase, whereas statistical models are easier to develop and only require some tuning. Hence, such (non-ML) models are the preferred choice when they exhibit similar performance to LSTM.
Evaluation and Results. To develop CAIAC, it is necessary to first assess the characteristics of the specific ICS: Indeed, it is not possible to use models trained on different environments (as explained in Section
5.3). Hence, S2Grupo monitors and collects the network traffic of the considered ICS and creates multiple time series, each considering a given metric and granularity. Some metrics are commonly adopted in NIDS (e.g., transmitted packets or bytes, in-/out-degree [
132]); others are specific of ICS and require dedicated industrial dissectors that extract the relevant information (e.g., protocol, parameters, command density). Finally, each metric is aggregated in time slots of varying length, from 1 minute to 1 hour.
After this data collection phase, which in the considered setting typically amounts to about 10 GB of data per day, S2Grupo performs the exploratory analysis focused on determining the most proficient (ML and non-ML) algorithms for studying each time series. Let us elucidate the differences between two specific applications of SARIMA and LSTM, starting from the non-ML algorithm.
Specifically, SARIMA analyzes a time series by adopting a sliding window approach: All data points within a given time window are considered by SARIMA to predict a “future” value, which is provided alongside a
confidence range. We provide an example of SARIMA in Figure
14, showing the time series of the
transmitted packets aggregated in time slots of
5 minutes, over a period of 1 week; the sliding window considered by SARIMA is of 30 minutes. The actual values are reported in dark blue, whereas the values predicted via SARIMA are shown in orange; the confidence window of each predicted value is shown in light blue: therefore, actual values that fall outside of this range are treated as anomalous. In particular, vertical gray lines denote the anomalies detected by SARIMA.
From Figure
14, we observe that SARIMA accurately detects
stationary deviations. However, SARIMA can only detect
non-stationary changes when they happen within its sliding window. Furthermore, non-stationary (but legitimate) changes that occur after a long stationary interval are falsely detected as anomalies by SARIMA. Despite some incorrect predictions, the considered application of SARIMA obtained a performance that was deemed appropriate for the given task and integrated in CAIAC.
Let us showcase an application of deep learning via LSTM. Since LSTM do not provide a confidence interval for each prediction, S2Grupo developed a custom anomaly threshold that takes into account the deviation between predicted and actual values, as well as the degree of accumulation of such deviation in the past history. An example of such an LSTM application is given in Figure
15, showing the time series of the transmitted packets (same as Figure
14) but with a time slot of 1 minute. The actual values are shown in blue, whereas the LSTM predictions are in orange. Vertical grey lines denote the anomalies detected by the LSTM, i.e., when the actual values falls outside the given anomalous threshold predicted with the LSTM.
From Figure
15, we can observe that, by reducing the time slot from 5 to 1 minute, the resulting time series is less predictable, making statistical methods unfeasible and requiring the advanced capabilities of deep learning. Indeed, the considered LSTM can detect anomalous values without being affected by non-stationary changes—even after long stationary intervals. This example highlights the capabilities of (deep) ML to deal with data with high dimensionality: The LSTM takes into account a long “past” history, allowing to better infer the “normal” behaviour. In contrast, applying SARIMA on the same time series resulted in very poor results due to the intrinsic variability of the sequence, which forced us to aggregate data in 5-minute time slots.
However, it is important to take into account that the LSTM require a training step, whereas SARIMA only requires some parameter adjustment. In this use-case, the LSTM in Figure
15 was trained with data collected over
3 weeks. Such a characteristic implies that a similar LSTM model requires at least 3 weeks of data collection, since no previous network traffic data were available to train the model—alongside the additional computational resources to store such data and train the LSTM model (which were within acceptable levels). Hence, CAIAC would initially make use of SARIMA and then replace it after enough data have been collected to develop a more proficient LSTM model.
We can conclude that machine (and deep) learning are powerful instruments for protecting modern ICS, but methods that do not leverage ML are equally important to compensate some of the limitations of ML. As such, future developments should not exclusively focus on ML and overlook the benefits provided by other data-driven methods.