Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Monitoring Runtime Metrics of Fog Manufacturing via a Qualitative and Quantitative (QQ) Control Chart

Published: 17 March 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Fog manufacturing combines Fog and Cloud computing in a manufacturing network to provide efficient data analytics and support real-time decision-making. Detecting anomalies, including imbalanced computational workloads and cyber-attacks, is critical to ensure reliable and responsive computation services. However, such anomalies often concur with dynamic offloading events where computation tasks are migrated from well-occupied Fog nodes to less-occupied ones to reduce the overall computation time latency and improve the throughput. Such concurrences jointly affect the system behaviors, which makes anomaly detection inaccurate. We propose a qualitative and quantitative (QQ) control chart to monitor system anomalies through identifying the changes of monitored runtime metric relationship (quantitative variables) under the presence of dynamic offloading (qualitative variable) using a risk-adjusted monitoring framework. Both the simulation and Fog manufacturing case studies show the advantage of the proposed method compared with the existing literature under the dynamic offloading influence.

    1 Introduction

    Fog manufacturing combines Fog and Cloud computing to provide efficient data analytics and support real-time decision-making [1]. Cloud computing has the advantage of utilizing powerful and centralized computation resources to execute computational tasks. However, Fog computing can perform computation on much cheaper yet still capable computational Fog nodes. These nodes (Fog nodes) are also physically close to manufacturing processes and sensors to reduce communication and computational latency. Moreover, by preserving the sensitive raw data and computation locally on the dedicated Fog nodes, Fog computing offers better computational reliability and data privacy than Cloud computing. People have recently improved Fog manufacturing's computational capability by introducing dynamic offloading (DO) among Fog nodes [1, 2]. DO is a mechanism that offloads part of computation tasks from a busy node to other less-occupied ones according to the computational resource availability in the network [3, 4]. DO often results in better computational redundancy, which means the tasks from the failed Fog nodes can be finished by others. Furthermore, DO provides lower latency by equally distributing the computation tasks among available Fog nodes in the network [8].
    This study is motivated by a Fog computing-enabled additive manufacturing (AM) network, which consists of multiple AM machines simultaneously producing various products fulfilling different customer needs. Although AM can efficiently fabricate products under varied designs and requirements (e.g., different mechanical and geometric properties) [5], it is still a manufacturing process that requires intensive sensor-based quality monitoring during fabrication to maintain the final product quality [6]. In the past few years, Fog Computing has proven to be an effective yet still budget-friendly real-time sensing, computation, and monitoring solution for an AM network [1, 7]. For simplicity, we also refer to Fog Computing-enabled additive manufacturing (AM) as Fog manufacturing.
    Despite the advantages of Fog manufacturing, anomalies, such as imbalanced workload among Fog nodes and cyber-attacks, can quickly consume bandwidth and CPU resources (high utilization) [2]. Although Fog nodes can dedicate themselves to their own AM machines for data collection and analysis to improve the system's reliability and responsiveness, the Cloud orchestrator is still necessary to guide DO among the Fog nodes [1]. The traffic flowing in the Fog manufacturing makes the network prone to various attacks from external visits and software updates. Due to the weak security configurations and computational capability of running anti-virus software, Fog nodes have become easy prey for hackers in recent years. Besides disrupting operations of attacked Fog nodes, hackers have also relied on the attacked Fog nodes in large numbers to amplify the DDoS attack efficiency on servers and other essential devices [9]. As a result, there has been attention to preventing the cyber-attack on Fog Manufacturing in recent years [10]. Furthermore, the recent adoption of dynamic offloading (DO) has created additional challenges in accurately monitoring Fog manufacturing because DO and anomalies simultaneously impact monitored metrics. Therefore, securing the DO-enabled Fog manufacturing is an essential yet challenging job.
    This research addresses the need for a statistically well-defined and adaptive anomaly detection framework under the influence of dynamic offloading in Fog manufacturing [1, 2]. The past literature monitored the Fog manufacturing anomalies by monitoring key runtime metrics, such as bandwidth and CPU utilization [1]. However, only focusing on one or two specific runtime metrics cannot differentiate if the irregular runtime metric values are from the anomalies or the common network variations, such as different computation tasks or data characteristics existing among Fog nodes [11]. We refer to the factors effectively reflecting the natural variations among individuals (e.g., Fog nodes) as the risk factors [25]. In Fog manufacturing, the risk factors can include different Fog nodes’ data summary statistics, computational complexity, etc. We have to consider these factors if we want to monitor the actual anomalies accurately. Statistical process control (SPC) literature refers to model-based control charts, which monitor the changes on risk factors’ model parameters, as risk-adjusted control charts. Instead of observing abnormal values on specific runtime metrics, such charts detect deviation on the baseline model's parameters, reflecting the relationships among risk-factors and the monitored runtime metrics. Therefore, although the variation of risk-factors constantly leads to the variation of monitored runtime metrics, the system remaining in normal condition does not alter model parameters. When actual anomalies occur and introduce unexpected variation into the relationships among risk-factors and the monitored runtime metrics, risk-adjusted chart will detect such events through the varying model parameters. An example of such anomalies can be that the systems performing the same rountine tasks are suddenly suffering from unusually high CPU utilization.
    However, another challenge is that the existing literature does not consider the impact of dynamic offloading (DO) on the risk-adjusted model. Since we study the anomalies over Fog Manufacturing's long-term operation, the impact of DO on the risk-adjusted model quickly becomes intractable. There is no specific risk factor nor runtime metric to fully quantify the impact of dynamic offloading onto the risk-adjusted model. Therefore, the existing risk-adjusted control charts cannot distinguish anomalies from DO. In fact, since 37% of computation tasks end up utilizing DO in our Fog manufacturing application, ignored DO prevents accurate anomaly detection.
    Figure 1 is an example of Fog manufacturing anomaly scenarios that the cyber-attacks and imbalance workloads show up. As shown in Figure 1, when Node 1 is under DDoS attack, the communication bandwidth of Node 1 is fully occupied and cannot execute any computation and communication requests from the manufacturing system. Despite that Node 2 is not attacked, the utilization of Node 2 is also extremely high compared with the normal nodes. This anomaly event is called imbalanced workloads due to the inappropriate task assignment strategies among fog nodes. The monitor statistics should be “out of control” when the above anomalies happen. This research aims to jointly monitor the relationships among quantitative variables (risk-factors and runtime metrics) under the qualitative variable's influence (if DO shows up or not) to detect anomalies accurately in Fog manufacturing.
    Fig. 1.
    Fig. 1. An illustration of anomalies in Fog manufacturing.
    Deng and Jin [12] proposed quantitative and qualitative (QQ) models using a constrained likelihood estimation to obtain the relationships among quantitative variables under the impact of a qualitative variable. Such a study paved the way for joint monitoring of the two types of responses through a risk-adjusted control chart. Motivated by the QQ model, we propose a QQ-based risk-adjusted control chart to monitor the quantitative model's variation to detect anomalies under different qualitative variable values (e.g., if DO exists). As the first work adopting risk-adjusted control charts for anomaly detection in Fog manufacturing, we mainly focus on developing the Phase I control chart, which focuses on identifying the anomaly/change-point given the full data of the entire run [13].
    Compared with the previous monitoring works in IoT, this work's contribution is three-fold: (1) It is among the first few research works that proposes a statistically well-defined risk-adjusted anomaly detection for Fog computing and manufacturing; (2) It identifies anomalies in a Fog system under the heavy influence of dynamic offloading (DO); (3) It adopts a well-defined approach to generate the out-of-control threshold/upper control limit given any type I error (false-alarm rate) to balance the trade-off between false alarms and miss-detections. The rest of the paper is organized as follows: Section 2 introduces the proposed risk-adjusted control chart with QQ responses; Section 3 includes a simulation study, examining the performance of the proposed method on discovering the process change in the signal; Section 4 implemented a Fog manufacturing testbed to evaluate the proposed method; Section 5 concludes and discusses some future research directions.

    2 Anomaly Detection in Cybersecurity and Statistical Process Control

    Before discussing the proposed method, we discuss the limitations of the existing methods by reviewing the literature on anomaly detection in cybersecurity and statistical process control (SPC) [14, 15]. To mitigate the security issues such as DDoS (distributed denial of service) attacks, Stolfo, Salem and Keromytis [16] proposed a disinformation attack with a large amount of decoy information facing unauthorized access. Kulkarni, Saha, and Hockenbury [17] studied counter-measures for adversary models for Fog manufacturing. Wells, Camelio, Williams, and White [18] discussed the importance of cyber-security research in manufacturing and illustrated a case study on how cyber-attack will impact manufacturing. Stojmenovic, Wen, Huang, and Luan [19] studied the man-in-the-middle attack, which compromises Fog devices with fake ones, based on monitoring CPU and memory consumption. Vishwanath, Peruri, and He [20] studied encryption methods to enhance the security of communication in Fog Computing. Sohal, Sandhu, Sood, and Chang [21] proposed a two-stage Markov model to identify malicious Fog devices. Sklavounos, Edoh, and Plytas [22] applied EWMA and CUSUM control charts for Root to Local (R2L) intrusion detection. As an improvement, recurrent quantification analysis (RQA) considers important features, such as the number of packets and bytes, to jointly monitor for anomaly. For example, Jeyanthi, Vinithra, Thandeeswaran, and Iyengar [23] adopt recurrence plots and several classical RQA measurements, including recurrence rate and entropy, to detect the DDoS attacks. Similarly, Righi and Nunes [24] adopt RQA with input variables identified from the literature and incorporate adaptive clustering to detect DDoS attacks. However, a challenge preventing a dynamic offloading-enabled Fog manufacturing network from adopting RQA is the lack of rigorously defined framework on determining key RQA parameter values, such as the recurrent radius. Due to the volatile nature of computation and communication patterns in Fog manufacturing under the dynamic offloading mechanism, adaptively adjusting these parameters is challenging. The existing research have been adopting the pre-tuned parameters from the prior literature, which also perform the computational studies on a similar set of well-established public domain DDoS datasets. There is a pressing need to propose a justified parameter selection framework to suit a broader range of applications in the future [24].
    For a well-defined method with key parameters automatically learned from the underlying Fog system, we adopt the statistical process control (SPC) framework, widely adopted for anomaly detection in manufacturing and cybersecurity. Specifically, there are two groups of control charts in SPC. The first group mainly focuses on monitoring the qualitative outcome (categorical variables). For example, Steiner, Cook, Farewell, and Treasure [25] proposed a risk-adjusted cumulative sum (CUSUM) control chart to monitor patients' survival status. They adopt Bernoulli distribution-based logistic regression to adjust each patient's risk when applying a CUSUM control chart to monitor the log-likelihood survival score. Spiegelhalter, Grigg, Kinsman, and Treasure [26] proposed a more general framework of risk-adjusted CUSUM control chart based on a sequential probability ratio test, which treats the work of Steiner, Cook, Farewell, and Treasure [25] as a special case of the proposed framework. Cook, Steiner, Cook, Farewell, and Morton [27] proposed a Shewhart p-chart based on adjustable control limits and risk-adjusted exponentially weighted moving average (EWMA) control charts based on traditional EWMA charts [28]. However, the above methods only apply to the qualitative response variables; hence they cannot monitor Fog manufacturing's continuous computational metrics. Furthermore, these methods do not incorporate variable selection, such as LASSO regression [29], into a risk-adjusted control chart so that they cannot remove the irrelevant variables when estimating the risk-adjusted model.
    The second group mainly focuses on monitoring the quantitative measures (continuous variables). For example, Biswas and Kalbfleisch [30] proposed a Cox-model-based risk-adjusted CUSUM control chart to monitor survival time. Steiner and Jones [31] proposed an updating EWMA control chart to monitor the risk-adjusted survival time. To handle censored data, which can be frequently seen in many healthcare and manufacturing applications, Sego, Reynolds, and Woodall [32] proposed a log-likelihood-based risk-adjusted survival time CUSUM control chart (RAST-CUSUM). Steward and Rigdon [33] proposed a Bayesian approach to monitor the model parameter change using a change-point method. Other risk-adjusted control charts for quantitative variables also include risk-adjusted parametric control charts [34], auxiliary information-based control charts [35], Cox-model based adjusted control charts [30]. People also studied nonparametric monitoring for quantitative variables, such as nonparametric regression monitoring [36], mix-effect model monitoring [37], location-and-scale monitoring [38]. However, these methods cannot distinguish if the changes in the risk-adjusted model parameters are from the dynamic offloading or the actual anomalies. Furthermore, these methods also do not incorporate variable selection to remove the irrelevant variables when monitoring. Lastly, it has come to our attention that applying risk-adjusted control charts for cyber-security in Fog Computing and manufacturing has not been widely studied.

    3 Phase I Risk-adjusted Control Chart with QQ Responses

    There are three assumptions for the proposed method. First, we assume that the system monitored has both quantitative and qualitative (QQ) response variables, formally known as a QQ system [12]. Second, the quantitative variable's input-output relationship (e.g., model parameters) is conditioned on the qualitative response value (e.g., with or without dynamic offloading). Third, we assume a change-point/anomaly at time \( \tau \) over the time horizon \( t = 1, \ldots ,m \) . We proposed a quantitative and qualitative variable-based control chart to jointly monitor the two types of response variables using the risk-adjusted likelihood ratio test framework (QQ-RA-LRT). The QQ-RA-LRT employs the conditional linear regression model for predicting the quantitative response based on the value of the qualitative model's output. We generated a table of notations (Table 1) so that readers can refer to it for a better understanding of the methodology proposed.
    In the QQ system, denote the quantitative response at the time \( i \) as \( {y_i} \) following a Gaussian distribution, and a qualitative response at the time \( i \) as \( {z_i} \) following a Bernoulli distribution. In Fog manufacturing, we treat the risk factors, reflecting the properties of the underlying computational tasks and the corresponding datasets, as quantitative variables. In the Fog manufacturing literature, experimental studies show that modeling these variables in Gaussian distribution yields satisfactory performance [1]. Furthermore, we treat the indication of whether dynamic offloading (DO) presents in Fog manufacturing as the Bernoulli distributed variable due to its binary nature [39]. In total, we have,
    \( \begin{equation} {y_i}\sim\left\{ {\begin{array}{@{}*{1}{c}@{}} {Gaussian\left( {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_1^{\left( {1\tau } \right)} + \left( {1 - {z_i}} \right){\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_2^{\left( {1\tau } \right)},\sigma } \right),i \le \tau }\\ {Gaussian\left( {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_1^{\left( {\tau m} \right)} + \left( {1 - {z_i}} \right){\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_2^{\left( {\tau m} \right)},\sigma } \right),i > \tau } \end{array}} \right\}, \end{equation} \)
    (1)
    and
    \( \begin{equation} {z_i} = \left\{ {\begin{array}{@{}*{1}{c}@{}} {Bernoulli\left( {\frac{{\exp \left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {1\tau } \right)}}} \right)}}{{1 + \exp \left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {1\tau } \right)}}} \right)}}} \right),i \le \tau }\\ {Bernoulli\left( {\frac{{\exp \left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {\tau m} \right)}}} \right)}}{{1 + \exp \left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {\tau m} \right)}}} \right)}}} \right),i > \tau } \end{array}} \right\}. \end{equation} \)
    (2)
    Denote the observation data as ( \( {{\boldsymbol{x}}_i} \) , \( {\rm{\ }}{y_i} \) , \( {\rm{\ }}{z_i} \) ), \( i = 1, \ldots ,n \) , where \( {y_i} \in [ { - \infty , + \infty } ] \) and \( {z_i} \in ( {0,1} ) \) . The vector of predictors \( {{\boldsymbol{x}}_i} = {( {{x_1}, \ldots ,{x_p}} )^{\rm{'}}} \) represent monitored process variables. In this case, a risk-adjusted change point control chart for monitoring a QQ system monitors the ratio test score \( \Lambda ( \tau ) \) , which is taken the logarithm of the ratio between \( {l_a} \) and \( {l_0} \) as [13]
    \( \begin{equation} \Lambda \left( \tau \right) = {l_a} - {l_0}, \end{equation} \)
    (3)
    where \( {l_a} \) represents the log-likelihood of the alternative hypothesis: \( ( {{\boldsymbol{\beta }}_1^{( {1\tau } )},{\boldsymbol{\beta }}_2^{( {1\tau } )},{{\boldsymbol{\eta }}^{( {1\tau } )}}} ) \ne ( {{\boldsymbol{\beta }}_1^{( {\tau m} )},{\boldsymbol{\beta }}_2^{( {\tau m} )},{{\boldsymbol{\eta }}^{( {\tau m} )}}} ) \) with
    \( \begin\begin{eqnarray} \!\!\!\!\!{l_a} &=& \log \left\{ {\mathop \prod \limits_{i = 1}^\tau Gaussian\left( {{y_i},{{\boldsymbol{x}}_i},{\boldsymbol{\beta }}_1^{\left( {1\tau } \right)},{\boldsymbol{\beta }}_2^{\left( {1\tau } \right)},{{\boldsymbol{\eta }}^{\left( {1\tau } \right)}}} \right)\mathop \prod \limits_{i = \tau + 1}^m Gaussian\left( {{y_i},{{\boldsymbol{x}}_i},{\boldsymbol{\beta }}_1^{\left( {\tau m} \right)},{\boldsymbol{\beta }}_2^{\left( {\tau m} \right)},{{\boldsymbol{\eta }}^{\left( {\tau m} \right)}}} \right)} \right\}\nonumber\\ \!\!\!\!\! &=& \mathop \sum \limits_{i = 1}^\tau \left\{ {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {1\tau } \right)}} - log\left( {1 + {\rm{exp}}\left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {1\tau } \right)}}} \right)} \right)} \right\} + \mathop \sum \limits_{i = \tau + 1}^m \left\{ {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {\tau m} \right)}} - log\left( {1 + {\rm{exp}}\left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{\left( {\tau m} \right)}}} \right)} \right)} \right\} \nonumber\\ \!\!\!\!\!&&-\ \tau \ln ( {2\pi } ) - \tau \ln ( {{\sigma ^2}} ) - \frac{1}{{2{\sigma ^2}}}\mathop \sum \limits_{i = 1}^\tau \left\{ {{z_i}{{\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_1^{\left( {1\tau } \right)}} \right)}^2} - \left( {1 - {z_i}} \right){{\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_2^{\left( {1\tau } \right)}} \right)}^2}} \right\} - \tau \ln \left( {2\pi } \right) \\ \!\!\!\!\!&&-\ \tau \ln \left( {{\sigma ^2}} \right) - \frac{1}{{2{\sigma ^2}}}\mathop \sum \limits_{i = \tau + 1}^m \left\{ {{z_i}{{\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_1^{\left( {\tau m} \right)}} \right)}^2} - \left( {1 - {z_i}} \right){{\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_2^{\left( {\tau m} \right)}} \right)}^2}} \right\}\nonumber \end{eqnarray} \)
    (4)
    and \( {l_0} \) represents the log-likelihood of the null hypothesis ( \( {{\boldsymbol{\beta }}^{( {0\tau } )}} = {{\boldsymbol{\beta }}^{( {\tau m} )}} \) ),
    \( \begin\begin{eqnarray} {l_0} &=& \log \left\{ {\mathop \prod \limits_{i = 1}^m f\left({y_i};{{\boldsymbol{x}}_i},{\boldsymbol{\beta }}_1^{\left( {1m} \right)},{\boldsymbol{\beta }}_2^{\left( {1m} \right)},{{\boldsymbol{\eta }}^{\left( {1m} \right)}}\right)} \right\}\nonumber\\ &=& \mathop \sum \limits_{i = 1}^m \left\{ {{z_i}{\boldsymbol{x}}_i^T{{\boldsymbol{\eta }}^{\left( {1m} \right)}} - \log \left(1 + \exp \left({\boldsymbol{x}}_i^T{{\boldsymbol{\eta }}^{\left( {1m} \right)}}\right)\right)} \right\} - \frac{m}{2}\ln (2\pi ) - \frac{m}{2}\ln ({\sigma ^2})\\ &&- \frac{1}{{2{\sigma ^2}}}\mathop \sum \limits_{i = 1}^m \left\{ {{z_i}{{\left({y_i} - {\boldsymbol{x}}_i^T{\boldsymbol{\beta }}_1^{\left( {1\tau } \right)}\right)}^2} - \left( {1 - {z_i}} \right){{\left({y_i} - {\boldsymbol{x}}_i^T{\boldsymbol{\beta }}_2^{\left( {1\tau } \right)}\right)}^2}} \right\}\!,\nonumber \end{eqnarray} \)
    (5)
    where \( f( \cdot ) \) is the linear function for \( {{\boldsymbol{x}}_i} \) and \( {\rm{\ }}{y_i} \) . When \( \Lambda ( \tau ) \) exceeds the upper control limit (UCL), the process is out-of-control. We use Monte Carlo simulation to determine the UCL values of QQ-RA-LRT since auto-correlated \( \Lambda ( \tau ) \) is intractable when determining its distribution [13]. Following the Monte Carlo approach, we will first use the baseline QQ models estimated from the in-control data to generate a set of \( m \) observations in time-series and calculate \( P \) using (6).
    \( \begin{equation} P = \mathop {\max }\limits_{\tau = 1, \ldots ,m} \Lambda \left( \tau \right)\!, \end{equation} \)
    (6)
    We repeat such procedures \( r \) times, and UCL will be obtained as the \( {\rm{\ }}100( {1 - \alpha } ) \) th percentile of all \( P \) collected over \( r \) trials, where \( \alpha \) is the pre-determined Type-I error rate (e.g., \( \alpha = 0.05 \) as a common practice). Such a strategy is flexible as we can adopt any value of \( \alpha \) . The joint estimation of model parameters for the baseline QQ models (e.g., \( ( {{\boldsymbol{\beta }}_1^{( {1\tau } )},{\boldsymbol{\beta }}_2^{( {1\tau } )},{{\boldsymbol{\eta }}^{( {1\tau } )}}} ) \) or \( ( {{\boldsymbol{\beta }}_1^{( {\tau m} )},{\boldsymbol{\beta }}_2^{( {\tau m} )},{{\boldsymbol{\eta }}^{( {\tau m} )}}} ) \) ) based on in-control is not trivial due to its non-convexity. As an alternative, we solve the problem using the reformulated constrained quadratic optimization problem proposed in [12], which is guaranteed to reach a global optimum.
    As benchmarks in the later computational studies, we consider either only monitoring a binary response variable \( {z_i} \) following Bernoulli distribution, or a continuous response variable \( {y_i} \) following a Gaussian distribution with a mean of \( {\theta _i} \) and a standard deviation of \( \sigma \) . Similar to the QQ-RA-LRT, a risk-adjusted change-point likelihood ratio test control chart for qualitative/binary responses (Qual-RA-LRT) is,
    \( \begin{equation} \Lambda \left( \tau \right) = {l_a} - {l_0}, \end{equation} \)
    (7)
    where \( {l_a} \) represents the log-likelihood of the alternative hypothesis ( \( {{\boldsymbol{\beta }}^{( {0\tau } )}} \ne {{\boldsymbol{\beta }}^{( {\tau m} )}} \) ) and \( {l_0} \) represents the log-likelihood of the null hypothesis ( \( {{\boldsymbol{\beta }}^{( {0\tau } )}} = {{\boldsymbol{\beta }}^{( {\tau m} )}} \) ). For \( {z_i} \) ,
    \( \begin\begin{eqnarray} {l_a} &=& \log \left\{ {\mathop \prod \limits_{i = 1}^\tau Bernoulli\left( {{z_i},{{\boldsymbol{x}}_i},{{\boldsymbol{\eta }}^{\left( {0\tau } \right)}}} \right)\mathop \prod \limits_{i = \tau + 1}^m Bernoulli\left( {{z_i},{{\boldsymbol{x}}_i},{{\boldsymbol{\eta }}^{\left( {\tau m} \right)}}} \right)} \right\}\nonumber\\ &=& \mathop \sum \limits_{i = 1}^\tau \left\{ {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}_0} - log\left( {1 + {\rm{exp}}\left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}_0}} \right)} \right)} \right\} + \mathop \sum \limits_{i = \tau + 1}^m \left\{ {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}_1} - log\left( {1 + {\rm{exp}}\left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}_1}} \right)} \right)} \right\} \end{eqnarray} \)
    (8)
    and
    \( \begin{equation} {l_0} = \log \left\{ {\mathop \prod \limits_{i = 1}^m B\left( {{z_i},{{\boldsymbol{x}}_i},{{\boldsymbol{\eta }}^{\left( {0m} \right)}}} \right)} \right\} = \mathop \sum \limits_{i = 1}^m \left\{ {{z_i}{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}_0} - log\left( {1 + {\rm{exp}}\left( {{\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}_0}} \right)} \right)} \right\} \end{equation} \)
    (9)
    Similarly, a risk-adjusted change-point likelihood ratio test control chart for quantitative/continuous responses (Quan-RA-LRT) is
    \( \begin\begin{eqnarray} {l_a} &=& \log \left\{ {\mathop \prod \limits_{i = 1}^m Gaussian\left( {{y_i},{{\boldsymbol{x}}_i},{{\boldsymbol{\beta }}^{\left( {0\tau } \right)}}} \right)\mathop \prod \limits_{i = \tau + 1}^m Gaussian\left( {{y_i},{{\boldsymbol{x}}_i},{{\boldsymbol{\beta }}^{\left( {\tau m} \right)}}} \right)} \right\} \nonumber\\ &=& - \tau \ln \left( {2\pi } \right) - \tau \ln \left( {{\sigma ^2}} \right) - \frac{1}{{2{\sigma ^2}}}\mathop \sum \limits_{i = 1}^m {\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\beta }}^{\left( {0\tau } \right)}}} \right)^2} - \frac{1}{{2{\sigma ^2}}}\mathop \sum \limits_{i = 1}^m {\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\beta }}^{\left( {\tau m} \right)}}} \right)^2} \end{eqnarray} \)
    (10)
    and
    \( \begin{equation} {l_0} = \log \left\{ {\mathop \prod \limits_{i = 1}^m Gaussian\left( {{y_i},{{\boldsymbol{x}}_i},{{\boldsymbol{\beta }}^{\left( {0m} \right)}}} \right)} \right\} = - \frac{\tau }{2}\ln \left( {2\pi } \right) - \frac{\tau }{2}\ln \left( {{\sigma ^2}} \right) - \frac{1}{{2{\sigma ^2}}}\mathop \sum \limits_{i = 1}^m {\left( {{y_i} - {\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\beta }}^{\left( {0m} \right)}}} \right)^2} \end{equation} \)
    (11)
    To determine the UCLs of Qual-RA-LRT and Quan-RA-LRT, we adopt the Monte Carlo simulation in the same fashion as QQ-RA-LRT mentioned above, since auto-correlated \( \Lambda ( \tau ) \) in this case are intractable for any analytical solution [13].

    4 Simulation Study

    This simulation aimed to compare the proposed QQ control chart's detection accuracy with other well-defined and competitive detection methods in the literature. Specifically, they included the widely adopted CUSUM control chart [40], EWMA control chart [28], and the risk-adjusted control chart. The risk-adjusted charts compared considered one type of response at a time, such as the Qual-RA-LRT and Quan-RA-LRT introduced in the previous section. CUSUM and EWMA charts only monitored the selected quantitative response variable's changes without considering the risk factors nor their corresponding model parameters. Qual-RA-LRT and Quan-RA-LRT only monitored the change of input-output relationships (model parameters) for the qualitative and quantitative variables separately. To comprehensively review the performance, we varied the threshold for out-of-control from 1 to 3 standard deviation for CUSUM (denoted CUSUM 1 and CUSUM 3) [40]. We also varied the values of \( \lambda \) (the percentage of running average of all preceding observations) to be 0.3 and 0.7 for EWMA (denoted as EWMA 0.3 and EWMA 0.7). The process mean and standard deviation for CUSUM and EWMA were estimated based on the first 25 samples per literature suggestion [41].
    We first adopted the probability-based metric from Tian, Jin, Huang, and Camelio [42] for performance evaluation. Specifically, we compared the probability that an out-of-control point (anomaly) can be accurately identified within the small fixed-size windows around the exact shift location. Assuming an anomaly at \( \tau \) , we measured the probabilities that the identified shifting location was within an interval centered around \( \tau \) . For example, \( {P_0} \) represents the probability of the identified shifting location being precisely at \( \tau \) , \( {P_1} \) represents the probability of the identified shifting location being within the interval between \( \tau - 1 \) and \( \tau + 1 \) , and \( {P_2} \) represents the probability of the identified shifting location being within the interval between \( \tau - 1 \) and \( \tau + 1 \) , respectively. In addition, we also summarized the distance between the identified shifting location and \( \tau \) in root-mean-squared-errors (RMSEs) [13].
    In the simulation, we had the model parameters for the quantitative model \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) and the model parameters for the qualitative model \( {{\boldsymbol{\eta }}^{( {1\tau } )}} \) before anomaly at \( \tau \) in three-dimensional vectors. The values of the parameter vector \( {{\boldsymbol{\eta }}^{( {1\tau } )}} \) were generated from a uniform distribution \( Uniform( { - 2,2} ) \) , and the \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) were generated as random unit vectors with an angle of \( {0^\circ} \) , \( {45^\circ} \) , \( {90^\circ} \) , and \( {180^\circ} \) from each other. To simulate the perturbation for the quantitative model parameters after the anomaly, we randomly simulated three-dimensional perturbation vectors and added them to \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) to generate \( {\boldsymbol{\beta }}_1^{( {\tau m} )} \) . These perturbation vectors were \( {\beta _1} \) : [0.5068, 0.8620, 0.0111], \( {\Delta _2} \) : [0.2186, 0.9697, −0.1088], and \( {\varDelta _3} \) : [0.7847, −0.6164, 0.0652]. Similarly, to simulate the perturbation for the qualitative model parameters, we generated \( {{\boldsymbol{\eta }}^{( {\tau m} )}} \) by adding a randomly generated vector with each element following \( Normal( {0,0.1} ) \) to \( {{\boldsymbol{\eta }}^{( {1\tau } )}} \) . Additionally, to observe the performance under the different magnitude of change, all perturbation vectors were multiplied by a magnitude multiplier chosen from (0.1, 0.25, 0.5, 1) in different experimental settings.
    During the simulation setup, we first generated the simulation data using the following procedures: to mimic the time-series nature of the input features \( {{\boldsymbol{x}}_i} \) , we adopted the first-order moving average (MA) model with \( {{\boldsymbol{x}}_i} = {\boldsymbol{\mu }} + {{\boldsymbol{w}}_i} + \theta {{\boldsymbol{w}}_{i - 1}} \) [43], where each element in \( {\rm{\ }}{\boldsymbol{\mu }} \) followed \( Normal( {0,0.1} ) \) , \( \theta = 0.1 \) , \( {{\boldsymbol{w}}_i}\sim Gaussian( {0,{\rm{\Sigma }}} ) \) , and \( {\rm{\Sigma }} \) was the covariance matrix with diagonal as 1 and off-diagonal as 0.1. Then, we generated the \( {y_i} \) and \( {z_i} \) based on the simulated \( {{\boldsymbol{x}}_i} \) using the revised Equation (1) and (2) to include error terms for randomness. Specifically, we added the random errors \( \epsilon \sim Gaussian( {0,0.1} ) \) to the linear term \( {z_i}{\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_1^{( {1\tau } )} + ( {1 - {z_i}} ){\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_2^{( {1\tau } )} \) in Equation (1) as \( {z_i}{\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_1^{( {1\tau } )} + ( {1 - {z_i}} ){\boldsymbol{x}}_i^{\rm{'}}{\boldsymbol{\beta }}_2^{( {1\tau } )} + \epsilon \) . Furthermore, we added the random errors \( \epsilon \sim Gaussian( {0,0.1} ) \) to the linear term \( {\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{( {1\tau } )}} \) in Equation (2) as \( {\boldsymbol{x}}_i^{\rm{'}}{{\boldsymbol{\eta }}^{( {1\tau } )}} + \epsilon \) . Using the setting above, we simulated a time-series sequence having 1,000 observations of ( \( {{\boldsymbol{x}}_i} \) , \( {\rm{\ }}{y_i} \) , \( {\rm{\ }}{z_i} \) ) and the change point at the 500th observation.
    There were two types of out-of-control (after anomalies) scenarios. Specifically, Scenario 1 was only adding perturbation vectors on \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) to generate \( {\boldsymbol{\beta }}_1^{( {\tau m} )} \) . Scenario 2 was adding perturbation vectors to both \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {{\boldsymbol{\eta }}^{( {1\tau } )}} \) to generate \( {\boldsymbol{\beta }}_1^{( {\tau m} )} \) and \( {{\boldsymbol{\eta }}^{( {\tau m} )}} \) . In theory, it was difficult for methods to yield superior performance when they only monitor the change of the qualitative model in Scenario 1 since only the quantitative model parameters changed in this case. We summarize all the simulation settings in Table 2. For each scenario, 100 replications were performed. For the simplicity of presentation, we include the figures representing the results of \( {0^\circ} \) and \( {45^\circ} \) as the initial angles between \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) and \( {\varDelta _1} \) as the vector of change in the main manuscript. In addition, we include all of the original values of the results in the supplementary material for future references.
    Table 1.
    SymbolsNotations
    \( t \) time index
    \( \tau \) time when anomalies show up
    \( {{\boldsymbol{x}}_i} \) model predictors at time \( i \)
    \( {y_i} \) quantitative response (continuous) at time \( i \)
    \( {z_i} \) qualitative response (binary) at time \( i \)
    \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) parameters for the quantitative model before anomaly (without DO)
    \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) parameters for the quantitative model before anomaly (with DO)
    \( {{\boldsymbol{\eta }}^{( {1\tau } )}} \) parameters for the qualitative model (whether DO exists) before anomaly
    \( {\boldsymbol{\beta }}_1^{( {\tau m} )} \) parameters for the quantitative model after anomaly (without DO)
    \( {\boldsymbol{\beta }}_2^{( {\tau m} )} \) parameters for the quantitative model after anomaly (with DO)
    \( {{\boldsymbol{\eta }}^{( {\tau m} )}} \) parameters for the qualitative model (whether DO exists) after anomaly
    \( \sigma \) standard deviation of the distribution
    \( {l_0} \) log-likelihood of the null hypothesis (no anomaly)
    \( {l_a} \) log-likelihood of the alternative hypothesis (with anomaly)
    \( \Lambda ( \tau ) \) ratio test score
    Table 1. Table of Notations
    Table 2.
    SettingsValues
    Initial Angles between \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) \( {0^\circ} \) , \( {45^\circ} \) , \( {90^\circ} \) , \( {180^\circ} \)
    Vector of Change \( {\varDelta _1} \) , \( {\varDelta _2} \) , \( {\varDelta _3} \)
    The magnitude of Change Multiplier0.10, 0.25,0.50,1
    Out-of-Control ScenarioScenario 1, Scenario 2
    Table 2. Summary of Simulation Settings
    Figures 29 represent the results collected when we gradually increased the magnitude of change (denoted as \( M \) ) from 0.10 to 1 by multiplying it with the vector change \( {\Delta _1} \) of [0.5068, 0.8620, 0.0111]. Based on the simulation settings, we expected that small values of \( M \) would make detection difficult while increasing \( M \) would gradually improve the performance. As a result, RMSEs of identifying the anomaly would reduce while the probability of identification would increase as \( M \) increased. Specifically, based on Figures 25 (the Initial Angles between \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) is \( {0^\circ} \) ), we observed that when the magnitude of the model parameter change was small, all control charts, including the proposed QQ-based control chart, could hardly detect the shift. In this case, all methods offered large RMSEs and a small probability of identifying the shift. When the magnitude of the shift grew larger, the QQ-based control chart significantly outperformed the benchmark methods, especially the non-risk-adjusted charts. Another observation was that Qual-RA-LRT always yielded a large variation of the RMSE even when \( M \) increased. This observation was consistent with our previous assumption that it was difficult for methods to yield superior performance in Scenario 1 when they only monitored the change of the qualitative model (e.g., Qual-RA-LRT) as only the quantitative model parameters changed in this case.
    Fig. 2.
    Fig. 2. Averages and standard errors (in error bars) of RMSEs when identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {0^\circ} \) ).
    Fig. 3.
    Fig. 3. Averages and standard errors (in error bars) of RMSEs when identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\eta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {0^\circ} \) ).
    Fig. 4.
    Fig. 4. The probability of identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {0^\circ} \) ).
    Fig. 5.
    Fig. 5. The probability of identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\eta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {0^\circ} \) ).
    Figures 69 show the results of additional simulation with a different initial angle between \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) ( \( {45^\circ} \) ). We also observed that when the magnitude of the model parameter change was small, all control charts, including the proposed QQ-based control chart, could hardly detect the shift. However, the proposed QQ-based control chart quickly outperformed others when the magnitude of change increased. Eventually, the proposed QQ-based charts detected the change in nearly 0 RMSE and 100% probability. Based on additional results from for vector \( {\Delta _1} \) \( {\Delta _2} \) , and \( {\Delta _3} \) in the supplementary material, we observed similar results across the vectors of changes in \( {\Delta _1} \) , \( {\Delta _2} \) , \( {\Delta _3} \) and the initial angles between \( {\boldsymbol{\beta }}_1^{( {1\tau } )} \) and \( {\boldsymbol{\beta }}_2^{( {1\tau } )} \) in \( {0^\circ} \) , \( {45^\circ} \) , \( {90^\circ} \) , \( {180^\circ} \) . This study showed when the varying response of the qualitative model jointly impacted the quantitative model (the model parameters) with anomalies, the proposed method separated the effect and detected the true anomalies.
    Fig. 6.
    Fig. 6. Averages and standard errors (in error bars) of RMSEs when identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {45^\circ} \) ).
    Fig. 7.
    Fig. 7. Averages and standard errors of RMSEs when identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\eta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {45^\circ} \) ).
    Fig. 8.
    Fig. 8. The probability of identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {45^\circ} \) ).
    Fig. 9.
    Fig. 9. The probability of identifying the anomaly with perturbation on \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\eta}}_1^{( {1{\rm{\tau }}} )} \) ( \( {\boldsymbol{\beta}}_1^{( {1{\rm{\tau }}} )} \) and \( {\boldsymbol{\beta}}_2^{( {1{\rm{\tau }}} )} \) held an angle of \( {45^\circ} \) ).

    5 Case Study

    The case study's objective was to evaluate the proposed QQ control chart's performance in a real dynamic offloading-enabled Fog manufacturing testbed [3], which faced anomalies. A testbed of five fused deposition modeling (FDM) machines fabricated the personalized product based on the testbed's customers' requirements. During the manufacturing process, Fog nodes collected various types of real-time data, such as temperature for the extruder and the platform, vibrations for the extruder and the platform in x-, y- and z-axial from sensors for in situ quality monitoring and control purpose. This study focused on anomaly detection when Fog manufacturing performed computation tasks, such as regression-based product quality prediction [1, 2]. The Fog nodes collaborated with each other to finish computational tasks under the DO strategy [44]. The structure of the dynamic offloading-enabled Fog manufacturing testbed is shown in Figure 10. The first layer of the testbed was the orchestrator, which organized the computation offloading, data transformation, and communications among Fog nodes. We adopted the Python DISPY distributed computation platform for computation assignment and let the orchestrator collect the information, including CPU utilization, bandwidth consumption, among Fog nodes during the operation. The second layer was the Fog nodes layer, which contained five identical Raspberry Pi 3 nodes. These nodes served both as data collectors, dedicated to each FDM machine, and computing devices, subject to collaboration under dynamic offloading. The third layer contained five FDM machines. Under the continuous influence from dynamic offloading, we would like to compare the proposed methods with the benchmark methods on their anomaly detection efficiencies, crucial for Fog manufacturing's computation and communication.
    Fig. 10.
    Fig. 10. Architecture of the Fog manufacturing testbed (Redrawn based on [2] with authors' permission).
    Based on the Fog manufacturing literature, we focused on two types of most encountered anomalies: (1) imbalanced workload among Fog nodes and (2) DDoS cyber-attack. The imbalanced workload was the unreasonable assignment of computation tasks (i.e., offloading strategy) that led to imbalanced computation workloads among Fog nodes [2]. Since the CPU utilization among Fog nodes varied significantly under imbalanced workload anomaly (i.e., some Fog nodes are extremely busy, but some are idle), we chose the response ( \( {y_{\boldsymbol{i}}} \) ) to be the standard deviation of CPU utilization among Fog nodes. Furthermore, we chose the predictors ( \( {{\boldsymbol{x}}_i} \) ) as the summary statistics (size, mean value, standard deviation, kurtosis, and skewness) of the dataset analyzed by Fog nodes, the computation complexity of the data analysis, and the in situ CPU temperature [44]. For detecting DDoS, a malicious attack by flooding the bandwidth and/or CPU utilization of the target system [2], we chose the response ( \( {y_{\boldsymbol{i}}} \) ) to be the bandwidth utilization. Furthermore, we chose the predictors ( \( {{\boldsymbol{x}}_i} \) ) to be the same as the ones adopted in detecting the imbalanced workload. For a complete list of details on the variable selection rationale and how to calculate each variable, we refer to the literatures [1, 2], adopting the same case study setup. The binary variable ( \( {z_{\boldsymbol{i}}} \) ) was the indication of whether dynamic offloading (DO) is running in the network. In our case, DISPY automatically determined and executed DO based on the user's setting.
    Recall the motivation example in the introduction, dynamic offloading strategy reduces the computation time latency and improves the computation service's responsiveness for Fog manufacturing. We compare the QQ control chart with the same set of benchmarks from the simulation section to evaluate anomaly detection performance under dynamic offloading. Specifically, we generated imbalanced workload scenarios following an improper offloading strategy: one Fog node is assigned 90% of computation. We simulate the DDoS cyber-attack by conducting an overwhelming number of simultaneous computation tasks on the testbed. We first generated 240 in-control computation tasks for each type of anomaly and then generated 120 out-of-control computation tasks (i.e., 360 computation tasks in total) in the testbed. We summarized the sample size in Table 3. Note that each computational task represents one observation along a time-series sequence, consisting of 360 tasks executed over hours. In total, there are two time-series sequences, one for each type of anomaly.
    Table 3.
     Under Dynamic OffloadingW/O Dynamic Offloading
    Scenario 1 Balanced Computation21129
    Scenario 1 Imbalanced Computation10218
    Scenario 2 W/O DDoS Cyber-attack21327
    Scenario 2 With DDoS Cyber-attack9525
    Table 3. Number of Computation Tasks (Sample Sizes for Anomaly Detection) Generated in the Case Study
    We used the sequence containing the first 200 in-control tasks as the training dataset to determine the UCL of the risk-adjusted control chart at 5% Type I error (95th percentile of likelihood ratio test scores). Then, we used the rest of the computation tasks as the testing dataset to evaluate the anomaly detection performance. We evaluated the anomaly detection performance using the distance between the actual and detected system shifts. The distances/errors are summarized in Table 4. Furthermore, we generated Figure 11 to show the raw ratio test scores \( \Lambda ( \tau ) \) for the risk-adjusted methods (QQ-RA-LRT, Quan-RA-LRT, and Qual-RA-LRT).
    Table 4.
    MethodCUSUM 1CUSUM 3EWMA 0.3EWMA 0.7Qual-RA-LRTQuan-RA-LRTQQ-RA-LRT
    Imbalance Computation53N/A76728122
    DDoS Cyber-attack2852821410
    Table 4. Distance Between the Actual System Shift and the Detected System Shift (N/A: no abnormal event was detected)
    Fig. 11.
    Fig. 11. The raw log-likelihood ratio test scores ( \( \Lambda ( \tau ) \) ) with the detected change points (red dots) and actual change points (black line) ((a) imbalanced computation; (b) DDoS cyber-attack).
    In Table 4, we first observed that the proposed QQ-RA-LRT method yielded a significantly shorter distance (error) between the actual anomaly location and the detected location for the case of Imbalance Computation. This result indicated that the proposed method accurately detected the Imbalance Computation and its impact on the model parameters under the influence of dynamic offloading. However, for the DDoS cyber-attack scenario, EWMA 0.7 and Quan-RA-LRT yielded only slightly worse results than the proposed method. We found out that the DDoS cyber-attack increased the bandwidth utilization distinguishably regardless of dynamic offloading. As a result, QQ-RA-LRT was not the only method to efficiently detect DDoS cyber-attack.
    Figure 11 shows that QQ-RA-LRT yielded a much smoother curve of likelihood ratio test scores and a more distinguishable peak at the exact change-point than the other two types of risk-adjusted control charts (Quan-RA-LRT and Qual-RA-LRT). Such observations indicate that QQ-RA-LRT offered a more confident and smooth change-point detection than the risk-adjusted control charts, only considering one type of response at a time (e.g., ignore the impact of DO when the monitoring runs metrics).

    6 Conclusion

    Fog manufacturing offers efficient data collection and analytics to support real-time decision-making. It also, however, suffers from various anomalies, such as imbalanced computational workloads and cyber-attacks. Such anomalies significantly reduce the reliability and responsiveness of Fog manufacturing's operation. Furthermore, detecting anomalies becomes more challenging with frequently encountered dynamic offloading (DO) as DO impacts Fog manufacturing jointly with anomalies. We formulate such a problem as anomaly detection in a quantitative and qualitative (QQ) system [12]. Specifically, we modeled relationships among the risk factors and the monitored runtime metrics in a quantitative model but estimated separate sets of in-control model parameters depending on whether dynamic offloading was running. As a result, we can adopt the most accurate set of in-control model parameters to detect anomalies regardless of whether there is dynamic offloading. Along this direction, we derived a statistically well-defined QQ risk-adjusted control chart to monitor the anomalies by identifying their impact on the model parameters.
    We performed both the simulation and the numerical studies in the Fog manufacturing testbed to show that the proposed method better detects the anomalies, which jointly impact Fog manufacturing with dynamic offloading. We also proved that identifying the change of relationships (reflected as model parameters) among the carefully selected risk factors and the monitored runtime metrics can effectively identify the anomalies. Several future works deserve further investigation. We only focused on the Phase I study of the monitoring to detect the anomaly. In the future, we would like to extend the QQ-RA-LRT for Phase II to enable real-time anomaly detection without requiring the entire dataset.
    Furthermore, as sensor instrumentation's advancement improves a sensor network's data collection capability, it is likely to have more variables monitored with more complex relationships among them. As a result, the nonparametric version of QQ-RA-LRT, which relaxes the model structure assumption from the QQ model, can become valuable future works. Lastly, we didn't specify where the attacks are coming from in this work as we only focus on identifying the system's irregular behaviors after it is affected by the anomalies. However, the different authentication policies among Fog nodes and the Cloud can significantly complicate Fog manufacturing anomaly detection. As a result, it is a promising future work to research anomaly detection under various authentication policies in Fog manufacturing and study the interaction among the trust methods and anomaly detection methods adopted.

    Supplementary Material

    li (li.zip)
    Supplemental movie, appendix, image and software files for, Monitoring Runtime Metrics of Fog Manufacturing via a Qualitative and Quantitative (QQ) Control Chart

    References

    [1]
    Yutong Zhang, Lening Wang, Xiaoyu Chen, and Ran Jin. 2019. Fog computing for distributed family learning in cyber-Manufacturing modeling. In Proceedings of the 2nd IEEE International Conference on Industrial Cyber Physical Systems (ICPS). IEEE, Taipei, 88--93.
    [2]
    Lening Wang, Yutong Zhang, and Ran Jin. 2020. A monitoring system for anomaly detection in fog manufacturing. In Proceedings of the 3rd IEEE Conference on Industrial Cyberphysical Systems (ICPS). IEEE, Tampere, Finland, 67--72.
    [3]
    Kai Guo, Mingcong Yang, Yongbing Zhang, and Yusheng Ji. 2018. An efficient dynamic offloading approach based on optimization technique for mobile edge computing. In Proceedings of the 6th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud), IEEE, Bamberg, Germany, 29--36.
    [4]
    D. Huang, P. Wang, and D. Niyato. 2012. A dynamic offloading algorithm for mobile computing. IEEE Transactions on Wireless Communications. 11, 6 (2012), 1991–1995.
    [5]
    I. Gibson, D. Rosen, B. Stucker, and M. Khorasani. 2014. Additive manufacturing technologies. Springer.
    [6]
    H. Kim, Y. Lin, and T.-L. B. Tseng. 2018. A review on quality control in additive manufacturing. Rapid Prototyping Journal.
    [7]
    L. Wang, X. Chen, D. Henkel, and R. Jin. 2020. Family learning: A process modeling method for cyber-additive manufacturing network. IISE Transactions (2020), 1–20.
    [8]
    Lening Wang, Yutong Zhang, Xiaoyu Chen, and Ran Jin. 2020. Online computation performance analysis for distributed machine learning pipelines in fog manufacturing. In Proceedings of the 16th International Conference on Automation Science and Engineering (CASE). IEEE, Virtual, 1628--1633.
    [9]
    C. Kolias, G. Kambourakis, A. Stavrou, and J. Voas. 2017. DDoS in the IoT: Mirai and other botnets. Computer. 50, 7 (2017), 80–84.
    [10]
    Z. Liu, X. Yin, and Y. Hu. 2020. CPSS LR-DDoS detection and defense in edge computing utilizing DCNN Q-Learning. IEEE Access, 8 (2020), 42120–42130.
    [11]
    Lening Wang, Yutong Zhang, Xiaoyu Chen, and Ran Jin. 2020. Online computation performance analysis for distributed machine learning pipelines in fog manufacturing. In Proceedings of the 16th International Conference on Automation Science and Engineering (CASE). IEEE, Virtual, 1628--1633.
    [12]
    X. Deng and R. Jin. 2015. QQ models: Joint modeling for quantitative and qualitative quality responses in manufacturing systems. Technometrics 57, 3 (2015), 320–331.
    [13]
    K. Paynabar, J. Jin, and A. B. Yeh. 2012. Phase I risk-adjusted control charts for monitoring surgical performance by considering categorical covariates. Journal of Quality Technology 44, 1 (2012), 39–53.
    [14]
    S. Khan, S. Parkinson, and Y. Qin. 2017. Fog computing security: A review of current applications and security solutions. Journal of Cloud Computing 6, 1 (2017), 19.
    [15]
    G. Javadzadeh and A. M. Rahmani. 2020. Fog computing applications in smart cities: A systematic survey. Wireless Networks 26, 2 (2020), 1433–1457.
    [16]
    Salvatore J. Stolfo, Malek Ben Salem, and Angelos D. Keromytis. 2012. Fog computing: mitigating insider data theft attacks in the cloud. In Proceedings of the 2012 IEEE Symposium on Security and Privacy Workshops. IEEE, San Francisco, CA, 125--128.
    [17]
    Saurabh Kulkarni, Shayan Saha, and Ryler Hockenbury. 2014. Preserving privacy in sensor-fog networks. In Proceedings of the 9th International Conference for Internet Technology and Secured Transactions (ICITST'14). IEEE, London, UK, 96--99.
    [18]
    L. J. Wells, J. A. Camelio, C. B. Williams, and J. White. 2014. Cyber-physical security challenges in manufacturing systems. Manufacturing Letters 2, 2 (2014), 74–77.
    [19]
    I. Stojmenovic, S. Wen, X. Huang, and H. Luan. 2016. An overview of Fog computing and its security issues. Concurrency and Computation: Practice and Experience 28, 10 (2016), 2991–3005.
    [20]
    A. Vishwanath, R. Peruri, and J. S. He. 2016. Security in Fog computing through encryption. International Journal of Information Technology and Computer Science 8, 5 (2016), 28.
    [21]
    A. S. Sohal, R. Sandhu, S. K. Sood, and V. Chang. 2018. A cybersecurity framework to identify malicious edge device in Fog computing and Cloud-of-Things environments. Computers & Security 74 (2018), 340–354.
    [22]
    D. Sklavounos, A. Edoh, and M. Plytas. 2017. A statistical approach based on EMWA and CUSUM control charts for R2L intrusion detection. IEEE, City.
    [23]
    N. Jeyanthi, J. Vinithra, Sneha, R. Thandeeswaran, and N. Ch S. N. Iyengar. 2011. A recurrence quantification analytical approach to detect DDoS attacks. In Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks. IEEE, Gwalior, India, 58--62.
    [24]
    M. A. Righi and R. C. Nunes. 2019. Combining recurrence quantification analysis and adaptive clustering to detect DDoS attacks. The Cyber Defense Review (2019), 15–30.
    [25]
    S. H. Steiner, R. J. Cook, V. T. Farewell, and T. Treasure. 2000. Monitoring surgical performance using risk-adjusted cumulative sum charts. Biostatistics 1, 4 (2000), 441–452.
    [26]
    D. Spiegelhalter, O. Grigg, R. Kinsman, and T. Treasure. 2003. Risk-adjusted sequential probability ratio tests: Applications to Bristol, Shipman and adult cardiac surgery. International Journal for Quality in Health Care. 15, 1 (2003), 7–13.
    [27]
    D. A. Cook, S. H. Steiner, R. J. Cook, V. T. Farewell, and A. P. Morton. 2003. Monitoring the evolutionary process of quality: Risk-adjusted charting to track outcomes in intensive care. Critical Care Medicine. 31, 6 (2003), 1676–1682.
    [28]
    S. W. Roberts. 1959. Control chart tests based on geometric moving averages. Technometrics. 1, 3 (1959/08/01 1959), 239–250.
    [29]
    R. Tibshirani. 1996. Regression shrinkage and selection via theLASSO. Journal of the Royal Statistical Society. Series B (Methodological) 58, 1 (1996), 267–288.
    [30]
    P. Biswas and J. D. Kalbfleisch. 2008. A risk-adjusted CUSUM in continuous time based on the Cox model. Statistics in Medicine 27, 17 (2008), 3382–3406.
    [31]
    S. H. Steiner and M. Jones. 2010. Risk-adjusted survival time monitoring with an updating exponentially weighted moving average (EWMA) control chart. Statistics in Medicine 29, 4 (2010), 444–454.
    [32]
    L. H. Sego, M. R. Reynolds, and W. H. Woodall. 2009. Risk-adjusted monitoring of survival times. Statistics in Medicine 28, 9 (2009), 1386–1401.
    [33]
    R. M. Steward and S. E. Rigdon. 2017. Risk-adjusted monitoring of healthcare quality: Model selection and change-point estimation. Quality and Reliability Engineering International 33, 5 (2017), 979–992.
    [34]
    W. Albers, W. C. Kallenberg, and S. Nurdiati. 2004. Parametric control charts. Journal of Statistical Planning and Inference 124, 1 (2004), 159–184.
    [35]
    M. Riaz. 2008. Monitoring process mean level using auxiliary information. Statistica Neerlandica 62, 4 (2008), 458–481.
    [36]
    C. Zou, F. Tsung, and Z. Wang. 2008. Monitoring profiles based on nonparametric regression methods. Technometrics 50, 4 (2008), 512–526.
    [37]
    P. Qiu, C. Zou, and Z. Wang. 2010. Nonparametric profile monitoring by mixed effects modeling. Technometrics 52, 3 (2010), 265–277.
    [38]
    G. J. Ross, D. K. Tasoulis, and N. M. Adams. 2011. Nonparametric monitoring of data streams for changes in location and scale. Technometrics 53, 4 (2011), 379–389.
    [39]
    H. Bashir, S. Lee, and K. H. Kim. 2019. Resource allocation through logistic regression and multicriteria decision making method in IoT fog computing. Transactions on Emerging Telecommunications Technologies (2019), e3824.
    [40]
    E. S. Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100–115.
    [41]
    M. B. Khoo and S. Quah. 2004. Alternatives to the multivariate control chart for process dispersion. Quality Engineering 16, 3 (2004), 423–435.
    [42]
    W. Tian, R. Jin, T. Huang, and J. A. Camelio. 2017. Statistical process control for multistage processes with non-repeating cyclic profiles. IISE Transactions 49, 3 (2017), 320–331.
    [43]
    S. C. Hillmer and G. C. Tiao. 1979. Likelihood function of stationary multiple autoregressive moving average models. Journal of the American Statistical Association 74, 367 (1979), 652–660.
    [44]
    Lening Wang, Yutong Zhang, and Ran Jin. 2020. A monitoring system for anomaly detection in fog manufacturing. In Proceedings of the 3rd IEEE Conference on Industrial Cyberphysical Systems (ICPS). IEEE, Tampere, Finland, 67--72.

    Index Terms

    1. Monitoring Runtime Metrics of Fog Manufacturing via a Qualitative and Quantitative (QQ) Control Chart

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Internet of Things
      ACM Transactions on Internet of Things  Volume 3, Issue 2
      May 2022
      214 pages
      EISSN:2577-6207
      DOI:10.1145/3505220
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 17 March 2022
      Accepted: 01 November 2021
      Revised: 01 March 2021
      Received: 01 September 2020
      Published in TIOT Volume 3, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Fog manufacturing
      2. Phase I monitoring
      3. quantitative and qualitative control chart
      4. runtime metrics

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Science Foundation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 1,262
        Total Downloads
      • Downloads (Last 12 months)459
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media