Authors should discuss the results and how they can be interpreted in perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted.
In this section, two case studies are introduced to evaluate the performance of the proposed method. The first case study involves a numerical example, illustrating the monitoring results under different faulty conditions. The second case study is a fed-batch penicillin fermentation process, which is a benchmark process that is widely used in batch process simulations.
4.1. A Numerical Simulation
We consider a nonlinear system with 3 variables as follows:
where
,
,
are the process variables,
t represents the latent variable,
,
,
are independent Gaussian noises, and
,
,
represent model parameters in the equation between process variables and output variable. The above system is run for a finite time, and a total of 200 samples are generated in each batch. Several normal batches and two faulty batches are introduced to test the monitoring performance of the proposed method. To evaluate the performance under the conditions of missing and limited data, only 10 normal batches are generated as training data and 40 samples in each normal batch are randomly missing, which results in a complicated dataset for offline modeling. Therefore, the missing sample rate is 20%, and this system becomes a batch process example with limited historical batches and missing data.
The first fault considered is a gradual fault affecting variable 1, which is introduced from the 101st sample and returned to normal after the 150th sample. The system equations of process variables under this fault are given by:
The second fault is a step bias of variable 2, introduced between the 101st sample and the 150th sample. This step fault is described in process variables by:
Due to the limited batches and missing data of the historical dataset, the bagging method is used to generate 40 sub-datasets based on the original dataset and four output modes, preparing for the construction of 4 sub-models. In each sub-model,
is extracted as process dataset and
used as quality data. Since the stochastic nature of process output is taken into consideration, we assume that the parameters
,
,
in Equation (14) vary from mode to mode as shown in
Table 2. Hence, the optimal trajectories of the individual sub-models are computed using the proposed stochastic programming approach. All weighting parameters are set as 1 for simplicity.
During the online monitoring procedures, one normal batch and two faulty batches are simulated to calculate the values of monitoring statistics at each time instant. After the implementation of stochastic programming with four modes, the optimal trajectory is calculated as shown in
Figure 3. Then, the residuals between the optimal trajectories and the practical output are calculated to obtain monitoring statistics. Thus, the monitoring results of the normal batch for each sub-model are obtained and shown in
Figure 4.
For the existing methods without the ensemble step, individual models are usually established to implement local monitoring tasks. When the ensemble strategy is introduced, the voting-based strategy is widely used to integrate the monitoring results of each individual model. Therefore, to demonstrate the advantages of the proposed method, comparisons of these conditions are considered in this case. The first contrast is implemented by using individual monitoring results of each sub-model directly without the ensemble learning strategy. Another one is executed by the use of a voting-based strategy as the ensemble step. These two comparisons are made to illustrate the advantages of the Bayesian fusion strategy based on stochastic programming.
In this case, the cutoff value of violation time for the voting-based strategy is set as 3, where most sub-models indicate abnormal conditions in such an ensemble strategy. As a result, the false alarm rate (FAR) of individual sub-models, voting-based strategy, and Bayesian fusion strategy can be calculated based on the testing normal batch.
Next, the two abnormal batches are introduced as the testing faulty batches. The corresponding monitoring results of these two faulty batches based on different sub-models are shown in
Figure 5 and
Figure 6.
Therefore, false alarm rate (FAR) and fault detection rate (FDR) can be calculated according to different ensemble strategies. The overall monitoring results of the normal batch and two faulty batches are listed in
Table 3.
It can be inferred that both results are acceptable due to the use of stochastic programming and the solution of optimal trajectory. Among these results, the results of individual sub-models are not reliable enough since the FDR varies from one sub-model to another and the FAR is much higher than the ensemble strategies. The FAR of voting-based strategy and Bayesian fusion strategy is close, while the FDR of BF is obviously higher than voting-based strategy. Besides, the monitoring results by using the average trajectory of different output modes are offered as well, which presents higher FAR and lower FDR with the same BF strategy.
According to the monitoring results of the numerical example, after the implementation of stochastic programming and the solution of the optimal quality trajectory, Bayesian fusion is a relatively better ensemble learning strategy to provide significant monitoring performance for batch processes with limited batches, missing elements, and multiple output modes.
4.2. Penicillin Fermentation Process
In this subsection, a simulation based on a fed-batch penicillin benchmark process is introduced to demonstrate the effectiveness of the proposed method. The benchmark software named PenSim v2.0 was developed by the Illinois Institute of Technology and can be found online [
36]. The flowsheet of the penicillin process is shown in
Figure 7. The process can be divided into two phases: a pre-culture phase for 40 hours for biomass growth, and a fed-batch phase for penicillin production. During the first stage, glucose is consumed and biomass grows for the preparation of penicillin production, followed by the second stage, where substrate is fed continuously and penicillin is produced until the end of the batch.
In this case study, 10 normal batches are generated by PenSim with 11 process variables, which are listed in
Table 4. Similar to the numerical example presented in
Section 4.1, the scale of historical data are small and insufficient. In each batch, 400 samples are generated and the sampling interval is set as one hour.
To estimate penicillin concentration for stochastic programming, the relationship between penicillin concentration and quality-relevant variables are calculated with different methods. The model equations used for calculation are given by:
where
is penicillin concentration,
is biomass concentration,
is substrate concentration,
is culture volume. The model parameters include
, which is the penicillin hydrolysis rate constant,
, which is the specific rate of penicillin production,
, which is the inhibition constant,
, which is the inhibition constant for product formation with the mean value,
, which is the oxygen limitation constant,
, which is the dissolved oxygen concentration, and
, which is the exponent of
.
As discussed in
Section 2.1, these parameters are not pure constants, and some of them show stochastic variations under different conditions. The parameter estimation results can vary, depending on which identification method is used. For this case study,
,
,
and
are chosen as the stochastic parameters, since they may influence penicillin concentration more than other parameters. Hence, these model parameters are set to different values according to NPM, PSO, GSA and PenSim, respectively. The values of the parameters are listed in
Table 5, which represents four output modes.
Similar to the numerical case presented earlier, all weight parameters are set as 1, and four sub-models are constructed based on stochastic programming for subsequent online monitoring. Although different output modes exist, constraints according to Equation (2) should be defined firstly, to make sure that the critical variables are varying within reasonable ranges. The corresponding constraints are listed in
Table 6. Then, 40 random missing samples are considered in each historical batch data during the offline modeling step, and the corresponding missing sample rate for each batch is 10%. Thus, the optimal quality trajectory is calculated as shown in
Figure 8.
Different from average trajectory, the optimal trajectory provides a better reference to process output with multiple output modes, limited batches, and missing elements in process data, which helps the construction of a more accurate monitoring model. Besides, the optimal solution has great significance for quality-relevant optimization and control of batch processes further.
For performance evaluation, one normal batch and three output-relevant faulty batches are generated as online data to test the performance of the proposed method. These faults are a step decrease of the substrate feed rate introduced from the 61st sample to the end of one batch as fault 1, a temperature controller failure from the beginning to the end as fault 2, and a pH controller failure from the beginning to the end as fault 3, respectively.
The monitoring results under normal conditions are shown in
Figure 9. For the proposed method, the results are satisfactory and demonstrate that the process is operating under normal conditions during the run time. Furthermore, all sub-models provide consistently correct results since the control limits are tight and accurate.
Then the process monitoring results for fault 1, fault 2, and fault 3 are shown in
Figure 10,
Figure 11 and
Figure 12, respectively. It can be seen that the proposed method is able to detect quality-relevant abnormal conditions after these faults occur.
The monitoring results of each sub-model are able to detect faults individually. However, the detailed monitoring performance varies from one sub-model to another. Similar to the first case, three kinds of decision-making strategies are implemented for performance evaluation. The corresponding FAR and FDR can be calculated according to each strategy and presented in
Table 7. The FAR of fault 2 and fault 3 is not listed in the table because these faults occur at the beginning of one batch and exist throughout the process. Besides, the monitoring results by using the average trajectory of different output modes are provided as well with the similar BF ensemble strategy.
As illustrated in
Table 7, no false alarm occurs for both methods under the framework of batch process monitoring based on stochastic programming. According to the comparison of FDR in faulty conditions, the BF strategy is proved to offer the most reliable monitoring performance among all of these decision-making strategies due to the highest overall FDR.