1. Introduction
In the industrial sector and in traditional industrial processes such as the steel industry, more and more elements of the chain are digitized. The availability of data is increasing but sometimes these data are difficult to structure and process, and it is also difficult to extract valuable information from them. In this context, efficiency in industrial processes is of great importance, especially when it is intended to reduce the ecological footprint while maintaining the production availability.
The quality of the final product is one of the most important indicators in any production process, but especially in iron and steel processes, where production conditions are very harsh and where there is hardly a theoretical framework and therefore disturbances dominate production.
Large amounts of data are available in these processes, but they are underused due to the difficulty in interpreting them, as well as the nature of the data corresponding to hostile, erratic and highly variable environments. Accordingly, data analysis tasks require a lot of expert knowledge in the domain to generate useful conclusions for business objectives. One of the biggest difficulties when working with data from an industrial process and in the steel industry in particular is uncertainty. We have to face a generalized uncertainty of different origins [
1] due to the hostile and highly variable conditions.
Currently the production of high quality steel is supported by modern measuring systems gathering an increasing amount of high resolution quality and process data along the complete flat steel production chain [
2].
The benefits of higher product quality, reduction of internal rejection and higher productivity will directly result in a reduction of the overall production costs. Moreover, it will provide access to some very stringent product applications for which the quality requirements are very high. The competitiveness of steel companies will be significantly improved in productivity at the Finishing Shop and reduced raw materials and energy consumption.
But despite digitization, obtaining valuable conclusions through smart tools is not being easy task. Machine learning techniques have also burst into steel production as mentioned in the work by Laha et al. [
3].
Although there have been several studies on the application of machine learning techniques, there are few works in the literature where these techniques are used especially in the prediction of the output performance of the steel manufacturing process.
There are some works on the use of neural networks to predict output parameters, such as the temperature of the liquid metal and the volume of oxygen blowing [
4], metallurgical length in continuous casting (CC) where the steel solidifies, shell thickness at the end of the mold and the billet surface temperature [
5], percentage of phosphorus in the final composition of the steel [
6,
7]. Mazumdar and Evans [
8] provide a complete description of modern steelmaking processes together with physical and mathematical models and solution methodologies based on artificial intelligence. In the work presented in [
9], prediction models based on production data are developed for casting surface defects. The authors show the importance of quality in the foundry industry by comparing 6 machine learning methods. Soft-sensing approaches are also proposed [
10,
11,
12] to obtain a prediction of the content of silicon in the molten iron, hot metal temperature forecasting or slag amount prediction in EAF furnace. These works employ decomposition of the time series, neural networks, multivariate adaptive regression splines and ensemble learning approaches. The optimization of process parameters as a prescriptive strategy carried out in [
13], is one of the most demanding tasks in the steel industry today.In this case using a surrogated model based on neural networks, within a multi-objective problem. Multi-output support vector regression and particle swarm optimization were used in [
14] to optimize de process parameters in steel production.
This work belongs to a project framework in which activities towards a better product quality, productivity improvement and cost reduction of the steel making process have been developed. Strategies have been built via the modelling and control issues related to secondary metallurgy (SM), CC and hot rolling (HR) with the aim of optimizing these sub-processes. These models are the basis of an optimization methodology, in which the use of data-driven models, and therefore the use of machine learning techniques is a key point. This publication only presents the work developed with the data-driven models within this methodology.
The aim of this work is to develop process models to improve the steelmaking process and reduce the number of surface and sub-surface defects at the final product for micro-alloyed steels, ensuring a good performance of the three sub-processes that have an influence in the generation of surface defects: SM, CC and HR. These models are the basis of an optimization strategy for each sub-process, based on the relationship established between the control parameters and performance indicators.
Based on the growing demand of the microalloyed steels and the critical points for their correct manufacturing, the following data based models were developed and are presented in the following sections:
SM model to predict the castability index of a heat. This index is a measure of the performance of the refining process, mainly influenced by the formation of solid micro-inclusions into liquid metal [
15], that affects to the steel cleanliness.
CC model to predict the temperature at the middle point of the upper face of the billet before the straightener during the continuous casting process.
HR model to predict the minimum and average temperatures of the billet before the continuous rolling mill.
The description of the proposed approach goes from the data processing to the generation of the models through the analysis of the most relevant parameters. An important part of this approach is the comparison of feature selection strategies that were applied, as well as the comparison of different regression paradigms. Within this strategy, it is worth highlighting the use of ensemble learning and a novel strategy based on different feature subspaces is presented. It combines four commonly employed feature selection techniques.
The article is structured as follows.
Section 2 describes the key performance indicators in steel production process, and machine learning methodology employed. In
Section 3, the data used is explained, the preprocessing results are provided, the performance metrics of the models, and the results are discussed in turn. Finally, in
Section 4, some conclusions are drawn.
4. Conclusions
The defects in the final product in a steel making process are related to inefficiencies in the different sub-processes involved. SM, CC and HR are the ones that have the greatest impact, and each one has its own indicators to determine these inefficiencies. The estimation of these indicators through data-driven models is an utility that allows establishing the causes of poor production and thus optimizing operations to move towards efficient production. For these models to be used in plant operation, two visual decision-making tools are being developed. First, a simulator of the indicators with some inputs to the process, in such a way that the operator himself can explore the parameter space of the process. Second, a search tool for optimal process parameters, where the cost function is based on the generated model itself.
In this work, various data-driven models of a steel production process have been developed. These models developed for each sub-process independently estimate a relevant indicator such as the castability index for secondary metallurgy, the temperature in the billet before straightening in the continuous casting machine and temperatures of the billet before the continuous rolling mill.
In the methodology used in this study, different methods of feature selection and different regression strategies come into play. A novel approach based on ensemble strategy with a generative approach is also presented, using several selection methods to generate different base learners in order to obtain greater diversity in prediction.
In the tests carried out to validate the methodology, experimental results are presented for the models of the three sub-processes. Firstly, it has been verified that the selection of features with the presented methods allows maintaining the reliability of the models in most cases. Secondly, the approximation with ensemble techniques by means of non-random subsampling of features, maintains in the same way the reliability, giving it more stability and reducing the variance of the prediction. Even in the case of SM we see a slight improvement.
A study has been carried out with different regression strategies, based on the proposed ensemble approximation. Analyzing the coefficient of determination, the most clear conclusion is that the Random Forest method obtains the best results, even above NNs. In a more statistical way, the non-parametric tests of Friedman and Wilcoxon reveal significant differences between boosting and bagging paradigms and NNs. With the results of the models alone (not meta-estimator), the boosting strategy produces better results, which is not the case in the meta-estimator. This fact requires further analysis to be carried out in future work.
The direct prediction of the defects in the final product is a very challenging task, which is further complicated by the difficulties of the product traceability along the process, anyway a data-driven model for direct prediction of defects is being worked on. At any case, as mentioned before, the indicators of the sub-processes can be understood as indirect measurements of that quality.
Finally, all these process models are within a process optimization methodology/ platform, using global methods such as evolutionary algorithms, being of great interest to the steel industry.