1 Introduction

They are confronted with tremendous challenges when mankind exploits the underground due to frequent and unpredictable underground disasters [1,2,3]. They are confronted with tremendous challenges when mankind exploits the underground due to frequent and unpredictable underground disasters. One of the most frequent is the roof accident caused by hard forecast underground pressure because of the underground structure, which has features like rock complexes and is difficult to environmental oversee [4]. It is a crucial technology that increases the security of industrial production through UnderGround Pressure Prediction (UGPP).

Fig. 1
figure 1

There is an example of coal mining in which human mining activity causes the change of natural states. Workers shift hydraulic support for collecting coal, and then cause the collapse of the immediate roof, which results in the change of underground pressure

There are two basic approaches currently being adopted in research into the UGPP. One is conventional the expert system approach based on practical engineering experience and physical modeling [5,6,7,8]. Experts with restricted experience and strict needs for accurate information about underground monitoring. However, are unable to develop these technologies, which can guide them to exploit underground resources [9]. The other is machine learning methods [4, 10, 11] have gradually become key techniques for the UGPP. Regrettably, time-series UGPP technologies, such as regression analysis, SVM and BPNN, exist serious defects which are overfitting, lower prediction accuracy and training efficiency. This is due to their weakly capturing the composite characteristics of time series, which lack analysis of pressure time series data.

It was well known the deep learning methods, especially LSTM, can commendably extract time-order character, and from now was recently applied to time series forecasting fields, such as FireCast [12], Rain Alarm Pro [13], Oilfield Production Forecast [14], COVID-19 [15] etc. Much of the research up to now has been inadequate considering the latent causal interrelation between human activity and natural disasters. Without doubt, human activity frequently gives rise to the diversification of natural states to generate disasters [16, 17]. It’s just how it is that underground pressure prediction characteristics have the characteristics that contain the recessive causal logic. To be specific, mining activities have changed the spatial structure of the underground rocks, which then causes an underground pressure change, as sketched in Fig. 1. Currently, graph neural networks are successfully used in handling all kinds of graph data, which has permutation-invariance, local connectivity, etc [18,19,20]. Therefore, we ponder reexamining the UGPP from the perspective of the graph. We view that the underground pressure data can be represented as nodes in a graph, which has the obvious causal logic link.

In this paper, we propose RC-GNN, a novel Reinforced and Causal Graph Neural Network, for the UGPP task, which considers the cause between mining with underground pressure change. First, we model a hierarchical causal graph based on practical engineering experience and physical common sense to reflect the causal logic of industrial production. These experiences include the advancement of the fully-mechanized face, underground pressure testing, and deformation of rock under stress. Besides, we provide an inference technology based on prior knowledge to conduct graph structure estimation to receive an early causal graph. Then, we model that a prediction network consists of graph convolutional network modules and long short-term memory modules to achieve the UGPP task. Aiming at the problem that lack of prior knowledge creates inaccurate causal graph modeling. We propose a reinforcement learning algorithm based on the performance index of prediction. The main idea is that the structure of the underground pressure causal graph is iteratively optimized through a reinforcement learning algorithm. According to the desired prediction performance signals of RMSE and MAE, the reward space is designed. The environmental characteristics of mine are used to design the action and state space in the reinforcement learning process.

This study evaluates the proposed framework on real datasets, which include advancement information of the fully-mechanized face, underground pressure, and geologic features. For the purpose of verifying the prediction accuracy of our proposal, we compare it with six representative methods and then use both RMSE and MAE as measuring standards. The present research makes several noteworthy contributions, summarized as follows:

  1. 1)

    To the best of our knowledge, this is the first study on underground pressure prediction using reinforced and causal graph neural networks.

  2. 2)

    Modeling the mining effect on underground pressure as a causal graph was proposed for the first time.

  3. 3)

    The reinforcement learning method is proposed to achieve iterative optimization of graph structure.

  4. 4)

    The present RC-GNN framework has better prediction accuracy on real underground pressure datasets compared with state-of-the-art deep learning approaches.

2 Related work

In this section, we first retrospect on research about multivariate time series forecasting and then briefly refer to methods for graph structure learning. In the end, we give the problem definition that this work requires.


Multivariate time series forecasting


Multivariate time series forecasting has been a crucial problem for multiple domains (e.g. environmental monitoring, industrial security). With the increasing data size and enhanced computing power, machine learning will become a key constituent part of how these technologies develop in the future [21, 22]. In contrast to classical methods such as autoregressive mode [23], machine learning methods overcome their disadvantages by making it difficult to fit complex nonlinear characteristics while maintaining robustness and also have the advantages of greater generalization performance, etc. Deep learning approaches are widely used in the multivariate time series forecasting field to capture the time-order character of nonlinear high-dimensional time series and improve forecasting performance. For instance, [12] designed the FireCast, which include Convolutional Neural Networks and Geographic Information Systems, to predict the high risk regions of wildfire on the basis of historical data. To effectively forecast the well performance, [14] proposed an LSTM framework which considers the coupling relationship among multiple influencing factors on the strength of statistical property. However, this research just treats the statistical property as a threshold value to eliminate variables of weaker correlation. Theirs lacks research, which has the feature of latent hierarchical structure between different data sets. Graph-based methods have the ability to represent latent structure correspondence for heterogeneous information via the nodes and edges in a graph structure [24]. The above research both mentions to discover latent interrelationship between heterogeneous information that are conducive to improve performance of the pressure prediction. Hence, in order to reveal the causal association between the mining activity and the underground pressure fluctuation, we design a graph-based methods of multivariate time series forecasting method.


Graph structure learning


Graph Neural Networks (GNNs), as a powerful tool, have been a tremendous success in many fields. In general, GNNs assume that the graph as input has a distinct structure and accurate relations between nodes. In most cases, the graphs have unavoidably noisy or ambiguous structures in the real world. To acquire the optimal graph structure, a number of research papers with respect to Graph Structure Learning (GSL) have emerged in recent years [25], and could be roughly divided into metric learning, probabilistic modeling, and direct optimization approaches. Direct optimization approaches have been developed rapidly. Some researchers designed a GSL method that was especially used for drawing out unidirectional relationships [18]. The other researchers proposed a flexible GSL framework that regarded the prior information as a set of candidate relations [24, 26]. [27] proposed graph embedding method based on DL to automatically learn the features of the graph. Nevertheless, this research hardly considers the correlation between GSL and specific tasks. Therefore, we present a new GSL method based on RL that designs the reward function on the basis of the prior information and the specific tasks.

3 Model

3.1 Problem definition

In this paper, it is the main research target that the underground pressure series was forecasted. Given the physical truth of underground pressure series, it dissatisfies the fundamental assumption that the sampling period is a fixed value in the time series forecasted. Therefore, we define \({\mathrm{{\gamma }}_s} \in {\mathrm{{R}}^N}\) denotes the value of an N-dimensional variable at mining step s, where \({\gamma _s}\left[ i \right] \in \mathrm{{R}}\) denotes the value of the i variable at mining step s. Due to exceptional data storage equipment and human factors, data missing and outliers are possible. We would deal with these kinds of issues by treating outliers as data missing and then utilizing the Lagrange’s interpolation method to make data. For problem description simplicity, we give some key definitions as follows:

Definition 1

Underground pressure graph \({G^s}\). We use a digraph \({G^s} = \left( {{V^s},{E^s}} \right)\) to represent the subsistent causal relationship in mining process, where \({V^s} = \left\{ {v_1^s,v_2^s, \dots ,v_n^s} \right\}\) is the set of nodes, and \({E^s} = \left\{ {e_1^s,e_2^s, \dots ,e_m^s} \right\}\) is the set of edges. We use n and m to denote the number of nodes and edges, respectively, in the causal graph. Let \({e_k} = \left( {{v_i},{v_j}} \right) \in E\) to denote a unidirection edge pointing from \({v_i}\) to \(v_j\). We describe the mathematical characterization of the graph \({G^s}\) as the adjacency matrix, denoted as \(\mathbf{{A}} \in {R^{n \times n}}\).

Definition 2

Multivariate mining series \(\mathbf{{\gamma }}\). Given the mining series often have evident the hierarchical structure, we further denote the hierarchical structure as \(\mathbf{{\gamma }} = \left( {{\mathbf{{\gamma }}^\alpha },{\mathbf{{\gamma }}^\beta },{\mathbf{{\gamma }}^\chi }} \right)\). Let \(\mathbf{{\gamma }}^\alpha\) denotes the value of the independent variable in the causal relationship (e.g. continuous mining activities). Then, we use the \(\mathbf{{\gamma }}^\beta\) to represent the value of the hidden variable that need to be supposed by prior knowledge. We finally describe the value of each unit of underground pressures as \(\mathbf{{\gamma }}^\chi\).

Definition 3

Prediction task. An underground pressure forecasting task is to predict the underground pressure \(\gamma _{s + n}^\chi\) at mining step \(s+n\) through the lookback window \(\left\{ {\gamma _1^\chi ,\gamma _2^\chi , \cdots ,\gamma _n^\chi } \right\}\). Hence, we treat the task as a long-term prediction if the mining step is greater than 1, and the other task is a short-term prediction.

Fig. 2
figure 2

Knowledge-based and data-driven underground pressure forcasting framework diagram

3.2 Framework overview

In Fig. 2, it shows that the overview of our RC-GNN framework for underground pressure forecasting. Our framework roughly can be outlined into three part:

Underground pressure causal graph modeling: We propose the heterogeneous and hierarchical causality graph to reflect the relationship between the mining activities of humans with the variation of underground pressure. We build three nodes that have different types of hierarchical characteristics in the graph. They are: mining activities, underground environment, and underground pressure unit. The edge weights show the probability that the cause nodes will affect the effect nodes. In the end, we use prior knowledge embedding approaches to learn the original graph structure.

Prediction network of underground pressure: We utilize the graph convolution modules to capture the sequence of spatial features through the original causality graph. The LSTM modules then use the sequence of spatial features as the input and extract the mining step feature by adjusting the positional encoding and the skip connection.

Causal graph structure optimization: It is difficult for the original causal graph to accurately describe the complex causal relations within mining, the natural environment, and underground pressure. Hence, the GCN modules may not extract realistic spatial features to reduce the accuracy of underground pressure prediction. For this reason, we use a reinforcement learning algorithm based on the performance index of prediction to optimize the causal graph structure. The algorithm combines underground pressure and physical significance in designing action and state spaces.

In conclusion, for the underground pressure prediction task, we build a causal graph with prior knowledge embedding approaches, and the use a series forecasting network consist of GCN and LSTM modules and optimize the causal graph structure though a reinforcement learning strategy. Design the RC-GNN framework to real-world demand in the mining industry, and continuously optimize the graph structure based on prediction results and expert knowledge. The Algorithm 1 outlines the training process of underground pressure causal graph learning. There is a detailed implementation method for these modules as follows.

3.3 Underground pressure causal graph modeling

In this section, we will introduce how to model a graph to represent relationships between mining activity and underground pressure in the mining industry. The crack of the roof and the deformation of surrounding rock, which is caused by the mining activity, induce the change of underground pressure according to prior knowledge. We will use a causal graph whose graph nodes are divided into three types, as presented earlier. The two types of nodes, mining activities and underground pressure units, have a clear causal relationship, and their data can be measured by sensors. We use the velocity value of a fully-mechanized face \({\mathbf{{\gamma }}^\alpha } = \left[ {\gamma _1^\alpha ,\gamma _2^\alpha , \cdots ,\gamma _s^\alpha } \right]\) as the feature of the mining activity node \({\mathbf{{X}}^\alpha }\), and the value of underground pressure \({\mathbf{{\gamma }}^\chi } = \left[ {\gamma _1^\chi ,\gamma _2^\chi , \cdots ,\gamma _s^\chi } \right]\) as the feature of the underground pressure unit nodes \({\mathbf{{X}}^\chi }\). The underground environment, as a link between the mining activities and the underground pressure units, it is feature \({\mathbf{{X}}^\beta }\) rely on the embedding of prior knowledge \({\mathbf{{\gamma }}^\beta } = \left[ {\gamma _1^\beta ,\gamma _2^\beta , \cdots ,\gamma _s^\beta } \right]\). The mining height and the burial depth can be predefined according to actual engineering requirements. The relation between the velocity of a fully-mechanized face and the sag of overlying strata by equation (1) \(\sim\) (3).

Symbol

Definition

\(\gamma _i^\alpha\)

The velocity value of a fully-mechanized face in the i-th step

\(\gamma _i^\chi\)

The underground pressure value in the i-th step

\(\gamma _i^ {\beta _1}\)

The mining height value in the i-th step

\(\gamma _i^ {\beta _2}\)

The burial depth value in the i-th step

\(\gamma _i^{\beta _3}\)

The rupture of overburden strata in the i-th step

\(\gamma _i^{\beta _4}\)

The degree of the surrounding rock deforma-tion and the fracturation in the i-th step

\(\lambda _1\)

The degree of sag of the main roof

\(\lambda _2\)

The degree of unconsolidated formation

\({\hat{\lambda }}\)

Other characteristics of the rock stratum

$$\begin{aligned}&{\gamma ^{{\beta _3}}} = \lambda = {\lambda _1} + {\lambda _2} + {\hat{\lambda }}, \end{aligned}$$
(1)
$$\begin{aligned}&{\lambda _1} = - 125.26{e^{\frac{{\gamma _i^\alpha }}{{24.8}}}} + 1022.8, \end{aligned}$$
(2)
$$\begin{aligned}&{\lambda _2} = - 55.61{e^{\frac{{\gamma _i^\alpha }}{{20.8}}}} + 673.97, \end{aligned}$$
(3)

where \(\lambda\) is the rupture of overburden strata, \({\lambda _1},{\lambda _2},{\hat{\lambda }}\) represents the degree of sag of the main roof, the degree of unconsolidated formation and other characteristics of the rock stratum. Given the surrounding rock deformation and the fracturation was difficult to measure accurately, we artificially divided into ten levels, represented by numbers from one to ten \({\gamma ^{{\beta _4}}} \in \left\{ {1, \cdots ,10} \right\}\). We assume that the population of underground pressure units follows a normal distribution \(\Upsilon \left( {\mu ,{\sigma ^2}} \right)\), and that the s-step sampling of these is regarded as a general sample, with the sample’s expectations and variance as follows:

$$\begin{aligned}&\bar{\mathrm{X}} = \frac{1}{{sm}}\sum \limits _i^{sm} {\gamma _i^\chi }, \end{aligned}$$
(4)
$$\begin{aligned}&{S^2} = \frac{1}{{sm - 1}}\sum \limits _i^{sm} {\left( {\gamma _i^\chi - \bar{\mathrm{X}}} \right) }, \end{aligned}$$
(5)

where m represents the number of the underground pressure units. The increase in the underground pressure means that the underground rock structure has changed based on previous research. Therefore, we calculate the mean value of the underground pressure \({\bar{\gamma }} _i^\chi\) at each step, and then use it as a basis for that generate the feature of deformation of surrounding rock \({\mathbf{{X}}^{{\beta _4}}}\), illustrated as follows:

$$\begin{aligned}&\mathbf{{X}}_i^{^{{\beta _{4_i}}}} = \left\lceil {\Theta \left( {\bar{\gamma }_i^\chi } \right) } \right\rceil , \end{aligned}$$
(6)
$$\begin{aligned}&\Theta \left( {{\bar{\gamma }} _i^\chi } \right) = 10\left( {\int _{ - \infty }^{{\bar{\gamma }} _i^\chi } {\frac{1}{{\sqrt{2\pi } S}}{e^{\frac{{ - {{\left( {x - \bar{\mathrm{X}}} \right) }^2}}}{{\left( {2{S^2}} \right) }}}}dx} } \right) . \end{aligned}$$
(7)
figure a

In this work, there are no edges between nodes of the same type, and nodes of the different types have uni-directional relationships (mining activities-underground pressure units-underground environment). Hence, to learn the original graph structure, we propose the edge generation rule as follows:

$$\begin{aligned}&\begin{array}{*{20}{c}} {{e_{ij}}\mathrm{{ }} = \mathrm{{ }}0.25\mathrm{{ }}}&{}\begin{array}{l}, i \in \left\{ 1 \right\} \mathrm{{; }}\\ j \in \left\{ {2,3,4,5} \right\} \end{array} \end{array} \end{aligned}$$
(8)
$$\begin{aligned}&\begin{array}{*{20}{c}} {{e_{ij}}\mathrm{{ }} = \mathrm{{ }}\frac{{{P_{ij}}}}{{\sum \nolimits _{k = 2}^5 {{P_{kj}}} }}}&{}\begin{array}{l}, i \in \left\{ {2,3,4,5} \right\} \mathrm{{; }}\\ j \in \left\{ {6, \dots ,6 + m} \right\} \end{array} \end{array} \end{aligned}$$
(9)

where \(P_{ij}\) represents the maximum underground pressure of the j-th underground pressure unit under the impact of i-th factor, and it is obtained in line with prior knowledge. The empirical equation as follows:

$$\begin{aligned}&{P_{1j}}\mathrm{{ }} = \mathrm{{ }}{g_1}{\gamma ^{{\beta _1}}} + {g_2}, \end{aligned}$$
(10)
$$\begin{aligned}&{P_{2j}}\mathrm{{ }} = \mathrm{{ }}{g_3}\ln \left( {{\gamma ^{{\beta _2}}} + {g_4}} \right) + {g_5}, \end{aligned}$$
(11)
$$\begin{aligned}&{P_{3j}}\mathrm{{ }} = {g_6}{e^{t_1^{0.5}}} \approx {g_6}\nonumber \\&\quad \left[ {1 + t_1^{0.5} + \frac{{{t_1}}}{2} + \frac{{t_1^{1.5}}}{6} + o\left( {t_1^2} \right) } \right] , \end{aligned}$$
(12)
$$\begin{aligned}&{P_{4j}}\mathrm{{ }} = \left( {1 - {g_6}} \right) {e^{t_2^{0.5}}}\nonumber \\&\quad \approx \left( {1 - {g_6}} \right) \left[ {1 + t_2^{0.5} + \frac{{{t_2}}}{2} + \frac{{t_2^{1.5}}}{6} + o\left( {t_2^2} \right) } \right] ,{t_1}\mathrm{{ }} = \mathrm{{ }}\frac{1}{s}\sum \limits _{i = 1}^s{\frac{{\gamma _i^\chi - \gamma _{\min }^\chi }}{{\gamma _i^{{\beta _3}} - \gamma _{\min }^{{\beta _3}}}}}, \nonumber \\&{t_2}\mathrm{{ }} = \mathrm{{ }}\frac{1}{s}\sum \limits _{i = 1}^s {\frac{{\gamma _i^\chi - \gamma _{\min }^\chi }}{{\gamma _i^{{\beta _4}}}}}, \end{aligned}$$
(13)

where \({g_i},\mathrm{{ }}i\mathrm{{ }} = \mathrm{{ }}1, \cdots ,6\) represent the parameters of the underground pressure inducement. To sum up, we proposed a graph structure learning method that has embedded prior knowledge, in which the edge weights of the causal graph are capable of automatically changing as the underground environment changes.

Fig. 3
figure 3

Causality graph modeling

3.4 Prediction network of underground pressure

In this section, we will introduce the framework of underground pressure prediction, and the framework shown in Fig. 3. We extract spatio-temporal features from the causal graph that were previously acquired to fulfil the multi-step underground pressure prediction task. We utilize the Graph Convolution Module to extract the spatial feature sequence of the underground pressure causal graph. Then, we regarded the spatial feature sequence as the input of the LSTM, which extracts the spatial feature sequence. Eventually, we obtain the predicted multi-step underground pressure sequence. Analogous concepts and techniques have been widely used in various domains that have a demand for sequence prediction tasks  [28, 29]. We will introduce the technical means in this paper as follows.

The Graph Convolutional Neural Networks have a wide range of applications and are suitable for nodes and graphs of any topology. To better learn the mixed relationships of neighborhood nodes, we employed the MixHop Graph Convolution Layer, which was inspired by [30].

$$\begin{aligned}&{\mathbf{{H}}^s} = \sum \limits _{i = 0} {\sigma _i}\nonumber \\&\quad \left( {\mathbf{{D}}^{ - \frac{1}{2}}}\left( {{\mathbf{{A}}^s}} \right) ^i\right. \nonumber \\&\quad \left. {\mathbf{{D}}^{ - \frac{1}{2}}}{\mathbf{{X}}^s}{\mathbf{{W}}_i} \right) , \end{aligned}$$
(14)

where \({\mathbf{{A}}^s},{\mathbf{{X}}^s}\) represent the adjacent matrix of the causal graph, and the feature matrix of the nodes, \(\mathbf{{D}}\) is the degree matrix of \({\mathbf{{A}}^s}\), \({\mathbf{{W}}_i}\) is the parameter matrix, the \({\sigma _i}\) represent the activation function, concretely, ReLU is chosen in this paper. Our motivation is to enhance the ability of the Graph Convolution Layer to learn mixing relationships between heterogeneous nodes by the above method. When we have obtained the spatial feature sequence, we still need to extract the time feature that is used to achieve the multi-step prediction of the underground pressure. To be specific, to infer the change of underground pressure in the future by learning the transformation law of the spatial feature over a period of time. LSTM is a classic deep feedforward neural network which has been widely used to process time series data in various domains. The performance of LSTM is sure to not compare favourably with the state-of-the-art techniques in some tasks. The framework still chooses LSTM because it has stable performance in various time series forecasting tasks. In this paper, the mathematical expression of the LSTM is given as Eq. (1).

$$\begin{aligned} \begin{array}{l} \left( \begin{array}{l} {\mathbf{{i}}_t}\\ {\mathbf{{f}}_t}\\ {\mathbf{{o}}_t}\\ {{\mathbf{{{\tilde{c}}}}}_t} \end{array} \right) \mathrm{{ = }}\left( \begin{array}{c} Leaky{\mathop {\mathrm{Re}}\nolimits } LU\\ Leaky{\mathop {\mathrm{Re}}\nolimits } LU\\ Leaky{\mathop {\mathrm{Re}}\nolimits } LU\\ \tanh \end{array} \right) \\ \left( \mathbf {W}\left[ \begin{array}{l} {\mathbf{{x}}_t}\\ {\mathbf{{h}}_{t - 1}} \end{array} \right] + \mathbf{{b}} \right) ,\\ {\mathbf{{c}}_t}\mathrm{{ = }}{\mathbf{{f}}_t} \odot {\mathbf{{c}}_{t - 1}} + {\mathbf{{i}}_t} \odot {{\mathbf{{{\tilde{c}}}}}_t},\\ \\ {\mathbf{{h}}_t} = {\mathbf{{o}}_t} \odot \tanh \left( {{\mathbf{{c}}_t}} \right) , \end{array} \end{aligned}$$
(15)

where \({\mathbf{{i}}_t}, {\mathbf{{f}}_t}, {\mathbf{{o}}_t}, {\mathbf{{c}}_t}\) represent the input gate, forget gate, output gate and memory cell, respectively, \({\mathbf{{x}}_t}, {\mathbf{{h}}_t}\) represent the state variable of system and the hidden. The mean squared error function is used as a loss function in the training.

Remark 1

It should be emphasized again that sensors acquire the sequence data of the underground pressure that does not satisfy the assumption of equal time-interval sampling in many practical industrial scenarios. Hence, we transformed the time sequence into the step sequence when modeled in this paper. In essence, the prediction sequence was obtained that reflected the change in the underground pressure during the next s-th mining action.

3.5 Causality graph structure optimization

In this section, we will explain the motivation why optimizing the underground pressure causal graph. Then we introduce the specific method that is used to achieve graph structure optimization. At the end of this section, restricted condition of the algorithm is discussed. In a general sense, mining industry professionals depend on their own experience combined with real-world data to establish equations. The equations reflect strata behaviors that relationships between maximum underground pressure and influence factor. In order to ensure safety of mining industry, the calculated maximum underground pressure of workface generally greatly exceeds the real underground pressure of the workface during production. Hence, the adjacent matrix of the causal graph build on basic equations of strata behaviors has a certain degree of inaccuracy, which may reduce the accuracy of underground pressure prediction. Of course, the method of manual adjustment parameter also be used to continuously adjust the adjacent matrix to obtain a better structure of causal graph. However, the method of manual adjustment parameters is not suitable for regulation parameters of the equations of strata behaviors, which have multi-parameter value ranges, and it also does not have transferability (Fig. 4). Reinforcement learning technology can overcome this problem well [31].

Fig. 4
figure 4

Decision making mechanism based on reinforcement learning

We propose a graph structure optimization technique based on reinforcement learning. To acquire the better causal graph structure, a machine intelligence expert is designed to regulate the edge weights according to the prescribed strategy. The components of the graph structure optimization algorithm based on RL are as follows: a machine intelligence expert as the agent, causal graph structure as the environment. We designed the state observation space \({O_t} = \left\{ {o_t^1,o_t^2,o_d^1,o_d^2,{\tilde{o}}_d^1,\tilde{o}_d^2} \right\}\) to create the reward function R, where \(o_d^1,o_d^2\) represent the RMSE and MSE of the underground pressure prediction error based on the original causal graph, \(o_t^1,o_t^2\) represent RMSE and MSE of the underground pressure prediction error obtained through the t-th generated causal graph, \(\tilde{o}_d^1,{\tilde{o}}_d^2\) RMSE and MSE of the desired value.

State space: The state \({S_t} = \left\{ {{e_{ij}}} \right\}\) represent value of the adjacency matrix in the t-th optimization causal graph. The state \({S_t}\) is affected by the parameter vector \({g_t} = \left\{ {g_j^t} \right\}\). If the \(type = 0\) is true, one of the parameters in vector \(g_t\) is change to generate vector \({g_{t + 1}} = \left\{ {g_j^{t + 1}} \right\}\), and calculate the new state \({S_{t + 1}}\) from the vector \({g_{t + 1}}\); If the action’s \(type=1\) is executed, one of the parameters in vector \(g_{t-1}\) is change to generate vector \({g_{t + 1}} = \left\{ {g_j^{t + 1}} \right\}\), and calculate the new state \(S_{t+1}\) from the vector \(g_{t+1}\). There is difficult to obtain an optimal combination of parameters that can reflect relationship between real-world underground pressure and influence factors. Hence, we assume the final state that satisfy the requirements of RMSE and MSE.

Action space: To reduce the number of iterations of graph structure optimization, we designed the action \({a_t} = \left\{ {{\delta _1},{\delta _2},type} \right\}\), where \(\Delta {g_j} = \left( {{\delta _1},{\delta _2}} \right)\), \({\delta _1} = \left\{ {\beta , - \beta } \right\}\), \({\delta _2} = \left\{ {1, \cdots ,6} \right\}\) is an equally likely event in the first optimization, \(\beta\) is hyper-parameter. As shown in the equation, the value of type is related to the state observed value. The meaning of this strategy is to only retain the actions that have been beneficial to the predicted accuracy.

$$\begin{aligned} type = \left\{ {\begin{array}{*{20}{c}} {0,}&{}{otherwise}\\ {1,}&{}{R_{accuracy}^t \le 0} \end{array}}. \right. \end{aligned}$$
(16)

Reward: We design the reward function based on the state observation space.

$$\begin{aligned}&R = \sum \limits _{t = 1}^{Iter} {R_{accuracy}^t}, \end{aligned}$$
(17)
$$\begin{aligned}&R_{accuracy}^t = {\mathop {\mathrm{sgn}}} \left( {o_t^1 - o_d^1} \right) \nonumber \\ + {\mathop {\mathrm{sgn}}} \left( {o_t^2 - o_d^2} \right) , \end{aligned}$$
(18)

where Iter represent optimization degree. Then, we have taken inspiration by the work of parameter tuning to design a method that is used to guarantee the reward function which can guide agent achieve the graph structure optimization task. The motivation of this method is the agent will be caught in the endless parameter tuning, if the desired RMSE and MSE would be design unreasonable. Let \({\hat{o}}_d^i: = \frac{{g\left( {o_d^i} \right) + \tilde{o}_d^i}}{2},\mathrm{{ }}i = 1,2\), where \(g\left( {o_d^i} \right) = \left( {1 - \gamma \varepsilon } \right) o_d^i \ge {\tilde{o}}_d^i\), \(\varepsilon \in \left( {0,0.1} \right]\), \({\hat{o}}_d^i,\mathrm{{ }}i = 1,2\) is the boundary parameter, \(\gamma\) represent cycle index. When \(R>0\) and \(o_t^i < {\hat{o}}_d^i,\mathrm{{ }}i = 1,2\) are established at the same time, the parameter vector \(\mathbf{{g}}\) is called a set of efficient graph structure optimization parameters. While increasing the number of cycle indices by one, modifying the boundary parameters, and then a new cycle of graph structure optimization is performed. The purpose of setting \(R > 0\) a condition is to enable the agent to learn the law of parameter optimization as shown in the following equation.

$$\begin{aligned}&\Delta g_k^m = \frac{{\sum \nolimits _{i = 1}^n {\exp \left[ {{a_i}\left( {{\delta _1} = m,{\delta _2} = k,type = 0} \right) } \right] } }}{{\sum \limits _{l = 1}^{Iter} {\exp \left( {{a_l}} \right) } }}. \end{aligned}$$
(19)

We still design truncation condition R \(> {\Omega ^\gamma }\), where \(\Omega\) is the positive integer. When the optimization process is truncated, the last set of effective graph structure optimization parameters are called the final parameters, and the boundary parameters are the suboptimal target that the task can satisfy. The purpose of the above design is to bring the parameter tuning behavior of the agent more in line with expert experience to avoid waste of efficiency.

4 Experiment

In this section, we introduce datasets, experimental design, and how to use primordial underground pressure data to train forecast models that are knowledge-based and data-driven. In order to make full use of the advantages of expert knowledge, it is necessary to know prior knowledge such as the geological characteristics of the mining area, the roof characteristics of the workface. This knowledge can reduce the time required for graph structure optimization to improve computational efficiency. Of course, the framework still effectively works without the above information.

Table 1 Environmental mining conditions
Table 2 The characteristics of coal seam roof

4.1 Datasets

The data used in the experiment is from the real-working condition of the 22104 fully-mechanized coal mining workface at Shangwan coal mine in Shendong mine lot (Fig. 5). The topographic features include loess remnant tableland, loess hills, and some ravines that have bedrock outcrops [32]. The intensive research has been related to the geological mining conditions of this mine lot and can be referred to in the literature [33]. We give the situation of a fully mechanized mining working face (Table 1) and the characteristics of a coal seam roof (Table 2). The data set contains all the 166 hydraulic supports of the workface which are subjected to the mine pressure during 299 feeding operations in one month. This data set reflects the relationship between rock pressure variation and working face advancement in real coal mining projects. In actual production, the working face will advance according to the set mining step. Due to the influence of factors such as geological conditions and production requirements, the time of each advancing process is not fixed. This means that the real mining pressure time series data does not meet the assumption of equal intervals. This is also an important reason for the poor performance of various machine learning algorithms in the actual mining pressure multi-step prediction task. The dataset is divided into a training set, a validation set, and a test set according to the ratio of 6:2:2.

Fig. 5
figure 5

The mining location

Fig. 6
figure 6

RMSE of 100 times underground pressure predictions

Fig. 7
figure 7

MAE of 100 times underground pressure predictions

4.2 Evaluation index and experimental settings

In order to evaluate the effect of the model proposed in this paper, two indicators, RMSE and MAE, are introduced to measure the prediction results. The smaller the values of these two indicators, the more accurate the prediction. We compare the proposed mine pressure prediction model based on reinforcement learning and knowledge embedding with five classes of baselines, namely BP, RNN, LSTM, GRU, and stacked LSTM. Among them, BP is a machine learning algorithm that is often used for mineral pressure prediction tasks, and previous studies have proved the effectiveness of this algorithm. The rest of the algorithms have been research hotspots in recent years and are often used for various mineral pressure prediction tasks. Our model is built on Pytorch with the built-in Adam optimization algorithm. The equipment used in the experiment was sponsored by the AT-BDSC laboratory. The experiment is divided into the following parts:

a):

To verify the effectiveness of the algorithm proposed in this paper, we selected a group of continuous hydraulic supports (\(\#\)155 - \(\#\)164) as representatives and performed the ten step prediction tasks one hundred times. Then we calculate the RMSE and MAE of each prediction task result and compared them with the baseline.

b):

To study the influence of the prediction step size on the prediction results in the multi-step rock pressure prediction task. We take the ten continuous hydraulic supports at the end of the working face as a group, perform five, ten and fifteen step prediction tasks respectively. We calculate the RMSE and MAE of the prediction results, and compare them with the baseline.

c):

To study the influence of the scale of the same group of supports on the prediction results in the multi-step rock pressure prediction task. We placed one, five, and ten hydraulic supports at the end of the working face as a group, and performed ten-step mining. Press the prediction task against the typical and compare it with the baseline.

d):

In order to verify that the prediction framework proposed in this paper can still complete the prediction task under the condition that the prior knowledge is missing, inaccurate or completely unknown. The verification method is to artificially delete some prior knowledge and randomly change prior knowledge and settings. Then we take 1ten consecutive hydraulic supports at the end of the working face as a group and perform a ten-step prediction task. Calculate the RMSE and MAE of the prediction results, and save the comparison GPU runtime.

4.3 Main results

In this paragraph, we will show the experimental results in detail and analyze them. First, we evaluate the prediction performance of different rock pressure prediction frameworks on typical tasks. Based on the research on the genesis of the mine pressure, we have determined the parameters of mine pressure inducement \({g_1} = 1.5\), \({g_2} = 30.5\), \({g_3} = -3.5\), \({g_4} = 80\), \({g_5} = 0.5\). In order to avoid the possible chance of the experimental results, we conducted one hundred prediction experiments under the premise of fixed parameters. After eliminating obvious outliers, the best RMSE and MAE in the overall prediction results are shown in Table 2. Figures 6 and  7 reflect the frequency distribution curve of the evaluation indicators for one hundred rock pressure prediction tasks. The idea of adopting this evaluation strategy comes from [34], which expounds the significance of using this evaluation method for the prediction task. The research results show that the classical machine learning method is difficult to use for the multi-step multivariate rock pressure prediction task with sampling at unequal time intervals. The prediction framework proposed in this paper outperforms classical machine learning methods, but it is still difficult to predict large changes in mine pressure. The RMSE and MAE of the multi-step prediction results of a single hydraulic support in RC-GNN are presented in Table 3, and the results far exceed those presented in some current studies. It is worth noting that existing studies usually employ data sampled at equal time intervals for univariate multi-step forecasting tasks. In contrast, the prediction framework proposed in this paper considers the inducing factors of rock pressure, so it achieves better results in multivariate and multi-step rock pressure prediction tasks with sampling at different time intervals.

Table 3 RMSE and MAE of one hundred times underground pressure predictions
Fig. 8
figure 8

Experimental results of different predicted step sizes

Fig. 9
figure 9

Experimental results of different numbers of hydraulic supports

Table 4 RMSE and MAE of single-scaffold prediction results under ten-step ten-scaffold condition
Table 5 Experimental results of different predicted step sizes

In this paragraph, we discuss the influence of the prediction step size on the prediction results in the multivariate multi-step prediction task. We intuitively display the RMSE and MAE of RC-GNN, GRU, and RNN prediction results through histograms. According to the results reflected in Fig. 8, it can be concluded that the longer the step size needs to be predicted, the worse the result of the rock pressure prediction. This is because the traditional machine learning model only considers the temporal dependencies of the surface layer, and does not try to mine the causal relationship between the data. The results shown in Table 3 show that the framework proposed in this paper alleviates this problem to a certain extent. In the five-step prediction task, compared with the GRU model, the RMSE is reduced by 48.1%, and the MAE is reduced by 46.4%; When predicting the task, compared to the GRU model, the RMSE is reduced by 44.4%, and the MAE is reduced by 60.3%. When the prediction framework proposed in this paper performs a fifte-step prediction task, the prediction results are 32.4% lower in RMSE and 18.1% lower in MAE than when a five step prediction task is performed. This result shows that the causal graph model of the mine pressure established in this paper can better represent the causal relationship between the mine pressure and its mine pressure inducement. This causal relationship can better reflect the internal relationship of the development of things, which helps to improve the accuracy of multi-step rock pressure prediction (Tables 4, 5).

In this paragraph, we discuss the influence of the number of variables on the prediction results in a multivariate multistep forecasting task. We still intuitively display the RMSE and MAE of the RC-GNN, GRU, and RNN prediction results through histograms. According to the results reflected in Fig. 9, it can be concluded that the more variables (mineral pressure units) that need to be predicted, the worse the results of rock pressure prediction. This is also caused by the lack of in-depth exploration of the causal relationship between variables (mineral pressure units) in traditional machine learning algorithms. The results shown in Table 6 show that when performing a single scaffold multi-step prediction task, the prediction results of the prediction framework proposed in this paper reduce RMSE by 17.9% and MAE by 31% compared to the LSTM method; When predicting the task, compared to the LSTM method, the RMSE is reduced by 27.1%, and the MAE is reduced by 34.8%. The results show that the prediction framework based on the causal relationship diagram of rock pressure established in this paper can better capture the causal relationship between variables, and can improve the prediction accuracy of multi-variable rock pressure prediction tasks.

Table 6 Experimental results of different numbers of hydraulic supports
Fig. 10
figure 10

Diagram of multi-step underground pressure prediction error

In this paragraph, we show the RMSE and MAE of the results of a multivariate, multi-step rock pressure prediction task under conditions where prior knowledge is missing, inaccurate, or completely unknown. The purpose of designing this part of the experiment is to verify that the reinforcement learning module in the prediction framework proposed in this paper can effectively replace the parameter tuning behavior of human engineers. In the real world, limited by the completeness of environmental information collection, it is difficult to obtain a very accurate set of mining pressure inducement parameters. Therefore, in order to obtain a better graph network structure, engineers do not wait for these parameters without iteration, and observe the prediction results for feedback adjustment. The results in Table 7 show that the reinforcement learning module in the framework proposed in this paper can assist engineers to complete this operation. The prediction framework can automatically adjust the mine pressure inducement parameters only by setting the desired RMSE and MAE. Of course, this will take some time. Figure 10 presents the error curves of the prediction results of rock pressure for multiple supports under the condition of lack of prior knowledge. The results show that even if the edge weights cannot be obtained accurately, the prediction framework proposed in this paper still has the ability to perform the task of rock pressure prediction.

Table 7 The difference between the predicted results of known prior parameters and unknown prior parameters

5 Conclusion

In this paper, we propose a framework that can contribute to the multi-step multivariable underground pressure prediction task. We model a causal graph to reflect the connection between inducement and representation of underground pressure. In order to overcome influence due to the inaccurate and missing prior knowledge when industrial manufacturing process, such as inaccurate prior Knowledge and missing environment parameters, give rise to the erroneous edge weights of the initial causal graph. We propose a reinforcement learning algorithm based on the prediction accuracy as feedback information to auxiliary obtain better the structure of the causal graph. The prediction network composed of GCN and LSTM modules is used to execute the UGPP task, and provide the performance index of prediction information for the reinforcement learning module.

The prediction framework is tested on the real dataset of Shendong Group Shangwan Coal Mine. This dataset has characteristics of strong noise and sampling at unequal intervals. The experimental results of this research indicate that our proposed framework can effectively improve the precision of underground pressure prediction and still effective in the deficiency of prior knowledge. Given the above, we present RC-CNN that is better suited to guiding industrial production than the UGPP frameworks that are based on deep learning technology. As a matter of fact, one of the important factors restricting the machine learning techniques, such as deep learning and GNN, applied to the mining industry is its own uncertainty. It can be deemed that the output of a well-designed prediction framework still contains a certain degree of non-Gaussian noise. Hence, it is necessary to construct a more comprehensive underground pressure causal graph. To more effectively guide industrial security, it is necessary that more research about model underground pressure causal graph. In particular, the construction operations and the construction equipment properties are considered in the model.