4.1. TRIDEC Drilling Support Components
The TRIDEC drilling support system (TDS) monitors the drilling operations performed at drilling rigs. It is scheduled to detect and show trends for critical situations in real time [
1]. The system is designed to be used both onsite and offsite at a rig at a real-time operating center (RTOC); see
Figure 7. It also guides the drillers during routine operations and presents counter-actions and recommendations for abnormal situations. The RTOC is dedicated to be used by a special stakeholder, the so-called RTOC engineer. That RTOC engineer monitors multiple rigs at the same time and is therefore typically located at a remote operating center.
Such an operation center is connected to multiple rigs. The RTOC engineer is provided with an overview of all monitored rigs, enabling the user to get a clear view of the overall situation. It is possible to switch to a more detailed view in order to monitor a single rig on demand. With this system, the RTOC engineer analyzes long-running trends, giving learning feedback to the system and responding to any provided counter-actions and recommendations.
The following subsections briefly introduce three of the drilling support components relevant to the scope of this paper.
4.1.1. System Training Component
In a self-learning computer system, a training component is of utmost importance. Basically, system training is the process of learning. This component enables users to replay historical rig data, to annotate undetected events, and to provide feedback for any existing events detected. Based on such feedback, the system can build up the learning models for subsequent use in real-world installations.
4.1.2. Data Analysis Component
The purpose of the data analysis component is to provide and prepare information and know-how on historical items such as data, inferences, and relations between them. It features the analysis of these items in a fashion similar to conventional online analytic processing techniques. Unlike such systems, it additionally provides analysis methods for generating new knowledge.
4.1.3. Knowledge Editor Component
The knowledge editor is a component that enables a user to create, update, and delete detected events as well as any proposed counter-actions and recommendations. Moreover, the component provides comfortable views on certain critical events in a time-oriented fashion. In certain cases, particular know-how of human experts is required to directly edit model parameters such as thresholds, weights, data type prototypes, or cluster centers.
Data transmission is a big challenge because of the amount of data with regards to real-time transmission. For data transmission, a standard encoding such as WITSML [
38] is used. Such standards are based on XML. For instance, one rig currently provides more than 700 real-time channels sampled with a frequency of 1 Hertz.
Figure 8 shows a portion of such data together with the associated information on some states of the drilling rig. The transmission of such a large amount of data from one rig is accomplishable by the use of DSL or satellite communication channels. Continuously receiving such data streams concurrently from numerous rigs may exceed the capacity of such channels.
4.2. Feature Construction
Some real-time data channels are shown in
Figure 8. Taking, for instance, the stuck pipe scenario into account, it can be presumed a priori that the block position and hook load play some role. If the pipe is stuck, the block cannot be moved with the drill string connected. Attempting to move the block up will drastically increase the hook load without giving the block a chance to move—or, to be more precise, the block speed is nearly zero. So, the block speed, which is the first derivative with regards to time of the block position, might be a feature that simplifies the problem of stuck pipe detection. The same applies to drill string rotation and torque.
In terms of real-world tasks, stuck pipe detection is an actual but rather simple problem. A more complex task is stuck pipe prediction, and, as a consequence, stuck pipe prevention through using some precautionary counter-actions. The assumption that hook load and torque contain information about a possibly emerging stuck pipe is still valid. How that information is provided is actually unknown, and therefore a challenge whose importance should not be underestimated.
Figure 9 sketches a typical borehole drilled nearly vertical at its beginning which then changes direction to nearly horizontal. The main components of the hook load
are the acceleration force
, the component of the total weight of the whole drill string (
) aligned with the drilling direction
, (according to the principles of mechanics the component that is vertical to the drilling direction
is to be ignored), the friction forces
(that according to the principles of mechanics could depend on
), and some other non-quantifiable forces denoted as ε. In case of creating a deterministic model, the mass influx of the drill string, the borehole trajectory, especially the inclination and friction factors, amongst others, need to be known.
Since it appears unpromising to identify and estimate all input factors with a reasonable certainty and accuracy to predict and thus prevent a stuck pipe (as well as other crises), a heuristic approach incorporating deterministic know-how seems to be the most feasible solution. To incorporate as much deterministic know-how as possible, a systematic approach to generate features based on specific laws of physics appears to be appropriate.
For feature creation, the physical rules of kinematics and dynamics were applied to the drill string in a first approach to create some heuristics for the features created via the available data channels. For instance, the simple rule for the acceleration force
based on the mass of
m and the acceleration of
a,
leads to the resulting heuristic that
can be expressed as
In Equation (7), denotes the length of the drill string above a rotary table and thus out of the borehole, is the length of the drill string in the borehole, and a is the acceleration applied to the total drill string. The constants and need to be evaluated. If a deterministic model builder is used, they are probably automatically estimated using a heuristic model builder and thus can be ignored if the same features are derived. In fact, the features from Equation (7) are , and the acceleration of the drill string itself.
Combining all the kinematic and dynamic rules heuristically and normalizing the results by the length of the drill string, a set of exactly 100 feature rules was obtained. Those rules were applied to the 10 base channels as shown in
Table 4 as well as to the extended base channels shown in
Table 5, resulting in a total of 1600 features.
The denotation of the features is based on the channel from where it originates (e.g., C0108) and the feature index (1 out of 100, H00 denotes the first and H99 the last of the features). Thus, the denotation C0108:H01 stands for the feature with index-1 based on bit depth, which is in fact the drill string acceleration.
Figure 10 shows a portion of such features generated within a 42-h window. The base channels in that case were sampled with a sampling frequency of 0.1 Hz equivalent to a sampling interval of 10 s. In all charts, the bit depth is assigned to the right axis and drawn as a black line. In
Figure 10a, the bit velocity is drawn as a yellow line within a range of about ±0.8 m/s. The bulk of the positive bit velocities on the left (3:20 to 6:10) is due to the trip-in of the drill string into the borehole. The middle part of the chart indicates the actual drilling operation starting at 6:10 and ending the next day at 9:20. Over this period, the average bit velocity is about 16 m/h (equivalent to 5 mm/s) and thus not really perceptible in the chart. The large spikes in that range (e.g., at 14:00) indicate so-called ream and wash operations applied for cleaning the borehole. The right part of the chart shows, again, a period of large bit velocities due to a trip-out operation when the drill string is removed from the borehole.
In
Figure 10b, the bit acceleration is drawn against the time and the chart is similar to that above; a period of large acceleration values occurs during a trip-in and trip-out. Some acceleration spikes during drilling are caused by the ream and wash operations.
Figure 10c shows the pump power (E0201, cp.
Table 5), the product of the pump pressure (C0121), and flow rate (C0130). It is obvious that the most pump power is required for drilling; for the trip-in and trip-out it is almost zero.
4.6. Experiments with a Random Forest Classifier
Experiments were undertaken to assess how well the resulting Sensitivity Indices correlated with the results of other state-of-the-art classification methods, first the Random Forest Classifier (
Section 4.6), and then Neural Networks (
Section 4.7). Random Forest Classifiers (RFC) [
39,
40] are well-known classification methods used in the machine learning community. They have shown better, or at least comparable, performance in comparison to other state-of-the-art classification algorithms, such as SVMs [
41] or boosting [
42], and show a series of advantages such as efficiency when used on large databases and robustness to missing data.
For this experiment, we used an adapted version of the online algorithm proposed by Saffari et al. [
43]. All our experiments were performed on MATLAB (R2011-64bit) running on an Intel Xeon 3.2 GHz—Windows 7 machine (Austria, Graz). Due to the randomized nature of the algorithm, we performed 10-fold cross-validation for the datasets. For each validation run, we used a random selection of 90% of the data for learning, while the remaining part of the data was used for testing. The main performance criterion is the correct classification rate (CCR), which is the ratio between the correctly classified datasets and the number of valid test datasets. Due to the unbalanced nature of classes, individual correct classification rates were calculated to emphasise any effects on smaller classes.
At first, we used the 32 most important features suggested by the EventTracker SA algorithm. The results obtained can be seen in the second column of
Table 8.
As one can see, the classification rates vary between about 55% and 70%, which indicates the feasibility of the proposed EventTracker SA method. The worst performance for S055 may be explained both by the complexity of the task (This is indicated by the fact that the best possible correct classification rate is also the worst one for this dataset) and the fact that the SA algorithm did not see the specific well data during the selection procedure.
In order to check the performance of RFC using added knowledge from an expert, an offline experiment incorporating an expert’s knowledge for the selection of channels was performed. In particular, we added the first and second derivatives (H01 and H02), as well as the H12 of the base channels C0108 (mdBit), C0110 (mdHole), and C0112 (posBlock) as additional features and obtained the results shown in the third column of
Table 8. The results show a correlation between the performance of the EventTracker SA method and the ones from incorporating an expert’s knowledge.
4.7. Experiments with a Neural Network Classifier
In addition to the Random Forest Classifiers, Neural Networks were applied to classify the operational states shown in
Table 6. In combination with the well-known Sequential Forward Selection (SFS) [
6], an estimate of the channels and features relevant for the classification task was made in order to compare it to the features recommended by the SA method. In addition, the performance of the classifier is of interest for comparison to the Random Forest Classifiers.
Neural Networks in general generate a substantive computational load in computers. In addition, feature selection increases this. To constrain the computation time to some acceptable time, a subset of the data was extracted from each of the four wells.
A drilling process is usually separated into so-called runs. For each such run, a rough description of what happened in the well or at the rig is provided. A drilling run includes all the operations applied during the actual drilling of a well and typically consists of trip-in the drill string, drill the well, and concludes with a drilling trip-out of the drill string (see
Figure 8).
For the experiments with the Neural Networks, the data were extracted as shown in
Table 9. From each well, one single drilling run was selected, and that data was separated into 3 subsets for learning (60%), validation (20%), and testing (20%).
For the classification, a special network architecture, the improved completely connected perceptron (iCCP) shown in
Figure 12a, was used. The design of the network layer, number of hidden layers, and number of neurons in each hidden layer, is one of the major challenges in designing a multi-layer perceptron network. The advantage of the iCCP architecture compared to the multi-layer perceptron is that all neurons in the hidden block are completely connected, and thus the search for the optimal network complexity is straightforward. In oil and gas exploration tasks, this architecture has been applied successfully to simulate drilling hydraulics [
44].
For all our experiments identical configurations were used. A total of 10 networks were trained in parallel to prevent them from being trapped in local error minima. Network growth was started from scratch, with no neurons in the hidden block, equivalent to multi-linear regression. Then, the hidden neurons were increased one by one until a maximal number of five hidden neurons were obtained.
The number of inputs to all networks was at the utmost of the channels shown in
Table 9, but via application of SFS, the input to the networks was managed with forward selection.
In the first experiment, the extracted data from all four wells were used as input to the model. Using all of the 57 channels as inputs to the model, a correct classification rate of about 93% was obtained, but the outcomes were improved by reducing the number of inputs according to the results of the feature selection.
The results of the feature selection are shown in
Figure 13, with 10 out of the 57 channels/features selected (4 of those inputs are recommendations from the ETSA method).
The most important input to the model was the difference between bit depth and borehole depth (D0101). The second most important input is the hook load (C0114), followed by rotary speed of the bit (C0120). The CCRs (learning, validation, and test subsets), using only those three channels as input, were all within a range of about 87%. Taking mud flow rate as a fourth input actually provides no improvement to the classification rates, but there is no other channel which provides better results at that position. Adding the bit velocity improves the CCRs by about 4%, to an average of about 92%. Adding more channels according to the SFS rules raises the CCR up to nearly 95%, which is slightly better than the results obtained using all 57 channels as input.
Figure 14 shows the forward selection results of the first SFS cycle. The data are sorted by the model’s validation error (increasing order). There are three channels/features at the left side of the chart providing correct classification rates above 70%, i.e., the difference between bit depth and borehole depth (D0101). The borehole depth was normalized by the drill string length (C0110:H30) and the bit velocity (C0108:H01). The insert in
Figure 14 shows the correct classification rates obtained for all 57 channels/features.