Electrical Power and Energy Systems: Pınar Tüfekci
Electrical Power and Energy Systems: Pınar Tüfekci
Electrical Power and Energy Systems: Pınar Tüfekci
a r t i c l e i n f o a b s t r a c t
Article history: Predicting full load electrical power output of a base load power plant is important in order to maximize
Received 7 April 2013 the profit from the available megawatt hours. This paper examines and compares some machine learning
Received in revised form 15 February 2014 regression methods to develop a predictive model, which can predict hourly full load electrical power
Accepted 25 February 2014
output of a combined cycle power plant. The base load operation of a power plant is influenced by four
main parameters, which are used as input variables in the dataset, such as ambient temperature, atmo-
spheric pressure, relative humidity, and exhaust steam pressure. These parameters affect electrical power
Keywords:
output, which is considered as the target variable. The dataset, which consists of these input and target
Prediction of electrical power output
Combined cycle power plants
variables, was collected over a six-year period. First, based on these variables the best subset of the data-
Machine learning methods set is explored among all feature subsets in the experiments. Then, the most successful machine learning
regression method is sought for predicting full load electrical power output. Thus, the best performance
of the best subset, which contains a complete set of input variables, has been observed using the most
successful method, which is Bagging algorithm with REPTree, with a mean absolute error of 2.818 and
a Root Mean-Squared Error of 3.787.
Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction which is a combined cycle power plant (CCPP) with two gas tur-
bines, one steam turbine, and two heat recovery systems. Predict-
In order for accurate system analysis with thermodynamical ap- ing electrical power output of a power plant has been considered a
proaches, a high number of assumptions is necessary such that critical real-life problem to construct a model using machine learn-
these assumptions account for the unpredictability in the solution. ing techniques. To predict full load electrical power output of a
Without these assumptions, a thermodynamical analysis of a real base load power plant correctly is important for the efficiency
application compels thousands of nonlinear equations, whose solu- and economic operation of a power plant. It is useful in order to
tion is either almost impossible or takes too much computational maximize the income from the available megawatt hours
time and effort. To eliminate this barrier, the machine learning ap- (MW h). The reliability, and sustainability of a gas turbine depend
proaches are used mostly as alternative instead of thermodynam- highly on prediction of its power generation, particularly when it is
ical approaches, in particular, to analyze the systems for arbitrary subject to constraints of high profitability and contractual
input and output patterns [1]. liabilities.
Predicting a real value, which is known as regression, is the Gas turbine power output primarily depends on the ambient
most common problem researched in machine learning. For this parameters which are ambient temperature, atmospheric pressure,
reason, machine learning algorithms are used to control response and relative humidity. Steam turbine power output has a direct
of a system for predicting a numeric or real-valued target feature. relationship with vacuum at exhaust. In the literature, the effects
Many real-life problems can be solved as regression problems, and of ambient conditions are studied with intelligent system tools
evaluated using machine learning approaches to develop predic- such as Artificial Neural Networks (ANNs) for prediction of electri-
tive models [2]. cal power (PE) [1,3,4]. In [1], the effects of ambient-pressure and
This paper deals with several machine learning regression temperature, relative humidity, wind-velocity and direction on
methods for a prediction analysis of a thermodynamic system, the plant power are investigated using the ANN model, which is
based on the measured data from the plant. In [4], the ANN model
is used to predict the operational and performance parameters of a
⇑ Tel.: +90 282 250 23 00; fax: +90 282 652 93 72. gas turbine for varying local ambient conditions.
E-mail address: ptufekci@nku.edu.tr
http://dx.doi.org/10.1016/j.ijepes.2014.02.027
0142-0615/Ó 2014 Elsevier Ltd. All rights reserved.
P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140 127
The intelligent systems are also used for modeling a stationary 2 160 MW ABB 13E2 Gas Turbines, 2 dual pressure Heat Recov-
gas turbine. For instance, ANNs identification techniques are devel- ery Steam Generators (HRSG) and 1 160 MW ABB Steam Turbine
oped in [5] and the results show that ANN system identification are as illustrated in Fig. 1.
perfectly applicable to estimate gas turbine behaviors in wide range Gas turbine load is sensitive to the ambient conditions; mainly
of operating points from full speed no load to full load conditions. In ambient temperature (AT), atmospheric pressure (AP), and relative
[6], Multi Layer Perceptron (MLP) and Radial Basis Function (RBF) humidity (RH). However, steam turbine load is sensitive to the ex-
Networks are used for identification of stationary gas turbine in haust steam pressure (or vacuum, V). These parameters of both gas
startup stage. In [7], dynamic linear models and Feed Forward and steam turbines, which are related with ambient conditions and
Neural Networks are compared for gas turbine identification and exhaust steam pressure, are used as input variables in the dataset
the neural network is found as a predictor model to identify with of this study. The electrical power generating by both gas and
better performances than the dynamic linear models. In [8,9], the steam turbines is used as a target variable in the dataset. All the in-
ANNs models are also used for performance analysis, anomaly put variables and target variable, which are defined as below, cor-
detection, fault detection and isolation of gas turbine engines. respond to average hourly data received from the measurement
Furthermore, in the literature, several studies, e.g., [10–16] have points by the sensors denoted in Fig. 1.
been undertaken to predict electricity energy consumption using
machine learning tools, only a few studies such as [1], which is re- (1) Ambient Temperature (AT): This input variable is measured in
lated to prediction of the total electric power of a cogeneration whole degrees in Celsius as it varies between 1.81 °C and
power plant with three gas turbines, one steam turbine and a district 37.11 °C.
a heating system, are found to be as a similar study of this paper. (2) Atmospheric Pressure (AP): This input variable is measured in
Pertaining to power plants, it is needed to develop a reliable units of minibars with the range of 992.89–1033.30 mbar.
predictive model for the following day’s net energy yield (full load (3) Relative Humidity (RH): This variable is measured as a per-
electrical power output) per hour by using real-valued target fea- centage from 25.56% to 100.16%.
ture. For this task, there are two main purposes of this study. First (4) Vacuum (Exhaust Steam Pressure, V): This variable is mea-
one is to determine the best subset of the dataset, which gives the sured in cm Hg with the range of 25.36–81.56 cm Hg.
highest predictive accuracy with a combination of the input (5) Full Load Electrical Power Output (PE): PE is used as a target
parameters defined for gas and steam turbines such as ambient variable in the dataset. It is measured in mega watt with
temperature, vacuum, atmospheric pressure, and relative humid- the range of 420.26–495.76 MW.
ity. For this purpose, the effects of different combinations of the
parameters were investigated and analyzed on predicting full load 2.2. Feature subset selection
electrical power output by using 15 machine learning regression
methods in WEKA [17] toolbox. Afterwards, the results of the pre- Data pre-processing is a significant process that contains the
dictive accuracies for the different combinations of the parameters processes of cleaning, integration, transformation, and reduction
were compared and evaluated to find out the best subset of the of data for using quality data in machine learning (ML) algorithms.
dataset. This paper compared the predictive accuracies of the Data sets may vary in dimension from two to thousands of fea-
regression methods as the second purpose, which was found out tures, and many of these features may be irrelevant or redundant.
the most successful regression method in the prediction of full load Feature subset selection decreases the data set dimension by
electrical power output of a base load operated CCPP with the removing irrelevant and redundant features from an original fea-
highest prediction accuracy. ture set. The objective of feature subset selection is to procure a
The remainder of this paper is organized as follows. In Section 2 minimum set of original features. Using the decreased set of origi-
materials and methods are elaborated, whereas the experimental nal features enables ML algorithms to operate faster and more
work is given in Section 3. In Section 4 we provide a discussion effectively. Therefore, it helps to predict more correctly by increas-
of the study and then we conclude in Section 5. ing learning accuracy of ML algorithms and improving result com-
prehensibility [20].
The feature selection process begins by inputting an original
2. Materials and methods feature set, which includes n number of features or input variables.
At the first stage of feature selection, which is called subset gener-
2.1. System description ation, a search strategy is used for producing possible feature sub-
sets of the original feature set for evaluation. Abstractly, the
A combined cycle power plant is composed of gas turbines (GT), current best subset of the original feature set can be performed
steam turbines (ST) and heat recovery steam generators (HRSG). In by evaluating all the possible feature subsets, which are all the
a CCPP, the electricity is generated by gas and steam turbines, contending 2n possible subsets. This search is known as exhaustive
which are combined in one cycle, and is transferred from one tur- search, which is too costly and impracticable if the original feature
bine to another [18]. A gas turbine in a combined cycle system does set consists of enormous features [21]. There are also some several
not only generate the electrical power but also generates fairly hot search procedures to find the optimal subset of the original feature
exhaust. Routing these gases through a water-cooled heat exchan- set, which are more realistic, easier and more practical. However,
ger produces steam, which can be turned into electric power with a in this study, the exhaustive search is used as search procedure.
coupled steam turbine and generator. Hence, a gas turbine gener- Therefore, every feature combination is tried and marked with a
ator generates electricity and waste heat of the exhaust gases is score by using ML regression methods, which equals a value of
used to produce steam to generate additional electricity via a the prediction accuracy. Then the results of each ML regression
steam turbine. This type of power plant is being installed in method are compared to find the feature subset with the best
increasing numbers around the world where there is access to sub- prediction accuracy, which is called as the best subset.
stantial quantities of natural gas [19].
The CCPP,1 which supplied the dataset for this study, is designed 2.3. Machine learning regression methods
with a nominal generating capacity of 480 MW, made up of
A machine learning (ML) algorithm estimates an unknown
1
The name of donor power plant is kept confidential. dependency between the inputs, which are independent variables,
128 P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140
Measurement Points:
Measurement Points: • V (Vacuum)
• AT (Ambient Temperature)
• AP (Atmospheric Pressure)
• RH (Relative Humidity)
Measurement
Points:
• PE (Electrical Power,
Net Yield)
Measurement Points:
• PE (Electrical Power, Net
Yield)
Measurement Points:
• AT (Ambient Temperature)
• AP (Atmospheric Pressure)
• RH (Relative Humidity)
Measurement Points:
• PE (Electrical Power, Net Yield)
and output, which is a dependent variable, from a dataset. In this which are based on the mathematical models. Lazy-learning algo-
study, regression methods are generated as learning algorithms rithms delay dealing with training data until a query is answered.
to predict full load electrical power output of a combined gas They store the training data in memory and find relevant data in
and steam turbines and the dataset is considered as a pair (Xi, Yi), the database to answer a particular query. Meta-learning algo-
that is known as an instance. A machine learning regression meth- rithms integrate different kinds of learning algorithms to enhance
od, which builds a mapping function Y b ¼ ðX i ; Y i Þ by using these the achievement of the used current learning algorithms. Rule-
pairs, behaves as shown in Fig. 2. The purpose of a machine learn- based algorithm uses decision rule for regression model. Tree-
ing regression method is to select the best function, which mini- based learning algorithms are used for making predictions via a
mizes the error between the actual output (Y) of a system and tree structure. Leaves of the tree structures illustrate classifications
b Þ based on instances of the dataset, which are
predicted output ð Y and branches of the tree structures denote conjunctions of
called training dataset [22]. features.
Table 1 shows a list of the ML regression methods, which are Here we present a brief summary of the methods in Table 1.
used in this study. Most of these regression methods have been
widely used for modeling many real-life regression problems. (1) Simple Linear Regression (SLR): The SLR generates a regres-
These methods are divided into five categories such as Functions, sion model, which has lowest squared error. This model fits
Lazy-learning Algorithms, Meta-learning Algorithms, Rule-based straight models between each input attribute (a0 and a1) and
Algorithm, and Tree-based Learning Algorithms, stated by the output (x) as in Eq. (1), in which the values of w and w0,
WEKA statistical analysis package. Functions contain algorithms, which are the weight of a0 and a1, are estimated by the
method of least squares. In Eq. (1), a0 is assumed as the con-
stant 1.
Machine Learning
Regression Method x ¼ w0 þ wa1 ð1Þ
Input Data Mimimize Error The predictive model is chosen by minimizing the squared er-
ror, which is the difference between the observed values and the
predicted values as illustrated in Fig. 2 [23].
Power Plant System
KStar K⁄
Eq. (6) expresses summing products of the inputs (Xi) and weight
Locally Weighted Learning LWL
vectors (aij) and a hidden layer’s bias term (a0j). In Eq. (7), the out-
Meta-learning algorithms Additive Regression AR
puts of hidden layer (Zj) are obtained as transforming this sum,
Bagging REP Tree BREP
which is defined in Eq. (6), using transfer function (activation func-
Rule-based algorithm Model Trees Rules M5R
tion) g.
Tree-based learning Model Trees Regression M5P
algorithms REP Trees REP Z j ¼ gðuj Þ ð7Þ
The most widely used transfer function is sigmoid function,
which is defined in Eq. (8) for input x. The hidden and output layers
are based on this sigmoid function.
and one or more independent variables. The data is observed 1
using a linear model by this algorithm. LR method deals with gðxÞ ¼ sigmoidðxÞ ¼ ð8Þ
ð1 þ ex Þ
weighted instances to create a prediction model. The least
squares regression is performed to specify linear relations Eq. (9) defines summing products of hidden layer’s outputs (Zj) and
in the training data. If there is some linear dependency weight vectors (bjk) and output layer’s bias term (b0k).
among the data, LR may create a best predictive model, N hid
X
which is a linear regression equation to predict the output vk ¼ Z j bjk þ b0k ð9Þ
value (x) for a set of input attributes a1, a2, . . . , ak. In Eq. j¼1
(2), w0, w1, . . . , wk, are the weights respectively of each input
attribute, where w1 is the weight of a1 and a0 is always con- In Eq. (10), the outputs of the output layer (Yk) are obtained by
sidered as the constant 1. transforming this sum, which is obtained in Eq. (9), using sigmoid
function g, which is defined in Eq. (8) [22].
x ¼ w0 þ w1 a1 þ þ wk ak ð2Þ Y k ¼ gðv k Þ ð10Þ
The weights must be selected to minimize the difference be- (5) Radial Basis Function Neural Network (RBF): The RBF Neural
tween the actual output value and predicted output value. In Eq. Network is emerged as a variant of neural network. It uses
(3), the predicted output value for the first instance of a training a normalized distance between the input points and the hid-
dataset is calculated as den nodes to define the activation of each node [27]. RBF
X
k Network maps inputs from the input layer to each of the hid-
ð1Þ ð1Þ ð1Þ
w0 þ w1 a1 þ þ wk ak ¼ wj aj ð3Þ den units, which use radial functions for activation. A Gauss-
j¼0 ian function is useful in finding the activation at a hidden
After the predicted outputs for all instances are calculated, the unit. The Gaussian function
! is defined as [28]:
weights are updated to minimize sum of squared differences be- ðx cÞ2
hðxÞ ¼ exp ð11Þ
tween the actual and predicted outcome as in Eq. (4) [24,25]. r2
!
X
n X
k
ðiÞ where c is the center of bell-shaped Gaussian and r is the width.
xðiÞ wj aj ð4Þ
i¼1 j¼0
!
X
k
ðiÞ
median xðiÞ wj aj ð5Þ
i
j¼0
(4) Multi Layer Perceptron (MLP): The MLP is a feed forward arti-
ficial neural network model that consists of neurons with
massively weighted interconnections, where signals always
travel in the direction of the output layer. These neurons
are mapped as sets of input data onto a set of appropriate Fig. 3. Multi layer perceptron neural networks.
130 P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140
(6) Pace Regression (PR): The PR method creates a predictive bootstrap sample is randomly drawn from the training set, but with
model by evaluating the effects of each variable. It uses a replacement. The purpose is to create numerous similar training
cluster analysis to enhance the statistical basis for estimat- sets using, sampling and train a new function for each of these sets.
ing their contribution to the overall regression [23]. The functions learned from these sets are then used collectively to
(7) Support Vector Poly Kernel Regression (SMOReg): Support vec- predict the test set [31,32]. In this study, we use Bag Size Percentage
tor machines (SVM) are kernel based learning algorithm for of 100, with 10 iterations and REP Tree as a predictor. These are the
solving classification and regression problems. Support vec- default settings provided by the WEKA tool.
tor regression (SVR) maps the input data x into a higher (13) Model Trees Rules (M5R): The M5R is a rule-based algorithm,
dimensional feature space by nonlinear mapping to solve a based on M5 algorithm, which builds a tree to predict
linear regression problem in this feature space. This trans- numeric values for a given instances. For a given instance,
formation can be done using a kernel function. The most the tree is traversed from top to bottom until a leaf node is
common existing kernel functions are linear kernel, polyno- reached and the best leaf into a rule is made using a decision
mial kernel, Gaussian (RBF) kernel, and sigmoid (MLP) kernel list [23].
[28]. In this study, Support Vector Poly Kernel Regression is (14) Model Trees Regression (M5P): The M5P is a regression-based
used as an implementation of the sequential minimal opti- decision tree algorithm, which builds a model tree using the
mization algorithm for training a support vector regression M5 algorithm. For a given instance, the tree is traversed from
model. top to bottom until a leaf node is reached. At each node in
(8) IBk Linear NN Search (IBk): The IBk instance-based learning the tree, a decision is made to follow a particular branch
that works as a k-nearest-neighbor classifier, which is the based on a test condition on the attribute associated with
most commonly used instance-based or lazy method for that node. As the leaf nodes contain a linear regression
both classification and regression problems. In this paper, model to obtain the predicted output, the tree is called a
it is used for a regression problem. The algorithm normalizes model tree. The M5P algorithm builds a model tree using
attributes by default and can do distance weighting. A vari- to divide and conquer method [33].
ety of different search algorithms are used to speed up the (15) Reduced Error Pruning (REP) Trees: The REP Trees algorithm
task of finding the nearest neighbors [20]. creates a regression tree using the node statistics such as
information gain or variance reduction measured in the
The KNN algorithm first measures distances between each in- top-down phase, and prunes it using reduced-error pruning
stance in the training dataset and the test instance according to a [34].
distance metric, which is often used Euclidian distance, then the
nearest instances to the test instance are determined as the target 3. Comparative analysis
value. For this purpose, the KNN algorithm gives higher values to
the weights of the nearer neighbors. Thus, Eq. (12) is used to pre- This section discusses the comparative analysis that was con-
dict the target value of the test instance (qi) [2]. ducted to evaluate and compare the some machine learning
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PN ffi regression methods available for predicting hourly full load electri-
2
f 1 dðq; s; f Þ cal power output of a CCPP, which is combined with gas and steam
Simðqi ; sj Þ ¼ 1 pffiffiffiffi ð12Þ turbines. First, the best subset of the dataset is explored among all
N
feature subsets in the experiments. Then, the most successful ma-
(9) KStar (K⁄): KStar method is also an instance-based classifier chine learning regression method is sought for predicting full load
used for regression. It generates a predictive model by using electrical power output. The flow diagram of the prediction process
some similarity function based on an entropy-based dis- is illustrated in Fig. 4.
tance function [29].
(10) Locally Weighted Learning (LWL): The LWL uses an instance- 3.1. Dataset description
based algorithm, which assigns instance weights. This algo-
rithm can perform classification or regression [20]. The dataset used in this study, which consists of four input vari-
(11) Additive Regression (AR): The AR is a meta-learning algo- ables and a target variable, was collected over a six-year period
rithm, which produces predictions by combining contribu- (2006–2011). It is composed of 9568 data points collected when
tions from an ensemble (collection) of different models. the CCPP is set to work with a full load over 674 different days.
Each iteration fits a model to the residuals left by the classi- The input variables correspond to average hourly data received
fier on the previous iteration. Prediction is accomplished by from the measurement points by the sensors denoted in Fig. 1.
adding the predictions of each classifier. Reducing the The input variables affect full load electrical power output (PE),
shrinkage parameter helps prevent over fitting and has a which is considered as the target variable in the dataset and corre-
smoothing effect, although it increases the learning time sponds to average hourly full load electrical power output data re-
[30]. ceived from the control system when the power plant was at base
(12) Bagging REP Tree (BREP): Bagging or bootstrap aggregating is load.
general technique for improving prediction rules by creating At the beginning of the pre-processing stage shown in Fig. 4, the
various sets of the training sets. Bagging algorithm is applied dataset was consisting of 674 datasheets (formatted in .xls) of 674
to tree-based methods such as REP Trees to reduce the different days with some noisy and incompatible data. The data is
variance associated with prediction, and therefore, increase cleaned by filtering the incompatible data, which is out of range
the accuracy of the resulting predictions. Bagging can be meaning that if the power plant is operating below 420.26 MW,
formalized as follows and the noisy data, which occurs when the signal is interfered from
electrical disturbance.
1X B
Then the datasets are merged to eliminate any duplication
^BAG ¼
y ;ðx; T b Þ ð13Þ
B b¼1 data and to be integrated dataset. After that, the dataset, which
includes 674 files formatted as .xls, is integrated as a file format-
where B is the number of bootstrap samples of training set T and x is ted in .xls. Then it has been randomly shuffled five times and
the input. y^BAG is the average of the different estimated trees. A transformed to a file formatted as .arff that is necessary for
P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140 131
The Dataset
(674 files formatted in .xls for 674 days)
Pre-Processing
Data Cleaning
Data Integration (674 .xls files 1 .xls file)
Data Transformation
-The dataset is randomly shuffled 5 times (1 .xls file 5 .xls files)
-Transformation (.xls files .arff files)
Selected Subset
Machine Learning Regression Methods
(15 Algorithms in WEKA, 2 CV)
Functions Lazy Meta Rules Trees
SL IBk AR M5P M5P
LR K* BREP REP
LMS LWL
MLP
RBF
PR
SMOReg
Results (15x5x2=150 measurements)
Statictical Test
(150x2=300 tests in ANOVA)
processing in WEKA tool. At the end of pre-processing, the observed between AT and V (0.84). Moreover, the highest correla-
original feature set is composed of five files formatted as .arff tions with target variable (PE) are also observed with AT (0.95)
and each file consists of 9568 data points with five totally param- and V (0.87).
eters as the integrated data set. According to Fig. 5, atmospheric pressure (AP) and relative
Table 2 denotes simple statistics of the dataset. humidity (RH) do not have a strong correlation with the target
Table 3 is the covariance matrix which indicates that the variable (PE) sufficient for an individual prediction. When other
parameters are not independent. factors remain constant, it has been shown that PE increases
Table 4 illustrates the parameters’ cross-correlation. The scatter with increasing AP and RH individually. Here the effects of each
plot of the data used is given in Fig. 5. When Table 4 and Fig. 5 are input variable individually on target variable are presented as
examined together, the highest correlation among input features is below:
132 P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140
AT V AP RH PE
PE
RH
AP
AT
Table 5
All possible subsets of the dataset.
Subsets AT V AP RH
With 1 parameter
AT 1 0 0 0
V 0 1 0 0
AP 0 0 1 0
RH 0 0 0 1
With 2 parameters
AT–V 1 1 0 0
AT–AP 1 0 1 0
AT–RH 1 0 0 1
V–AP 0 1 1 0
V–RH 0 1 0 1
AP–RH 0 0 1 1
With 3 and 4 parameters
AT–V–AP 1 1 1 0
AT–V–RH 1 1 0 1
Fig. 8. Scatter diagram of RH (x axis) vs. PE (MW, y axis).
AT–AP–RH 1 0 1 1
V–AP–RH 0 1 1 1
AT–V–AP–RH 1 1 1 1
one-way Analysis of Variance (ANOVA) has been used for the sta-
tistical tests, which are applied to average results obtained from
each 2-fold CV. ANOVA is a parametric test in which the mean
across different groups is compared, and one-way ANOVA is cho-
sen here to analyze the significance of difference with respect to
a single-factor [45]. It is also used to compare results of machine
learning experiments [44]. The idea is to decompose the total var-
iability into within group and between group variability, which
provides estimators to dataset variance. Then the ratio of mean be-
tween and mean within group variance provide the test statistic.
One-way ANOVA tests the null hypothesis that the means of all
treatments are equal, and the alternative hypothesis is that at least
one pair is significantly different. When the null hypothesis is re-
Fig. 9. Scatter diagram of V (cm Hg, x axis) vs. PE (MW, y axis). jected post hoc tests are issued to evaluate pairwise differences
[44,45]. In our machine learning experiments, we test one-way
CV is a special case where at first experiment one half is trained to ANOVA for comparing algorithms and for comparing feature sub-
validate on the second half and then the roles of the initial subsets sets independently. Therefore, 300 measurements were obtained
are swapped [44]. For the generalization of results, each cell in the over 15 5 2 2, which correspond to the number of methods,
following tables (Tables 6–10) corresponds to an average of 10 2-fold average and features, respectively.
(5 2 CV) [30] runs in WEKA toolbox.
The resulting validation set performances of size 10 for each The controlled experimental results, which belong to following
learning algorithm are used for a statistical significance test. Thus, four experiments, have shown by using which subset the response
Table 6
Regression errors with RMSE performances on the subsets with one parameter.
Table 7
Regression errors with RMSE performances on the subsets with two parameters.
Table 8
Regression errors with RMSE performances on the subsets with three parameters.
can be predicted with the highest prediction accuracy by the result (see Fig. 10) as also the mean performance results shown
regression methods. The RMSE measure is used in the tables to in Table 6.
represent the performance of the regression methods to determine In the second experiment, the performance of the methods for
the best subset. predicting PE for the subsets, which consist of only two parameters,
In the first experiment, each input variable of the dataset has is compared in Table 7. This table indicates that the highest predic-
applied individually to the regression methods. Thus, the perfor- tion accuracy is found as a mean RMSE value of 5.395 for the subset
mance of the methods for predicting PE for the subsets, which con- with AT, and V parameters, which is the best subset among the
sist of only one parameter, is presented in Table 6. According to the subsets with two parameters.
mean performance of each subset with one parameter for whole On the other hand, statistical tests indicated that there is a sig-
regression methods in this table, the best subset of this experiment nificant difference between performances of algorithms when
is found as the subset with AT parameter with the highest predic- tested under one-way ANOVA, F(14, 435) = 4.17, p < 0.05. However,
tion accuracy, which is the lowest error in terms of a RMSE of the post hoc tests had shown that the performance of the best algo-
5.829. rithm with two-parameter models was found to be significantly
In addition to that, a one-way ANOVA was used to test the sta- better than only LWL and RBF. A one-way ANOVA was used to test
tistical significance of the difference in predictive performances performances of subsets with two parameters. Results indicated a
among machine learning algorithms using only one feature. The significant difference among groups, F(5, 444) = 485.76, p < 0.05.
test result indicated no significant difference, F(14, 285) = .48, Multiple comparison tests indicated no significant difference
p = .940. A second one-way ANOVA was used to test for perfor- among subsets containing AT (i.e. AT–V, AT–AP, and AT–RH). More-
mance differences among machine features over all algorithms. over, all subsets, including AT were found to be significantly better
The test result indicated significant difference between features, than the subsets without AT. The subset with AP, and RH parame-
F(3, 296) = 1461.38, p < .05. A multiple comparison test based on ters was found to be significantly the worst as can be observed in
Tukey-HSD indicated that all groups differ significantly from each Table 7.
other and the performance ordering is AT < V < AP < RH, which im- In the third experiment, the subsets with three parameters are
plied AT yields the minimal error which validates the ANOVA applied to all regression methods. Table 8 illustrates that the BREP
136 P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140
Table 9
Regression errors with RMSE performances on the subsets with four parameters and the best RMSE performances of the each previous experiments applied to subsets with one.
two. and three parameters.
Table 10
Results of the best regression methods of each category for the best subsets of the experiments.
method gives the highest prediction accuracy for each subset with by means. Post-hoc tests after issuing ANOVA revealed significant
three parameters in RMSE measure. According to the mean results difference between all subsets containing AT and V-AP-RH. How-
of Table 8, the best subset with three parameters of this experi- ever, no significant difference was observed among the perfor-
ment is obtained as the subset comprising AT, V, and RH parame- mances of the subsets with three parameters containing AT (see
ters with RMSE of 5.183 as a mean value, which is the average of Fig. 11).
the results of all regression methods for the selected subset. Simi- In the last experiment, there is only one subset with four
larly, a one-way ANOVA indicated significant difference between parameters to apply to the regression methods. Table 9 illustrates
groups, F(14, 285) = 31.13, p < 0.05. Post-hoc tests indicated that the best results of the other three experiments and the results of
BREP is significantly better than 6 out of 14 methods, namely AR, the subset with four parameters. According to the mean RMSE val-
IBk, LWL, MLP, RBF, and SLR. MLP was found to be significantly bet- ues of Table 9, the best subset is found to be the subset with AT, V,
ter than the least 2 and significantly worse than the top 5 ordered AP, and RH parameters, which can be used for predicting PE with
the highest prediction accuracy, with the mean RMSE value of
5.071.
Table 9 is a brief table, which indicates the best subset and the
best regression method with the highest prediction accuracy of PE.
According to this table, the subset with four parameters has been
found as the best subset. Moreover, the comparison of the mean
performances indicates that BREP method, which is a meta classi-
fier used as a learner combination, is the best regression method,
which is the most successful regression method with the highest
mean prediction accuracy with a mean RMSE value of 4.234 for
all the best subsets of previous experiments in this table. After-
ward, BREP is followed by respectively M5P, M5R, REP, K⁄, LR, PR,
SMOReg, LMS, SLR, IBk, AR, MLP, LWL, and RBF regression methods
on all evaluation methods. The RBF method is found to be the poor-
AT V AP RH
est performing predictive model.
Fig. 10. The ANOVA plot of RMSE performances of models using a single parameter. In addition to that, the M5R method is the best method with a
Models enumerated correspond to AT, V, AP, and RH, respectively. RMSE of 5.085 for the subset AT, which is the best subset of the
P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140 137
Most of the predicted values of the M5R method are above the
ideal prediction line. This implies that the M5R method over esti-
mates the overall predictions.
The best performed algorithms in each category for the best
subsets of the subsets with one, two, three, and four parameters
are denoted in Table 10. As can be seen this table, the SMOReg
method has the best performances of the functions for the subset
AT, and the subset AT, V, AP, and RH; the LMS method has the best
performance of functions for the subset AT, and V, and the subset
AT, V, and RH. The K⁄ is the best method of the lazy-learning algo-
rithms for the all subsets with one, two, three, and four parame-
ters. The BREP is the most successful algorithm of the meta-
learning algorithms used in this study for all the subsets with
one, two, three, and four parameters. The M5R is the only rule-
AT-V-AP AT-V-RH AT-AP-RH V-AP-RH based algorithm used for this study. Though the M5P is the best
method of tree-based algorithms for the subset AT, the BREP is also
Fig. 11. The ANOVA plot of models (over all algorithms) with 3-parameter subsets.
As can be seen, the three subsets containing AT do not significantly differ among
the best method of tree-based learning algorithm for the subsets
themselves however all are significantly better than the one without AT (i.e. subset AT and V; AT, V, and RH; AT, V, AP, and RH.
with V–AP–RH). Lastly, we carried out more learner combination experiments
with the best performing N (N = 2. . .6) predictors, namely the best
subsets with one parameter. However, the BREP method is the best performing algorithms of each category for the best subsets with
method with a RMSE of 4.026 for the subset AT, and V, which is the one, two, three, and four parameters shown in Table 10. While
best subset of the subsets with two parameters, and with a RMSE the advantages of learner combination are manifold (avoiding local
of 3.922 for the subset AT, V, and RH, which is the best subset of minima of learners, providing different views of the data, reducing
the subsets with three parameters, and with a RMSE of 3.779 for estimation variance, etc.), the main criticism is the increased mod-
the subset AT, V, AP, and RH, that is the best subset of allover the el complexity. As one of the core values of science is simplicity,
produced subsets. usually referred to as Ockham’s razor, we wish to seek a learner
According to the performances in Table 9, the scatter plots of combiner method as simple and accurate as possible. We see that
the most successful algorithms for each best subset with one, the learner combination, namely voting the best N learners, did not
two, three and four parameters are presented for the actual and ob- improve over the performance of Bagging REPTree except with AT
served PE in Fig. 12. This figure illustrates that BREP performance feature alone. The best overall performance obtained with the full
for the best subset with four parameters fits best to the ideal line set of features (also using Bagging REPTree) did not improve by any
(i.e., the diagonal line), followed by BREP for the subsets with three, combination of best N predictors as denoted in Table 11. In case the
and two parameters, and M5R for the subset with one parameter. results have improved slightly due to learner combination, we
Fig. 12. Scatter plots of the actual and predicted PE for the subsets with one, two, three, and four parameters.
138 P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140
Table 11
Comparing the results of the best single regression method with the results of voting the methods.
Fig. 13. Scatter plot of the actual PE and predicted PE with the BREP method for the best subset with AT, V, AP, and RH parameters.
would still recommend using a simpler model due to aforemen- 4.1. Validity of learned model
tioned rule of simplicity.
The scatter plot of the best predictive model of this study, which The generalization ability of the learned model can be esti-
predicts full load electrical power output of a CCPP, is denoted in mated at lab environment by the performance on unseen test data.
Fig. 13. This plot is the scatter plot of the actual PE and predicted In our simulations, we carry out experiments in 5 2 cross valida-
PE with the best regression method, which is found as BREP meth- tion scheme. In this scheme the data, which are collected over a
od, for the best subset, which is found as the subset with four 6 year-period, are split into a training set and a testing set. The
parameters such as AT, V, AP, and RH, with MAE of 2.818 and with training data is used by the algorithm to train a machine learning
RMSE of 3.787. model (i.e. to learn model parameters) and the independent testing
set is used to evaluate the model’s performance. Also the model’s
4. Discussion hyper-parameters can be fine tuned using the validation set perfor-
mance. However, in order to avoid over-fitting to data, which wor-
The ultimate aim of machine learning study is to provide a gen- sen the generalization power of the learner, we do not carry out
eralizable algorithm to predict future, unseen data. From machine fine tuning using validation set. Moreover, we carry out statistical
learning perspective, the answer to generalization capability is two tests to show the relative performance of learned models as well as
folds: the generalization ability of the algorithm and the general- the relevant algorithm on unseen test sets.
ization ability of the trained model. First, we need to distinguish To assert the validity/generalization power of the model on new
the model from the method or the algorithm: the methods consist data, we first need to make sure that the data is drawn from the
of the algorithms as well as the abstractions learned by these algo- same underlying distribution. This means that the data should be
rithms from data. The methods optimize an objective function and generated from the same or similar physical data generation
learn the abstractions, which collectively and compactly form the process. The proposed methods are applied in the power plant
model. Examples to methods are feed-forward neural networks (whose identity is kept confidential) to predict next day’s hourly
that utilize error back-propagation with gradient descent to update power output with a high accuracy (less than 2% mean relative er-
the network parameters, namely weights. The model here is a set ror). This is an ‘‘on-site, in-the-wild’’ confirmation of the proposed
of weights connecting the input, hidden and output layers. In the machine learning methods. If the model is to be used in a different
case of decision trees, a hierarchical ordering of features with power plant, first the collected data should match in terms of their
respective thresholds is obtained while the algorithm tries to re- statistics. In case of a totally different ambient condition set (such
duce the error (in regression) or increase the information gain (in as a co-generation power plant in Sweden, where the ambient con-
classification). Therefore the algorithm validity and the learned ditions differ dramatically with respect to Turkey), the model
model’s validity are different. should be trained from the data collected from this plant. The
P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140 139
validity/suitability of the algorithms will be discussed in the next precisely, and also investigate prediction of electrical power output
subsection. for different types of power plants.
As mentioned before, the data are collected over a long period, Finally, I would like to thank Dr. Erdinç Uzun and Heysem Kaya,
therefore are highly representative of the population. Similarly, the who is pursuing his PhD degree in Boğaziçi University, Department
numbers of samples as opposed to the number of features are suf- of Computer Engineering in Istanbul, for their sincere supports, and
ficient to learn a regression model, which can be subjected to sta- helps.
tistical significance analysis. Without fine tuning the method’s
hyper-parameters (such as the number of hidden nodes in a neural
References
network), the statistical tests provide a confident estimate of rela-
tive performance. We use 5 2 CV to obtain 10 simulations to [1] Kesgin U, Heperkan H. Simulation of thermodynamic systems using soft
measure the statistical significance. Based on the findings, we computing techniques. Int J Energy Res 2005;29:581–611.
can argue that in a similar study even with different ambient con- [2] Güvenir HA. Regression on feature projections. Knowl-Based Syst
2000;13:207–14.
ditions, the four features collectively are most likely to provide the [3] Kaya H, Tüfekci P, Gürgen FS. Local and global learning methods for predicting
best results. Similarly, the best performing algorithms are encour- power of a combined gas & steam turbine. In: International conference on
aged to be tested first. emerging trends in computer and electronics engineering (ICETCEE’2012),
Dubai, March 24–25, 2012.
[4] Fast M, Assadi M, Deb S. Development and multi-utility of an ANN model for an
industrial gas turbine. Appl Energy 2009;86(1):9–17.
5. Conclusion [5] Rahnama M, Ghorbani H, Montazeri A. Nonlinear identification of a gas turbine
system in transient operation mode using neural network. In: 4th Conference
on thermal power plants (CTPP), IEEE Xplore; 2012.
This study presented an alternative solution model for a predic- [6] Refan MH, Taghavi SH, Afshar A. Identification of heavy duty gas turbine
tion of the electrical power output of a base load operated CCPP, startup mode by neural networks. In: 4th Conference on thermal power plants
(CTPP), IEEE Xplore; 2012.
when it was full load. Instead of thermodynamical approaches, [7] Yari M, Shoorehdeli MA. V94.2 gas turbine identification using neural network.
which involve some assumptions with intractably many nonlinear In: First RSI/ISM international conference on robotics and mechatronics
equations of a real application of a system, machine learning ap- (ICRoM), IEEE Xplore, 2013.
[8] Kumar A, Srivastava A, Banerjee A, Goel A. Performance based anomaly
proaches were preferred to use for accurate prediction. The analy-
detection analysis of a gas turbine engine by artificial neural network
sis of a system by using thermodynamical approaches takes too approach. In: Procee. annual conference of the prognostics and health
much computational time and effort, and sometimes the result of management society; 2012.
this analysis might be unsatisfactory and unreliable due to many [9] Tayarani-Bathaie SS, Sadough Vanini ZN, Khorasan K. Dynamic neural
network-based fault diagnosis of gas turbine engines. Neurocomputing
assumptions taken into account and nonlinear equations. In order 2014;125(11):153–65.
to overcome this obstacle, the analysis of several machine learning [10] Tso GKF, Yau KKW. Predicting electricity energy consumption: a comparison of
regression methods for predicting output of a thermodynamic sys- regression analysis, decision tree and neural networks. Energy
2007;32(9):1761–8.
tem, which is a CCPP with two gas turbines, one steam turbine and [11] Azadeh A, Saberi M, Seraj O. An integrated fuzzy regression algorithm for
two heating systems, was presented as an alternative analysis. energy consumption estimation with non-stationary data: a case study of Iran.
There were two main purposes of this study. The first was to Energy 2010;35(6):2351–66.
[12] Ekonomou L. Greek long-term energy consumption prediction using artificial
discover the best subset of our dataset among all other subset con- neural networks. Energy 2010;35(2):512–7.
figurations in the experiments. For this purpose, we investigated [13] Che J, Wang J, Wang G. An adaptive fuzzy combination model based on self-
which parameter or combinations of parameters were the most organizing map and support vector regression for electric load forecasting.
Energy 2012;37(1):657–64.
influential on the prediction of the target parameter. Secondly, [14] Kavaklioglu K. Modeling and prediction of Turkey’s electricity consumption
we aimed to find out which machine learning regression method using support vector regression. Appl Energy 2011;88:368–75.
was the most successful in prediction of full load electrical power [15] Leung PCM, Lee EWM. Estimation of electrical power consumption in subway
station design by intelligent approach. Appl Energy 2013;101:634–43.
output.
[16] Alrashidi MR, El-Naggar KM. Long term electric load forecasting based on
In order to find out the most influent individual variables or particle swarm optimization. Appl Energy 2010;87:320–6.
combination of the variables, all possible subsets of the dataset, [17] Machine Learning Group at University of Waikato. <www.cs.waikato.ac.nz/ml/
which include 15 different combinations of four variables such as weka/> [accessed: 15.10.12].
[18] Niu LX. Multivariable generalized predictive scheme for gas turbine control in
AT, V, AP and RH, were applied to 15 different machine learning combined cycle power plant. In: IEEE conference on cybernetics and intelligent
regression methods. As a result of the experiments, the subset, systems; 2009.
which consists of a complete set of parameters, was found to be [19] Ramireddy V. An overview of combined cycle power plant. <http://electrical-
engineering-portal.com/an-overview-of-combined-cycle-power-plant>
the best subset of the dataset among all possible subsets yielding [accessed: 02.03.13].
MAE of 2.818 and RMSE of 3.787 in prediction of electrical power [20] Han J, Kamber M. Data mining: concepts and techniques. San
output. Besides, the best accuracy was obtained by applying the Francisco: Morgan Kauffmann Publishers; 2001.
[21] Dash M, Liu H. Consistency-based search in feature selection. Artif Intell
subset with four parameters using Bagging method with REPTree 2003;151:155–76.
predictor. Similarly, according to the average results of the com- [22] Betrie GD, Tesfamariam S, Morin KA, Sadiq R. Predicting copper concentrations
parative experiments, the most successful method, which might in acid mine drainage: a comparative analysis of five machine learning
techniques. Environ Monit Assess 2012. http://dx.doi.org/10.1007/s10661-
predict the full load electrical power output of a base load operated 012-2859-7.
CCPP with the highest prediction accuracy, was found as Bagging [23] Ekinci S, Celebi UB, Bal M, Amasyalı MF, Boyacı UK. Predictions of oil/chemical
REP Tree method, resulting in MAE of 3.220 and in RMSE of 4.239. tanker main design parameters using computational intelligence techniques.
Appl Soft Comput 2011;11:2356–66.
The CCPP, where the dataset is supplied for this study, has
[24] Witten IH, Frank E. Data mining: practical machine learning tools and
started to use this developed predictive model for next day’s techniques with java implementations. San Francisco: Morgan Kaufmann;
hourly energy output. As input the CCPP uses the next day’s 2005.
temperature forecast given by the state’s meteorology institute. [25] Crouch C, Crouch D, Maclin R, Polumetla A. Automatic detection of RWIS
sensor malfunctions (Phase I). Final report, published by Intelligent
In future works, we plan to perfect the input to this predictive Transportation Systems Institute Center for Transportation Studies
model by first predicting the next day’s ambient variables more University of Minnesota, CTS 09-10, March 2009.
140 P. Tüfekci / Electrical Power and Energy Systems 60 (2014) 126–140
[26] Rousseeuw PJ. Least median of squares regression. J Am Stat Assoc [36] De Sa A, Al Zubaidy S. Gas turbine performance at varying ambient
1984;79(388) [Theory and Methods Section]. temperature. Appl Therm Eng 2011;31(14–15):2735–9.
[27] Haykin S. Neural networks, a comprehensive foundation. 2nd ed. New [37] Erdem HH, Sevilgen SH. Case study: effect of ambient temperature on the
Jersey: Prentice Hall; 1999. electricity production and fuel consumption of a simple cycle gas turbine in
[28] Elish MO. A comparative study of fault density prediction in aspect-oriented Turkey. Appl Therm Eng 2006;26(2–3):320–6.
systems using MLP, RBF, KNN, RT, DENFIS and SVR models. Artif Intell Rev [38] Lee JJ, Kang DW, Kim TS. Development of a gas turbine performance analysis
2012. http://dx.doi.org/10.1007/s10462-012-9348-9. program and its application. Energy 2011;36(8):5274–85.
[29] Cleary JG, Trigg LE. K⁄: an instance-based learner using an entropic distance [39] Improve steam turbine efficiency. <http://www.iffco.nic.in/applications/
measure. In: 12th International conference on machine learning; 1995. p. 108– brihaspat.nsf/0/fddd5567e90ccfbde52569160021d1c8/$FILE/turbine.pdf>
14. [accessed: January 2012].
[30] Friedman JH. Stochastic gradient boosting (tech. rep.). Palo Alto (CA): Stanford [40] Challagulla VUB, Bastani FB, Yen IL, Paul RA. Empirical assessment of machine
University, Statistics Department; 1999. learning based software defect prediction techniques. In: 10th IEEE international
[31] Breiman L. Bagging predictors. Mach Learn 1996;24(2):123–40. workshop on object-oriented real-time dependable systems, Sedona, 2–4
[32] D’Haen J, Van den Poel D. Temporary staffing services: a data mining February, 2005. p. 263–70. http://dx.doi.org/10.1109/WORDS.2005.32.
perspective. In: 2012 IEEE 12th international conference on data mining [41] Liu H, Gopalkrishnan V, Quynh KTN, Ng W. Regression models for estimating
workshops; 2012. product life cycle cost. J Intell Manuf 2009;20:401–8. http://dx.doi.org/
[33] Wang Y, Witten I. Inducing model trees for continuous classes. In: Ninth 10.1007/s10845-008-0114-4.
European conference on machine learning, Prague, Czech Republic; 1997. p. [42] Wallet BC, Marchette DJM, Solka JL, Wegman EJ. A genetic algorithm for best
128–37. subset selection in linear regression. Comput Sci Stat 1997;28:545–50.
[34] Portnoy S, Koenker R. The Gaussian hare and the Laplacian tortoise: [43] Dietterich TG. Approximate statistical tests for comparing supervised
computability of squared-error versus absolute-error estimators. Stat Sci classification learning algorithms. Neural Comput 1998;10:1895–923.
1997;12(4):279–300. [44] Alpaydın E. Introduction to machine learning. 2nd ed. MIT Press; 2010.
[35] Arrieta FRP, Lora EES. Influence of ambient temperature on combined-cycle [45] Montgomery DC. Design and analysis of experiments. 6th ed. John Wiley &
power-plant performance. Appl Energy 2005;80(3):261–72. Sons; 2007.