Zhang 2015

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON SMART GRID 1
Day-Ahead Power Output Forecasting for

Small-Scale Solar Photovoltaic
Electricity Generators
Yue Zhang, Marc Beaudin, Student Member, IEEE, Raouf Taheri,
Hamidreza Zareipour, Senior Member, IEEE, and David Wood
Abstract—Because of the rapid growth of small-scale solar commitment analysis, determining reserve requirements and
electricity generation over the past few years, forecasting solar contingency analysis by system operators [5]. Solar farm own-
power output is becoming more important. However, changes in ers may also use short-term forecasts to plan their bidding
weather conditions cause solar power generation to be highly
volatile. This paper analyses the challenges of solar power fore- strategies in electricity markets and minimize the penalties that
casting and then presents a similar day-based forecasting tool are sometimes imposed on variable resources [6]. Short-term
to do 24-h-ahead forecasting for small-scale solar power output solar power forecasts are also useful in dealing with volt-
forecasting. age issues that arise from integration of solar PV system in
Index Terms—Forecasting, prediction, solar power, similar day. distribution networks [7].
Using forecasting tools to improve operation efficiency
of power systems has a long history. In particular, elec-
tric load forecasting solutions have been around for several
I. I NTRODUCTION
decades, and load forecast error has gradually dropped to a
HE PRICE of photovoltaic (PV) solar panels has dropped
T significantly over the past years. For individual PV pan-
els, the price has dropped from 4 $/W in 2006 to less than
very low level of 1%–3% [8]. Compared to electricity load,
wind and solar power generation forecasting errors are sig-
nificantly higher, sometimes reaching 15%–20% [9], [10]. A
1 $/W in 2012. The cost of installing a complete PV system recent review of energy forecasting technologies in power sys-
is expected to drop from 3.35 $/W in 2012 to 1.50–2.19 $/W tems can be found in [11]. What improves the predictability
by 2020 [1]. Furthermore, the efficiency of these panels has of electric load is the repeating load consumption patterns
improved over the years to 17.5% for multi-Si cell in 2013 [2]. driven by human/industrial behavior. However, wind and solar
Lower prices and higher efficiencies have contributed to a power are driven by weather conditions, where climate patterns
growing solar power market, with an average 40% growth over and fluctuations are less predictable. In particular, because of
the past decade. It is expected that annual installed solar PV changing weather, solar power generation data in an array level
panel capacity will reach 73.4 GW in 2020 [3]. Grid-connected is highly nonstationary. Moving from sunny days to cloudy
solar power generation, whether at the roof top level or at the days breaks the continuity and production patterns in the data
bulk solar farm level, accounts for over 90% of the PV market and changes the daily mean and variance of power production
in 2010, and is expected to dominate the PV market for the time series.
foreseeable future [3]. Several methodologies for solar PV power forecasting have
Due to the rapid integration of solar power into electric- been proposed in the literature. In [12], historical power out-
ity systems and the inherent variability of this energy source, put and forecast irradiance was fed into an autoregressive
forecasting short-term variations of solar power generation with exogenous input (ARX) model to generate 6-h-ahead
is necessary for power systems operation. Short-term fore- power output forecasts with a normalized root mean square
casts, i.e., a few hours to days-ahead [4], are used for unit error (nRMSE) of 7.2%. In [13], historical power output and
forecast temperature was fed into a recurrent neural network
Manuscript received June 12, 2014; revised October 19, 2014 and
December 4, 2014; accepted January 15, 2015. This work was supported to predict 24-h-ahead power with a mean absolute percentage
in part by the program of Renewable Energy Research funded by the Natural error (MAPE) of 16.83%. In [14], the historical power output
Sciences and Engineering Research Council of Canada, and in part by the is classified by forecast irradiance, total cloud and low cloud
ENMAX Corporation under the Industrial Research Chairs Program. Paper
no. TSG-00569-2014. cover, and radial basis function neural networks (RBFNNs)
Y. Zhang, M. Beaudin, R. Taheri, and H. Zareipour are with the Department were used to generate forecasts, with MAPE ranging from
of Electrical and Computer Engineering, Institute for Sustainable Energy, 8.29% to 54.44%. In [15], forecast high, medium, and low
Environment, and Economy, University of Calgary, Calgary, AB T2L0C9,
Canada (e-mail: h.zareipour@ucalgary.ca). temperatures are used to classify historical power output,
D. Wood is with the Department of Mechanical and Manufacturing three feedforward neural networks were employed to gener-
Engineering, University of Calgary, Calgary, AB T2L0C9, Canada. ate 24-h-ahead forecasts with MAPE ranging from 10.06%
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. to 18.89%. In [9], the training set was classified into four
Digital Object Identifier 10.1109/TSG.2015.2397003 groups according to the weather type, and different support
1949-3053 c 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON SMART GRID
vector machines (SVMs) were used to generate forecasts with

an nRMSE of 10.5%. In [16], a combination of similar-day
techniques and weighted SVMs was employed and 1-h-ahead
forecasts with an average nRMSE of 4.36% were generated.
In [17], a recurrent neural network with structural elements
was used to predict 24-h-ahead PV power output for a 4.08 kW
PV system in Denmark. The inputs included clear sky irradi-
ance and forecast weather type for the forecast days. For a
half-year testing period, the MAPEs for the recurrent neu-
ral network model and feed-forward neural network were
16.47% and 30.72%, respectively. In [18], wavelet transform
Fig. 1. Location of three PV sites: San Diego, Braedstrup, and Catania.
and RBFNN were combined to generate a 1-h-ahead PV power
forecast for a 15 kW PV plant in Ashland, USA. The neu-
ral network inputs included past PV power output, irradiance, compared to more complex models. Most winners in the
and temperature. The MAPE varied from 4.24% to 13.81% Global Energy Forecasting Competition 2012 (GEFCom2012)
depending on the season. The most recent Global Energy load forecasting track used regression analysis [22] instead
Forecasting Competition (www.gefcom.org) has included a of artificial neural networks or ARIMA methods, which are
solar power forecasting track in year 2014. complex compared to regression analysis. While there are
From the literature, it can be observed that solar PV power many forecasting algorithms in the literature, regression analy-
forecasting accuracy is highly influenced by the type of sis has been well-integrated as forecasting methods in electric
weather on a specific day. In other words, forecast accuracy utilities, such as the linear regression model in [23], and
may be significantly different for sunny days versus cloudy the semi-parametric regression models in [24] and [25]. The
days. For example, Chen et al. [14] reports that the MAPE of second place winner of GEFCom2012 used a new implemen-
forecasts for sunny, cloudy, and rainy days ranged from 8.29% tation of a k-nearest neighbor (kNN) algorithm [22], which
to 10.8%, from 6.36% to 15.08%, and from 24.16% to 54.44%, is a relatively simple and old clustering method. In addition,
respectively. Shi et al. [9] reports that the nRMSE of one- considering that roof-top solar power is gaining significant
day-ahead forecasts with 15-min intervals was 9.12% for the popularity, simple forecasting methods that do not require
cloudy days, 12.6% for foggy days, 12.4% for rainy days, and advanced computing resources would be preferred in retail
7.85% for sunny days. In addition, various forecasting models consumer level.
have been used in the literature, but none tend to outperform The rest of this paper is organized as follows. Section II
others consistently. For example, Sfetsos and Coonick [19] provides the background, describes the employed data sets,
compared the forecasts generated by auto-regressive integrated and discusses the challenges of solar power forecasting com-
moving average (ARIMA) and different types of neural net- pared to other forecasting tasks in power systems. Section III
work models and the results shows that neural network models presents the proposed forecasting method that is composed
outperformed the former. However, in [20], the opposite was of a similar-day detection (SDD) engine and a forecasting
reported. Such inconsistencies root in the nature of solar power engine. The numerical results and discussions are provided in
time series studied and the overall forecasting mechanism. Section IV, followed by some concluding remarks in Section V.
In this paper, we explore the characteristics of solar power
time series and provide a comparison with electric load as a II. BACKGROUND
benchmark forecasting task in electric power systems. Based
on those characteristics, we propose a forecasting method that In this section, we provide a discussion on the challenges
mainly relies on mining data and finding days that are “sim- of solar power forecasting compared to electricity load fore-
ilar” to the forecast day according to certain measures. We casting. The discussions are based on the real-life data that
demonstrate that if the data mining component is performed are later used for numerical simulations.
properly and proper similar days are found, solar power fore-
casts with relatively high accuracy can be generated using A. Data Description
simple and computationally efficient models. We apply the We have used the data from solar PV systems in three differ-
proposed method to three different solar power sites, with very ent locations around the world; see Fig. 1 for the locations. The
different weather regimes, and provide comparative numerical first site, Site 1, is located in San Diego, the USA (32.86 ◦ N,
results. The main contributions of this paper are: 1) to pro- 117.25 ◦ W), the second site, Site 2, is located at Braedstrup,
vide an analysis of the challenges of solar power forecasting; Denmark (55.97 ◦ N, 9.61 ◦ E), and the third one, Site 3, is
and 2) to design a similar-day selection technique that helps located at Catania, Italy (37.41 ◦ N, 15.04 ◦ E). Information on
generating relatively accurate forecasts when combined with these sites are summarized in Table I.
simple forecasting engines. The proposed technique is able Site 1 is a 49.2 kW PV system. This system has 240 Kyocera
to deal with some of the identified challenges of solar power KD 205 GXLP PV modules and was installed at a tilt of 10◦
forecasting. and azimuth of 180◦ . For this site, the recorded values of
According to the principle of parsimony [21], simple fore- ambient temperature, cell temperature and wind speed were
casting methods that perform at least equally well are preferred available for the period of July 1, 2010 to December 31, 2011,
ZHANG et al.: DAY-AHEAD POWER OUTPUT FORECASTING FOR SMALL-SCALE SOLAR PV ELECTRICITY GENERATORS 3
TABLE I
S UMMARY OF T EST S ITES
Fig. 2. Daily weather distribution over a year for the three PV sites:
San Diego, Braedstrup, and Catania.
Fig. 3. Daily power pattern for two consecutive sunny days at Site 1.
with a resolution of 15-min [26]. Also for this site, the
global horizontal irradiance (GHI) values were measured at
Hubbs Hall, which is 300 meters away from the PV site, the equator, the sites remain snowless for the entire year.
using a LICOR Li-200SZ silicon-186 pyrometer sampled at Braedstrup has more rain and fog compared to the other two
one second intervals [27]. In addition, forecast GHI at 1-h locations. Moreover, these three locations have limited full
intervals, provided by the National Oceanic and Atmospheric sunny days. San Diego and Catania have a significant num-
Administrations (NOAA) using the Weather Research and ber of partly cloudy days. In this paper, the parameters for
Forecasting North American Mesoscale model [28], were weather-type classifications (e.g., sunny, cloudy, and foggy)
obtained. are employed based on the definitions provided by Weather
Site 2 has 21 PV systems built as Project Sol 300 in Underground (www.wunderground.com)
Denmark [12]. Those PV systems are made by BP 585 mod-
ules and BP GCI 1200 inverters. The rated power varies from
1020 to 4080 W, the azimuth angle varies from 100◦ to 230◦ , B. Challenges of PV Output Forecasting
and the tilt angle varies from 15◦ to 45◦ . The data employed As mentioned in the introduction, the literature suggests that
here is the average output of the 21 systems, and the capacity solar power forecasting is significantly less accurate than elec-
of the site is assumed to be the mean peak capacity of these tric load forecasting. From a time-series modeling point of
21 systems. For Site 2, we used data that cover the period view, the lower predictability of solar power data stems from
from January 1, 2006 to December 31, 2006. the weak stationarity and noncontinuity of power production
The aggregated produced power was measured from these patterns. Nonstationarity refers to varying mean or variance,
21 PV systems at 15-min intervals. Climate forecast data, or both, of the data and imposes limitations on modeling time
including GHI, high cloud cover, medium cloud cover, low series [21]. From a forecasting point of view, solar power is
cloud cover, total cloud cover, fog, ambient temperature, and driven by weather conditions, which are hard to predict, and
wind speed were provided by the Danish Meteorological thus, solar power forecasting has relatively high errors.
Institute [29]. The forecast GHI are provided for 3-h intervals, Fig. 3 depicts two consecutive days of power output at
and other forecast data are provided at 4-h intervals. Site 1. As it can be seen, the power output values are very
Site 3 is a 5.21 kW PV system. Available recorded data close on these two days. However, days with this much sim-
included solar altitude, GHI, direct normal irradiance, ambi- ilarity in weather condition are not very common, and thus,
ent temperature, and power output. Forecast data, including the power output would vary accordingly. Fig. 4(a) depicts
solar altitude, GHI, direct normal irradiance, total cloud cover, the power output for four consecutive days at Site 3. Observe
and ambient temperature were provided by the regional atmo- that the change in weather condition has resulted in differ-
spheric modeling system. These data are available at 1-h ent power output values in each day. Such dramatic changes
intervals. For Site 3, we used data that cover the period from in the values break the continuity of the data patterns and
January 1, 2011 to December 31, 2011. leads to highly nonstationary time series. Observe that changes
Aside from having different altitudes, which impacts the in weather conditions resulted in unique daily PV generation
number of daylight hours, the three selected sites have dif- profiles. Lack of continuity of patterns in solar power data
ferent climate conditions. Fig. 2 shows the number of days makes it difficult for forecasting models to capture any exist-
for each weather type for a one-year period for the three ing pattern. For comparison purpose, observe the continuity of
sites. Since San Diego and Catania are located closer to electricity load for a typical week in Fig. 4(b). Comparing to
(a)
Fig. 6. Daily power pattern on two sunny days in January and May in
California.
used in financial market data analysis. It basically mea-

(b) sures the average variance of fluctuations in a time series
over certain intervals (e.g., from one hour to another, or
Fig. 4. (a) Power output of Site 3 for a period of four consecutive days in
2011. (b) Electricity load for a typical week in 2011 at California. from a day to another) [30]. Let Pt denote the hourly PV
power output at time t, a logarithmic “return” for Pt is
defined as [30]

Pt
rt,h = ln = ln(Pt ) − ln(Pt−h ) (1)
Pt−h
where, h is time period for the return. If rt,h is an identically

and independently distributed (i.i.d.) series over a time window
(a) T, it could be expressed as
rt,h = μh,T + σh,T εt (2)
where, μh,T is the conditional mean return, σh,T is the condi-

tional return variance, and εt is an i.i.d process with zero mean
and unit variance. σh,T is defined as the historical volatility
over the time window T. In this paper, T is selected as the
(b) daylight hour. σ1,T (d) refers to the average intrahour volatility
Fig. 5. (a) Daily energy production for Site 2 for a full year along with the
for day d, σ24,T (d) refers to the average trans-day volatility
variations in daylight hours. (b) Daily energy consumption in Denmark along for day d and d ∈ 1, 2, 3, . . . , 365. In summary, σ1,T (d) quan-
with daylight hours. tifies the variations in hour-to-hour power output over a day
and σ24,T (d) quantifies the power output fluctuations at the
solar power data, electricity load has repetitive patterns, and same hour on subsequent days.
thus, is easier to predict. Fig. 7 presents the intrahour volatility σ1,T (d) and trans-day
The other factor that contributes to noncontinuity and non- volatility values for Catania over a one year period. Observe
stationarity of solar power data is the change in daylight hours from Fig. 7(a), the intrahour volatilities mostly range from
throughout the year. For example, Fig. 5(a) illustrates a full 100% to 200%, with exception of a few days with higher val-
year of daily energy production in Site 2. Observe that energy ues. A day over which there are frequent cloud movements
production (in dashed black) is strongly dependent on daylight would have a high intraday volatility. Similarly, observe from
hours (in solid red). In Fig. 5(b), we present the daily load in Fig. 7(b), the trans-day volatilities range between 4% and
Denmark (i.e., same location as Site 2). Observe that while 400%. The trans-day volatilities are high when one of two
there is a seasonal dependence of load on daylight hours, the consecutive days is sunny and one is rainy.
dependence is much smaller than PV generation. The varia- To give context to the volatility values, Table II provides
tions of daylight hours and seasonality of the power production the average intrahour and trans-day volatilities for the three PV
data impacts the data patterns. For example, a sunny day in sites, along with those of the electric load in the corresponding
January and a sunny day in May have significantly different region. Observe that solar power data suffer from extremely
power patterns, as illustrated in Fig. 6. Such variations cre- high volatilities compared to electric load data.
ate limitations on training forecasting models, which normally To summarize, solar power data lack continuity and sta-
require a long range of consistent data to be able to extract tionarity and suffer from extreme intrahour and trans-day
patterns. volatilities. Such characteristics makes it difficult to apply
In order to quantify the level of variability and fluctua- conventional forecasting tool (e.g., neural networks and regres-
tions in solar power data, we use the concept of historical sion models) to solar power data and achieve high forecasts
volatility. Volatility is a measure of variability that is mainly accuracies similar to electric load.
the uncertain variables. Denote the set of external variables

(d)
by V = {vj,t |j = 1, . . . , J; d = 1, . . . , N; t = 1, . . . , T} where
t indicates the measurement interval and T is the number of
intervals in a day (e.g., T = 24 if hourly values are measured).
The process of selecting similar days by the SDD engine
have five steps, as follows.
(a) Step 1: The values of EDs based on recorded power values
are calculated for each and every pair of historical days, say
day d and day d , as follows:

T
(d ) 2
ED p, d, d =
(d)
p −p t t (3)
t=1
(b) where ED(p, d, d ) is the ED between days d and d according

to the values of p. To make the values of ED comparable for
Fig. 7. Intrahour volatility and trans-day volatility for Catania. the rest of the calculations in this paper, we use normalized
values of each variable under study. For N historical days,
TABLE II
H ISTORICAL VOLATILITIES FOR T HREE A RRAY L EVEL PV there will be N(N − 1)/2 day pairs and thus, N(N − 1)/2 val-
O UTPUT AND C ALIFORNIA L OAD OVER A Y EAR ues of ED. Note that since these values of ED are calculated
using the actual recorded values of power outputs, days with
similar power pattern would have very small values of ED,
whereas days with significantly different power patterns, say a
sunny day versus a snowy day, would have high values of ED.
Step 2: The values of EDs are calculated for each and every
pair of historical days based on each external variable v(.)
j,t , as
III. M ETHODOLOGY follows:

T
Considering the discussions on characteristics of solar (d ) 2
ED v , d, d =
(d)
power data in the previous section, we propose a forecast- j v −v j,t j,t(4)
ing method to use the available data at midnight and forecast t=1
the power output of a solar power system for the next 24-h. where, ED(vj , d, d ) is the ED between day d and day d based
Briefly, the proposed method has two components, namely, on the normalized values of variable vj . Now, let consider two
the SDD engine and the forecasting engine. The SDD engine days, day d and day d , with very similar power patterns. Thus,
mines the historical data to find days that are similar to the these days would have a very small value of ED(p, d, d ).
target forecasting day, according to certain similarity measures. The question is: which of the external variables could as well
The power output data of the selected similar days are then fed capture the similarity between the two days and result in small
into the forecasting engine to generate power output forecasts values of ED(vj , d, d )? The next step involves selecting the
for the target day. external variables, vj , that should be used to properly capture
the similarities between days d and d in order to include in
A. Proposed SDD Engine the forecasting engine.
The goal of the SDD engine is to detect a number of Step 3: We define the root mean square ED difference
days that are similar to the target forecast day. To do so, (RMSEDD) for each variable vj , in Step 3 of the process,
the historical data is used to build a weighted Euclidean as follows:
distance (ED) [31] measure of similarity.
The power output of a PV system is influenced by several RMSEDD p, vj
external variables, such as, solar irradiance, ambient tempera- N d −1

prime 2
d =2 d=1 ED (p, d, d ) − ED vj , d, d
ture, daylight hours, etc. In practice, a few of these variables = . (5)
2 N(N − 1)
1
are deterministic (e.g., number of daylight hours for the target
day) and can be easily calculated. However, most of them are A low value of RMSEDD indicates that the power pattern
uncertain and must be forecast (e.g., GHI, temperature, and captured by variable vj for days d and d is similar to that
wind speed). The available global or local numerical weather captured by actual recorded power values. We calculate the
prediction models (e.g., models run by NOAA) provide such values of RMSEDD for all external variables, and we call the
forecasts. Consider a set of historical data for N days preced- variable with the lowest value RMSEDD the prime external
ing the target forecast day. The historical data set includes variable (PEV).
recorded power output values at each measuring interval t Step 4: While the PEV has the highest information con-
(e.g., every hour) of each historical day d, denoted here by pt(d), tent in terms of similarity of power patterns, a mechanism to
as well as J other external variables, i.e., the values of the include the other external variables in the similar-day selec-
deterministic variables, and the historical forecast values of tion process and benefit from any complementary information
embedded in them is desirable and could potentially improve

forecasting accuracy. To do so, we define the weighted hybrid
distance (WHD) for days d and d for variables PEV and vj ,
where vj = PEV, as follows:

WHD vj , d, d = 1 − λvj ED PEV, d, d

+ λvj ED vj , d, d (6)
where 0 ≤ λvj ≤ 1. By varying the value of λvj , the one that

results in the lowest RMSEDD is determined. Note that in
this case, ED(vj , d, d ) in (5) is replaced with WHD(vj , d, d ).
At the end of this step, the best pair of external variables
which capture the similarity among the studied days the best
according to RMSEDD are determined. We denote the second
informative variable found in this step as secondary external
variable (SEV). Observe that one may extend this process to
include further variables (e.g., the best three variables which
capture the similarity). However, our extensive simulations Fig. 8. Proposed forecasting methodology.
found that adding other available variables did not result in
any meaningful improvements in final forecasts.
Step 5: In this step, we rank the similarity of the target fore- discussed in Section II-B. It is in the forecasting stage, and
cast day to the previous days according to their WHD values. by means of trial and error, that we find an “optimal” number
To do so, denote the target forecast day by d( f ) . We calculate of similar days to be included for producing forecasts.
the values of WHD for all days d in the set of historical data Some literary works preprocess data before feeding them to
as follows: the forecasting engine. For example, [9] classifies the data
according to the type of weather (e.g., sunny or cloudy),
WHD SEV, d, d( f ) = (1 − λSEV ) ED PEV, d, d( f ) while [14] classifies the data according to irradiance, total
cloud and low cloud cover. These methods use a predeter-
+ λSEV ED SEV, d, d( f ) . (7) mined set of available external variables (e.g., precipitation
and temperature forecasts) to build a fixed similar-day selec-
Observe that the values of SEV and PEV for each forecast-
tion algorithm. Note that the availability and accuracy of these
ing interval of the target day as well as all historical days are
external variables are highly dependent on geographical loca-
known and so is the value of λSEV . The lower the value of
tion and the source of the external variables (e.g., the weather
WHD for a given day, the more similar is that day to the target
bureau). Thus, a flexible similar-day selection algorithm is
day. This step basically provides us with a similarity ranking
required because each site may require a customized set of
for the target day with respect to all historical days.
external variables. In this paper, we present a flexible similar-
Step 6: The actual power production of the historical similar
day selection algorithm that detect production patterns based
days are passed to the forecasting engine.
on a diverse range of external variables. More specifically,
To summarize the SDD engine, in Step 1, we look at the
the proposed algorithm determines the most informative exter-
similarity of historical days in terms of actual power produc-
nal variables from a pool of candidates by comparing their
tion values. In Steps 2 and 3, we decide which external variable
information content to that of the actual power patterns in the
provides the most consistent similarity among historical days
training stage.
compared to the similarities found in Step 1. In Step 4, we
decide which external variable is the second most informative
variable to determine which days are similar, consistent with B. Forecasting Engines
the similarities of Step 1. Note that in these three steps, we The actual power production values of the similar days,
work with all available historical data, and no specific target detected by the SDD engine, are used to train the employed
day is of interests. It is in Step 5 that we turn our focus on a forecasting models. We have employed four well-known clas-
specific target day. In this step, we determine how the target sic forecasting models as the engine, namely, RBFNNs [14],
day is similar to the historical days based on PEV and SEV. least square SVM (LS-SVM) [9], kNNs, and weighted kNNs
In Step 6, the actual hourly power values of detected similar (WkNNs) [32]. RBFNN is a nonlinear model which can adjust
days are passed to the forecasting engine. Fig. 8 summarizes model form from data, however, RBFNN does not lead to a
the forecasting process. convex optimization, so it may get trapped in a local optimal
Observe that the SDD engine helps us build a new power point [33]. LS-SVM is a nonlinear model with a convex opti-
time series by manipulating the original power time series. For mization process. LS-SVM could produce an accurate forecast
each target day, while the integrity of intraday variations are using less training sets compared to neural network-based fore-
preserved, essentially, the order of days in the historical times casting [34]. The common feature of RBFNN and LS-SVM is
series is reshuffled based on their similarity rank. This is in fact that they require a training and model building step. This step
to deal with the problem of noncontinuity in the historical data is not straightforward because it requires model identification,
TABLE III
estimation, and testing stages that are sometimes computa- RMSEDD(p, vk ) FOR E ACH VARIABLE AT THE T HREE T ESTING
tionally burdensome. For example, identifying the optimal L OCATIONS . I N T HIS TABLE , “N/A” D ENOTES
number of neurons in RBFNN could take extensive trial U NAVAILABILITY OF DATA
and error.
The kNN and WkNN are simple and computationally effec-
tive forecasting methods [32]. When used for regression
purposes, these models use observed values in the past as
the basis for the forecast of future values. After detecting the
“nearest neighbors” based on predefined distance measures,
the kNN model uses the average of the observed values for K
nearest neighbors days as the forecast value, as follows:
K
1 (k)
p̂t = pt (8)
K
k=1
where, p̂t is the forecast value for the output variable at hour t IV. N UMERICAL R ESULTS
(k)
on forecasting day d( f ) and pt is the observed output variable We apply the proposed method to the data from the three
at hour t of historical day k. We refer to a kNN model that sites for which the data was described in Section II-A. Table III
uses K historical days as kNNK . Note that the nearest neigh- presents the values of RMSEDD(p, vj ) for each available vari-
bors in this paper are determined by the proposed SDD engine able at the three testing locations. For all three locations, GHI,
discussed before. The WkNN model uses a weighted average as expected, shows the lowest value of RMSEDD, and thus is
of the selected similar days, instead of a simple average. The selected as the PEV.
weights are defined according to the measure of similarity, i.e., Two error measurements are used in this paper to measure
the higher the similarity, the higher the weight. In this paper, daily forecasting accuracy: normalized mean absolute error
the weights are driven from the values of WHD from Step 5. (nMAE) and normalized root-mean-square error (nRMSE).
The main advantage of these two models compared to the They are defined as

other two, i.e., RBFNN and LS-SVM, is that there is no train-
100 p̂t − pt
ing involved beyond finding the nearest neighbors. In other nMAE = (9)
words, the model building process narrows down to selecting DL t PC

the optimal number of days to be used, i.e., the value of K. 1 2
We use trial and error to determine the number of simi- p̂t − pt
DL t
lar days to be included in the process of training the forecast nRMSE = (10)
PC
models, i.e., K. The index of similarity is WHD from Step 5.
After extensively training models for different values of K, where, DL is the number of daylight hours of the target fore-
we found that the unique characteristics of each site and forecast day, t includes only the daylight hours as the night time
casting engine influence the modeling parameters that yield output is assumed to be zero. p̂t is the forecast power output
the best forecasting result. For example, each site has unique at hour t, pt is the actual power output at hour t and PC is
weather and climate characteristics. To accommodate the dif- the capacity of the PV site. For all testing days, the average
ferences between the sites, a model may require a different forecasting errors and the standard deviation of the forecasting
number of most similar days for each site in order to minimize errors are reported.
the impact of daylight hours and seasonality in the model-
ing process. Moreover, the modeling parameters, such as the A. Results for the Three Sample Sites
minimum training data required [33], are unique to each fore- The proposed SDD engine was applied to the data sets and
casting engine. Hence, we recognize that is important to train the outcome of this stage were used to generate forecasts using
each forecast engine separately based on the characteristics the five selected models. The forecasts were generated for the
of each site. Thus, to build our forecasting models, we select period from April 1 to June 30. Table IV presents the error
modeling parameters that yield the minimum nRMSE for each measures for the forecasts. The results of the naive forecasts
site by trial-and-error for all forecasting engines. For exam- are also presented. The smallest nMAE and nRMSE errors are
ple, the kNN model is most accurate in San Diego (i.e., Site 1) in bold. From Table IV, note that the kNN and WkNN mod-
when the model is trained with three most similar days. els consistently outperform the RBFNN and LS-SVM models,
As a benchmark, we also use a naive persistence model, despite being less complex.
i.e., using the observed values of day d as the forecasts for Note the large difference between the minimum and maxi-
day d + 1. This is the simplest form of forecast. mum average forecast error in Braedstrup, relative to the other
Note that, we do not use the external variables in this stage two sites in Table IV. This can be explained by the high aver-
any more, mainly because our numerical experiments proved age forecast error in the naive model, and the low average
that using the values of external variables in addition to the forecast error in the nearest neighbor models. From Table IV,
power values in the forecasting stage did not improve the the kNN and WkNN models outperformed the naive model
forecast accuracy. for all three sites, notably for Braedstrup. This is because, for
TABLE IV TABLE VI
AVERAGE AND S TANDARD D EVIATION (S TD ) OF F ORECASTING E RROR AVERAGE F ORECASTING E RROR FOR T EST L OCATIONS IN
FOR T EST L OCATIONS IN T ERMS OF N MAE AND N RMSE D IFFERENT M ONTH IN T ERM OF N MAE
As seen in Table V, cloudy and rainy days produce a higher

forecasting error. However, forecast errors are reduced by sup-
plementing the kNN forecasting engine with the proposed
SDD model. A similar trend can be seen in Table VI, where
forecast errors are reduced when the kNN forecasting engine
uses the proposed SDD model.
As mentioned before, the most effective external variable,
i.e., PEV, was the irradiance forecasts for all three sites. As
for the next effective external variable, i.e., SEV, Table VII
reports the values of RMSEDD and nRMSE for the three sites
TABLE V
where only the PEV is used versus when both PEV and SEV
AVERAGE F ORECASTING E RROR FOR T EST L OCATIONS U NDER are employed. While for San Diego the daylight hours was
D IFFERENT W EATHER IN T ERM OF N MAE found to be helpful, in other two sites, the total cloud coverage
forecasts was detected as SEV. Also note that inclusion of SEV
in the forecasting process is more effective in San Diego than
the other two. Thus, for a given site, one needs to examine
whether including other external variables are worth the effort
or not.
this site, the day-to-day production volatility is significantly B. Comparison to Other Existing Methods
higher than the other two sites, as shown in the high trans-day We compared the proposed method to the existing literature
volatility σ24,24 in Table II. The naive model uses the observed where the data is available and a fair comparison is possible. In
values of production for the current day to forecasts production this comparison, we use a common kNN forecasting engine
in the following day. Particularly, for Braedstrup, the interday and a common data set (i.e., April 1 to June 30, 2011) to
power production fluctuates more significantly, and thus, the compare two algorithms that use SDD to our proposed method.
naive model results in larger errors. Conversely, the forecast To compare our SDD method to a baseline, we found
engines perform better in the Braestrup site because the GHI two other similar-day selection techniques where compari-
forecasts are more accurate for this site relative to the other son was possible: we refer to them as similar-day method 1
two sites, as shown in Table III, which leads to better SDD (SDM1) [15] and similar-day method 2 (SDM2) [16]. SDM1
recognition. searches the historical days and finds one similar day to the
When the SDD engine can successfully find previous days target day based on the ED of temperature. SDM2, however,
with a high similarity to the target day, the forecasting stage is finds five similar days to the target day based on similarity
straightforward and the patterns can be captured by the simple of season, solar radiation, maximum temperature and mini-
forecasting engines, such as nearest neighbor methods. Due to mum temperature. Both the SDM1 and SDM2 are fixed to
the simple implementation of kNN and its competitiveness for these specific external variables, and use different detection
forecasting accuracy, kNN will be used for the remainder of equations. Conversely to SDM1 and SDM2, our proposed
this paper unless otherwise stated. The reason is that if two SDD provides flexibility in how the similar days are selected.
given days are similar in terms of the day length and sun- More specifically, our approach searches the historical actual
shine, the power outputs would be very similar as well (e.g., production values to find similarity patterns, and matches
see Fig. 3). If the selection of similar days to the target day those with the similarity patterns found in a set of fore-
based on the forecasts of the external variables is successful, casted external variables. Once the SDD engine knows this
it can be expected that the target day would have a power pat- information, it rates the similarity of historical days to the
tern very similar to those historical days. Observe that lower current day, as outlined in Section III-A, Steps 4 and 5,
standard deviation of errors indicates that forecast errors are such that the most similar days are given priority to be
less scattered (i.e., fewer large forecast errors). selected.
We present the impact of weather and seasonal variations on We note that, we can not compare the SDD, SDM1
the forecasting accuracy in Tables V and VI. The nMAE values and SDM2 by using the RBFNN, and LS-SVM forecasting
were produced by using the kNN forecasting engine with the engines, because these forecast engines require more instances
proposed SDD for the period from April 1 to June 30, 2011. to be trained properly for fair comparison. Thus, we present
TABLE VII
E FFECTIVENESS OF A DDING OTHER E XTERNAL VARIABLES TO THE P ROCESS IN T ERMS OF RMSEDD, N RMSE
forecast GHI pattern. However, in reality the observed GHI

values turned out to be much lower than the forecasted ones,
and thus, leading to high error in forecasting power values-
the nMAE for kNN model forecasts for this day was 17.05%.
On the other hand, Fig. 10 plots the same quantities for one
sunny target day, i.e., May 3, again for San Diego. These two
chosen similar days have a very similar GHI forecast pattern
as the target day. Thus, the proposed kNN model forecasts
power output with a very small error, i.e., nMAE of 0.96%.
In summary, if the similar days could be chosen properly, there
is no need for a more complicated model. Simple models like
Fig. 9. Two chosen similar days for forecast day (May 23) at San Diego.
kNN model could produce relatively accurate forecasts.
V. C ONCLUSION
This paper discussed some of the unique challenges of
power output forecasting for small-scale solar electricity gen-
erators. Because the output is highly influenced by cloud
movements, solar irradiance, and number of daylight hours.
The solar power time series shows weak stationarity and lacks
continuity of patterns. This makes it difficult for forecasting
models to make sense of the historical behavior of the time
series and predict its future fluctuations. Thus, compared to
other times series studied for forecasting in power systems lit-
erature (e.g., electric load), the weak stationarity and the lack
Fig. 10. Two chosen similar days for forecast day (May 3) at San Diego. of continuity results in higher volatilities in solar data and thus,
lower predictability is expected. In this paper, we propose a
method for day-ahead solar power forecasting. The method
the comparison for the kNN model only. We applied SDM1 is composed of a SDD engine and a forecasting engine. The
and SDM2 to the data for the three sites. The simulation SDD engine identifies historical days that are similar to the
results showed the proposed SDM resulted in considerably target day. The recorded power values of those days are then
improved forecast errors for all three sites. For example, for employed to forecast the power output for the target fore-
San Diego, the proposed SDD resulted in improved average cast day. Four forecast engines, namely, RBFNN, LS-SVM,
errors by about 8% compared to SDM2 and by about 23% kNN, and WkNN, were employed as the forecasting engine.
compared to SDM1. The proposed method was applied to three sites with differ-
The data for Site 2, Braedstrup, was also used in [12] to ent climate. The numerical results showed that the proposed
generated power forecasts using an auto regressive with exoge- method was effective in generating next day forecasts. The
nous input (ARX) model. The reported error measures in [12] kNN and WkNN forecasting engines coupled with the SDD
are for hours 13:00 to 18:00 only, i.e., for up to 6-h-ahead fore- engine yielded the most competitive forecasting results with
casts. The forecasting horizon in this paper is not the same as the lowest overall errors. A further advantage of the kNN
in [12], i.e., we generate forecasts at midnight for the upcom- and WkNN forecasting engines compared to the other two is
ing day, whereas [12] generates forecasts at noon for the next the computational simplicity. It was also discussed that since
36 h. Yet, we compared the reported average error measures weather forecasts are used as model inputs, inaccuracy in those
in [12] with the ones generated by kNN in the present research forecasts lead into inevitable solar power forecast errors.
for the same hours. On average, the errors of the proposed kNN
were 7% lower than those of [12].
Fig. 9 plots the recorded and forecast power and the GHI for ACKNOWLEDGMENT
target day May 23th and two typical similar days selected for The authors would like to thank J. Kleissl, F. Mejia, and
this particular day for San Diego. Note that the proposed model P. Mathiesen at the University of California, San Diego, CA,
mainly relies on GHI forecasts to select the similar days. The USA, and P. Bacher of the Technical University of Denmark,
forecast GHI values have led to selecting days with similar Lyngby, Denmark, for sharing their solar data.
R EFERENCES [25] S. Fan and R. Hyndman, “Short-term load forecasting based on a semi-
parametric additive model,” IEEE Trans. Power Syst., vol. 27, no. 1,
[1] D. Gauntlett, “Solar PV market forecasts,” Navigant Consulting, pp. 134–141, Feb. 2012.
Inc., Chicago, IL, USA, Tech. Rep., 2013. [Online]. Available: [26] J. Kleissl. (May 7, 2013). UC San Diego Solar Resource
http://www.navigantresearch.com/research/solar-pv-market-forecasts Assessment and Forecasting Laboratory. [Online]. Available:
[2] (2013). Energytrend. [Online]. Available: http://pv.energytrend.com/ http://maeresearch.ucsd.edu/kleissl/
[3] International Energy Agency. (2010). Technology Roadmap: Solar [27] M. Lave, J. Kleissl, and E. Arias-Castro, “High-frequency irradiance
Photovoltaic Energy. [Online]. Available: http://www.iea.org/ fluctuations and geographic smoothing,” Solar Energy, vol. 86, no. 8,
publications/freepublications/publication/pv_roadmap.pdf pp. 2190–2199, 2012.
[4] C. Potter and M. Negnevitsky, “Very short-term wind forecasting for [28] P. Mathiesen and J. Kleissl, “Evaluation of numerical weather prediction
Tasmanian power generation,” IEEE Trans. Power Syst., vol. 21, no. 2, for intra-day solar forecasting in the continental united states,” Solar
pp. 965–972, May 2006. Energy, vol. 85, no. 5, pp. 967–977, 2011.
[5] Cal-ISO. (2014). Building a Sustainable Energy Future 2014–2016 [29] P. Bacher, “Short-term solar power forecasting,” Master’s thesis, Dept.
Strategic Plan. [Online]. Available: http://www.caiso.com/Documents/ Appl. Math. Comput. Sci., Tech. Univ. Denmark, Lyngby, Denmark,
2014-2016StrategicPlan-ReaderFriendly.pdf 2008.
[6] B. Kraas, M. Schroedter-Homscheidt, and R. Madlener, “Economic mer- [30] H. Zareipour, K. Bhattacharya, and C. A. Cañizares, “Electricity market
its of a state-of-the-art concentrating solar power forecasting system for price volatility: The case of Ontario,” Energy Policy, vol. 35, no. 9,
participation in the Spanish electricity market,” Solar Energy, vol. 93, pp. 4739–4748, 2007.
pp. 244–255, Jul. 2013. [31] P. Mandal, T. Senjyu, N. Urasaki, T. Funabashi, and A. Srivastava,
[7] A. Woyte, V. Van Thong, R. Belmans, and J. Nijs, “Voltage fluctuations “A novel approach to forecast electricity price for PJM using neural
on distribution level introduced by photovoltaic systems,” IEEE Trans. network and similar days method,” IEEE Trans. Power Syst., vol. 22,
Energy Convers., vol. 21, no. 1, pp. 202–209, Mar. 2006. no. 4, pp. 2058–2065, Nov. 2007.
[32] J. Han and M. Kamber, Data Mining, Concepts and Techniques.
[8] K. Lee, Y. T. Cha, and J. Park, “Short-term load forecasting using
San Francisco, CA, USA: Morgan Kaufmann, 2006.
an artificial neural network,” IEEE Trans. Power Syst., vol. 7, no. 1,
[33] N. Sapankevych and R. Sankar, “Time series prediction using support
pp. 124–132, Feb. 1992.
vector machines: A survey,” IEEE Comput. Intell. Mag., vol. 4, no. 2,
[9] J. Shi, W.-J. Lee, Y. Liu, Y. Yang, and P. Wang, “Forecasting power out- pp. 24–38, May 2009.
put of photovoltaic systems based on weather classification and support [34] H. Zareipour, A. Janjani, H. Leung, A. Motamedi, and A. Schellenberg,
vector machines,” IEEE Trans. Ind. Appl., vol. 48, no. 3, pp. 1064–1069, “Classification of future electricity market prices,” IEEE Trans. Power
May/Jun. 2012. Syst., vol. 26, no. 1, pp. 165–173, Feb. 2011.
[10] G. Sideratos and N. Hatziargyriou, “An advanced statistical method
for wind power forecasting,” IEEE Trans. Power Syst., vol. 22, no. 1,
pp. 258–265, Feb. 2007.
[11] T. Hong, “Energy forecasting: Past, present, and future,” Foresight Int.
J. Appl. Forecasting, no. 32, pp. 43–48, 2014. Yue Zhang received the M.Sc. degree in electrical and computer engineering
[12] P. Bacher, H. Madsen, and H. A. Nielsen, “Online short-term solar power from the University of Calgary, Calgary, AB, Canada, in 2013.
forecasting,” Solar Energy, vol. 83, no. 10, pp. 1772–1783, 2009. His current research interests include power system operation and planning,
[13] C. Chupong and B. Plangklang, “Forecasting power output of PV grid forecasting technologies applied to power systems, and solar power integration
connected system in Thailand without using solar radiation measure- into the grid.
ment,” Energy Procedia, vol. 9, pp. 230–237, 2011. [Online]. Available:
http://www.sciencedirect.com/science/journal/18766102/9/supp/C
[14] C. Chen, S. Duan, T. Cai, and B. Liu, “Online 24-h solar power forecast-
ing based on weather type classification using artificial neural network,” Marc Beaudin (S’08) received the B.Sc. and Ph.D. degrees in electrical engi-
Solar Energy, vol. 85, no. 11, pp. 2856–2870, 2011. neering from the University of Calgary, Calgary, AB, Canada, in 2008 and
[15] M. Ding, L. Wang, and R. Bi, “An ANN-based approach for fore- 2014, respectively.
casting the power output of photovoltaic system,” Procedia Environ. His current research interests include managing residential energy con-
Sci., vol. 11, Part C, pp. 1308–1315, 2011. [Online]. Available: sumption and production to improve environmental and economic efficiency,
http://www.sciencedirect.com/science/journal/18780296/11/supp/PC power system planning and policy, and integration of renewable into the grid.
[16] R. Xu, H. Chen, and X. Sun, “Short-term photovoltaic power forecasting
with weighted support vector machine,” in Proc. 2012 IEEE Int. Conf.
Autom. Logist. (ICAL), Zhengzhou, China, pp. 248–253.
[17] T. Cai, S. Duan, and C. Chen, “Forecasting power output for grid- Raouf Taheri received the B.Sc. degree in electrical engineering from the
connected photovoltaic power system without using solar radiation K.N. Toosi University of Technology, Tehran, Iran, and the M.Sc. degree in
measurement,” in Proc. 2nd IEEE Int. Symp. Power Electron. Distrib. computer sciences from the University of Isfahan, Isfahan, Iran, in 1994 and
Gener. Syst. (PEDG), Hefei, China, Jun. 2010, pp. 773–777. 2010, respectively.
[18] P. Mandal, S. T. S. Madhira, A. U. Haque, J. Meng, and R. L. He is currently a Visiting Scholar with the University of Calgary, Calgary,
Pineda, “Forecasting power output of solar photovoltaic system using AB, Canada. His current research interests include data-mining applications
wavelet transform and artificial intelligence techniques,” Procedia in power system operation and planning.
Comput. Sci., vol. 12, pp. 332–337, 2012. [Online]. Available:
http://www.sciencedirect.com/science/journal/18770509/12/supp/C
[19] A. Sfetsos and A. Coonick, “Univariate and multivariate forecasting
of hourly solar radiation with artificial intelligence techniques,” Solar Hamidreza Zareipour (SM’09) received the Ph.D. degree in electrical
Energy, vol. 68, no. 2, pp. 169–178, 2000. engineering from the University of Waterloo, Waterloo, ON, Canada, in 2006.
[20] C. Voyant, M. Muselli, C. Paoli, and M.-L. Nivet, “Hybrid methodology He is currently is an Associate Professor with the Department of Electrical
for hourly global radiation forecasting in Mediterranean area,” Renew. and Computer Engineering, University of Calgary, Calgary, AB, Canada.
Energy, vol. 53, pp. 1–11, May 2013. His current research interests include economics, operation, and planning of
[21] G. E. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: electric energy systems.
Forecasting and Control. Hoboken, NJ, USA: Wiley, 2011
[22] T. Hong, P. Pinson, and S. Fan, “Global energy forecasting competition
2012,” Int. J. Forecasting, vol. 30, no. 2, pp. 357–363, 2014.
[23] T. Hong, P. Wang, and H. Willis, “A naïve multiple linear regression David Wood received the Ph.D. degree in mechanical engineering from
benchmark for short term load forecasting,” in Proc. IEEE Power Energy London University, London, U.K., in 1980.
Soc. Gen. Meeting, San Diego, CA, USA, Jul. 2011, pp. 1–6. He is currently a Professor and the ENMAX/Schulich Chair of Renewable
[24] R. Hyndman and S. Fan, “Density forecasting for long-term peak elec- Energy with the University of Calgary, Calgary, AB, Canada. His current
tricity demand,” IEEE Trans. Power Syst., vol. 25, no. 2, pp. 1142–1153, research interests include renewable energy systems, and wind and solar power
May 2010. energy resources.

Zhang 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zhang 2015

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON SMART GRID 1

Day-Ahead Power Output Forecasting for

2 IEEE TRANSACTIONS ON SMART GRID

vector machines (SVMs) were used to generate forecasts with

4 IEEE TRANSACTIONS ON SMART GRID

used in financial market data analysis. It basically mea-

where, h is time period for the return. If rt,h is an identically

rt,h = μh,T + σh,T εt (2)

where, μh,T is the conditional mean return, σh,T is the condi-

the uncertain variables. Denote the set of external variables

(b) where ED(p, d, d ) is the ED between days d and d according

6 IEEE TRANSACTIONS ON SMART GRID

embedded in them is desirable and could potentially improve

where 0 ≤ λvj ≤ 1. By varying the value of λvj , the one that

8 IEEE TRANSACTIONS ON SMART GRID

As seen in Table V, cloudy and rainy days produce a higher

forecast GHI pattern. However, in reality the observed GHI

10 IEEE TRANSACTIONS ON SMART GRID

You might also like

Zhang 2015

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zhang 2015

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON SMART GRID 1

Day-Ahead Power Output Forecasting for

2 IEEE TRANSACTIONS ON SMART GRID

vector machines (SVMs) were used to generate forecasts with

4 IEEE TRANSACTIONS ON SMART GRID

used in financial market data analysis. It basically mea-

where, h is time period for the return. If rt,h is an identically

rt,h = μh,T + σh,T εt (2)

where, μh,T is the conditional mean return, σh,T is the condi-

the uncertain variables. Denote the set of external variables

(b) where ED(p, d, d ) is the ED between days d and d according

6 IEEE TRANSACTIONS ON SMART GRID

embedded in them is desirable and could potentially improve

where 0 ≤ λvj ≤ 1. By varying the value of λvj , the one that

8 IEEE TRANSACTIONS ON SMART GRID

As seen in Table V, cloudy and rainy days produce a higher

forecast GHI pattern. However, in reality the observed GHI

10 IEEE TRANSACTIONS ON SMART GRID

You might also like

(b) where ED(p, d, d ) is the ED between days d and d according