Optimal Battery Management Strategies in Mobile Networks Powered by A Smart Grid
Optimal Battery Management Strategies in Mobile Networks Powered by A Smart Grid
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Abstract—We focus in this paper on energy management Energy storage techniques have been widely studied with
strategies for a mobile network equipped with battery storage the aim of improving the autonomy of the systems and
capacity as well as local energy production capability, and pow- optimizing their power consumption.
ered by a smart grid. At each time instant, the mobile network
operator has to decide whether to operate its network based on its The authors in [3] proposed an analytical model for optimiz-
own energy resources or the smart grid ones, with a possibility to ing battery storage management. They simplified the solution
sell energy to the smart grid as well. We formulate our problem by setting thresholds on the energy price allowing the operator
using Markov Decision Process (MDP) and derive an optimal, to decide online if it sells or buys energy. The achieved control
offline policy which minimizes the operator energy bill, using
dynamic programming algorithm. We show the optimality of
is independent of the state of charge of the battery, the solution
our solution by numerical comparison with the solution obtained is not optimal and could lead to high energy losses.
through linear programming. Our numerical applications allow In [4], the authors used linear programming to obtain a
to further understand when the operator has an incentive to buy battery management strategy in order to maximize the profit
energy, whether it is beneficial for it to act as an energy seller,
of an energy provider in the micro-grid1 under the uncertainty
the size of the battery to deploy, as well as the robustness of our
offline deterministic policy to estimation errors. of the wind turbine production.
The authors in [5] used dynamic programming at the energy
Index Terms—Energy expenditure minimization, Optimal of-
fline policy, Markov decision process, Dynamic programming, provider side to determine the optimal battery storage man-
Linear programming agement to satisfy a deterministic user load in order to reduce
as much as possible the energy losses assuming a Markovian
distribution for the photovoltaic energy production. The paper
I. I NTRODUCTION also studied the optimal battery size allowing to avoid energy
losses.
Smart grid networks have gained tremendous attention in
the energy-oriented research community over the years. Many In our work, we suppose that the serving eNB is equipped
works tend to exploit the data collected from the different sen- with a battery that can be charged using its own renewable
sors in the network in order to optimize the energy production energy sources or by buying energy from the energy provider
and distribution, to increase the use of the renewable power and discharged to serve its users traffic requests or when it
sources and, more generally, to save energy [1] . sells energy back to the smart grid, as illustrated in Fig. 1.
In the telecommunications field, resource-hungry applica- Our goal in this work is to reduce the daily energy ex-
tions have been continuously gaining popularity among the penditure of the network operator while serving the network
users, driven by the development of smart devices, such subscribers traffic load.
as smartphones and tablets. These users ask for intensive We formulate this problem as a Markov Decision Process
computing resources from the network operator forcing it to (MDP). Then, using dynamic programming tools, we investi-
ask for high amounts of energy from the electrical grid which gate offline solutions to find the optimal battery management
presents, in turn, a challenge for the electricity provider. policy based on prior knowledge of the environment: network
One of the solutions to overcome these issues is to equip subscribers traffic load and electricity unitary price.
the network operator, and especially the eNodeB (eNB)’s, with Our contributions are:
batteries that can be charged directly from the smart grid or • Devising an optimal offline deterministic policy for the
from its own renewable energy production on site and which operator to decide whether to buy or sell energy, to
are able to provide the operator with the energy required for operate on its own battery or on energy of the smart grid.
autonomous operation which may last few hours for instance • Comparing the above policy with a randomized one
[2]. obtained via linear programming and showing that both
These batteries will allow the operator to set an energy yield the same performance and hence prove the optimal-
storage strategy that takes advantage of changing electricity ity of the deterministic one.
prices and communication traffic load to reduce its energy
acquisition costs and to lighten its demand from the energy 1 Micro-grids are modern, small-scale versions of the centralized electrical
provider in the critical hours of the day characterized by a system. They have their own control capability, which means that they can
high energy users demand. disconnect from the traditional grid and operate autonomously.
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2
•Evaluating the impact of the battery size, the energy equal to the day ahead price and other probabilities to be
seller capability and the renewable energy on the average different.
operator daily energy expenditure. The network provider has an infrastructure which allows it
• Testing the robustness of our offline policy to errors in to satisfy its users traffic demands. This infrastructure (eNB’s,
the estimation of user traffic, energy prices and battery backhaul, etc) needs to be continuously powered to ensure
state of charge (SOC). its operation. The operator’s eNB consumes a fixed amount
The remainder of the paper is organized as follows. In of energy for some basic features independent of the users
Section II, we describe our system and model from the traffic in the cell (cooling, energy storage, etc) and a variable
energy and the network providers points of view and explain energy component, dependent on the users activities (baseband
the interactions between them. In Section III, we formulate processing, radio frequency components, etc).
the optimization problem which reduces the average operator User traffic changes during the day. It depends on the user
energy cost and show how to solve it using both dynamic density in the covered area and the requested services. Let Ut
and linear programming. In Section IV, we show and discuss denote the ratio in time slot t between the actual traffic demand
our numerical results and analyze the robustness of the offline and the maximal traffic the cell can handle in the peak hours.
policy. Eventually, conclusions are given in Section V. We suppose that the operator uses all its available capacity to
serve its users when Ut = 1. Ut is a random variable which is
II. S YSTEM AND M ODEL taken from real statistics in [6]. The network operator has to
ensure that all its user requests are satisfied; no request could
Our system is composed of two players: a network operator
be delayed or blocked.
and an energy provider, as shown in Fig. 1. These two players
are in constant interaction as the network operator needs to The operator sites are equipped in addition with photovoltaic
be constantly powered to be able to operate its network. This panels (PV) that produce a deterministic amount of renewable
power comes from the energy provider who is the principle power Pt depending on the hours of the day. This energy is
intervenient in the smart grid. stored in the battery to be used later.
The operator equipment can be run using two types of power
sources:
• It can be powered directly by the smart grid. The network
operator pays its consumption following the price fixed by
the energy provider. We assume that the grid can satisfy
all the operator demands for electricity.
• It can be powered by its own battery. The amount of
energy that can be consumed at each time slot is limited
by the battery capacity Bmax . We assume that the use of
the battery has no financial cost for the operator.
The operator can also store in its own battery the energy
Fig. 1: Interactions between the system players
bought from the electricity provider. We assume that when the
battery is charging, the network operator cannot use it to run
The energy provider produces electricity to respond to the its components. Thus, its infrastructure has to be powered in
demands of its customers (houses, factories, etc). This energy this case directly by the smart grid.
is sold with a price fixed by the grid provider and changes We assume also that the battery energy efficiencies during
from one hour to another. We consider that electricity spot charge and discharge processes are equal to 1. Thus, all the
prices are announced by the smart grid one day in advance. energy that is bought from the smart grid could be fully stored
The pricing system in this so-called day-ahead market is, in without any losses. We assume also that the network operator
principle, determined by matching offers from generators to can sell energy back to the grid. We assume that the selling
bids from consumers at each node in order to produce a classic and purchase prices are equal and that the energy provider will
supply and demand equilibrium price, typically on an hourly never decline the offers of the network operator.
basis. This is the case for instance for EPEX SPOT SE which The time is divided into equal epochs of one hour each.
is a European trading market for electricity which indicates We suppose that this duration is sufficient to fully charge or
an estimated price of energy one day-ahead based on demand discharge the battery.
and supply in order to allow intervenients to make their bids The battery level is discretized into L finite values in
on electricity. [0, BL−1
ma x
, ..., Bmax ]. The battery state of charge (SOC) evo-
The network operator, as one of the biggest energy con- lution over time is expressed as follows:
sumers in the grid, is informed in advance of the expected Bt+1 = Bt + ct .Bmax + T .Pt − dt .nT X .T .(P0 + ∆ p .Pout .Ut ) (1)
electricity prices to be able to set its energy purchase strategy.
The real time price, which we denote by Ft , changes during where:
the day depending on the users demands for electricity. Its • ct .Bmax is the amount of energy that the operator could
value can differ from the day-ahead price. We assume a simple buy (resp. sell) from (resp. to) the grid. ct takes its values
model in which the real price has a certain probability to be from a finite space in [−1, ..., 1]. ct is delimited by -1 and
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
1 due to the fact that the operator cannot sell more than operator can choose from a finite space of possible actions
its battery reserves or charge its battery more than its at ∈ Ωt (st ) depending on the state st . at = (ct , dt ) with
capacity. If ct is negative the operator will sell energy. If ct the amount of energy to buy (resp. sell) from (resp.
it is positive, it will charge its battery and will be at the to) the smart grid and dt the decision of switching the
same time powered by the smart grid to satisfy its users power source between the battery and the smart grid. If
demands. the operator has the capability to be an energy seller,
• T is the observation interval duration. ct could take negative values, otherwise it can only be
• Pt is the renewable energy produced by the PV panels. positive.
• dt is the action of switching between the battery and the • Transition probabilities: They correspond each to the
smart grid. If dt = 0, the operator is powered by the smart probability of reaching state st+1 knowing that the system
grid. If dt = 1, the network infrastructure runs using its was in the previous slot in state st and that the operator
own battery. performed action at .
• nT X is the number of sectors in the eNB. P(st+1 /st , at ) = P(Bt+1 /st , at ).P(Ut+1 /st , at ).P(Ft+1 /st , at ).
• P0 is the power consumed by the eNB for traffic inde- (2)
pendent features. The battery state of charge at the beginning of slot
• ∆ p is the slope of load dependent power consumption. t + 1, denoted by Bt+1 , is completely known if we know
state st and action at taken at time slot t. Thus, the
• Pout is the irradiated power by the eNB’s antennas. battery transition probability is deterministic and can be
expressed as follows:
III. P ROBLEM F ORMULATION AND A NALYSIS (
1 i f Bt+1 = Bt + ct .Bmax + T .Pt −
In this section, we formulate the problem of minimizing the P(Bt+1 /st , at ) = nT X .T .dt .(P0 + ∆ p .Pout .Ut )
operator’s energy bill while satisfying the users traffic requests. 0 otherwise
(3)
An unconstrained Markov Decision Process (MDP) will be
The energy prices and user traffic load are assumed to be
proposed and solved using dynamic and linear programming
independent and uncorrelated in time. Thus:
to devise respectively offline deterministic and randomized
P(Ft+1 /Ft , at ) = P(Ft+1 ) and P(Ut+1 /Ut , at ) = P(Ut+1 )
policies which optimize the operator’s daily expenditure for
energy. • Instantaneous reward (outcome): It refers to the money
We treat both cases: when the operator has PV sources and that the network operator spends or earns with respect to
when it does not. Moreover, the network operator can be: the current state st and the action at it performs at time
• An energy seller and so could sell its battery surplus to slot t.
the smart grid to make profit especially when the energy rt (st , at ) = Ft .[ct .Bmax +T .(1− dt ).nT X (P0 +∆ p .Pout .Ut )]
prices are high. We suppose that the energy provider will (4)
never decline the offers of the operator.
The algorithm applied on this MDP aims to find an optimal,
• The operator has not the possibility to sell energy. The
deterministic, offline policy, denoted by µ̄∗ , which at each time
quantities bought or produced are exclusively intended
slot t and each state s defines a unique action a to perform
for its own consumption.
by the operator among all possible actions Ωt (s) in that state.
A. MDP Formulation To devise this policy, the operator needs to have a perfect
knowledge of the distribution of the random parameters in its
An MDP is a discrete-time state transition system. It is used environment: the energy prices and the user traffic requests
for modeling decision making in situations where outcomes in our case. This could be achieved through a learning phase
are partly random and partly under the control of a decision from previous experiments.
maker. It is composed of 4 components: states, actions, tran- The policy is performed at the beginning of each hour
sitions and outcomes. This process has to verify the Markov on a whole day. We define N = 24 to be the width of the
property: the effects of an action taken in a state depend only optimization window.
on that state and not on the prior history.
Let µ̄ denote the mapping between the state and action
Our system can be seen as an ergodic Markov chain leading
spaces and can be expressed as follows: µ̄ = (µ0, µ2, ..., µ N −1 )
to an MDP and can be solved using dynamic programming
with : µt : S → A as a = µt (s).
tools.
We define the following components for this MDP : The optimal policy µ̄∗ solves the following finite horizon
• State space: It denotes the possible states st the system
average cost problem:
can be in at any time t. st = (Bt , Ut , Ft ), where Bt is the N −1
1 Õ
battery state of charge, Ut is the ratio between the actual µ̄∗ = argmin rt (st , µt (st ))
µ̄ N
user traffic load and the maximal one and Ft is the unitary t=0
price of electricity. All these values are discrete and finite s.t. 0 ≤ Bt ≤ Bmax ∀t ∈ [1, .., N] (5)
which makes the whole system state space finite.
The objective is, again, to decide at each time slot, the
• Action space: It represents the actions that the operator energy source (grid or battery) and the amount of energy to
could perform at the beginning of each time slot t. The buy or sell in order to minimize the operator average daily
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
expenditure for energy while satisfying the user traffic demand C. Linear Programming Algorithm
and the battery level limitations. In this subsection, we convert the problem contained in
equation (5) to a constrained linear program which can be
solved by classical linear optimization tool.
B. Offline Dynamic Programming Algorithm The linear programming algorithm aims at finding a ran-
domized policy that solves the MDP (given by equation (5)).
The dynamic programming technique uses stochastic opti- In order to do this, the average cost problem is converted to
mization algorithms to restrict the search for the optimal policy a linear expression for which we have to find the occupation
that solves the MDP. The aim is to compute a policy which measures of all the state/action couples (s, a) at each time
describes how to act optimally in the face of uncertainty. It step t [9]. The occupation measures at time slot t, denoted by
consists at breaking a complex problem into sequential sub- βt (s, a), refer to the probabilities that the system reaches state
problems, solve them individually and combine their solutions s and performs action a in time slot t.
to achieve the global one [8]. In each stage, dynamic pro- Having an initial state occupancy denoted by γ at time
gramming makes decisions based on all the decisions made in slot t = 0, we have to find at each time slot t ∈ [1, .., N],
the previous stages, and may reconsider the previous stage’s the occupation measures of all the states and actions that
algorithmic path to the solution. minimize the objective function. The solution given by linear
In our case, the goal of this algorithm is to find for our programming is optimal [9] but depends closely on the initial
ergodic finite horizon Markov chain an optimal, deterministic state distribution, contrary to the solution given by dynamic
offline policy which solves the average cost problem given in programming.
equation (5) [8]. The optimization problem can be expressed as follows:
The optimization is based on Bellman equations which N −1
1 ÕÕ
compute the optimal path to follow starting from any initial min βt (s, a).rt (s, a)
β N
state distribution at t = 0. The minimization problem of this t=0 (s,a)
MDP is handled by solving iteratively dynamic programming Õ
s.t ∀ t ∈ [0, .., N − 1] βt (s, a) = 1
equations.
(s,a)
Specifically, the objective is to find at each step k ∈ Õ
[0, .., N −1] the optimal value function that defines the optimal ∀ s ∈S0
βt+1 (s , a ) = pt (s 0/s, a)βt (s, a)
0 0
a0
cost to go from any state s ∈ S at time step k to the final state.
! ∀ t ∈ [0, .., N − 1]
βt (s, a) = 0 i f a < Ωts
N −1 Õ
µ̄ 1 Õ ∀ s∈S β0 (s, a) = γ(s)
V (s) = min E
(k)
rt (s, µt (s)) (6)
µk N − k t=k a
(9)
For s0 denoting the initial state at t = 0, = 0 ) is J∗ V (0) (s The first line above represents the objective of minimizing
the optimal cost to go from the initial state to the final one in the daily expenses of the operator for energy. The first con-
N steps. J ∗ can be found by applying backward recursion straint ensures that the sum of all the occupation measures of
assuming that V (N ) (s) = r N (s), where r N is the end cost all couples (s, a) has to be equal to 1 at all time slots. The
obtained at step N. In the algorithm, we took it null for all second constraint sets a relationship between the occupation
the states. measures and the transition probabilities of the Markov chain.
The optimal policy µ̄∗ is obtained by computing, through In the third constraint, we eliminate the actions that could
backward recursion, the following two relationships: not be performed at state s : charging the battery more than
• Policy improvement equation: It consists at finding at its capacity or run on battery when the energy consumption
each step k and for each state s ∈ S the optimal action is higher than the battery actual state of charge. The last
µk (s) to perform over all possible actions a ∈ Ωk (s) constraint ensures that the system is in state s0 in time slot
which minimizes the cost to go for the remaining N − k t = 0 with respect to the initial distribution γ.
transitions:
" # Having at each t ∈ [0, ..., N − 1] the optimal occupation
measures βt (s, a), we can devise a randomized optimal policy
Õ
0 (k+1) 0
µk (s) = argmin rk (s, a) + p(s /s, a).V (s ) (7)
a ∈Ω k (s) s0 ∈S µ̄r . It gives ∀t ∈ [0, ..N − 1] and ∀ s ∈ S, the probability ρts (a)
of doing an action a when the system reaches state s at time
• Policy evaluation equation: It consists at evaluating the
cost to go at each state s ∈ S from step k to N after slot t.
choosing the policy µk to perform. This probability can be expressed as follows:
βt (s, a)
Õ
V (k) (s) = rk (s, µk (s)) + P s 0 /s, µk (s) .V (k+1) (s 0 )
(8) ρts (a) = Í (10)
s0 ∈S a βt (s, a)
The obtained optimal policy µ̄∗ = (µ0, ..., µ N −1 ) is deter- Comparing the performances given by both deterministic
ministic as at each state s there is only one unique action a and randomized policies can be a proof of the optimality of
to be executed. It minimizes V (0) (s0 ) regardless of the choice the deterministic offline policy [9]. We show this comparison
of the initial state s0 [9]. next, in the numerical results section.
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
120
100
by the grid. Conversely, small battery sizes do not allow long
Average user traffic load (%)
80 periods of self autonomy.
60
40
20 C. Impact of Energy Seller Capability and Renewable Energy
0
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
In this subsection, we run our optimal policy under all cases:
Bmax = 4 KWh
with and without PV and with and without energy selling
Average Battery SOC (KWh)
12
Bmax = 8 KWh
10 Bmax = 12 KWh capability. The results are shown in Fig. 5.
8
6
45
20
15
In Fig. 3, when the operator is able to sell energy to the
grid, we note that irrespective of the size of the battery, the 10
shape of the SOC evolution is the same for all battery sizes. 5
operator takes the same decisions on buying and selling but Fig. 5: Average money spent per day for different battery sizes
with higher energy portions when its battery size is larger. We under different scenarios
notice also that having a battery of 12 KWh gives the operator
the opportunity to sell more energy at high price periods.
We notice that for all battery sizes, the operator having the
We notice that the achieved policy indicates trivially to ability to be an energy seller is able to reduce significantly its
operate on the battery and sell energy when the prices are energy expenses especially for large battery sizes compared to
in local maxima and to take advantage when the prices are in the case when it is powered exclusively by the grid. When the
local minima to charge the battery and stay powered by the battery size is low, the operator cannot afford long periods of
grid. An operation that the battery of 4 KWh allows but at self autonomy, and so, having the capability of selling energy
smaller proportions. back to the grid does not have a significant impact on its daily
120
energy expenditure.
100
Average user traffic load (%)
Having renewable energy resources allows the operator to
80
60
reduce the energy costs especially at low battery size, as this
40 additional energy source allows the operator to reduce the
20
0
amount of energy purchased from the grid.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
14 Fig. 6 details the actions of purchase and selling if applica-
Bmax = 4 KWh
Average Battery SOC (KWh)
12
Bmax = 8 KWh ble taken by the operator for the cases when it can and cannot
Bmax = 12 KWh
10
sell energy to the grid. We show the average amount of money
8
6
spent by the operator at each hour of the day. The battery is
4
set to 8 KWh which allows the operator for 6 hours of self
2 autonomy at high user traffic load. We treat the case when the
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 operator is not equipped with PV panels.
Day Hours
Fig. 4: SOC evolution for an operator not able to sell energy 120
100
Average user traffic load (%)
80
60
40
In Fig. 4, when the operator is not able to sell energy to the 20
0
grid, we notice that there is a small difference in the shape of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
60
the SOC curve depending on the battery size at 7 pm when Operator able to sell Bmax = 8 KWh
40 Operator not able to sell Bmax = 8 KWh
the operator with 4 KWh battery was forced to charge it to
be able to sustain its demands during the next hours while for 20
the other two battery sizes, the operator had enough reserves 0
in its battery to use it when the prices become higher between -20
grid, the amounts of energy collected are intended only for its Fig. 6: The energy seller capability effect on the optimal
own consumption. Thus, having a larger battery size allow the dynamic programming policy
operator to rely more on its battery and avoid being powered
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
We notice that the biggest gaps in energy expenses are
60
No PV ; Bmax = 8 KWh obtained when the operator buys large amounts of energy (at
40 With PV ; Bmax = 8 KWh 5 am and 2 pm) which also corresponds to local minima of
20
the electricity prices during the day. These gaps are due to the
0 errors in the estimated price.
-20 In Fig. 9, we vary the error in estimating the user traffic. If
-40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
the operator has to serve more traffic than it expects and the
Day hours
energy stored in the battery does not allow it to do so, it has
Fig. 7: The renewable energy production effect on the optimal to buy the remaining energy from the grid.
dynamic programming policy
35
no traffic estimation error
with 10 % traffic estimation error
30 with 20 % traffic estimation error
with 30 % traffic estimation error
We notice that except for the case when the renewable
25
energy production is high, that is between 10 am and 6 pm, the
actions taken in the two cases are quite similar. Between 10 am 20
and 6 pm, when the energy prices are quite high, the operator
15
having photovoltaic panels exploits its production to sell it
to the grid while staying powered by the battery, and hence 10
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8
Bmax = 8 KWh
0.1
Bmax = 12 KWh
[3] J. Qin, R. Sevlian, D. Varodayan, and R. Rajagopal, "Optimal electric
energy storage operation", IEEE Power and Energy Society General
0.08 Meeting, vol 7, pp. 6 - 8, 2012.
[4] P. Mahat, J. E. Jimenez, E. R. Moldes, S. I. Haug, I. G. Szczesny, K. E.
0.06 Pollestad, and L. C. Totu, "A micro-grid battery storage management",
IEEE power and energy society general meeting, pp. 1 - 5, 2013.
0.04
[5] S. Grillo, A. Pievatolo, and E. Tironi, "Optimal storage scheduling using
markov decision processes", IEEE Transactions on Sustainable Energy,
0.02
vol. 7, pp. 755 - 764, 2016.
[6] G. Auer, V. Giannini, C. Desset, I. Godor, P. Skillermark, M. Olsson, M.
A. Imran, D. Sabella, M. J. Gonzalez, O. Blume, and A. Fehske, "How
0
11 21 31 41 51 61 much energy is needed to run a wireless network?", IEEE Transactions
Number of battery states on Sustainable Energy, vol. 18, pp. 40-49, 2011.
Fig. 12: Effect of the battery number of states on the [7] J. Munkhammar and J. Widel, "A flexible markov-chain model for sim-
ulating demand side management strategies - applications to distributed
variance of the daily energy expenditure photovoltaics", conference proceedings of World Renewable Energy Fo-
rum (WREF), 2012
We notice that the efficiency of our policy gets better when [8] D. P. Bertsekas, " dynamic programming and optimal control", Chapman
and Hall, vol. 4th edition, 2005.
increasing the number of the battery SOC. We note also that [9] E. Altman, "Constrained markov decision processes", Chapman and Hall,
as the battery size increases, our policy needs to consider more 1999.
2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.