Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

dp-intro dynamic programming

This document introduces the Bellman Equation through an informal explanation of a sequential optimization problem in a dynamic programming context. It illustrates how to recursively represent a problem involving asset management over time and derives the Bellman equation as a functional equation that describes the optimal decision-making process. The document emphasizes the equivalence of solutions to the original problem and the Bellman equation in a stationary environment.

Uploaded by

Dareen Fayyad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

dp-intro dynamic programming

This document introduces the Bellman Equation through an informal explanation of a sequential optimization problem in a dynamic programming context. It illustrates how to recursively represent a problem involving asset management over time and derives the Bellman equation as a functional equation that describes the optimal decision-making process. The document emphasizes the equivalence of solutions to the original problem and the Bellman equation in a stationary environment.

Uploaded by

Dareen Fayyad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to the Bellman Equation

January 2009

This note shows in an informal way, using a specific example, how a sequential problem
can be represented recursively, as a dynamic programming problem. It should help you
establish intuition behind the Bellman equation.

In the lectures on Ch. 8, deterministic endowment economy with sequential markets, we


have seen the following infinite-horizon sequential optimization problem:


X
max β t u(ct ) (1)
{ct ,at+1 }∞
t=0
t=0
s.t. ct + at+1 Qt = yt + at
at+1 ≥ −At+1
given a0 and {yt }∞
t=0

Rewrite problem (1) in terms of assets as control variables and consider a finite-horizon
case t = 0, 1, 2, ..., T :
T
X
max β t u(yt + at − at+1 Qt ) (2)
{at+1 }T
t=0 t=0

given a0 and {yt }Tt=0

Also, to simplify further discussion, assume the borrowing constraint does not bind.
Solving problem (2) requires finding sequence {a1 , a2 , ..., aT −1 , aT , aT +1 } .

Suppose {a1 , a2 , ..., aT } are chosen and aT +1 needs to be decided. Then the problem is a
one-period optimization with a single control variable aT +1 :

max u(yT + aT − aT +1 QT ) (3)


aT +1

1
The solution to this problem is obvious: aT +1 = 0, because the return on the asset cannot
be enjoyed after death. Note that to solve the problem we only need to know {aT , yT } and not
the whole sequence of assets and endowment. That is, {aT , yT } are individual state variables
in problem (3). Denote by V T (aT , yT ), called a value function, the maximum attained by
the objective function given aT :

V T (aT , yT ) ≡ max u(yT + aT − aT +1 QT ) (4)


aT +1

= u(yT + aT )

and by g T (aT ) the optimal decision rule for aT +1 given aT :

aT +1 = g T (aT , yT )

Next, suppose {a1 , a2 , ..., aT −1 } are chosen, and {aT , aT +1 } need to be decided. Then the
problem becomes a two-period optimization with two control variables {aT , aT +1 } and state
variables {aT −1 , yT −1 , yT } :

max {u(yT −1 + aT −1 − aT QT −1 ) + βu(yT + aT − aT +1 QT )} (5)


{aT ,aT +1 }

Note that the problem can be solved in two steps. First we can find the optimal decision
rule for aT +1 given aT and then find the optimal decision rule for aT given aT −1 . That is, we
can rewrite problem (5) as follows:
 
T −1 T −1
V (aT −1 , ŷ ) ≡ max u(yT −1 + aT −1 − aT QT −1 ) + β max u(yT + aT − aT +1 QT ) (6)
aT aT +1

where V T −1 (aT −1 , ŷ T −1 ) denotes the maximum welfare attained over two periods given aT −1
and the two-period history of income ŷ T −1 ≡ {yT −1 , yT } . Using the notation introduced in
the previous step (one-period problem) we get

V T −1 (aT −1 , ŷ T −1 ) = max u(yT −1 + aT −1 − aT QT −1 ) + βV T (aT , yT )



aT

Likewise, denote by g T −1 (aT −1 ) the optimal decision rule for aT given aT −1 :

aT = g T −1 (aT −1 , ŷ T −1 )

The optimal decision rule g T −1 (aT −1 ) is obtained from the following F.O.C.:

∂V T (aT )
u′ (yT −1 + aT −1 − aT QT −1 )QT −1 = β (7)
∂aT

2
Since optimal aT +1 = 0, V T (aT , yT ) = u(yT + aT ), hence the F.O.C. becomes

u′ (yT −1 + aT −1 − aT QT −1 )QT −1 = βu′ (yT + aT ) (8)

Now continue solving problem (2) in a similar fashion. That is, suppose {a1 , a2 , ..., aT −2 }
are chosen, and {aT −1 , aT , aT +1 } need to be decided. You should get a value function corre-
sponding to the three-period problem:

V T −2 (aT −2 , ŷ T −2 ) = max u(yT −2 + aT −2 − aT −1 QT −2 ) + βV T −1 (aT −1 , ŷ T −1 )



aT −1

and a decision rule aT −1 = g T −2 (aT −2 , ŷ T −2 ) obtained from the F.O.C.

u′ (yT −2 + aT −2 − aT −1 QT −2 )QT −2 = βu′ (yT −1 + aT −1 − aT QT −1 ) (9)

Iterating all the way to period 0, we get a value function that corresponds to the complete
problem (2):

V 0 (a0 , ŷ 0 ) = max u(y0 + a0 − a1 Q0 ) + βV 1 (a1 , ŷ 1 )



a1

with the optimum attaned at a1 = g 0 (a0 , ŷ 0 ).

Now consider a stationary environment: let the asset price Qt be constant, and impose
some restrictions on the individual endowment process {yt }∞ t=0 , such that a law of motion

for the endowment can be decribed by y = Γ(y). Then the sequence of value functions
T
{V t (at , ŷ t )}t=0 converges to a time-invariant function V (a, y) as T −→ ∞ (infinite horizons),
and the sequential problem (1) admits the following recursive representation:

V (a, y) = max

{u(y + a − a′ Q) + βV (a′ , y ′ )} (10)
a

given a0 and y ′ = Γ(y).


T
If the sequence of the decision rules {g t (at , ŷ t )}t=0 converges to a time-invariant function
g(a, y) as T −→ ∞, then a′ = g(a, y) attains V (a, y).
Equation (10) is called Bellman or functional equation. It states sequential problem
(1) in the language of dynamic programming. The solutions to problems (1) and (10) are
identical in a stationary environment.
Solution to the Bellman equation (10) satisfies the F.O.C.

∂V (a′ , y ′ )
u′ (y + a − a′ Q)Q = β (11)
∂a′

3
Using the Envelope theorem, obtain

∂V (a, y)
= u′ (y + a − a′ Q)
∂a
so that the F.O.C. can be written as

u′ (y + a − a′ Q)Q = βu′ (y ′ + a′ − g(a′ , y ′ )Q) (12)

Solution to this equation gives a′ = g(a, y).

You might also like