dp-intro dynamic programming
dp-intro dynamic programming
January 2009
This note shows in an informal way, using a specific example, how a sequential problem
can be represented recursively, as a dynamic programming problem. It should help you
establish intuition behind the Bellman equation.
∞
X
max β t u(ct ) (1)
{ct ,at+1 }∞
t=0
t=0
s.t. ct + at+1 Qt = yt + at
at+1 ≥ −At+1
given a0 and {yt }∞
t=0
Rewrite problem (1) in terms of assets as control variables and consider a finite-horizon
case t = 0, 1, 2, ..., T :
T
X
max β t u(yt + at − at+1 Qt ) (2)
{at+1 }T
t=0 t=0
Also, to simplify further discussion, assume the borrowing constraint does not bind.
Solving problem (2) requires finding sequence {a1 , a2 , ..., aT −1 , aT , aT +1 } .
Suppose {a1 , a2 , ..., aT } are chosen and aT +1 needs to be decided. Then the problem is a
one-period optimization with a single control variable aT +1 :
1
The solution to this problem is obvious: aT +1 = 0, because the return on the asset cannot
be enjoyed after death. Note that to solve the problem we only need to know {aT , yT } and not
the whole sequence of assets and endowment. That is, {aT , yT } are individual state variables
in problem (3). Denote by V T (aT , yT ), called a value function, the maximum attained by
the objective function given aT :
= u(yT + aT )
aT +1 = g T (aT , yT )
Next, suppose {a1 , a2 , ..., aT −1 } are chosen, and {aT , aT +1 } need to be decided. Then the
problem becomes a two-period optimization with two control variables {aT , aT +1 } and state
variables {aT −1 , yT −1 , yT } :
Note that the problem can be solved in two steps. First we can find the optimal decision
rule for aT +1 given aT and then find the optimal decision rule for aT given aT −1 . That is, we
can rewrite problem (5) as follows:
T −1 T −1
V (aT −1 , ŷ ) ≡ max u(yT −1 + aT −1 − aT QT −1 ) + β max u(yT + aT − aT +1 QT ) (6)
aT aT +1
where V T −1 (aT −1 , ŷ T −1 ) denotes the maximum welfare attained over two periods given aT −1
and the two-period history of income ŷ T −1 ≡ {yT −1 , yT } . Using the notation introduced in
the previous step (one-period problem) we get
aT = g T −1 (aT −1 , ŷ T −1 )
The optimal decision rule g T −1 (aT −1 ) is obtained from the following F.O.C.:
∂V T (aT )
u′ (yT −1 + aT −1 − aT QT −1 )QT −1 = β (7)
∂aT
2
Since optimal aT +1 = 0, V T (aT , yT ) = u(yT + aT ), hence the F.O.C. becomes
Now continue solving problem (2) in a similar fashion. That is, suppose {a1 , a2 , ..., aT −2 }
are chosen, and {aT −1 , aT , aT +1 } need to be decided. You should get a value function corre-
sponding to the three-period problem:
Iterating all the way to period 0, we get a value function that corresponds to the complete
problem (2):
Now consider a stationary environment: let the asset price Qt be constant, and impose
some restrictions on the individual endowment process {yt }∞ t=0 , such that a law of motion
′
for the endowment can be decribed by y = Γ(y). Then the sequence of value functions
T
{V t (at , ŷ t )}t=0 converges to a time-invariant function V (a, y) as T −→ ∞ (infinite horizons),
and the sequential problem (1) admits the following recursive representation:
V (a, y) = max
′
{u(y + a − a′ Q) + βV (a′ , y ′ )} (10)
a
∂V (a′ , y ′ )
u′ (y + a − a′ Q)Q = β (11)
∂a′
3
Using the Envelope theorem, obtain
∂V (a, y)
= u′ (y + a − a′ Q)
∂a
so that the F.O.C. can be written as