Probabilistic Dynamic Programming (Stochastic Dynamic Programming)
Probabilistic Dynamic Programming (Stochastic Dynamic Programming)
PROGRAMMING
As many of the problems in the field of Operations Research deals with future planning and
many future events are hard to predict with certainty, it is not hard to imagine the
importance of SDP and related techniques. According to Bellmann and Dreyfus [5] this -
that is; the stochastic case - is always the actual situation.
Problem:
An enterprising young statistician believes that she has
developed a system for winning a popular Las Vegas
game. Her colleagues do not believe that her system
works, so they have made a large bet with her that if she
starts with three chips, she will not have at least five chips
after three plays of the game. Each play of the game
involves betting any desired number of available chips
and then either winning or losing this number of chips. The
statistician believes that her system will give her a
probability of 2/3 of winning a given play of the game.
If she wins, the state at the next stage will be fn+1 = (Sn + Xn).
probability of winning = 2/3
If she loses, the state at the next stage will be fn+1 = (Sn – Xn).
probability of losing = 1 – 2/3 = 1/3
1 2
f*n(Sn,Xn) = (Sn – Xn) + (Sn + Xn)
3 3
n=3
X3
F*3(S3) X*3
S3
0 0 -
1 0 -
2 0 -
3 2/3 2 (or more)
4 2/3 1 (or more)
≥5 1 0 (or ≤ S3 – 5 )
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
≥5
1 2
f2(s2,x2) = f*n+1(s2-x2) + f*n+1(s2+x2)
3 3
1 2 1 2 1 2
f2(0,0) = f*3(0-0) + f*3(0+0) = f*3(0) + f*3(0) = (0) + (0)
3 3 3 3 3 3
1 2 1 2 1 2
f2(1,0) = f*3(1-0) + f*3(1+0) = f*3(1) + f*3(1) = (0) + (0)
3 3 3 3 3 3
1 2 1 2 1 2
f2(1,1) = f*3(1-1) + f*3(1+1) = f*3(0) + f*3(2) = (0) + (0)
3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
≥5
1 2
f2(s2,x2) = f*n+1(s2-x2) + f*n+1(s2+x2)
3 3
1 2 1 2 1 2
f2(2,0) = f*3(2-0) + f*3(2+0) = f*3(2) + f*3(2) = (0) + (0)
3 3 3 3 3 3
1 2 1 2 1 2 2
f2(2,1) = f*3(2-1) + f*3(2+1) = f*3(1) + f*3(3) = (0) + ( )
3 3 3 3 3 3 3
1 2 1 2 1 2 2
f2(2,2) = f*3(2-2) + f*3(2+2) = f*3(0) + f*3(4) = (0) + ( )
3 3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
≥5
1 2 1 2 1 2 2 2
f2(3,0) = f*3(3-0) + f*3(3+0) = f*3(3) + f*3(3) = ( ) + ( )
3 3 3 3 3 3 3 3
1 2 1 2 1 2 2
f2(3,1) = f*3(3-1) + f*3(3+1) = f*3(2) + f*3(4) = (0) + ( )
3 3 3 3 3 3 3
1 2 1 2 1 2
f2(3,2) = f*3(3-2) + f*3(3+2) = f*3(1) + f*3(5) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f2(3,3) = f*3(3-3) + f*3(3+3) = f*3(0) + f*3(6) = (0) + (1)
3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
≥5 1 1 0 (or ≤ S2 – 5 )
1 2 1 2 1 2 2 2
f2(4,0) = f*3(4-0) + f*3(4+0) = f*3(4) + f*3(4) = ( ) + ( )
3 3 3 3 3 3 3 3
1 2 1 2 1 2 2
f2(4,1) = f*3(4-1) + f*3(4+1) = f*3(3) + f*3(5) = ( ) + (1)
3 3 3 3 3 3 3
1 2 1 2 1 2
f2(4,2) = f*3(4-2) + f*3(4+2) = f*3(2) + f*3(6) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f2(4,3) = f*3(4-3) + f*3(4+3) = f*3(1) + f*3(7) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f2(4,4) = f*3(4-4) + f*3(4+4) = f*3(0) + f*3(8) = (0) + (1)
3 3 3 3 3 3
n=1
𝟏 𝟐
X1 f*1(S1,X1) = f*2(S1 – X1) + f*2(S1 + X1)
𝟑 𝟑
F*1(S1) X*1
S1 0 1 2 3
1 2 1 2 1 2 2 2
f1(3,0) = f*2(3-0) + f*2(3+0) = f*2(3) + f*2(3) = ( ) + ( )
3 3 3 3 3 3 3 3
1 2 1 2 1 4 2 8
f1(3,1) = f*2(3-1) + f*2(3+1) = f*2(2) + f*2(4) = ( ) + ( )
3 3 3 3 3 9 3 9
1 2 1 2 1 2
f1(3,2) = f*2(3-2) + f*2(3+2) = f*2(1) + f*2(5) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f1(3,3) = f*2(3-3) + f*2(3+3) = f*2(0) + f*2(6) = (0) + (1)
3 3 3 3 3 3