Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
166 views

Dynamic Programming 2

This document discusses deterministic and probabilistic dynamic programming. Deterministic dynamic programming models decision processes where the next state is fully determined by the current state and action. Probabilistic dynamic programming accounts for uncertainty, where the next state depends on the current state and action according to known probabilities. Examples are provided to illustrate how to formulate problems, define value and policy functions recursively, and solve for the optimal policy using backward induction.

Uploaded by

apa aja
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views

Dynamic Programming 2

This document discusses deterministic and probabilistic dynamic programming. Deterministic dynamic programming models decision processes where the next state is fully determined by the current state and action. Probabilistic dynamic programming accounts for uncertainty, where the next state depends on the current state and action according to known probabilities. Examples are provided to illustrate how to formulate problems, define value and policy functions recursively, and solve for the optimal policy using backward induction.

Uploaded by

apa aja
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Dynamic Programming

(Part 2)
TI 2102
Optimization Mathematics
Deterministic Dynamic
Programming
 Deterministic dynamic programming can be described diagram-
matically as shown below

 Making policy decision xn then moves the process to some state


sn+1 at stage n + 1.
 The contribution thereafter to the objective function under an opti-
mal policy has been previously calculated to be f*n+1(sn+1).
 The policy decision xn also makes some contribution to the object-
ive function.
Deterministic Dynamic
Programming
 Combining these two quantities in an appropriate way provides
fn(sn, xn), the contribution of stages n onward to the objective
function.
 Optimizing with respect to xn then gives fn*(sn) = fn(sn, xn*).
 After xn* and fn*(sn) are found for each possible value of sn, the so-
lution procedure is ready to move back one stage.
 One way of categorizing deterministic dynamic programming
problems is by the form of the objective function.
 Another categorization is in terms of the nature of the set of states
for the respective stages.
Deterministic Dynamic
Programming
 In particular, states sn might be representable by a discrete state va-
riable (as for the stagecoach problem) or by a continuous state vari-
able, or perhaps a state vector (more than one variable) is required.
 Several examples are presented to illustrate these various possi-
bilities.
 More importantly, they illustrate that these apparently major dif-
ferences are actually quite inconsequential (except in terms of
computational difficulty) because the underlying basic structure
always remains the same.
Deterministic Dynamic
Programming : An Example
Distributing Scientists to Research Teams
A government space project is conducting research on a certain engine-
ering problem that must be solved before people can fly safely to Mars.
Three research teams are currently trying three different approaches for
solving this problem. The estimate has been made that, under present
circumstances, the probability that the respective teams—call them 1, 2,
and 3—will not succeed is 0.40, 0.60, and 0.80, respectively. Thus, the
current probability that all three teams will fail is (0.40)(0.60)(0.80) =
0.192. Because the objective is to minimize the probability of failure,
two more top scientists have been assigned to the project.
Deterministic Dynamic
Programming : An Example
Only integer numbers of scientists are considered because each new scien-
tist will need to devote full attention to one team. The problem is to deter-
mine how to allocate the two additional scientists to minimize the probabi-
lity that all three teams will fail.
Problem Formulation
 In this case, stage n (n = 1, 2, 3) corresponds to research team n, and the sta-
te sn is the number of new scientists still available for allocation to the rema-
ining teams.
 The decision variables xn (n = 1, 2, 3) are the number of additional scientists
allocated to team n.
 Let pi(xi) denote the probability of failure for team i if it is assigned xi additi-
onal scientists, as given by previous table.
Deterministic Dynamic
Programming : An Example
 If we let ∏ denote multiplication, the government’s objective is to cho-
ose x1, x2, x3 so as to

subject to :

where xi are nonnegative integers.


 Consequently, fn(sn, xn) for this problem is

where the minimum is taken over xn+1, . . . , x3 such that


Deterministic Dynamic
Programming : An Example
 For n = 1, 2, 3.Thus,

where

with f4* defined to be 1


 The recursive relationship relating the f1*, f2*, and f3* functions in this
case is

when n = 3,
Deterministic Dynamic
Programming : An Example
Solution Procedure
Stage 3 (n = 3)

Stage 2 (n = 2)
Deterministic Dynamic
Programming : An Example
Stage 1 (n = 1)

 The optimal solution must have x1* = 1, which makes s2 = 2 - 1 = 1, so


that x2* = 0, which makes s3 = 1 - 0 = 1, so that x3* = 1.
Mid Exercise Example
The
Model
The Answer
 Stage 3

 Stage 2
The Answer
 Stage 1
The
Answer
Another Exercise Example
The Formulation
The Formulation
The Formulation
The Answer
 Stage 4

 Stage 3
The Answer
The Answer
 Stage 2
The Answer
 Stage 1
Probabilistic Dynamic Programming
 Probabilistic dynamic programming differs from deterministic dy-
namic programming in that the state at the next stage is not com-
pletely determined by the state and policy decision at the current
stage.
 There is a probability distribution for what the next state will be.
 However, this probability distribution still is completely determi-
ned by the state and policy decision at the current stage.
 The resulting basic structure for probabilistic dynamic program-
ming is described diagrammatically in next figure.
Probabilistic Dynamic Programming

 Let S denote the number of possible states at stage n = 1 and label


these states on the right side as 1, 2, . . . , S.
 The system goes to state i with probability pi (i = 1, 2, . . . , S) given
state sn and decision xn at stage n.
Probabilistic Dynamic Programming
 If the system goes to state i, Ci is the contribution of stage n to the
objective function.
 Because of the probabilistic structure, the relationship between
fn(sn, xn) and the f*n+1(sn+1) necessarily is somewhat more complica-
ted than that for deterministic dynamic programming.
 To illustrate, suppose that the objective is to minimize the expected
sum of the contributions from the individual stages.
 In this case, fn(sn, xn) represents the minimum expected sum from
stage n onward, given that the state and policy decision at stage n
are sn and xn, respectively.
Probabilistic Dynamic Programming
: An Example
An enterprising young statistician believes that she has developed a sys-
tem for winning a popular Las Vegas game. Her colleagues do not believe
that her system works, so they have made a large bet with her that if she
starts with three chips, she will not have at least five chips after three
plays of the game. Each play of the game involves betting any desired
number of available chips and then either winning or losing this number
of chips. The statistician believes that her system will give her a probabi-
lity of 2/3 of winning a given play of the game. Assuming the statistician
is correct, we now use dynamic programming to determine her optimal
policy regarding how many chips to bet (if any) at each of the three plays
of the game. The decision at each play should take into account the re-
sults of earlier plays. The objective is to maximize the probability of win-
ning her bet with her colleagues.
Probabilistic Dynamic Programming
: An Example
Problem Formulation
The dynamic programming formulation for this problem is
Stage n = nth play of game (n = 1, 2, 3),
xn = number of chips to bet at stage n,
State sn = number of chips in hand to begin stage n.
Because the objective is to maximize the probability that the statistician
will win her bet, the objective function to be maximized at each stage
must be the probability of finishing the three plays with at least five
chips.
Then, we can described their functions as:
Probabilistic Dynamic Programming
: An Example
 fn(sn, xn) = probability of finishing three plays with at least five chips, gi-
ven that the statistician starts stage n in state sn, makes immediate deci-
sion xn, and makes optimal decisions.
 fn*(sn) =
 If she loses, the state at the next stage will be sn - xn, and the probability
of finishing with at least five chips will then be f*n+1(sn - xn).
 If she wins the next play instead, the state will become sn + xn, and the
corresponding probability will be f*n+1(sn + xn).
 Because the assumed probability of winning a given play is 2/3, it now
follows that
Probabilistic Dynamic Programming
: An Example
 Therefore, the recursive relationship for this problem is

for n = 1, 2, 3, with f4*(s4) as just defined 0 if s4 < 5 and 1 if s4 ≥ 5.


Probabilistic Dynamic Programming
: An Example
 Solution Procedure
Stage 3 (n=3)

Stage 2 (n=2)
Probabilistic Dynamic Programming
: An Example
Stage 1 (n=1)

The optimal policy is

It gives probability of winning her bet with her colleagues is 20/27


Another Example
The Formulation
The Formulation
The Solution
 Stage 3

 Stage 2
The Solution
 Stage 1
Conclusion
 Dynamic programming is a very useful technique for making a
sequence of interrelated decisions.
 It requires formulating an appropriate recursive relationship for each
individual problem.
 However, it provides a great computational savings over using
exhaustive enumeration to find the best combination of decisions,
especially for large problems.
 Dynamic programming is basic for the next step for further
problem which could only be solved by Markov Chain.
End of Topic
TI 2102
Optimization Mathematic

You might also like