Dynamic Programming 7707
Dynamic Programming 7707
University of Delhi
Introduction: Dynamic Programming
• Dynamic Programming Overview
• Backwards Recursion
States:A set of states must be identified at each stage. A state describes the
status of the system being analyzed and contains all the information needed to
make decisions.
Decisions: Typically, the set of available decisions will depend on the state.
• The optimal value function, fn(xn), is the cumulative return starting at stage
n in state xn and proceeding to stage 1 under an optimal strategy.
Backwards Recursion
• Generally, a dynamic programming problem is solved by starting at
the final stage and working backwards to the initial stage, a process
called backwards recursion.
The optimal policy must be one such that, regardless of how a particular
state is reached, all later decisions (choices) proceeding from that state
must be optimal.
In the forward process is used to solve a problem by first solving the initial stage of the
problem and working towards the last stage, making an optimal decision at each stage of
the problem.
The exact recursion relationship depends on the nature of the problem to be solved by
dynamic programming. The one stage return is given by:
By continuing the above logic recursively for a general n stage problem, we have
The optimal policy must be one such that, regardless of how a particular state is reached, all later
decisions proceeding from that state must be optimal.
The General Procedure
The procedure for solving a problem by using the dynamic programming approach can be
summarized in the following steps:
Step 1: Identify the problem decision variables and specify the objective function to be
optimized under certain limitations, if any.
Step 2: Decompose (or divide) the given problem into a number of smaller sub-problems (or
stages). Identify the state variables at each stage and write down the transformation function
as a function of the state variable and decision variable at the next stage.
Step 3: Write down a general recursive relationship for computing the optimal policy. Decide
whether to follow the forward or the backward method for solving the problem.
Step 4: Construct appropriate tables to show the required values of the return function at
each stage as shown in Table.
Step 5: Determine the overall optimal policy or decisions and its value at each stage. There
may be more than one such optimal policy.
Decision Table
CASE 1: The Stagecoach Problem
The STAGECOACH PROBLEM is a problem specially constructed1 to illustrate the features and to
introduce the terminology of dynamic programming. It concerns a mythical fortune seeker in Missouri
who decided to go west to join the gold rush in California during the mid-19th century. The journey would
require traveling by stagecoach through unsettled country where there was serious danger of attack by
marauders. Although his starting point and destination were fixed, he had considerable choice as to
which states (or territories that subsequently became states) to travel through en route. The possible
routes are shown in the following Figure, where each state is represented by a circled letter and the
direction of travel is always from left to right in the diagram.
Continued….
Thus, four stages (stagecoach runs) were required to travel from his point of embarkation in
state A (Missouri) to his destination in state J (California). This fortune seeker was a prudent
man who was quite concerned about his safety. After some thought, he came up with a
rather clever way of determining the safest route. Life insurance policies were offered to
stagecoach passengers. Because the cost of the policy for taking any given stagecoach run
was based on a careful evaluation of the safety of that run, the safest route should be the
one with the cheapest total life insurance policy.
• Trial-and-error solution
• Very time-consuming for large problems
• Problem formulation
• Decision variables x1, x2, x3, x4
• Route begins at A, proceeds through x1, x2, x3, x4, and ends at J
Solution:
• Let fn(𝑠𝑛 , xn) be the total cost of the overall policy for the remaining
stages
• Fortune-seeker is in state s, ready to start stage n
• Selects xn as the immediate destination
• When n = 3:
Solution:
• The n = 2 problem
• When n = 1:
Solution:
• Construct optimal solution using the four tables
• Results for n = 1 problem show that fortune-seeker should choose state C or D
• Suppose C is chosen
• For n = 2, the result for s = C is x2*=E …
• One optimal solution: A→ C → E → H → J
• A recursive relationship can be defined that identifies the optimal policy for
stage n, given the optimal policy for stage n + 1
• Using the recursive relationship, the solution procedure starts at the end and
works backward
Problem 1
The measure of performance being used is additional person-years of life. (For a particular
country, this measure equals the increased life expectancy in years times the country’s
population.) . The following Table gives the estimated additional person-years of life (in
multiples of 1,000) for each country for each possible allocation of medical teams. Which
allocation maximizes the measure of performance?
Continued…
Solution…
Beginning with the last stage (n =3 ), we get the following Table
STAGE 3
Solution for n=2 i.e. STAGE 2
Solution for n=1 i.e. STAGE 1
PROBLEM 2
A company has five salesmen who have to be allocated to four marketing zones. The return
(or profit) from each zone depends upon the number of salesmen working in that zone. The
expected returns for different number of salesmen in different zones, as estimated from the
past records, are given in the following table. Determine the optimal allocation policy.
Do your self
Model II: Multiplicative Separable Return Function
and Single Additive Constraint
Continued…
PROBLEM 3
A company has decided to introduce a product in three phases. Phase 1 will feature making a special
offer at a greatly reduced rate to attract the first-time buyers. Phase 2 will involve intensive advertising
to persuade the buyers to continue purchasing at a regular price. Phase 3 will involve a follow up
advertising and promotional campaign.
A total of Rs 5 million has been budgeted for this marketing campaign. If m is the market share
captured in Phase 1, fraction 𝑓2 of m is retained in Phase 2, and fraction 𝑓3 of market share in Phase 2
is retained in Phase 3. The expected values of m, 𝑓2 and 𝑓3 at different levels of money expended are
given below. How should the money be allocated to the three phases in order to maximize the final
share?
Determine how many power cells should be assigned to each system in order to maximize the overall
system reliability. SEE PDF FOR
THE SOLUTION
Case 3: Distributing Scientists to Research
Teams
A government space project is conducting research on a certain engineering problem that
must be solved before people can fly safely to Mars. Three research teams are currently
trying three different approaches for solving this problem. The estimate has been made that,
under present circumstances, the probability that the respective teams call them 1, 2, and 3—
will not succeed is 0.40, 0.60, and 0.80, respectively. Thus, the current probability that all
three teams will fail is (0.40)(0.60)(0.80) 0.192. Because the objective is to minimize the
probability of failure, two more top scientists have been assigned to the project.
The following Table gives the estimated probability that the respective teams will fail when 0,
1, or 2 additional scientists are added to that team. Only integer numbers of scientists are
considered because each new scientist will need to devote full attention to one team. The
problem is to determine how to allocate the two additional scientists to minimize the
probability that all three teams will fail.
Cont…
Solution.