Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Dynamic Programming 7707

Uploaded by

cgpt9733
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Dynamic Programming 7707

Uploaded by

cgpt9733
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

UNIT 5 – Dynamic Programming

Dr. Santosh Kumar, Assistant Professor

Faculty of Management Studies

University of Delhi
Introduction: Dynamic Programming
• Dynamic Programming Overview

• Dynamic Programming Notation

• Backwards Recursion

• Applications of Dynamic Programming

• A Production and Inventory Control Problem


Dynamic Programming
• Dynamic programming (DP) is an approach to problem solving which
permits decomposing of the original problem into a series of several
smaller sub-problems.

• To successfully apply DP, the original problem must be viewed as a


multistage decision problem.

• Defining the stages of a DP problem is sometimes obvious, but at


other times this requires subtle reasoning.
Dynamic Programming
• The power of DP is that one need solve only a small portion of all sub-
problems, due to Bellman's principle of optimality.

• Bellman’s principle states that regardless of what decisions were


made at previous stages, if the decision to be made at stage n is to be
part of an overall optimal solution, then the decision made at stage n
must be optimal for all remaining stages.
Dynamic Programming: Terminology
 Stages:
We first identify stages of the decision process. Stages may correspond
to geometric stages or time periods or some other criterion.

 States:A set of states must be identified at each stage. A state describes the
status of the system being analyzed and contains all the information needed to
make decisions.

 Decisions: Typically, the set of available decisions will depend on the state.

 ReturnFunctions: The measure of effectiveness will be denoted by a function


which may be cost, profit, distance, or some other measure.
Dynamic Programming
• At each stage, n, of the dynamic program, there is:
• a state variable, xn
• an optimal decision variable, dn

• For each value of xn and dn at stage n, there is:


• a return function value, rn(xn,dn)

• The output of the process at stage n is:


• the state variable for stage n-1, xn-1
• xn-1 is calculated by a stage transformation function, tn(xn,dn)

• The optimal value function, fn(xn), is the cumulative return starting at stage
n in state xn and proceeding to stage 1 under an optimal strategy.
Backwards Recursion
• Generally, a dynamic programming problem is solved by starting at
the final stage and working backwards to the initial stage, a process
called backwards recursion.

• The following recursion relation can be used to operationalize the


principle of optimality:

fn(xn) = MAX {rn(xn,dn) + fn -1(tn(xn , dn))}


dn

• A problem is solved beginning at stage 0 with the boundary condition


f0(x0) = 0, and working backwards to the last stage, N.
For a multistage decision process, functional relationship
among state, stage and decision may be described as shown
in Fig.
Further, suppose that there are n stages at which a decision
is to be made. These n stages are all interconnected by a
relationship (called transition function):
that is, Ouput at stage n = (Input to state n) * (Decision at stage n).

where * represents any mathematical operation, namely addition,


subtraction, division or multiplication. The units of 𝑆𝑛 , 𝑑𝑛 , 𝑆𝑛−1 must be
homogeneous.
It can be seen that at each stage of the problem, there are two input
variables: state variable, 𝑆𝑛 and decision variable, 𝑑𝑛 . The state variable
(state input) relates the present stage back to the previous stage. For
example, the current state 𝑆𝑛 provides complete information about various
possible conditions in which the problem is to be solved when there are n
stages to go. The decision 𝑑𝑛 is made at stage n for optimizing the total
return over the remaining n – 1 stages. The decision 𝑑𝑛 , which optimizes the
output at stage n, produces two outputs: (i) the return function 𝑟 (𝑆 , 𝑑 )
The return function is expressed as function of the state variable, 𝑆𝑛 .

The decision (variable), 𝑑𝑛 indicates the state of the process at the


beginning of the next stage (stage n – 1), and is denoted by transition
function (state transformation)

where 𝑡𝑛 represents a state transformation function and its form depends on


the particular problem to be solved. This formula allows the transition from
one stage to another.
DEVELOPING OPTIMAL DECISION POLICY
A particular sequence of alternatives (courses of action) adopted by the
decision-maker in a multistage decision problem is called a policy. The
optimal policy, therefore, is the sequence of alternatives that achieves the
decision-maker’s objective. The solution of a dynamic programming problem
is based upon Bellman’s principle of optimality (recursive optimization
technique), which states:

The optimal policy must be one such that, regardless of how a particular
state is reached, all later decisions (choices) proceeding from that state
must be optimal.

Based on this principle of optimality, an optimal policy is derived by solving


one stage at a time, and then sequentially adding a series of one-stage-
problems that are solved until the optimal solution of the initial problem is
obtained. The solution procedure is based on a backward induction process
In the first process, the problem is solved by solving the problem in the last stage and
working backward towards the first stage, making optimal decisions at each stage of the
problem.

In the forward process is used to solve a problem by first solving the initial stage of the
problem and working towards the last stage, making an optimal decision at each stage of
the problem.
The exact recursion relationship depends on the nature of the problem to be solved by
dynamic programming. The one stage return is given by:
By continuing the above logic recursively for a general n stage problem, we have

The optimal policy must be one such that, regardless of how a particular state is reached, all later
decisions proceeding from that state must be optimal.
The General Procedure
The procedure for solving a problem by using the dynamic programming approach can be
summarized in the following steps:
Step 1: Identify the problem decision variables and specify the objective function to be
optimized under certain limitations, if any.

Step 2: Decompose (or divide) the given problem into a number of smaller sub-problems (or
stages). Identify the state variables at each stage and write down the transformation function
as a function of the state variable and decision variable at the next stage.

Step 3: Write down a general recursive relationship for computing the optimal policy. Decide
whether to follow the forward or the backward method for solving the problem.

Step 4: Construct appropriate tables to show the required values of the return function at
each stage as shown in Table.
Step 5: Determine the overall optimal policy or decisions and its value at each stage. There
may be more than one such optimal policy.
Decision Table
CASE 1: The Stagecoach Problem
The STAGECOACH PROBLEM is a problem specially constructed1 to illustrate the features and to
introduce the terminology of dynamic programming. It concerns a mythical fortune seeker in Missouri
who decided to go west to join the gold rush in California during the mid-19th century. The journey would
require traveling by stagecoach through unsettled country where there was serious danger of attack by
marauders. Although his starting point and destination were fixed, he had considerable choice as to
which states (or territories that subsequently became states) to travel through en route. The possible
routes are shown in the following Figure, where each state is represented by a circled letter and the
direction of travel is always from left to right in the diagram.
Continued….
Thus, four stages (stagecoach runs) were required to travel from his point of embarkation in
state A (Missouri) to his destination in state J (California). This fortune seeker was a prudent
man who was quite concerned about his safety. After some thought, he came up with a
rather clever way of determining the safest route. Life insurance policies were offered to
stagecoach passengers. Because the cost of the policy for taking any given stagecoach run
was based on a careful evaluation of the safety of that run, the safest route should be the
one with the cheapest total life insurance policy.

Question for the discussion


Find the optimal route which minimizes the total cost of the policy
Solution:
• The stagecoach problem
• Mythical fortune-seeker travels West by stagecoach to join the gold rush in the
mid-1900s
• The origin and destination is fixed
• Many options in choice of route

• Insurance policies on stagecoach riders


• Cost depended on perceived route safety

• Choose the safest route by minimizing policy cost


Solution:
The cost for the standard policy on the stagecoach run from state i to state j, which will be
denoted by 𝐶𝑖𝑗 , is
Solution:
• Incorrect solution: choose the cheapest run offered by each
successive stage
• Gives A→B → F → I → J for a total cost of 13
• There are less expensive options

• Trial-and-error solution
• Very time-consuming for large problems

• Dynamic programming solution


• Starts with a small portion of the original problem
• Finds optimal solution for this smaller problem

• Gradually enlarges the problem


• Finds the current optimal solution from the preceding one
Solution:
• Stagecoach problem approach
• Start when fortune-seeker is only one stagecoach ride away from the
destination
• Increase by one the number of stages remaining to complete the journey

• Problem formulation
• Decision variables x1, x2, x3, x4
• Route begins at A, proceeds through x1, x2, x3, x4, and ends at J
Solution:
• Let fn(𝑠𝑛 , xn) be the total cost of the overall policy for the remaining
stages
• Fortune-seeker is in state s, ready to start stage n
• Selects xn as the immediate destination

• Value of 𝐷(𝑠𝑛 , 𝑥𝑛 )obtained by setting i = 𝑠𝑛 and j = xn


Solution:
• Immediate solution to the n = 4 problem

• When n = 3:
Solution:
• The n = 2 problem

• When n = 1:
Solution:
• Construct optimal solution using the four tables
• Results for n = 1 problem show that fortune-seeker should choose state C or D

• Suppose C is chosen
• For n = 2, the result for s = C is x2*=E …
• One optimal solution: A→ C → E → H → J

• Suppose D is chosen instead


A → D → E → H → J and A → D → F → I → J
Solution:
All three optimal solutions have a total cost of 11
Characteristics of Dynamic Programming
Problems
• The stagecoach problem is a literal prototype
• Provides a physical interpretation of an abstract structure
• Features of dynamic programming problems
• Problem can be divided into stages with a policy decision required at each
stage
• Each stage has a number of states associated with the beginning of the stage
• The policy decision at each stage transforms the current state into a state
associated with the beginning of the next stage
• Solution procedure designed to find an optimal policy for the overall problem
• Given the current state, an optimal policy for the remaining stages is
independent of the policy decisions of previous stages
Continued…..
• Solution procedure begins by finding the optimal policy for the last stage

• A recursive relationship can be defined that identifies the optimal policy for
stage n, given the optimal policy for stage n + 1

• Using the recursive relationship, the solution procedure starts at the end and
works backward
Problem 1

A salesman located in a city A decided to travel to city B. He knew the distances of


alternative routes from city A to city B. He then drew a highway network map as shown in
the following Figure. The city of origin A, is city 1. The destination city B, is city 10. Other
cities through which the salesman will have to pass through are numbered 2 to 9. The arrow
representing routes between cities and distances in kilometers are indicated on each route.
The salesman’s problem is to find the shortest route that covers all the selected cities from A
to B.
Problem 1
Deterministic Dynamic Programming
further elaborates upon the dynamic programming approach to deterministic problems,
where the state at the next stage is completely determined by the state and policy decision at
the current stage.
Deterministic dynamic programming can be described diagrammatically as shown in Fig.

The basic structure for deterministic dynamic programming


Continued…
Thus, at stage n the process will be in some state 𝑠𝑛 . Making policy decision
Model I : Additive Separable Return Function and
Single Additive Constraint
Continued…
Case 2: Distributing Medical Teams to
Countries
The WORLD HEALTH COUNCIL is devoted to improving health care in the underdeveloped
countries of the world. It now has five medical teams available to allocate among three such
countries to improve their medical care, health education, and training programs. Therefore,
the council needs to determine how many teams (if any) to allocate to each of these
countries to maximize the total effectiveness of the five teams. The teams must be kept
intact, so the number allocated to each country must be an integer. The measure of
performance being used is additional person-years of life.

The measure of performance being used is additional person-years of life. (For a particular
country, this measure equals the increased life expectancy in years times the country’s
population.) . The following Table gives the estimated additional person-years of life (in
multiples of 1,000) for each country for each possible allocation of medical teams. Which
allocation maximizes the measure of performance?
Continued…
Solution…
Beginning with the last stage (n =3 ), we get the following Table

STAGE 3
Solution for n=2 i.e. STAGE 2
Solution for n=1 i.e. STAGE 1
PROBLEM 2
A company has five salesmen who have to be allocated to four marketing zones. The return
(or profit) from each zone depends upon the number of salesmen working in that zone. The
expected returns for different number of salesmen in different zones, as estimated from the
past records, are given in the following table. Determine the optimal allocation policy.

Do your self
Model II: Multiplicative Separable Return Function
and Single Additive Constraint
Continued…
PROBLEM 3
A company has decided to introduce a product in three phases. Phase 1 will feature making a special
offer at a greatly reduced rate to attract the first-time buyers. Phase 2 will involve intensive advertising
to persuade the buyers to continue purchasing at a regular price. Phase 3 will involve a follow up
advertising and promotional campaign.
A total of Rs 5 million has been budgeted for this marketing campaign. If m is the market share
captured in Phase 1, fraction 𝑓2 of m is retained in Phase 2, and fraction 𝑓3 of market share in Phase 2
is retained in Phase 3. The expected values of m, 𝑓2 and 𝑓3 at different levels of money expended are
given below. How should the money be allocated to the three phases in order to maximize the final
share?

SEE PDF FOR THE SOLUTION


Continued..
PROBLEM 4
Consider the problem of designing electronic devices to carry five power cells, each of which must be
located within three electronic systems. If one system’s power fails, then it will be powered on an
auxiliary basis by the cells of the remaining systems. The probability that any particular system will
experience a power failure depends on the number of cells originally assigned to it. The estimated
power failure probabilities for a particular system are given below:

Determine how many power cells should be assigned to each system in order to maximize the overall
system reliability. SEE PDF FOR
THE SOLUTION
Case 3: Distributing Scientists to Research
Teams
A government space project is conducting research on a certain engineering problem that
must be solved before people can fly safely to Mars. Three research teams are currently
trying three different approaches for solving this problem. The estimate has been made that,
under present circumstances, the probability that the respective teams call them 1, 2, and 3—
will not succeed is 0.40, 0.60, and 0.80, respectively. Thus, the current probability that all
three teams will fail is (0.40)(0.60)(0.80) 0.192. Because the objective is to minimize the
probability of failure, two more top scientists have been assigned to the project.

The following Table gives the estimated probability that the respective teams will fail when 0,
1, or 2 additional scientists are added to that team. Only integer numbers of scientists are
considered because each new scientist will need to devote full attention to one team. The
problem is to determine how to allocate the two additional scientists to minimize the
probability that all three teams will fail.
Cont…
Solution.

You might also like