0% found this document useful (0 votes)

10 views

Dynamic Programming 7707

Uploaded by

cgpt9733

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Dynamic Programming 7707

Uploaded by

cgpt9733

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

UNIT 5 – Dynamic Programming

Dr. Santosh Kumar, Assistant Professor

Faculty of Management Studies

University of Delhi
Introduction: Dynamic Programming
• Dynamic Programming Overview

• Dynamic Programming Notation

• Backwards Recursion

• Applications of Dynamic Programming

• A Production and Inventory Control Problem

Dynamic Programming
• Dynamic programming (DP) is an approach to problem solving which
permits decomposing of the original problem into a series of several
smaller sub-problems.

• To successfully apply DP, the original problem must be viewed as a

multistage decision problem.

• Defining the stages of a DP problem is sometimes obvious, but at

other times this requires subtle reasoning.
Dynamic Programming
• The power of DP is that one need solve only a small portion of all sub-
problems, due to Bellman's principle of optimality.

• Bellman’s principle states that regardless of what decisions were

made at previous stages, if the decision to be made at stage n is to be
part of an overall optimal solution, then the decision made at stage n
must be optimal for all remaining stages.
Dynamic Programming: Terminology
 Stages:
We first identify stages of the decision process. Stages may correspond
to geometric stages or time periods or some other criterion.

 States:A set of states must be identified at each stage. A state describes the
status of the system being analyzed and contains all the information needed to
make decisions.

 Decisions: Typically, the set of available decisions will depend on the state.

 ReturnFunctions: The measure of effectiveness will be denoted by a function

which may be cost, profit, distance, or some other measure.
Dynamic Programming
• At each stage, n, of the dynamic program, there is:
• a state variable, xn
• an optimal decision variable, dn

• For each value of xn and dn at stage n, there is:

• a return function value, rn(xn,dn)

• The output of the process at stage n is:

• the state variable for stage n-1, xn-1
• xn-1 is calculated by a stage transformation function, tn(xn,dn)

• The optimal value function, fn(xn), is the cumulative return starting at stage
n in state xn and proceeding to stage 1 under an optimal strategy.
Backwards Recursion
• Generally, a dynamic programming problem is solved by starting at
the final stage and working backwards to the initial stage, a process
called backwards recursion.

• The following recursion relation can be used to operationalize the

principle of optimality:

fn(xn) = MAX {rn(xn,dn) + fn -1(tn(xn , dn))}

• A problem is solved beginning at stage 0 with the boundary condition

f0(x0) = 0, and working backwards to the last stage, N.
For a multistage decision process, functional relationship
among state, stage and decision may be described as shown
in Fig.
Further, suppose that there are n stages at which a decision
is to be made. These n stages are all interconnected by a
relationship (called transition function):
that is, Ouput at stage n = (Input to state n) * (Decision at stage n).

where * represents any mathematical operation, namely addition,

subtraction, division or multiplication. The units of 𝑆𝑛 , 𝑑𝑛 , 𝑆𝑛−1 must be
homogeneous.
It can be seen that at each stage of the problem, there are two input
variables: state variable, 𝑆𝑛 and decision variable, 𝑑𝑛 . The state variable
(state input) relates the present stage back to the previous stage. For
example, the current state 𝑆𝑛 provides complete information about various
possible conditions in which the problem is to be solved when there are n
stages to go. The decision 𝑑𝑛 is made at stage n for optimizing the total
return over the remaining n – 1 stages. The decision 𝑑𝑛 , which optimizes the
output at stage n, produces two outputs: (i) the return function 𝑟 (𝑆 , 𝑑 )
The return function is expressed as function of the state variable, 𝑆𝑛 .

The decision (variable), 𝑑𝑛 indicates the state of the process at the

beginning of the next stage (stage n – 1), and is denoted by transition
function (state transformation)

where 𝑡𝑛 represents a state transformation function and its form depends on

the particular problem to be solved. This formula allows the transition from
one stage to another.
DEVELOPING OPTIMAL DECISION POLICY
A particular sequence of alternatives (courses of action) adopted by the
decision-maker in a multistage decision problem is called a policy. The
optimal policy, therefore, is the sequence of alternatives that achieves the
decision-maker’s objective. The solution of a dynamic programming problem
is based upon Bellman’s principle of optimality (recursive optimization
technique), which states:

The optimal policy must be one such that, regardless of how a particular
state is reached, all later decisions (choices) proceeding from that state
must be optimal.

Based on this principle of optimality, an optimal policy is derived by solving

one stage at a time, and then sequentially adding a series of one-stage-
problems that are solved until the optimal solution of the initial problem is
obtained. The solution procedure is based on a backward induction process
In the first process, the problem is solved by solving the problem in the last stage and
working backward towards the first stage, making optimal decisions at each stage of the
problem.

In the forward process is used to solve a problem by first solving the initial stage of the
problem and working towards the last stage, making an optimal decision at each stage of
the problem.
The exact recursion relationship depends on the nature of the problem to be solved by
dynamic programming. The one stage return is given by:
By continuing the above logic recursively for a general n stage problem, we have

The optimal policy must be one such that, regardless of how a particular state is reached, all later
decisions proceeding from that state must be optimal.
The General Procedure
The procedure for solving a problem by using the dynamic programming approach can be
summarized in the following steps:
Step 1: Identify the problem decision variables and specify the objective function to be
optimized under certain limitations, if any.

Step 2: Decompose (or divide) the given problem into a number of smaller sub-problems (or
stages). Identify the state variables at each stage and write down the transformation function
as a function of the state variable and decision variable at the next stage.

Step 3: Write down a general recursive relationship for computing the optimal policy. Decide
whether to follow the forward or the backward method for solving the problem.

Step 4: Construct appropriate tables to show the required values of the return function at
each stage as shown in Table.
Step 5: Determine the overall optimal policy or decisions and its value at each stage. There
may be more than one such optimal policy.
Decision Table
CASE 1: The Stagecoach Problem
The STAGECOACH PROBLEM is a problem specially constructed1 to illustrate the features and to
introduce the terminology of dynamic programming. It concerns a mythical fortune seeker in Missouri
who decided to go west to join the gold rush in California during the mid-19th century. The journey would
require traveling by stagecoach through unsettled country where there was serious danger of attack by
marauders. Although his starting point and destination were fixed, he had considerable choice as to
which states (or territories that subsequently became states) to travel through en route. The possible
routes are shown in the following Figure, where each state is represented by a circled letter and the
direction of travel is always from left to right in the diagram.
Continued….
Thus, four stages (stagecoach runs) were required to travel from his point of embarkation in
state A (Missouri) to his destination in state J (California). This fortune seeker was a prudent
man who was quite concerned about his safety. After some thought, he came up with a
rather clever way of determining the safest route. Life insurance policies were offered to
stagecoach passengers. Because the cost of the policy for taking any given stagecoach run
was based on a careful evaluation of the safety of that run, the safest route should be the
one with the cheapest total life insurance policy.

Question for the discussion

Find the optimal route which minimizes the total cost of the policy
Solution:
• The stagecoach problem
• Mythical fortune-seeker travels West by stagecoach to join the gold rush in the
mid-1900s
• The origin and destination is fixed
• Many options in choice of route

• Insurance policies on stagecoach riders

• Cost depended on perceived route safety

• Choose the safest route by minimizing policy cost

Solution:
The cost for the standard policy on the stagecoach run from state i to state j, which will be
denoted by 𝐶𝑖𝑗 , is
Solution:
• Incorrect solution: choose the cheapest run offered by each
successive stage
• Gives A→B → F → I → J for a total cost of 13
• There are less expensive options

• Trial-and-error solution
• Very time-consuming for large problems

• Dynamic programming solution

• Starts with a small portion of the original problem
• Finds optimal solution for this smaller problem

• Gradually enlarges the problem

• Finds the current optimal solution from the preceding one
Solution:
• Stagecoach problem approach
• Start when fortune-seeker is only one stagecoach ride away from the
destination
• Increase by one the number of stages remaining to complete the journey

• Problem formulation
• Decision variables x1, x2, x3, x4
• Route begins at A, proceeds through x1, x2, x3, x4, and ends at J
Solution:
• Let fn(𝑠𝑛 , xn) be the total cost of the overall policy for the remaining
stages
• Fortune-seeker is in state s, ready to start stage n
• Selects xn as the immediate destination

• Value of 𝐷(𝑠𝑛 , 𝑥𝑛 )obtained by setting i = 𝑠𝑛 and j = xn

Solution:
• Immediate solution to the n = 4 problem

• When n = 3:
Solution:
• The n = 2 problem

• When n = 1:
Solution:
• Construct optimal solution using the four tables
• Results for n = 1 problem show that fortune-seeker should choose state C or D

• Suppose C is chosen
• For n = 2, the result for s = C is x2*=E …
• One optimal solution: A→ C → E → H → J

• Suppose D is chosen instead

A → D → E → H → J and A → D → F → I → J
Solution:
All three optimal solutions have a total cost of 11
Characteristics of Dynamic Programming
Problems
• The stagecoach problem is a literal prototype
• Provides a physical interpretation of an abstract structure
• Features of dynamic programming problems
• Problem can be divided into stages with a policy decision required at each
stage
• Each stage has a number of states associated with the beginning of the stage
• The policy decision at each stage transforms the current state into a state
associated with the beginning of the next stage
• Solution procedure designed to find an optimal policy for the overall problem
• Given the current state, an optimal policy for the remaining stages is
independent of the policy decisions of previous stages
Continued…..
• Solution procedure begins by finding the optimal policy for the last stage

• A recursive relationship can be defined that identifies the optimal policy for
stage n, given the optimal policy for stage n + 1

• Using the recursive relationship, the solution procedure starts at the end and
works backward
Problem 1

A salesman located in a city A decided to travel to city B. He knew the distances of

alternative routes from city A to city B. He then drew a highway network map as shown in
the following Figure. The city of origin A, is city 1. The destination city B, is city 10. Other
cities through which the salesman will have to pass through are numbered 2 to 9. The arrow
representing routes between cities and distances in kilometers are indicated on each route.
The salesman’s problem is to find the shortest route that covers all the selected cities from A
to B.
Problem 1
Deterministic Dynamic Programming
further elaborates upon the dynamic programming approach to deterministic problems,
where the state at the next stage is completely determined by the state and policy decision at
the current stage.
Deterministic dynamic programming can be described diagrammatically as shown in Fig.

The basic structure for deterministic dynamic programming

Continued…
Thus, at stage n the process will be in some state 𝑠𝑛 . Making policy decision
Model I : Additive Separable Return Function and
Single Additive Constraint
Continued…
Case 2: Distributing Medical Teams to
Countries
The WORLD HEALTH COUNCIL is devoted to improving health care in the underdeveloped
countries of the world. It now has five medical teams available to allocate among three such
countries to improve their medical care, health education, and training programs. Therefore,
the council needs to determine how many teams (if any) to allocate to each of these
countries to maximize the total effectiveness of the five teams. The teams must be kept
intact, so the number allocated to each country must be an integer. The measure of
performance being used is additional person-years of life.

The measure of performance being used is additional person-years of life. (For a particular
country, this measure equals the increased life expectancy in years times the country’s
population.) . The following Table gives the estimated additional person-years of life (in
multiples of 1,000) for each country for each possible allocation of medical teams. Which
allocation maximizes the measure of performance?
Continued…
Solution…
Beginning with the last stage (n =3 ), we get the following Table

STAGE 3
Solution for n=2 i.e. STAGE 2
Solution for n=1 i.e. STAGE 1
PROBLEM 2
A company has five salesmen who have to be allocated to four marketing zones. The return
(or profit) from each zone depends upon the number of salesmen working in that zone. The
expected returns for different number of salesmen in different zones, as estimated from the
past records, are given in the following table. Determine the optimal allocation policy.

Do your self
Model II: Multiplicative Separable Return Function
and Single Additive Constraint
Continued…
PROBLEM 3
A company has decided to introduce a product in three phases. Phase 1 will feature making a special
offer at a greatly reduced rate to attract the first-time buyers. Phase 2 will involve intensive advertising
to persuade the buyers to continue purchasing at a regular price. Phase 3 will involve a follow up
advertising and promotional campaign.
A total of Rs 5 million has been budgeted for this marketing campaign. If m is the market share
captured in Phase 1, fraction 𝑓2 of m is retained in Phase 2, and fraction 𝑓3 of market share in Phase 2
is retained in Phase 3. The expected values of m, 𝑓2 and 𝑓3 at different levels of money expended are
given below. How should the money be allocated to the three phases in order to maximize the final
share?

SEE PDF FOR THE SOLUTION

Continued..
PROBLEM 4
Consider the problem of designing electronic devices to carry five power cells, each of which must be
located within three electronic systems. If one system’s power fails, then it will be powered on an
auxiliary basis by the cells of the remaining systems. The probability that any particular system will
experience a power failure depends on the number of cells originally assigned to it. The estimated
power failure probabilities for a particular system are given below:

Determine how many power cells should be assigned to each system in order to maximize the overall
system reliability. SEE PDF FOR
THE SOLUTION
Case 3: Distributing Scientists to Research
Teams
A government space project is conducting research on a certain engineering problem that
must be solved before people can fly safely to Mars. Three research teams are currently
trying three different approaches for solving this problem. The estimate has been made that,
under present circumstances, the probability that the respective teams call them 1, 2, and 3—
will not succeed is 0.40, 0.60, and 0.80, respectively. Thus, the current probability that all
three teams will fail is (0.40)(0.60)(0.80) 0.192. Because the objective is to minimize the
probability of failure, two more top scientists have been assigned to the project.

The following Table gives the estimated probability that the respective teams will fail when 0,
1, or 2 additional scientists are added to that team. Only integer numbers of scientists are
considered because each new scientist will need to devote full attention to one team. The
problem is to determine how to allocate the two additional scientists to minimize the
probability that all three teams will fail.
Cont…
Solution.

FSISAC_GenerativeAI-VendorEvaluation&QualitativeRiskAssessmentTool
No ratings yet
FSISAC_GenerativeAI-VendorEvaluation&QualitativeRiskAssessmentTool
8 pages
PCS-9705S - X - Technical Manual - EN - Overseas General - X - R1.40
100% (1)
PCS-9705S - X - Technical Manual - EN - Overseas General - X - R1.40
431 pages
TailwindCSS WebApps Using B4X
No ratings yet
TailwindCSS WebApps Using B4X
331 pages
Dynamic Programming (DP)
No ratings yet
Dynamic Programming (DP)
32 pages
Dynamic Programming
No ratings yet
Dynamic Programming
30 pages
Dynamic Programming
No ratings yet
Dynamic Programming
8 pages
Dynamic Programming - Mirage Group
No ratings yet
Dynamic Programming - Mirage Group
37 pages
Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision
No ratings yet
Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
10 pages
Operational Reseach 1
No ratings yet
Operational Reseach 1
9 pages
RL With LCS
No ratings yet
RL With LCS
29 pages
OR
No ratings yet
OR
34 pages
Opt Class CH17102 - Unit 4
No ratings yet
Opt Class CH17102 - Unit 4
26 pages
OT Model Question Answers
No ratings yet
OT Model Question Answers
4 pages
Group 3 Report
No ratings yet
Group 3 Report
2 pages
Lecture 8 Dynamic Programming
No ratings yet
Lecture 8 Dynamic Programming
32 pages
Assignment - 3 Solution
No ratings yet
Assignment - 3 Solution
28 pages
Unit I: Operations Research Formulation, Graphicaland Simplex Methods
No ratings yet
Unit I: Operations Research Formulation, Graphicaland Simplex Methods
21 pages
Module 2
No ratings yet
Module 2
87 pages
Group 5 Dyn Prog
No ratings yet
Group 5 Dyn Prog
15 pages
Discuss The Methodology of Operations Research
0% (1)
Discuss The Methodology of Operations Research
5 pages
Module Iii
No ratings yet
Module Iii
8 pages
CH 2 Linear Programming
No ratings yet
CH 2 Linear Programming
8 pages
Or Q&a
No ratings yet
Or Q&a
9 pages
DP_Bellman_1741339134 2025-03-07 09_19_05
No ratings yet
DP_Bellman_1741339134 2025-03-07 09_19_05
13 pages
UNIT6
No ratings yet
UNIT6
27 pages
What Is Operations Research?
No ratings yet
What Is Operations Research?
10 pages
Operation Research: Physical Models
No ratings yet
Operation Research: Physical Models
10 pages
OT MODULE CHAP 3-6 NOTES
No ratings yet
OT MODULE CHAP 3-6 NOTES
25 pages
AI Module 2
No ratings yet
AI Module 2
22 pages
Week 10 -Dynamic Programming
No ratings yet
Week 10 -Dynamic Programming
12 pages
Q.1 What Are The Important Feature of Operations Research? Describe in Details The Different Phase of Operation Research
No ratings yet
Q.1 What Are The Important Feature of Operations Research? Describe in Details The Different Phase of Operation Research
11 pages
Dynammic Programming Shortest Route
No ratings yet
Dynammic Programming Shortest Route
18 pages
Introduction To Planning
No ratings yet
Introduction To Planning
44 pages
IS_EndSem-Notes
No ratings yet
IS_EndSem-Notes
30 pages
Monte Carlo Learning
No ratings yet
Monte Carlo Learning
14 pages
DS
No ratings yet
DS
129 pages
AI Lecture SEVEN
No ratings yet
AI Lecture SEVEN
52 pages
Chapter 4 Constrained Optimization: FX XR GX HX U M V PN
No ratings yet
Chapter 4 Constrained Optimization: FX XR GX HX U M V PN
5 pages
QT Note 3
No ratings yet
QT Note 3
12 pages
UNIT 4 (2)
No ratings yet
UNIT 4 (2)
6 pages
Chapter 3 Solution Approaches To LP Problems
No ratings yet
Chapter 3 Solution Approaches To LP Problems
25 pages
Unit 2 - 1
No ratings yet
Unit 2 - 1
13 pages
Unit III. Heuristic Search Technique
No ratings yet
Unit III. Heuristic Search Technique
15 pages
Explain The Types of Operations Research Models. Briefly Explain The Phases of Operations Research. Answer Operations Research
No ratings yet
Explain The Types of Operations Research Models. Briefly Explain The Phases of Operations Research. Answer Operations Research
13 pages
3 Artificial Intelligence - Week3
No ratings yet
3 Artificial Intelligence - Week3
18 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
4 Dynamic Programming-Lec
No ratings yet
4 Dynamic Programming-Lec
13 pages
OTM Module 1_Part 1
No ratings yet
OTM Module 1_Part 1
23 pages
RMT-Unit 3 To Unit5
No ratings yet
RMT-Unit 3 To Unit5
97 pages
Chapter 4 & 5 OR
No ratings yet
Chapter 4 & 5 OR
13 pages
cs or 2 mark
No ratings yet
cs or 2 mark
15 pages
Algo - Mod9 - Dynamic Programming Method
No ratings yet
Algo - Mod9 - Dynamic Programming Method
51 pages
Oml Syllabus
No ratings yet
Oml Syllabus
33 pages
Lecture 2 Deterministic
No ratings yet
Lecture 2 Deterministic
21 pages
search strategies
No ratings yet
search strategies
24 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Unit-2 AIA (BCA3) UPDATED
No ratings yet
Unit-2 AIA (BCA3) UPDATED
7 pages
SMU Assignment Solve Operation Research, Fall 2011
100% (1)
SMU Assignment Solve Operation Research, Fall 2011
11 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
PPT
No ratings yet
PPT
92 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Understanding and Evaluating User Satisfaction With Music Discovery
No ratings yet
Understanding and Evaluating User Satisfaction With Music Discovery
10 pages
5S Management Audit Form - Canteen and Offices Rev 2
100% (1)
5S Management Audit Form - Canteen and Offices Rev 2
8 pages
零售行业求职信
100% (1)
零售行业求职信
6 pages
IMI Remosa Product Diverter AW
No ratings yet
IMI Remosa Product Diverter AW
4 pages
Supplemental Parts Information Index: Allison MT (B) 640, 643, 650, 653 Series On-Highway Transmissions Parts Catalog
100% (1)
Supplemental Parts Information Index: Allison MT (B) 640, 643, 650, 653 Series On-Highway Transmissions Parts Catalog
130 pages
ans-c01_8
No ratings yet
ans-c01_8
11 pages
Modular Multiplication
No ratings yet
Modular Multiplication
5 pages
Super South Bridge: Preliminary
No ratings yet
Super South Bridge: Preliminary
239 pages
Work From Home HandBook For FFA RMs
No ratings yet
Work From Home HandBook For FFA RMs
3 pages
Computer Science-Xii-Model Test Paper-1
No ratings yet
Computer Science-Xii-Model Test Paper-1
10 pages
Amazon Resume Examples
100% (2)
Amazon Resume Examples
8 pages
Viking Series 90 Operating & Maintenance Instructions: Filters
No ratings yet
Viking Series 90 Operating & Maintenance Instructions: Filters
15 pages
History and Types of Computer Mouse
No ratings yet
History and Types of Computer Mouse
6 pages
Appointment and Confirmation Letter PDF
No ratings yet
Appointment and Confirmation Letter PDF
2 pages
Curicullum Vitae ATS Friendly & Fortofolio-1
No ratings yet
Curicullum Vitae ATS Friendly & Fortofolio-1
3 pages
Assignment 3_553
No ratings yet
Assignment 3_553
9 pages
MiniPack Sealmatic - 56T-79T
No ratings yet
MiniPack Sealmatic - 56T-79T
78 pages
Problem Set 5
No ratings yet
Problem Set 5
4 pages
Networking Cheat Sheet - by Codelivly
No ratings yet
Networking Cheat Sheet - by Codelivly
5 pages
Logistic Officer HADAAF
No ratings yet
Logistic Officer HADAAF
6 pages
CS QP - CLASS XI ANNUAL EXAM APRIL 30TH (1)
No ratings yet
CS QP - CLASS XI ANNUAL EXAM APRIL 30TH (1)
5 pages
BCS 11 - June2010 June2023
No ratings yet
BCS 11 - June2010 June2023
91 pages
CSE MINI PROJECT Report TemplatePDF 231122 133402
No ratings yet
CSE MINI PROJECT Report TemplatePDF 231122 133402
9 pages
Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
2 pages
Brochure XA (H, T, V, X) S 186-600C, XRHS 666C, V900, X1300, XA (T, V) S 100-1200 Cud China 2958 0961 52 English
No ratings yet
Brochure XA (H, T, V, X) S 186-600C, XRHS 666C, V900, X1300, XA (T, V) S 100-1200 Cud China 2958 0961 52 English
2 pages
Unacademy PRMO PAPERS SET 2 Questions
No ratings yet
Unacademy PRMO PAPERS SET 2 Questions
2 pages
Flexible Pin Bush Couplings
No ratings yet
Flexible Pin Bush Couplings
9 pages