CH 22
CH 22
CH 22
2 2
22.1
A GAME OF CHANCE A variation of the Russian roulette game calls for spinning a wheel marked along the perimeter with n consecutive numbers: 1 to n. The probability that the wheel will stop at number i after one spin is pi. A player pays $x for the privilege of spinning the wheel a maximum of m times. The payoff to the player is twice the number produced in the last spin. Assuming that the game (of up to m spins each) is repeated a reasonably large number of times, devise an optimal strategy for the player. We can construct the problem as a DP model using the following definitions: 1. Stage i is represented by spin i, i = 1, 2, , m. 2. The alternatives at each stage include either spinning the wheel once more or ending the game. 3. The state j of the system at stage i is one of the numbers 1 to n obtained in the last spin.
CD-47
CD-48
Chapter 22
Let
fi 1 j2 = Maximum expected return given that the game is at stage 1spin2 i and that j is the outcome of the last spin 2j, Expected payoff at stage i n a b = c given last spins outcome j a pkfi + 11k2,
k=1
a pkfi + 11k2
s i = 2, 3, , m
The rationale for the recursive equation is that at the first spin 1i = 12, the state of the system is j = 0, because the game has just started. Hence, f1102 = p1f2112 + p2f2122 + + pnf21n2. After the last spin 1i = m2, the game must end regardless of the outcome j of the mth spin. Thus, fm + 11j2 = 2j. The recursive calculations start with fm + 1 and terminate with f1102, thus producing m + 1 computational stages. Because f1102 is the expected return from all m spins, and given that the game costs $x, the net return is f1102 - x. Example 22.1-1
Suppose that the perimeter of the Russian roulette wheel is marked with the numbers 1 to 5. The probability pi of stopping at the number i is given by p1 = .3, p2 = .25, p3 = .2, p4 = .15, and p5 = .1. The player pays $5 for a maximum of four spins. Determine the optimal strategy for each of the four spins and the expected net return. Stage 5 f51j2 = 2j, j = 1, 2, 3, 4, or 5
k=1
Decision: End if j = 1, 2, 3, 4, or 5, f51j2 = 2j Stage 4 f41j2 = max52j, p1f5112 + p2f5122 + p3f5132 + p4f5142 + p5f51526 = max52j, 56 = max52j, .3 * 2 + .25 * 4 + .2 * 6 + .15 * 8 + .1 * 106 5, if j = 1 or 2 2j, if j = 3, 4, or 5
= e
22.1 Stage 3
A Game of Chance
CD-49
f31j2 = max52j, p1f4112 + p2f4122 + p3f4132 + p4f4142 + p5f41526 = max52j, 6.156 = max52j, .3 * 5 + .25 * 5 + .2 * 6 + .15 * 8 + .1 * 106 6.15, if j = 1, 2, or 3 2j, if j = 4 or 5
= e
Decision: Spin if j = 1, 2, or 3, f31j2 = 6.15. End if j = 4 or 5, f31j2 = 2j. Stage 2 f21j2 = max52j, p1f3112 + p2f3122 + p3f3132 + p4f3142 + p5f31526 = max52j, 6.81256 = max52j, .3 * 6.15 + .25 * 6.15 + .2 * 6.15 + .15 * 8 + .1 * 106 6.8125, if j = 1, 2, or 3 2j, if j = 4 or 5
= e
Decision: Spin if j = 1, 2, or 3, f31j2 = 6.8125. End if j = 4 or 5, f31j2 = 2j. Stage 1 f1102 = p1f2112 + p2f2122 + p3f2132 + p4f2142 + p5f2152 = .3 * 6.8125 + .25 * 6.8125 + .2 * 6.8125 + .15 * 8 + .1 * 10 = 7.31 Decision: Spin, f1102 = $7.31. From the preceding calculations, the optimal solution is
Spin no. 1 2 3 4 Optimal strategy Game starts, spin Continue if spin 1 produces 1, 2, or 3; else, end game Continue if spin 2 produces 1, 2, or 3; else, end game Continue if spin 3 produces 1 or 2; else, end game Expected net return = $7.31 - $5.00 = $2.31
CD-50
Chapter 22
to 3 consecutive days. At the end of each day, I will decide whether or not to accept the best offer made that day. What should be my optimal strategy regarding the acceptance of an offer?
22.2
INVESTMENT PROBLEM An individual wishes to invest up to $C thousand in the stock market over the next n years. The investment plan calls for buying the stock at the start of the year and selling it at the end of the same year. Accumulated money may then be reinvested (in whole or in part) at the start of the following year. The degree of risk is represented by expressing the return probabilistically. A study of the market shows that the return on investment is affected by m market conditions and that condition k yields a return rk (positive, zero, or negative) with probability pk, k = 1, 2, , m. How should the amount C be invested to realize the highest accumulation at the end of n years? Define xi = Available funds at the start of year i, i = 1, 2, , n 1x1 = C2 yi = Invested funds at the start of year i 1yi xi2 The elements of the DP model can be described as 1. Stage i is represented by year i. 2. The alternatives at stage i are given by yi. 3. The state at stage i is given by xi. Let fi 1xi2 = Maximum expected funds for years i, i + 1, p , and n, given xi at the start of year i For market condition k, we have xi + 1 = 11 + rk2yi + 1xi - yi2 = xi + rkyi, k = 1, 2, , m Given that market condition k occurs with probability pk, the DP recursive equation is written as fi1xi2 = max e a pkfi + 11xi + rkyi2 f , i = 1, 2, , n
0 yi xi k=1 m
where fn + 11xn + 12 = xn + 1 because no investment occurs after year n. For year n, we have fn1xn2 = max e a pk1xn + rkyn2 f 0y x
n n
k=1
= xn +
max e a a pkrk b yn f 0y x
n n
k=1
22.2 Letting
m
Investment Problem
CD-51
r = a pkrk
k=1
we get
yn = e fn1xn2 = e
0, xn,
if r 0 if r 7 0 if r 0 if r 7 0
xn, 11 + r2xn,
Example 22.2-1
In the investment model, suppose that you want to invest $10,000 over the next 4 years. There is a 50% chance that you will double your money, a 20% chance that you will break even, and a 30% chance that you will lose the invested amount. Devise an optimal investment strategy. Using the notation of the model, we have C = $10,000, n = 4, m = 3 p1 = .4, p2 = .2, p3 = .4 r1 = 1, r2 = 0, r3 = - 1 Stage 4 r = .5 * 1 + .2 * 0 + .3 * - 1 = .2 Thus, f41x42 = 1.2x4
= 1.44x3
CD-52
The optimal investment policy can thus be summarized as follows: Because y i = xi for i = 1 to 4, the optimal solution calls for investing all available funds at the start of each year. The accumulated funds at the end of 4 years are 2.0736x1 = 2.07361 $10,000) = $20,376.
22.2
Investment Problem
CD-53
Actually, it can be shown by induction that the problem has the following general solution at stage i, i = 1, 2, , n. fi1xi2 = e yi = e xi, 11 + r2n - i + 1, 0, xi, if r 0 if r 7 0 if r 0 if r 7 0
2. A 10-m3 compartment is available for storing three items. The volumes needed to store 1 unit of items 1, 2, and 3 are 2, 1, and 3 m3, respectively. The probabilistic demand for the items is described as follows:
Probability of demand No. of units 1 2 3 4 Item 1 .5 .5 .0 .0 Item 2 .3 .4 .2 .1 Item 3 .3 .2 .5 .0
The shortage costs per unit for items 1, 2, and 3 are $8, $10, and $15, respectively. How many units of each item should be held in the compartment? 3. HiTec has just started to produce supercomputers for a limited period of 4 years. The annual demand, D, for the new computer is described by the following distribution: p1D = 12 = .5, p1D = 22 = .3, p1D = 32 = .2 The production capacity of the plant is three computers annually at the cost of $5 million each. The actual number of computers produced per year may not equal the demand exactly. An unsold computer at the end of a year incurs $1 million in storage and maintenance costs. A loss of $2 million occurs if the delivery of a computer is delayed by 1 year. HiTec will not accept new orders beyond year 4 but will continue production in year 5 to satisfy any unfilled demand at the end of year 4. Determine the optimal annual production schedule for HiTec.
CD-54
Chapter 22
*4. The PackRat Outdoors Company owns three sports centers in downtown Little Rock. On Easter Day, bicycle riding is a desirable outdoors activity. The company owns a total of eight rental bikes to be allocated to the three centers with the objective of maximizing revenues. The demand for the bikes and the hourly rental cost to customers vary by location and are described by the following distributions:
Probability of demand No. of bikes 0 1 2 3 4 5 6 7 8 Rental cost/hr ($) Center 1 .1 .2 .3 .2 .1 .1 0 0 0 6 Center 2 .02 .03 .10 .25 .30 .15 .05 .05 .05 7 Center 3 0 .15 .25 .30 .15 .10 .025 .025 0 5
How should PackRat allocate the eight bikes to the three centers?
22.3
MAXIMIZATION OF THE EVENT OF ACHIEVING A GOAL Section 22.2 deals with maximizing the optimal expected return. Another useful objective is the maximization of the probability of achieving a certain level of return. We use the investment situation in Section 22.2 to illustrate the application of the new criterion. As in Section 22.2, the definitions of the stage, i, alternative, yi, and state, xi, remain the same. The new criterion maximizes the probability of realizing a sum of money, S, at the end of n years. Define fi 1xi2 = Probability of realizing the amount S given xi is the amount of funds available at the start of year i and that an optimal policy is implemented for years i, i + 1, p , and n The DP recursive equation is thus given as fn1xn2 = max e a pkP5xn + rkyn S6 f
k=1 m m
0 yn xn
k=1
The recursive formula is based on the conditional probability law P5A6 = a P5A Bk6P5Bk6
k=1 m
22.3
CD-55
Example 22.3-1
An individual wants to invest $2000. Available options include doubling the amount invested with probability .3 or losing all of it with probability .7. Investments are sold at the end of the year, and reinvestment, in whole or part, starts again at the beginning of the following year. The process is repeated for three consecutive years. The objective is to maximize the probability of realizing $4000 at the end of the third year. For simplicity, assume that all investments are in multiples of $1000. Using the notation of the model, we say that r1 = 1 with probability .3, and r2 = - 1 with probability .7. Stage 3. At stage 3, the state x3 can be as small as $0 and as large as $8000. The minimum value is realized when the entire investment is lost, and the maximum value occurs when the investment is doubled at the end of each of the first 2 years. The recursive equation for stage 3 is thus written as f31x32 =
y3 = 0,1, ,x3
max
where x3 = 0, 1, , 8. Table 22.1 details the computations for stage 3. All the shaded entries are infeasible because they do not satisfy the condition y3 x3. Also, in carrying out the computations, we notice that P5x3 + y3 46 = e P5x3 - y3 46 = e 0, if x3 + y3 6 4 1, if otherwise 0, if x3 - y3 6 4 1, if otherwise
Although Table 22.1 shows that alternative optima exist for x3 = 1, 3, 4, 5, 6, 7, and 8, the optimum (last) column provides only the smallest optimum y3. The assumption here is that the investor is not going to invest more than what is absolutely necessary to achieve the desired goal. Stage 2 f21x22 = 5.3f31x2 + y22 + .7f31x2 - y226
y2 = 0,1, ,x2
max
The associated computations are given in Table 22.2. Stage 1 f11x12 = max 5.3f21x1 + y12 + .7f21x1 - y126
y1 = 0,1,2
Table 22.3 provides the associated computations. The optimum strategy is determined in the following manner: Given the initial investment x1 = $2000, stage 1 (Table 22.3) yields y1 = 0, which means that no investment should be made in year 1. The decision not to invest in year 1 leaves the investor with $2000 at the start of year 2. From stage 2 (Table 22.2), x2 = 2 yields y2 = 0, indicating once again that no investment should occur in year 2. Next, using x3 = 2, stage 3 (Table 22.1) shows that y3 = 2, which calls for investing the entire amount in year 3. The associated maximum probability for realizing the goal S = 4 is f1122 = .3.
CD-56
.3P 1x3 + y3 42 + .7P 1x3 - y3 42 1 2 3 4 5 6 7 8 Optimum f3 0 .3 * 0 + .7 * 0 = 0 .3 * 0 + .7 * 0 = 0 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 1 = 1 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .31 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 0 .3 .3 1 1 1 1 1 y3 0 0 2 1 0 0 0 0 0
TABLE 22.1
x3
y3 = 0
.3 * 0 + .7 * 0 = 0
.3 * 0 + .7 * 0 = 0
.3 * 0 + .7 * 0 = 0
.3 * 0 + .7 * 0 = 0
.3 * 1 + .7 * 1 = 1
.3 * 1 + .7 * 1 = 1
.3 * 1 + .7 * 1 = 1
.3 * 1 + .7 * 1 = 1
.3 * 1 + .7 * 1 = 1
References
TABLE 22.2 .3f3 1x2 + y22 + .7f3 1x2 - y22 x2 0 1 2 3 4 y2 = 0 .3 * 0 + .7 * 0 = 0 .3 * 0 + .7 * 0 = 0 .3 * .3 + .7 * .3 = .3 .3 * .3 + .7 * .3 = .3 .3 * 1 + .7 * 1 = 1 .3 * .3 + .7 * 0 = .9 .3 * .3 + .7 * 0 = .09 .3 * 1 + .7 * .3 = .51 .3 * 1 + .7 * .3 = .51 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * .3 = .51 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 .3 * 1 + .7 * 0 = .3 1 2 3 4
CD-57
TABLE 22.3 .3f2 1x1 + y12 + .7f2 1x1 - y12 x1 2 y1 = 0 .3 * .3 + .7 * .3 = .3 1 .3 * .51 + .7 * .09 = .216 2 .3 * 1 + .7 * 0 = .3 = .3 Optimum f1 .3 y1 0
REFERENCES
Bertsekas, D., Dynamic Programming: Deterministic and Stochastic Models, Prentice Hall, Upper Saddle River, NJ, 1987. Cooper, L., and M. Cooper, Introduction to Dynamic Programming, Pergamon Press, New York, 1981. Smith, D., Dynamic Programming: A Practical Introduction, Ellis Horwood, London, 1991.