Friendly Introduction To Numerical Analysis 1st Edition Bradie Solutions Manual PDF
Friendly Introduction To Numerical Analysis 1st Edition Bradie Solutions Manual PDF
Friendly Introduction To Numerical Analysis 1st Edition Bradie Solutions Manual PDF
1. Provide the floating point equivalent for each of the following numbers from the
floating point number system F(10, 4, 0, 4). Consider both chopping and round-
ing. Compute the absolute and relative error in each floating point equivalent.
(a) √π (b) e
(c) 2 (d) 1/7
(e) cos 22◦ (f ) ln 10
√
(g) 3 9
In the following table, δ denotes the absolute error and ǫ the relative error.
Chopping Rounding
y f l(y) error f l(y) error
π 3.141 δ = 5.927 × 10−4 3.142 δ = 4.073 × 10−4
ǫ = 1.886 × 10−4 ǫ = 1.297 × 10−4
e 2.718 δ = 2.818 × 10−4 2.718 δ = 2.818 × 10−4
√ ǫ = 1.037 × 10−4 ǫ = 1.037 × 10−4
2 1.414 δ = 2.136 × 10−4 1.414 δ = 2.136 × 10−4
ǫ = 1.510 × 10−4 ǫ = 1.510 × 10−4
1/7 0.1428 δ = 5.714 × 10−4 0.1429 δ = 4.286 × 10−4
ǫ = 4.000 × 10−4 ǫ = 3.000 × 10−4
cos 22◦ 0.9271 δ = 8.385 × 10−5 0.9272 δ = 1.615 × 10−5
ǫ = 9.044 × 10−5 ǫ = 1.741 × 10−5
ln 10 2.302 δ = 5.851 × 10−4 2.303 δ = 4.149 × 10−4
√ ǫ = 2.541 × 10−4 ǫ = 1.802 × 10−4
3
9 2.080 δ = 8.382 × 10−5 2.080 δ = 8.382 × 10−5
ǫ = 4.030 × 10−5 ǫ = 4.030 × 10−5
2. Prove the bounds on the absolute and relative roundoff error associated with
rounding:
1 e−k |f lround (y) − y| 1
|f lround (y) − y| ≤ β and ≤ β 1−k .
2 |y| 2
Consider the floating point system F(β, k, m, M ) with rounding. Let y be a real
number whose expansion is given by
y = ±(0.d1 d2 d3 · · · dk dk+1 · · ·)β × β e
Full download all chapters instantly please go to Solutions Manual, Test Bank site: TestBankLive.com
2 Section 1.3
3. Show that machine precision is the smallest floating point number, v, such that
f l(1 + v) > 1.
First consider the floating point number system F(β, k, m, M ) with chopping. The
number one is represented by the expansion
(0.1 00 · · 00} )β × β 1 .
| ·{z
k−1 zeros
If we let
v = u = β 1−k = · · 00} )β × β 2−k
(0.1 |00 ·{z
k−1 zeros
= (0. 00
| ·{z· · 00} 1 00 · · 00} )β × β 1 ,
| ·{z
k−1 zeros k−1 zeros
then
1 + v = (0.1 00
| ·{z
· · 00} 1 00 · · 00} )β × β 1
| ·{z
k−2 zeros k−1 zeros
and
f lchop (1 + v) = 1.00 · · · 001 > 1.
If we assign to v any value smaller than u, then the kth digit in the mantissa of
1 + v is zero and f lchop (1 + v) = 1. Thus, with chopping, machine precision is
the smallest floating point number, v, such that f l(1 + v) > 1.
Now, consider the floating point number system F(β, k, m, M ) with rounding. For
notational convenience, let d denote β/2. If we take
1 1−k
v=u= β = (0.d 00 · · 00} )β × β 1−k
| ·{z
2
k−1 zeros
Floating Point Number Systems 3
= (0. 00
| ·{z
· · 00} d 00 · · 00} )β × β 1 ,
| ·{z
k zeros k−1 zeros
then
1 + v = (0.1 00 · · 00} d 00
| ·{z · · 00} )β × β 1
| ·{z
k−1 zeros k−1 zeros
and
f lround (1 + v) = 1.00 · · · 001 > 1.
If we assign to v any value smaller than u, then the (k + 1)st digit in the mantissa
of 1 + v is smaller than β/2 and f lround (1 + v) = 1. Thus, with rounding, machine
precision is the smallest floating point number, v, such that f l(1 + v) > 1.
(a) Assuming the floating point system uses rounding, here is an algorithm to
determine machine precision. Multiplication by β is performed in the output
step because the while loop terminates when one too many divisions by β
have been carried out.
GIVEN: base β
(c) In general, machine precision with rounding is 12 β 1−k and the smallest positive
number is (0.1)β × β m = β m−1 . Assuming β = 2, we solve 2−k = 2.22045 ×
10−16 to find k = 52 in both single and double precision on the SunBlade 100.
In single precision, we solve 2m−1 = 1.4013 × 10−45 to find m = −148; in
double precision, the equation 2m−1 = 4.94066 × 10−324 yields m = −1073.
5. Determine machine precision, the smallest positive number and the largest pos-
itive number for the floating point number system used by your calculator.
Assuming the calculator uses β = 10, determine the values for k, m and M.
6. Determine the number of significant decimal digits and the number of significant
binary digits to which each of the following pairs of numbers agree.
(a) 355/113 and π
(b) 685/252 and e
√ √
(c) 10002 and 10001
(d) 103/280 and 1/e
(a) Because
355
113 − π
π = 8.491 × 10
−8
and
10−8 < 8.491 × 10−8 ≤ 10−7 ,
355
it follows that 113 and π agree to at least 7 and at most 8 decimal digits.
Since
2−24 = 5.960 × 10−8 < 8.491 × 10−8 < 1.192 × 10−7 = 2−23 ,
355
we see that 113 and π agree to at least 23 and at most 24 binary digits.
(b) Because
685
252 − e
e = 1.025 × 10
−5
and
10−5 < 1.025 × 10−5 ≤ 10−4 ,
Floating Point Number Systems 5
685
it follows that 252 and e agree to at least 4 and at most 5 decimal digits.
Since
2−17 = 7.629 × 10−6 < 1.025 × 10−5 < 1.526 × 10−5 = 2−16 ,
685
we see that 252 and e agree to at least 16 and at most 17 binary digits.
(c) Because
√
10002 − √10001
√ = 4.999 × 10−5
10001
and
10−5 < 4.999 × 10−5 ≤ 10−4 ,
√ √
it follows that 10002 and 10001 agree to at least 4 and at most 5 decimal
digits. Since
2−15 = 3.052 × 10−5 < 4.999 × 10−5 < 6.103 × 10−5 = 2−14 ,
√ √
we see that 10002 and 10001 agree to at least 14 and at most 15 binary
digits.
(d) Because
103
280 − 1/e
1/e = 6.061 × 10
−5
and
10−5 < 6.061 × 10−5 ≤ 10−4 ,
103
it follows that 280 and 1/e agree to at least 4 and at most 5 decimal digits.
Since
2−15 = 3.052 × 10−5 < 6.061 × 10−5 < 6.103 × 10−5 = 2−14 ,
103
we see that 280 and 1/e agree to at least 14 and at most 15 binary digits.
7. The ideal gas law states that P V = nRT , where P is the pressure of the gas, V
is the volume, n is the number of moles, T is the temperature and R = 0.08206
atm·m3 /moles·K is the universal gas constant.
(a) Experimentally, it has been determined that P = 0.750 atm, V = 1.15 m3
and T = 294.1K. Assuming that all values have been rounded to the digits
shown, in what range of values does n fall?
(b) Experimentally, it has been determined that V = 0.331 m3 , n = 0.00712
moles and T = 264.7K. Assuming that all values have been rounded to the
digits shown, in what range of values does P fall?
6 Section 1.3
(a) With
0.7495 atm < P < 0.7505 atm
1.145 m3 < V < 1.155 m3
294.05 K < T < 294.15 K
it follows from the ideal gas law that
(0.7495)(1.145) (0.7505)(1.155)
<n<
(0.08206)(294.15) (0.08206)(294.05)
or 0.03555 moles < n < 0.03592 moles.
(b) With
0.3305 m3 < V < 0.3315 m3
0.007115 moles < n < 0.007125 moles
264.65 K < T < 264.75 K
it follows from the ideal gas law that
(0.007115)(0.08206)(264.65) (0.007125)(0.08206)(264.75)
<P <
0.3315 0.3305
or 0.46612 atm < P < 0.46836 atm.
(a) With
7.75 cm < length < 7.85 cm
3.05 cm < width < 3.15 cm
4.15 cm < depth < 4.25 cm
it follows that
and we determined in part (a) that 98.095625 cm3 < volume < 105.091875 cm3 ,
so
grams 243.625 243.275 grams
2.32 = < density < = 2.48 .
cm3 105.091875 98.095625 cm3
Floating Point Number Systems 7
it follows that
1.145 1.155
4π 2 < g < 4π 2 ,
2.252 2.152
2 2
or 8.929 m/s < g < 9.864 m/s .
10. Determine machine precision, the smallest positive number and the largest posi-
tive number in the IEEE standard double precision system. Approximately how
many significant decimal digits does the double precision standard supply?
11. In addition to the standard single and double precision floating point systems,
Intel microprocessors also have an extended precision system F(2, 64, −16381, 16384).
Determine machine precision, the smallest positive number and the largest pos-
itive number for this extended precision system.
12. IBM System/390 mainframes provide three floating point number systems: short
precision F(16, 6, −64, 63), long precision F(16, 14, −64, 63) and extended preci-
sion F(16, 28, −64, 63). Compare machine precision, the smallest positive num-
ber and the largest positive number for each of these number systems.
In the short precision system F(16, 6, −64, 63), machine precision with rounding is
1 1−6
u= 16 = 2−21 ≈ 4.77 × 10−7 ,
2
while machine precision with rounding in the long precision system F(16, 14, −64, 63)
is
1
u = 161−14 = 2−53 ≈ 1.11 × 10−16 .
2
In the extended precision system F(16, 28, −64, 63), machine precision with round-
ing is
1
u = 161−28 = 2−109 ≈ 1.54 × 10−33 .
2
Accordingly, the short precision system provides between 6 and 7 significant decimal
digits, the long precision system provides between 15 and 16 significant decimal
digits and the extended precision system provides between 32 and 33 significant
decimal digits. In all three systems, the smallest positive number is
(1 − 16−6 ) · 1663 ,
(1 − 16−14 ) · 1663 , and
(1 − 16−28 ) · 1663 ,
respectively.
1 1−10
u= 10 = 5 × 10−10 .
2
Floating Point Number Systems 9
14. (a) Show that the number of elements in the set F(β, k, m, M ) is given by
1 + 2(β − 1)β k−1 (M − m + 1).
(b) How many elements are in the IEEE standard single precision number
system?
(c) How many elements are in the IEEE standard double precision number
system?
(b) IEEE standard single precision is the system F(2, 24, −125, 128). Therefore,
the IEEE standard single precision number system has
elements.
(c) IEEE standard double precision is the system F(2, 53, −1021, 1024). There-
fore, the IEEE standard single precision number system has
1 + 2(2 − 1)253−1 (1024 − (−1021) + 1) = 18, 428, 729, 675, 200, 069, 633
≈ 1.84 × 1019
elements.
(b) Suppose we were to change the constant term to 4 − 10−8 . What are
the zeros of this new function? Relative to the size of the change in the
constant term, how big is the change in the zeros of the function?
(c) Now, suppose we were to change the constant term to 4 + 10−8 . What are
the zeros of this new function? Relative to the size of the change in the
constant term, how big is the change in the zeros of the function?
Observe that the change in the zeros (±0.0001) is 10,000 times larger than
the change in the constant term in the function.
(c) Finally, consider the function f (x) = x2 − 4x + (4 + 10−8 ). By the quadratic
formula, the zeros of this new function are
p
4 ± 16 − 4(4 + 10−8 )
x =
2
= 2 ± 0.0001 · i
Observe that the change in the zeros (±0.0001 · i) is 10,000 times larger than
the change in the constant term in the function.
(a) Multiplying
dx 1 sin t
+ x=
dt t t
by t yields
dx
+ x = sin t.
dt
Floating Point Number Systems 11
Note that the terms on the left-hand side of this latter equation are equal
to the derivative of the product tx. Integrating both sides of this equation
therefore produces
C − cos t
tx = − cos t + C or x(t) = ,
t
where C is a constant of integration. Using the initial condition x(π/2) = x0 ,
we determine
C −0 πx0
x0 = or C= .
π/2 2
Hence, the solution of the initial value problem is
πx0 cos t
x(t) = − .
2t t
(b) The general solution to the differential equation remains
C − cos t
x(t) = ,
t
where C is a constant of integration. Using the initial condition x(π/2) =
x0 + ǫ, we determine
C −0 π(x0 + ǫ)
x0 + ǫ = or C= .
π/2 2
Hence, the solution to the perturbed initial value problem is
π(x0 + ǫ) cos t
x(t) = − .
2t t
(c) The difference between the solutions obtained in parts (a) and (b) is
πǫ
.
2t
Because this difference decays to zero as t → ∞, this problem is not ill-
conditioned.
(a) Multiplying
dx 1
− x = t sin t
dt t
by t−1 yields
dx 1
− 2 x = sin t.
dt t
Note that the terms on the left-hand side of this latter equation are equal to
the derivative of the product t−1 x. Integrating both sides of this equation
therefore produces
x
= − cos t + C or x(t) = t(C − cos t),
t
where C is a constant of integration. Using the initial condition x(π/2) = x0 ,
we determine
π 2x0
x0 = (C − 0) or C= .
2 π
Hence, the solution of the initial value problem is
2x0
x(t) = t − cos t .
π
(c) The difference between the solutions obtained in parts (a) and (b) is
2ǫt
.
π
Because this difference tends toward infinity as t → ∞, meaning that a small
change in input data can result in a large change in the output, this problem
is ill conditioned.
T
(a) Solve the system for the right-hand side vector b = 3.2 5.8 .
T
(b) Solve the system for the right-hand side vector b = 3.21 5.79 .
T
(c) Solve the system for the right-hand side vector b = 3.1 5.7 .
(d) By considering the difference between the solutions obtained in parts (a),
(b), and (c), comment on the conditioning of this problem.
If we multiply the first equation by 2 and the second equation by−1.1 and
then add, we obtain 0.02y = 0.02. Thus, y = 1. Back substituting into either
of the original equations yields x = 1.
(b) Working as we did in part (a), we find the solution corresponding to the right-
T
hand side vector b = 3.21 5.79 is x = −1.95 and y = 2.55.
(c) Working as we did in part (a), we find the solution corresponding to the right-
T
hand side vector b = 3.1 5.7 is x = 9.5 and y = −3.5.
(d) Given that small changes to the right-hand side vector resulted in relatively
large changes to the solution vector, it appears that this problem is ill condi-
tioned
Full download all chapters instantly please go to Solutions Manual, Test Bank site: TestBankLive.com