The Conversion Procedure (Decimal To Floating Point)
The Conversion Procedure (Decimal To Floating Point)
The rules for converting a decimal number into floating point are as follows:
A. Convert the absolute value of the number to binary, perhaps with a fractional part after
the binary point. This can be done by converting the integral and fractional parts
separately. The integral part is converted with the techniques examined previously. The
fractional part can be converted by multiplication. This is basically the inverse of the
division method: we repeatedly multiply by 2, and harvest each one bit as it appears left
of the decimal.
B. Append × 20 to the end of the binary number (which does not change its value).
111001.1
1.110011 x 26
C. Normalize the number. Move the binary point so that it is one bit from the left. Adjust the
exponent of two so that the value does not change.
D. Place the mantissa into the mantissa field of the number. Omit the leading one, and fill
with zeros on the right.
E. Add the bias to the exponent of two, and place it in the exponent field. The bias is
2k−1 − 1, where k is the number of bits in the exponent field. For the eight-bit format,
k = 3, so the bias is 23−1 − 1 = 3. For IEEE 32-bit, k = 8, so the bias is 28−1 − 1 = 127.
F. Set the sign bit, 1 for negative, 0 for positive, according to the sign of the original
number.
Using The Conversion Procedure
Convert 2.625 to our 8-bit floating point format.
A. The integral part is easy, 210 = 102. For the fractional part:
B. So 4.7510 = 100.112.
C. Normalize: 100.112 = 1.00112 × 22.
D. Mantissa is 0011, exponent is 2 + 3 = 5 = 1012, sign bit is 1.
B. So 0.4062510 = 0.011012.
C. Normalize: 0.011012 = 1.1012 × 2-2.
D. Mantissa is 1010, exponent is -2 + 3 = 1 = 0012, sign bit is 0.
B. The reason why the process seems to continue endlessly is that it does. The
number 7/10, which makes a perfectly reasonable decimal fraction, is a repeating
fraction in binary, just as the faction 1/3 is a repeating fraction in decimal. (It
repeats in binary as well.) We cannot represent this exactly as a floating point
number. The closest we can come in four bits is .1011. Since we already have a
leading 1, the best eight-bit number we can make is 1.1011.
C. Already normalized: 1.10112 = 1.10112 × 20.
D. Mantissa is 1011, exponent is 0 + 3 = 3 = 0112, sign bit is 0.
The result is 00111011 = 3b16. This is not exact, of course. If you convert it back to
decimal, you get 1.6875.
B. So 0.101562510 = 0.00011012.
C. Normalize: 0.00011012 = 1.1012 × 2-4.
D. Mantissa is 10100000000000000000000, exponent is -4 + 127 = 123 =
011110112, sign bit is 0.
B. So 39887.562510 = 1001101111001111.10012.
C. Normalize: 1001101111001111.10012 = 1.00110111100111110012 × 215.
D. Mantissa is 00110111100111110010000, exponent is 15 + 127 = 142 =
100011102, sign bit is 0.
If the binary exponent is very large or small, you can convert the mantissa directly to decimal
without de-normalizing. Then use a calculator to raise two to the exponent, and perform the
multiplication. This will give an approximate answer, but is sufficient in most cases.
Exponents 23 22 21 20 2-1
Place Values 8 4 2 1 0.5
Bits 1 0 1 1 . 1
Value 8 + 2 + 1 + 0.5 = 11.5
G. Sign: negative.
Result: e7 is -11.5
E. Sign: positive
Result: 26 is 0.6875.
E. Sign: negative
Result: d3 is -4.75.
c. Convert the 32-bit floating point number 44361000 (in hex) to decimal.
A. Convert and separate: 4436100016 = 01000100001101100001000000000000 2
B. Exponent: 100010002 = 13610; 136 − 127 = 9.
C. Denormalize: 1.011011000012 × 29 = 1011011000.01.
D. Convert:
E. Sign: positive
Exponent 0 -1 -2
2 2 2 2-3 2-4 2-5 2-6 2-7
s
Place 0.2 0.12 0.062 0.0312 0.01562 0.007812
1 0.5
Values 5 5 5 5 5 5
Bits 0 .0 0 1 1 0 1 1
0.12 0.062 0.01562 0.007812 0.210937
Value + + + =
5 5 5 5 5
E. Sign: negative
e. Convert the 32-bit floating point number a3358000 (in hex) to decimal.
A. Convert and separate: a335800016 = 10100011001101011000000000000000 2
B. Exponent: 010001102 = 7010; 70 − 127 = -57.
C. Since the exponent is far from zero, convert the original (normalized) mantissa:
Expone 0 -1 -2
2 2 2 2-3 2-4 2-5 2-6 2-7 2-8
nts
Place 0. 0.2 0.12 0.06 0.031 0.0156 0.0078 0.00390
1
Values 5 5 5 25 25 25 125 625
Bits 1 .0 1 1 0 1 0 1 1
0.2 0.12 0.031 0.0078 0.00390 1.41796
Value 1 + + + + + =
5 5 25 125 625 875
D. Use calculator to find 1.41796875 × 2-57. You should get something like
9.83913471531 × 10-18 .
E. Sign: negative
f. Convert the 32-bit floating point number 76650000 (in hex) to decimal.
A. Convert and separate: 7665000016 = 01110110011001010000000000000000 2
B. Exponent: 111011002 = 23610; 236 − 127 = 109.
C. Since the exponent is far from zero, convert the original (normalized) mantissa:
D. Use calculator to find 1.7890625 × 2109. You should get something like
1.16116794981 × 1033 .
E. Sign: positive