PDF

CPE 431/531 Homework #6 Solution Fall 2016
The University of Alabama in Huntsville

ECE Department
CPE 431 01, CPE 531 01/91
Fall 2016
Homework #6 Solution
5.5.1(5), 5.5.2(5), 5.6.3(5), 5.7.1(10), 5.7.2(10), 5.7.3(15), 5.7.4(10), 5.7.6(20), 5.8.1(5), 5.8.2(5),
5.8.3(5), 5.9.2(5)
5.5 Media applications that play audio or video files are part of a class of workloads called
streaming workloads; i.e., they bring in large amounts of data but do not reuse much of it.
Consider a video streaming workload that accesses a 512 KiB working set sequentially with the
following address stream:
0, 2, 4, 6, 8, 10, 12, 14, 16,
m
er as
5.5.1 Assume a 64-KiB direct-mapped cache with a 32-byte block. What is the miss rate for the
address stream above? How is this miss rate sensitive to the size of the cache or the working
co
eH w
set? How would you categorize the misses this workload is experiencing, based on the 3C
model?
o.
If the stream represents byte addresses, the first address (0) will miss and bytes 0:31 will be
rs e
brought in, making the accesses to 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 hits. Then
ou urc
32 will miss and 32:63 will be brought in . So the miss rate will be 1/16. The misses are
compulsory and based only on the access pattern and the block size.
o
5.5.2 Re-compute the miss rate when the cache size is 16 bytes, 64 bytes, and 128 bytes. What kind of
aC s
locality is this workload exploiting?

vi y re
Miss rate for 16 bytes is 1/8

This workload exploits spatial locality.
ed d
ar stu
5.6 In this exercise, we will look at the different ways capacity affects overall performance. In
general, cache access time is proportional to capacity. Assume that main memory accesses take
70 ns and that memory accesses are 36% of all instructions. The following table shows data for
L1 caches attached to each of two processors, P1 and P2.
sh is
Th
L1 Size L1 Miss Rate L1 Hit Time

P1 2 KiB 8.0% 0.66 ns
P2 4 KiB 6.0% 0.90 ns
5.6.3 Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 and P2?
Which processor is faster?
Hit time is included in base CPI.
CPIP1 = 1.0 + 1.36(0.08*70ns)/0.66 ns = 12.54
CPIP2 = 1.0 + 1.36(0.06*70ns)/0.9 ns = 7.35
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/
5.7 This exercise examines the impact of different cache designs, specifically comparing associative
caches to the direct-mapped caches from section 5.4. For this exercise, use the address stream
shown in Exercise 5.2
5.7.1 Using the sequence of addresses given, show the final cache contents for a three-way set
associative cache with two-word blocks and a total size of 24 words. Use LRU replacement. For
each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or miss.
1block 1set
24words 4sets , Index 2 bits, Block Offset = 1 bit.
2words 3blocks
Index Block Offset

3 0000 0 01 1 miss
180 1011 0 10 0 miss
43 0010 1 01 1 miss
m
er as
2 0000 0 01 0 hit
191 1011 1 11 1 miss
co
88 0101 1 00 0 miss
eH w
190 1011 1 11 0 hit
o.
14 0000 1 11 0 miss
181
44 rs e
1011 0 10 1 hit
0010 1 10 0 miss
ou urc
186 1011 1 01 0 miss
253 1111 1 10 1 miss
o
*Last hex digit is actually a bit

Set Tag* Data Tag* Data Tag* Data
aC s
0 0x0000 005 1 M[88..89]

vi y re
1 0x0000 002 1 M[42..43] 0x0000 000 0 M[2..3] 0x0000 00B 1 M[186..187]

2 0x0000 002 1 M[44..45] 0x0000 00F 1 M[252..253] 0x0000 00B 0 M[180..181]
3 0x0000 00B 1 M[190..191] 0x0000 000 1 M[14..15]
ed d
5.7.2 Using the references given, show the final cache contents for a fully associative cache with one-
word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the
ar stu
index bits, the tag bits, and if it is a hit or miss. Fully associative has 0 bits of index.
sh is
3 0000 0011 miss Tag Data

180 1011 0100 miss 0x0000 0002, M[2], M[253]
Th
43 0010 1011 miss 0x0000 00FD

2 0000 0010 miss 0x0000 000E M[14]
191 1011 1111 miss 0x0000 0003, M[3], M[181]
0x0000 00B5
88 0101 1000 miss
0x0000 0058 M[88]
190 1011 1110 miss
0x0000 002B, M[43], M[186]
14 0000 1110 miss 0x0000 00BA
181 1011 0101 miss 0x0000 00BF M[191]
44 0010 1100 miss 0x0000 00B4, M[180], M[44]
186 1011 1010 miss 0x0000 002C
253 1111 1101 miss 0x0000 00BE M[190]
5.7.3 Using the references given, what Is the miss rate for a fully associative cache with two-word
blocks and a total size of 8 words, using LRU replacement? What is the miss rate for MRU (most
recently used) replacement? Finally, what is the best possible miss rate for this cache, given any
replacement policy?
8 words x 1 block/2 words = 4 blocks in one set
LRU
3 0000 001 1 miss
180 1011 010 0 miss
43 0010 101 1 miss
2 0000 001 0 hit
191 1011 111 1 miss
88 0101 100 0 miss
190 1011 111 0 hit
14 0000 111 0 miss
181 1011 010 1 miss
m
44 0010 110 0 miss
er as
186 1011 101 0 miss
253 1111 110 1 miss
co
eH w
*The last digit of the tag is three bits
Tag* Data Tag* Data Tag* Data Tag* Data
o.
0x0000 00B2, M[180..181] 0x0000 00B7, M[190..191], 0x0000 0001, M[2..3], 0x0000 0025, M[42..43],
0x0000 0054,
rs e
, M[88..89], 0x0000 00B5 M[186..187] 0x0000 00B2 M[180..181] 0x0000 0007, M[14..15],
ou urc
0x0000 0026 M[44..45] 0x0000 00F6 M[252..253]
Miss rate = 10/12 = 83.3%
MRU
o
3 0000 001 1 miss

aC s
180 1011 010 0 miss

vi y re
43 0010 101 1 miss

2 0000 001 0 hit
191 1011 111 1 miss
88 0101 100 0 miss
ed d
190 1011 111 0 miss

14 0000 111 0 miss
ar stu
181 1011 010 1 hit

44 0010 110 0 miss
186 1011 101 0 miss
253 1111 110 1 miss
sh is
*The last digit of the tag is three bits

Th
Tag* Data Tag* Data Tag* Data Tag* Data

0x0000 00B2, M[180..181] 0x0000 00B5, M[190..191] 0x0000 0001 M[2..3] 0x0000 0025 M[42..43]
0x0000 0026, , 0x0000 0054, , M[88..89],
0x0000 00B5, M[44..45], 0x0000 00B7, M[190..191]
0x0000 00F6 M[186..187] 0x0000 0007, , M[14..15],
,
M[252..253]
Miss rate 10/12 = 83.3%
Best miss rate possible = 9/12 = 75%, There are only three references that could hit.
Multilevel caching is an important technique to overcome the limited amount of space that a first level cache
can provide while still maintaining its speed. Consider a processor with the following parameters.
cache, direct-mapped
direct-mapped speed
cache, eight-way set

et associative speed
Second-level cache,
Second-level cache,
with second-level
with second-level
Processor speed
First-level cache
Global miss rate
Global miss rate

Main memory
memory stalls
miss rate per

Base CPI, no
eight-way s
access time
associative
instruction
1.5 2 GHz 100 ns 7% 12 cycles 3.5 % 28 cycles 1.5 %
m
5.7.4 Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level
er as
direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers
co
change if main memory access time is doubled? If it is cut in half?
eH w
1) First level cache only
o.
rs e
CPItotal = CPIbase + CPImemory-stalls = CPIbase + L1miss*Main Memory access
= 1.5 + 0.07*100ns*2 GHz = 1.5 + 0.07*200 = 1.5 + 14 = 15.5
ou urc
Main Memory Doubled: CPItotal = 1.5 + 0.07*200ns*2GHz = 1.5 + 28 = 29.5
Main Memory Halved: CPI total = 1.5 + 0.07*50ns*2GHz = 1.5 + 7 = 8.5
o
2) First level and econd level direct-mapped cache

aC s
CPItotal = CPIbase + CPImemory-stalls = CPIbase + L1miss*L2 access + L2miss* Main memory access
vi y re
= 1.5 + 0.07*12 + 0.035*100ns*2GHz = 1.5 + 0.84 + 7 = 9.34

Main Memory Doubled: CPItotal = 1.5 + 0.07*12 + 0.035*200ns*2GHz = 1.5 + 0.84 + 14 = 16.34
Main Memory Halved: CPItotal = 1.5 + 0.07*12 + 0.035*50ns*2GHz = 1.5 + 0.84 + 3.5 = 5.84
ed d
3) First level and second level eight-way set associative

ar stu
CPItotal = CPIbase + CPImemory-stalls = CPIbase + L1miss*L2 access + L2miss* Main memory access
= 1.5 + 0.07*28 + 0.015*100ns*2GHz = 1.5 + 1.96 + 3 = 6.46
Main Memory Doubled: CPItotal = 1.5 + 0.07*28 + 0.015*200ns*2GHz = 1.5 + 1.96 + 6 = 9.46
sh is
Main Memory Halved: CPItotal = 1.5 + 0.07*28 + 0.015*50ns*2GHz = 1.5 + 1.96 + 1.5 = 4.96
Th
5.7.6 In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external
(located on a different chip) from the main processor and the first-level cache. While this allowed for
large second-level caches, the latency to access the cache was much higher, and the bandwidth was
typically lower because the second-level cache ran at a lower frequency. Assume a 512 KiB off-chip
second-level cache has a global miss rate of 4 %. If each additional 512 KiB of cache lowered the global
miss rate by 0.7 %, and the cache had a total access time of 50 cycles, how big would the cache have to
be to match the performance of the second-level direct-mapped cache listed in the table? Of the
eight-way set-associative cache?
Direct-mapped on chip cache

CPItotal = 1.5 + 0.07*12 + 0.035*200 = 1.5 + 0.84 + 7 = 9.34
External
CPItotal = CPIbase + L1miss per instruction * L2 hit + L2 miss rate * Main memory access cycles
9.34 = 1.5 + 0.07 * 50 + (0.04 0.007n) * 100 ns * 2 GHz
7.84 = 3.5 + 200(0.04 0.007n)
4.34 = 200(0.04 0.007n)
4.34
0.04 0.007n
200
4.3.4
0.04
0.03
n 200 2.25 , so need 3 more, 512 KiB plus original 512 KiB, 2MiB
0.007 0.007
Set-associative on chip cache
m
CPItotal = 1.5 + 0.07*28 + 0.015*200 = 6.46
er as
External
co
CPItotal = CPIbase + L1miss per instruction * L2 hit + L2 miss rate * Main memory access cycles
eH w
6.46 = 1.5 + 0.07 * 50 + (0.04 0.007n) * 100 ns * 2 GHz
4.96 = 3.5 + 200(0.04 0.007n)
o.
rs e
1.46 = 200(0.04 0.007n)
ou urc
1.46
0.04 0.007n
200
1.46
0.04
o
200 0.073 0.04

n 4.67 , so need 5 more 512 KB plus original 512 KB, 3 MiB
aC s
0.007 0.007
vi y re
5.8 Mean Time Between Failures (MTBF), MEAN Time To Replacement (MTTR),a nd Mean Time to Failure
(MTTF) are useful metrics for evaluating the reliability and avilablility of a storage resource. Explore
these basic concepts by answering the questions about devices with the following metrics.
ed d
MTTF MTTR
ar stu
3 Years 1 Day
5.8.1 Calculate the MTBF for the device given.

MTBF = MTTF + MTTR = 3*365 + 1 = 1096 days
sh is
Th
5.8.2 Calculate the availability for the device given.

Availability = MTTF/(MTTF + MTTR) = 1095/(1096) = 99.9%
5.8.3 What happens to availability as the MTTR approaches 0? Is this a realistic situation?
As MTTR 0, availability approaches 1. With the emergence of inexpensive drives, having a nearly
0 replacement time for hardware is quite feasible. However, replacing fi le systems and other data
can take signifi cant time. Although a drive manufacturer will not include this time in their statistics,
it is certainly a part of replacing a disk.
5.9 This Exercise examines the single error correcting, doble error detecting (SEC/DED) Hamming code.
5.9.2 Section 5.5 states that modern servcer memory modules (DIMMs) employ SEC/DED ECC to protect
each 64 bits with 8 parity bits. Compute the cost/performance ratio of this code to the code form
5.9.1. IN this case, cost is the relative parity bits needed while performance is the relative number of
errors that can be corrected. Which is better?
5.9.1 asks what isthe miminum number of parity bits required to protect a 128-bit word using the
SEC/DED code? Need to find minimum p such that 2p __ p _ d _ 1 and then add one.
Th us 9 total bits are needed for SEC/DED.
5.9.2 Th e (72,64) code described in the chapter requires an overhead of
8/64_12.5% additional bits to tolerate the loss of any single bit within 72 bits,
providing a protection rate of 1.4%. Th e (137,128) code from part a requires an
overhead of 9/128_7.0% additional bits to tolerate the loss of any single bit within
137 bits, providing a protection rate of 0.73%. Th e cost/performance of both codes
m
er as
is as follows:
(72,64) code __ 12.5/1.4 _ 8.9
co
eH w
(136,128) code __ 7.0/0.73 _ 9.6
Th e (72,64) code has a better cost/performance ratio.
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
sh is
Th
Powered by TCPDF (www.tcpdf.org)

PDF

Uploaded by

Copyright:

Available Formats

PDF

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PDF

Uploaded by

Copyright:

Available Formats

CPE 431/531 Homework #6 Solution Fall 2016

The University of Alabama in Huntsville

locality is this workload exploiting?

Miss rate for 16 bytes is 1/8

L1 Size L1 Miss Rate L1 Hit Time

Index Block Offset

*Last hex digit is actually a bit

0 0x0000 005 1 M[88..89]

1 0x0000 002 1 M[42..43] 0x0000 000 0 M[2..3] 0x0000 00B 1 M[186..187]

3 0000 0011 miss Tag Data

43 0010 1011 miss 0x0000 00FD

3 0000 001 1 miss

180 1011 010 0 miss

43 0010 101 1 miss

190 1011 111 0 miss

181 1011 010 1 hit

*The last digit of the tag is three bits

Tag* Data Tag* Data Tag* Data Tag* Data

cache, eight-way set

Global miss rate

Global miss rate

miss rate per

2) First level and econd level direct-mapped cache

= 1.5 + 0.07*12 + 0.035*100ns*2GHz = 1.5 + 0.84 + 7 = 9.34

3) First level and second level eight-way set associative

Direct-mapped on chip cache

200 0.073 0.04

5.8.1 Calculate the MTBF for the device given.

5.8.2 Calculate the availability for the device given.

Powered by TCPDF (www.tcpdf.org)

You might also like

= 1.5 + 0.0712 + 0.035100ns*2GHz = 1.5 + 0.84 + 7 = 9.34