5.5.1(5), 5.5.2(5), 5.6.3(5), 5.7.1(10), 5.7.2(10), 5.7.3(15), 5.7.4(10), 5.7.6(20), 5.8.1(5), 5.8.2(5),
5.8.3(5), 5.9.2(5)
5.5 Media applications that play audio or video files are part of a class of workloads called
streaming workloads; i.e., they bring in large amounts of data but do not reuse much of it.
Consider a video streaming workload that accesses a 512 KiB working set sequentially with the
following address stream:
0, 2, 4, 6, 8, 10, 12, 14, 16,
m
er as
5.5.1 Assume a 64-KiB direct-mapped cache with a 32-byte block. What is the miss rate for the
address stream above? How is this miss rate sensitive to the size of the cache or the working
co
eH w
set? How would you categorize the misses this workload is experiencing, based on the 3C
model?
o.
If the stream represents byte addresses, the first address (0) will miss and bytes 0:31 will be
rs e
brought in, making the accesses to 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 hits. Then
ou urc
32 will miss and 32:63 will be brought in . So the miss rate will be 1/16. The misses are
compulsory and based only on the access pattern and the block size.
o
5.5.2 Re-compute the miss rate when the cache size is 16 bytes, 64 bytes, and 128 bytes. What kind of
aC s
5.6 In this exercise, we will look at the different ways capacity affects overall performance. In
general, cache access time is proportional to capacity. Assume that main memory accesses take
70 ns and that memory accesses are 36% of all instructions. The following table shows data for
L1 caches attached to each of two processors, P1 and P2.
sh is
Th
5.6.3 Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 and P2?
Which processor is faster?
Hit time is included in base CPI.
CPIP1 = 1.0 + 1.36(0.08*70ns)/0.66 ns = 12.54
CPIP2 = 1.0 + 1.36(0.06*70ns)/0.9 ns = 7.35
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/
CPE 431/531 Homework #6 Solution Fall 2016
5.7 This exercise examines the impact of different cache designs, specifically comparing associative
caches to the direct-mapped caches from section 5.4. For this exercise, use the address stream
shown in Exercise 5.2
5.7.1 Using the sequence of addresses given, show the final cache contents for a three-way set
associative cache with two-word blocks and a total size of 24 words. Use LRU replacement. For
each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or miss.
1block 1set
24words 4sets , Index 2 bits, Block Offset = 1 bit.
2words 3blocks
m
er as
2 0000 0 01 0 hit
191 1011 1 11 1 miss
co
88 0101 1 00 0 miss
eH w
190 1011 1 11 0 hit
o.
14 0000 1 11 0 miss
181
44 rs e
1011 0 10 1 hit
0010 1 10 0 miss
ou urc
186 1011 1 01 0 miss
253 1111 1 10 1 miss
o
5.7.2 Using the references given, show the final cache contents for a fully associative cache with one-
word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the
ar stu
index bits, the tag bits, and if it is a hit or miss. Fully associative has 0 bits of index.
sh is
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/
CPE 431/531 Homework #6 Solution Fall 2014
5.7.3 Using the references given, what Is the miss rate for a fully associative cache with two-word
blocks and a total size of 8 words, using LRU replacement? What is the miss rate for MRU (most
recently used) replacement? Finally, what is the best possible miss rate for this cache, given any
replacement policy?
8 words x 1 block/2 words = 4 blocks in one set
LRU
3 0000 001 1 miss
180 1011 010 0 miss
43 0010 101 1 miss
2 0000 001 0 hit
191 1011 111 1 miss
88 0101 100 0 miss
190 1011 111 0 hit
14 0000 111 0 miss
181 1011 010 1 miss
m
44 0010 110 0 miss
er as
186 1011 101 0 miss
253 1111 110 1 miss
co
eH w
*The last digit of the tag is three bits
Tag* Data Tag* Data Tag* Data Tag* Data
o.
0x0000 00B2, M[180..181] 0x0000 00B7, M[190..191], 0x0000 0001, M[2..3], 0x0000 0025, M[42..43],
0x0000 0054,
rs e
, M[88..89], 0x0000 00B5 M[186..187] 0x0000 00B2 M[180..181] 0x0000 0007, M[14..15],
ou urc
0x0000 0026 M[44..45] 0x0000 00F6 M[252..253]
Miss rate = 10/12 = 83.3%
MRU
o
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/
CPE 431/531 Homework #5 Solution Fall 2016
Multilevel caching is an important technique to overcome the limited amount of space that a first level cache
can provide while still maintaining its speed. Consider a processor with the following parameters.
cache, direct-mapped
direct-mapped speed
with second-level
with second-level
Processor speed
First-level cache
eight-way s
access time
associative
instruction
1.5 2 GHz 100 ns 7% 12 cycles 3.5 % 28 cycles 1.5 %
m
5.7.4 Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level
er as
direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers
co
change if main memory access time is doubled? If it is cut in half?
eH w
1) First level cache only
o.
rs e
CPItotal = CPIbase + CPImemory-stalls = CPIbase + L1miss*Main Memory access
= 1.5 + 0.07*100ns*2 GHz = 1.5 + 0.07*200 = 1.5 + 14 = 15.5
ou urc
Main Memory Doubled: CPItotal = 1.5 + 0.07*200ns*2GHz = 1.5 + 28 = 29.5
Main Memory Halved: CPI total = 1.5 + 0.07*50ns*2GHz = 1.5 + 7 = 8.5
o
CPItotal = CPIbase + CPImemory-stalls = CPIbase + L1miss*L2 access + L2miss* Main memory access
vi y re
CPItotal = CPIbase + CPImemory-stalls = CPIbase + L1miss*L2 access + L2miss* Main memory access
= 1.5 + 0.07*28 + 0.015*100ns*2GHz = 1.5 + 1.96 + 3 = 6.46
Main Memory Doubled: CPItotal = 1.5 + 0.07*28 + 0.015*200ns*2GHz = 1.5 + 1.96 + 6 = 9.46
sh is
Main Memory Halved: CPItotal = 1.5 + 0.07*28 + 0.015*50ns*2GHz = 1.5 + 1.96 + 1.5 = 4.96
Th
5.7.6 In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external
(located on a different chip) from the main processor and the first-level cache. While this allowed for
large second-level caches, the latency to access the cache was much higher, and the bandwidth was
typically lower because the second-level cache ran at a lower frequency. Assume a 512 KiB off-chip
second-level cache has a global miss rate of 4 %. If each additional 512 KiB of cache lowered the global
miss rate by 0.7 %, and the cache had a total access time of 50 cycles, how big would the cache have to
be to match the performance of the second-level direct-mapped cache listed in the table? Of the
eight-way set-associative cache?
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/
CPE 431/531 Homework #5 Solution Fall 2016
m
CPItotal = 1.5 + 0.07*28 + 0.015*200 = 6.46
er as
External
co
CPItotal = CPIbase + L1miss per instruction * L2 hit + L2 miss rate * Main memory access cycles
eH w
6.46 = 1.5 + 0.07 * 50 + (0.04 0.007n) * 100 ns * 2 GHz
4.96 = 3.5 + 200(0.04 0.007n)
o.
rs e
1.46 = 200(0.04 0.007n)
ou urc
1.46
0.04 0.007n
200
1.46
0.04
o
0.007 0.007
vi y re
5.8 Mean Time Between Failures (MTBF), MEAN Time To Replacement (MTTR),a nd Mean Time to Failure
(MTTF) are useful metrics for evaluating the reliability and avilablility of a storage resource. Explore
these basic concepts by answering the questions about devices with the following metrics.
ed d
MTTF MTTR
ar stu
3 Years 1 Day
5.8.3 What happens to availability as the MTTR approaches 0? Is this a realistic situation?
As MTTR 0, availability approaches 1. With the emergence of inexpensive drives, having a nearly
0 replacement time for hardware is quite feasible. However, replacing fi le systems and other data
can take signifi cant time. Although a drive manufacturer will not include this time in their statistics,
it is certainly a part of replacing a disk.
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/
CPE 431/531 Homework #5 Solution Fall 2016
5.9 This Exercise examines the single error correcting, doble error detecting (SEC/DED) Hamming code.
5.9.2 Section 5.5 states that modern servcer memory modules (DIMMs) employ SEC/DED ECC to protect
each 64 bits with 8 parity bits. Compute the cost/performance ratio of this code to the code form
5.9.1. IN this case, cost is the relative parity bits needed while performance is the relative number of
errors that can be corrected. Which is better?
5.9.1 asks what isthe miminum number of parity bits required to protect a 128-bit word using the
SEC/DED code? Need to find minimum p such that 2p __ p _ d _ 1 and then add one.
Th us 9 total bits are needed for SEC/DED.
5.9.2 Th e (72,64) code described in the chapter requires an overhead of
8/64_12.5% additional bits to tolerate the loss of any single bit within 72 bits,
providing a protection rate of 1.4%. Th e (137,128) code from part a requires an
overhead of 9/128_7.0% additional bits to tolerate the loss of any single bit within
137 bits, providing a protection rate of 0.73%. Th e cost/performance of both codes
m
er as
is as follows:
(72,64) code __ 12.5/1.4 _ 8.9
co
eH w
(136,128) code __ 7.0/0.73 _ 9.6
Th e (72,64) code has a better cost/performance ratio.
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
sh is
Th
https://www.coursehero.com/file/19238016/16f-cpe431-hw6-solution/