Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
40 views

Chip Multicore Processors - Tutorial 8: Task 8.1: Performance of Snooping-Based Cache Coherency

This document contains a tutorial on cache coherency for a multicore processor system. It provides sample operations and asks the student to trace the state of the caches using the MSI protocol and an extended MOSI protocol. It also asks the student to summarize a research paper on cache coherency effects on the Intel Nehalem multicore architecture.

Uploaded by

Bobby Beaman
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Chip Multicore Processors - Tutorial 8: Task 8.1: Performance of Snooping-Based Cache Coherency

This document contains a tutorial on cache coherency for a multicore processor system. It provides sample operations and asks the student to trace the state of the caches using the MSI protocol and an extended MOSI protocol. It also asks the student to summarize a research paper on cache coherency effects on the Intel Nehalem multicore architecture.

Uploaded by

Bobby Beaman
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MUNCHEN TECHNISCHE UNIVERSITAT Lehrstuhl f ur Integrierte Systeme

Chip Multicore Processors Tutorial 8


June 19, 2013

Task 8.1: Performance of Snooping-based Cache Coherency


In this task the performance of snooping-based cache coherency is evaluated. The starting state of a system with three processor cores and their caches is depicted. Each cache entry is the state of the coherency protocol, the tag und two data words. All addresses are hexadecimal and the tag is depicted simpleed as the cahe lines base address of the cache line. Data is also simpleed. You also nd the start state of the memory. As discussed in the previous tutorial, the timing behavior and the performance depend on the coherency implementation. The given implementation for mobile systems is optimized towards power eciency, so that accesses to the external memory should be minimized. The system has the following properties: In case of a hit, no extra stall cycles are required. In case of a miss, Nmem = 40 cycles are required when the block is loaded from memory. In case another cache holds the cache block currrently, it can provide the data within Ncache = 16 to other caches. An invalidation delays the execution by Ninv = 4 cycles. A write back delays the execution by Nwb = 40 cycles. Given is following operation sequence: sequence 1: (P1) read 410 (P2) read 410 (P0) read 430 sequence 2: (P0) write 420, 42 (P2) read 424 (P2) write 424, 23 sequence 3: (P0) write 408, 7 (P2) read 408 (P0) write 408, 9

1: 2: 3:

nomenclature: (CPU) read address and (CPU) write addresse, value. a) Give the changes of the cache entries of each sequence (separately) according to the MSI protocol. Use the following tables for the changes after each operation. Furthermore, give the delay of the whole sequence on execution.

2 sequence 1 Op CPU

Index

State

Tag

Data

sequence 2 Op CPU

Index

State

Tag

Data

sequence 3 Op CPU

Index

State

Tag

Data

b) To optimize the external accesses an owner state (O) is added to the cache coherency protocol. On a write, all other cache entries should be invalidated (write-invalidate). Instead of the memory the current owner will give the data on a read access of another cache. Sketch the modied diagramm of the MOSI protocol.

Invalid

Shared

Modified

Owner

3 c) Perform the same procedure as in part a for the MOSI protocol in the following tables.

sequence 1 Op CPU

Index

State

Tag

Data

sequence 2 Op CPU

Index

State

Tag

Data

sequence 3 Op CPU

Index

State

Tag

Data

Task 8.2: Cache Coherency Example: Intel Nehalem


Read the article Memory Performance and Cache Coherency Eects on an Intel Nehalem Multiprocessor System, Daniel Molka et al., PACT 2009. Shortly describe the investigated architecture? What is decribed by the term ccNUMA? How do the information in the level 3 cache relate to the other levels and how precise is it? Shortly describe the executed benchmarks and central ndings of the article.

You might also like