Fetch Caches | Guide books

Fetch CachesFebruary 1993

February 1993

1993 Technical Report

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Published:01 February 1993

Bibliometrics

Abstract

For high performance, data caches must have a low miss rate and provide high bandwidth, while maintaining low latency. Larger and more complex set associative caches provide lower miss rates but at the cost of increased latency. Interleaved data caches can improve the available bandwidth, but the improvement is limited by bank conflicts and increased latency due to the switching networks required to distribute cache addresses and to route the data. We propose using a small buffer to reduce the data read latency or improve the read bandwidth of an on-chip data cache. We call the small read-only buffer a fetch cache. The fetch cache attempts to capture the immediate spatial locality of the data read reference stream by utilizing the large number of bits that can be fetched in a single access of an on-chip cache. There are two ways a processor can issue multiple instructions per cache access: the cache access can require multiple cycles (i.e. superpipelined), or multiple instructions are issued per cycle (i.e. superscalar). In the first section, we show the use of fetch caches with multi-cycle per access data caches. When there is a read hit in the fetch cache, the read request can be serviced in one cycle, otherwise the latency is that of the primary data cache. For a four line, 16 byte wide fetch cache, the hit rate ranged from 40 to 60 percent depending on the application. In the second part, we show the use of fetch caches when multi-accesses per cycle are requested. When there is a read hit in the fetch cache, a read can be satisfied by the fetch cache, while the primary cache performs another read or write request. For a four line, 16 byte wide fetch cache, the cache bandwidth increased by 20 to 30 percent depending on the application.

Cited By

Yang L and Torrellas J Optimizing primary data caches for parallel scientific applications Proceedings of the 10th international conference on Supercomputing, (141-148)

Contributors

Brian K Bray
Stanford University
- Publication Years1988 - 1993
- Publication counts8
- Citation count114
- Available for Download5
- Downloads (cumulative)2,847
- Downloads (12 months)445
- Downloads (6 weeks)102
- Average Downloads per Article569
- Average Citation per Article14
View Full Profile
Michael J Flynn
Imperial College London
- Publication Years1966 - 2013
- Publication counts167
- Citation count1,233
- Available for Download28
- Downloads (cumulative)14,043
- Downloads (12 months)2,082
- Downloads (6 weeks)384
- Average Downloads per Article502
- Average Citation per Article7
View Full Profile

Comments

Recommendations

Performance of One's Complement Caches

On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which ...
Reactive-Associative Caches
PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques

Abstract: While set-associative caches typically incur fewer misses than direct-mapped caches, set-associative caches have slower hit times. We propose the reactive-associative cache (r-a cache), which provides flexible associativity by placing most ...
Temporal instruction fetch streaming
MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. Cache access latency constraints preclude L1 instruction caches large enough to capture the application, library, and OS instruction working sets of these ...

Browse Reports

Sections

Cited By

Performance of One's Complement Caches

Reactive-Associative Caches

Temporal instruction fetch streaming

Save to Binder

Sections

Cited By

Save to Binder

Recommendations

Performance of One's Complement Caches

Reactive-Associative Caches

Temporal instruction fetch streaming