In this thesis, we present a cache-based mediaprocessor memory system that requires minimal programmer input, but can transfer data between processor and main memory as efficiently as a DMA controller. We have tested this new memory system with a set of multimedia functions. The data transfer is efficient for three reasons. First, it utilizes a hardware prefetcher to tolerate main memory latency. Second, it transfers both input and output data in blocks to minimize page misses in the of chip memory and sustain a high main memory throughput. Third, a no-write-allocate write-miss policy is used to efficiently utilize main memory bandwidth. The simulation results show that our cache-based memory system, on average, reduces execution time by 53.9% compared to a baseline cache-based architecture. In comparison, the average execution time reduction of a DMA controller over the baseline architecture was 56.0%. The thesis is that cache-based mediaprocessors are desirable due to their simple programming paradigm despite of lower performance, and the efficient main memory data transfer of a DMA-based mediaprocessor can be effectively modeled and incorporated into cache-based mediaprocessors. The efficient main memory data transfer can be achieved with a cache-based memory system by complementing it with a prefetching cache, a blocking write buffer, and a no-write-allocate write-miss policy to tolerate memory latency and better utilize memory bandwidth. The hardware cost of such a cache-based memory system is expected to be comparable to the cost of a DMA-based memory system.
Index Terms
- A cache-based prefetching memory system for mediaprocessors
Recommendations
Effective cache prefetching on bus-based multiprocessors
Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a shared-memory multiprocessor. Prefetching ...
An intelligent cache system with hardware prefetching for high performance
We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. The proposed cache, which we call a selective-mode intelligent (SMI) cache, consists of three parts: a ...