Google Scholar

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org

ACM SIGARCH Computer Architecture News, 2014•dl.acm.org

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption.
A key component of this is a shared unified address space between the heterogeneous units
to obtain the programmability benefits of virtual memory. To this end, we are the first to
explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside
Buffers (TLBs) and page table walkers (PTWs) for address translation in unified …

To this end, we are the first to explore GPU Memory Management Units(MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) for address translation in unified heterogeneous systems. We show the performance challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1 parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15\% of runtime). We presume this initial design leaves room for improvement but anticipate that our bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this fruitful area.

ACM Digital Library

Show moreShow less

Save Cite Cited by 199 Related articles All 13 versions

Cite

Advanced search

Saved to My library

Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces