Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces

B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org
B Pichai, L Hsu, A Bhattacharjee
ACM SIGARCH Computer Architecture News, 2014dl.acm.org
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption.
A key component of this is a shared unified address space between the heterogeneous units
to obtain the programmability benefits of virtual memory. To this end, we are the first to
explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside
Buffers (TLBs) and page table walkers (PTWs) for address translation in unified …
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory.
To this end, we are the first to explore GPU Memory Management Units(MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) for address translation in unified heterogeneous systems. We show the performance challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1 parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15\% of runtime). We presume this initial design leaves room for improvement but anticipate that our bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this fruitful area.
ACM Digital Library