Gpu1 - GPU Introduction
Gpu1 - GPU Introduction
Sang-Woo Jun
Spring 2019
Graphic Processing – Some History
1990s: Real-time 3D rendering for video games were becoming common
o Doom, Quake, Descent, … (Nostalgia!)
3D graphics processing is immensely computation-intensive
Also,
System Architecture Snapshot With a GPU
(2019)
GPU Memory
GDDR5: 100s GB/s, 10s of GB
(GDDR5, GPU
HBM2: ~1 TB/s, 10s of GB
HBM2,…)
NVMe
CPU DDR4 2666 MHz I/O Hub (IOH) Network
128 GB/s Interface
100s of GB
…
QPI/UPI
12.8 GB/s (QPI) PCIe
Host Memory 20.8 GB/s (UPI) 16-lane PCIe Gen3: 16 GB/s
(DDR4,…) …
Lots of moving parts!
High-Performance Graphics Memory
Modern GPUs even employing 3D-stacked memory via silicon interposer
o Very wide bus, very high bandwidth
o e.g., HBM2 in Volta
Graphics Card Hub, “GDDR5 vs GDDR5X vs HBM vs HBM2 vs GDDR6 Memory Comparison,” 2019
Massively Parallel Architecture For
Massively Parallel Workloads!
NVIDIA CUDA (Compute Uniform Device Architecture) – 2007
o A way to run custom programs on the massively parallel architecture!
OpenCL specification released – 2008
Both platforms expose synchronous execution of a massive number of
threads GPU Threads
…
GPU
CPU
CUDA Execution Abstraction
Block: Multi-dimensional array of threads
o 1D, 2D, or 3D
o Threads in a block can synchronize among themselves
o Threads in a block can access shared memory
o CUDA (Thread, Block) ~= OpenCL (Work item, Work group)
Grid: Multi-dimensional array of blocks
o 1D or 2D
o Blocks in a grid can run in parallel, or sequentially
Kernel execution issued in grid units
Limited recursion (depth limit of 24 as of now)
Simple CUDA Example
Asynchronous call
CPU side GPU side