Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (2 votes)
317 views

GPU Architecture

GPU programming allows highly parallel general-purpose computations to be run on GPU accelerators. There are several programming frameworks for GPU programming, including CUDA, OpenCL, and OpenACC. CUDA provides direct access to the GPU's instruction set for executing compute kernels. OpenCL is an open standard for parallel programming across heterogeneous platforms including CPUs and GPUs. OpenACC uses directives to help port codes to heterogeneous HPC hardware.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (2 votes)
317 views

GPU Architecture

GPU programming allows highly parallel general-purpose computations to be run on GPU accelerators. There are several programming frameworks for GPU programming, including CUDA, OpenCL, and OpenACC. CUDA provides direct access to the GPU's instruction set for executing compute kernels. OpenCL is an open standard for parallel programming across heterogeneous platforms including CPUs and GPUs. OpenACC uses directives to help port codes to heterogeneous HPC hardware.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

GPU and Programing

• Questions you should ask yourself,


before starting to code or optimize
Will my code run faster on the GPU?
Is my existing code running as fast as it should?
Is performance limited by computations or memory bandwidth?
Pen-and-pencil calculations can (often) answer such questions
GPU vs CPU
• A Central Processing Unit (CPU) is a latency-optimized general
purpose processor that is designed to handle a wide range of distinct
tasks sequentially, while a Graphics Processing Unit (GPU) is a
throughput-optimized specialized processor designed for high-end
parallel computing.
What is a GPU?
• A Graphics Processing Unit (GPU) is a specialized processor whose job
is to rapidly manipulate memory and accelerate the computer for a
number of specific tasks that require a high degree of parallelism.
GPU Architecture
• As the GPU uses thousands of lightweight cores whose instruction
sets are optimized for dimensional matrix arithmetic and floating-
point calculations, it is extremely fast with linear algebra and similar
tasks that require a high degree of parallelism.
• As a rule of thumb, if your algorithm accepts vectorized data, the job
is probably well-suited for GPU computing.
Task
How GPU vs CPU Limitations ?
What Is GPU Computing And How Is It Applied Today?
The basic architecture of a GPU
• a GPU uses many lightweight processing cores, leverages data
parallelism, and has high memory throughput. While the specific
components will vary by model, fundamentally most modern GPUs
use single instruction multiple data (SIMD) stream architecture
What is Flynn’s Taxonomy?
• lynn’s Taxonomy is a categorization of computer architectures by
Stanford University’s Michael J. Flynn. The basic idea behind Flynn’s
Taxonomy is simple: computations consist of 2 streams (data and
instructional streams) that can be processed in sequence(1 stream at
a time) or in parallel (multiple streams at once). Two data streams
with two possible methods to process them leads to the 4 different
categories in Flynn’s Taxonomy. Let’s take a look at each.
Single Instruction Single Data (SISD)
• SISD stream is an architecture where a single instruction stream (e.g.
a program) executes on one data stream. This architecture is used in
older computers with a single-core processor, as well as many simple
compute devices.
Single Instruction Multiple Data (SIMD)
• A SIMD stream architecture has a single control processor and
instruction memory, so only one instruction can be run at any given
point in time. That single instruction is copied and ran across each
core at the same time. This is possible because each processor has its
own dedicated memory which allows for parallelism at the data-level
(a.k.a. “data parallelism”).
• The fundamental advantage of SIMD is that data parallelism allows it
to execute computations quickly (multiple processors doing the same
thing) and efficiently (only one instruction unit).
Multiple Instruction Single Data (MISD)
• MISD stream architecture is effectively the reverse of SIMD
architecture. With MISD multiple instructions are performed on the
same data stream. The use cases for MISD are very limited today.
Most practical applications are better addressed by one of the other
architectures.
Multiple Instruction Multiple Data (MIMD)
• MIMD stream architecture offers parallelism for both data and
instruction streams. With MIMD, multiple processors execute
instruction streams independently against different data streams.
• What makes SIMD best for GPUs?
• What about SIMT?
CUDA compute hierarchy
• The processing resources in CUDA are designed to help optimize
performance for GPU use cases. Three of the fundamental
components of the hierarchy are threads, thread blocks, and kernel
grids.
CUDA memory hierarchy
• Like compute resources, memory allocation follows a specific
hierarchy in CUDA. While the CUDA compiler automatically handles
memory allocation, CUDA developers can and do program to optimize
memory usage directly. Here are the key concepts to understand
about the CUDA memory hierarchy.
• Registers
• Read-only memory
• Cache/shared memory
• L2 Cache
• Global memory
• Explain the brief history of Nvidia GPU Architecture
What is GPU Programming?
• GPU Programming is a method of running highly parallel general-
purpose computations on GPU accelerators.

• While the past GPUs were designed exclusively for computer graphics,
today they are being used extensively for general-purpose computing
(GPGPU computing) as well. In addition to graphical rendering, GPU-
driven parallel computing is used for scientific modelling, machine
learning, and other parallelization-prone jobs today.
GPU Programming APIs
• GPU understands computational problems in terms of graphical
primitives. Today there are several programming frameworks available
that operate these primitives for you under the hood, so you could
focus on the higher-level computing concepts.
CUDA
• Compute Unified Device Architecture (CUDA) is a parallel computing
platform and application programming interface (API) created by
Nvidia in 2006, that gives direct access to the GPU’s virtual instruction
set for the execution of compute kernels.
OpenCL
• While CUDA is a proprietary framework, OpenCL is an open standard
for parallel programming across heterogeneous platforms created by
the Khronos Group. OpenCL works with central processing units
(CPU), graphics processing units (GPU), digital signal processors, field-
programmable gate arrays (FPGA) and other processors or hardware
accelerators.
OpenACC
• OpenACC is a user-driven directive-based parallel programming
standard designed for scientists and engineers interested in porting
their codes to a wide variety of heterogeneous high-performance
computing (HPC) hardware platforms. The standard is designed for
the users by the users.

You might also like