Fundamentals of Accelerated Computing With CUDA Python
Fundamentals of Accelerated Computing With CUDA Python
This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using
CUDA® and the Numba compiler GPUs. You’ll work though dozens of hands-on coding exercises and, at the end of the training,
implement a new workflow to accelerate a fully functional linear algebra program originally designed for CPUs, observing
impressive performance gains. After the workshop ends, you’ll have additional resources to help you create new GPU-
accelerated applications on your own.
Learning Objectives
At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated
Python applications with CUDA and Numba:
> GPU-accelerate NumPy ufuncs with a few lines of code.
> Configure code parallelization using the CUDA thread hierarchy.
> Write custom CUDA device kernels for maximum performance and flexibility.
> Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.
Duration: 8 hours
Price: Contact us for pricing. During the workshop, each participant will have dedicated access to a
fully configured, GPU-accelerated workstation in the cloud.
Prerequisites: Basic Python competency, including familiarity with variable types, loops, conditional
statements, functions, and array manipulations. NumPy competency, including the use of
ndarrays and ufuncs. No previous knowledge of CUDA programming is required.
Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA
DLI certificate to recognize their subject matter competency and support professional
career growth.
Hardware/software Desktop or laptop computer capable of running the latest version of Chrome or Firefox.
requirements: Each participant will be provided with dedicated access to a fully configured, GPU-accelerated
workstation in the cloud.
Languages: English
Custom CUDA Kernels in > Learn CUDA’s parallel thread hierarchy and how to extend parallel program
Python with Numba possibilities.
(120 mins) > Launch massively parallel, custom CUDA kernels on the GPU.
> Utilize CUDA atomic operations to avoid race conditions during parallel execution.
Break (15 mins)
RNG, Multidimensional Grids, > Use xoroshiro128+ RNG to support GPU-accelerated Monte Carlo methods.
and Shared Memory for CUDA > Learn multidimensional grid creation and how to work in parallel on 2D matrices.
Python with Numba
> Leverage on-device shared memory to promote memory coalescing while reshaping
(120 mins) 2D matrices.
Final Review (15 mins) > Review key learnings and wrap up questions.
> Complete the assessment to earn a certificate.
> Take the workshop survey
© 2021 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, and CUDA are trademarks and/or registered
trademarks of NVIDIA Corporation in the U.S. and other countries. All other trademarks and copyrights are the property of
their respective owners. Jul21