0% found this document useful (0 votes)

38 views

Lesson 17 Particle System

1. The document summarizes key lessons from particle system implementations including global synchronization using atomic operations, asynchronous copying from the host to GPU, using shared memory to cache particle data, and using texture cache to accelerate particle lookup. 2. It provides an example code to build an inverse mapping from grid cells to particles using atomic operations to increment a cell counter and write particle indices. 3. It briefly discusses using asynchronous copying between host and device memory to overlap data transfer with computation by copying one data tile asynchronously while computing on another.

Uploaded by

JM Mejia

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Lesson 17 Particle System

Uploaded by

JM Mejia

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

4/16/09

Administrative
•  Still missing some design reviews
-  Please email to me slides from presentation
L17: Lessons from -  And updates to reports

Particle System -  By Thursday, Apr 16, 5PM

•  Grading
Implementations -  Lab2 problem 1 graded, problem 2 under construction
-  Return exams by Friday AM

•  Upcoming cross-cutting systems seminar,

Monday, April 20, 12:15-1:30PM, LCR: “Technology Drivers
for Multicore Architectures,” Rajeev Balasubramonian,
Mary Hall, Ganesh Gopalakrishnan, John Regehr
•  Final Reports on projects
-  Poster session April 29 with dry run April 27
-  Also, submit written document and software by May 6
-  Invite your friends! I’ll invite faculty, NVIDIA, graduate
CS6963 students, application owners, ..
CS6963 L17: Particle Systems
2

Particle Systems Sources for Today’s Lecture

•  MPM/GIMP •  A particle system simulation in the CUDA Software
Developer Kit called particles
•  Particle animation and other special effects
•  Implementation description in /Developer/CUDA/
•  Monte-carlo transport simulation projects/particles/doc/particles.pdf
•  Fluid dynamics •  Possibly related presentation in
•  Plasma simulations http://www.nvidia.com/content/cudazone/download/
Advanced_CUDA_Training_NVISION08.pdf
•  What are the performance/implementation
challenges?
-  Global synchronization
This presentation also talks about finite differencing
-  Global memory access costs (how to reduce) and molecular dynamics.
-  Copy to/from host overlapped with computation
•  Asynchronous copies in CUDA Software Developer Kit
•  Many of these issues arise in other projects called asyncAPI
-  E.g., overlapping host copies with computation image
mosaicing
L17: Particle Systems L17: Particle Systems
CS6963 3 CS6963 4

1
4/16/09

Relevant Lessons from Particle Simulation 1. Global synchronization

1.  Global synchronization using atomic operation •  Concept:
-  We need to perform some computations on particles, and
2.  Asynchronous copy from Host to GPU others on grid cells
3.  Use of shared memory to cache particle data -  Existing MPM/GIMP provides a mapping from particles to
the grid nodes to which they contribute
4.  Use of texture cache to accelerate particle lookup -  Would like an inverse mapping from grid cells to the
particles that contribute to their result
5.  OpenGL rendering
•  Strategy:
-  Decompose the threads so that each computes results at a
particle
-  Use global synchronization to construct an inverse mapping
from grid cells to particles
-  Primitive: atomicAdd

L17: Particle Systems L17: Particle Systems

CS6963 5 CS6963 6

Example Code to Build Inverse Mapping 2. Asynchronous Copy To/From Host

•  Warning: I have not tried this, and could not find a
__device__ void addParticleToCell(int3 gridPos, uint index refers to index of lot of information on it.
index, uint* gridCounters, uint* gridCells) particle
{
•  Concept:
gridPos represents -  Memory bandwidth can be a limiting factor on GPUs
// calculate grid hash grid cell in 3-d space
uint gridHash = calcGridHash(gridPos);
-  Sometimes computation cost dominated by copy cost
gridCells is data structure -  But for some computations, data can be “tiled” and
in global memory for the computation of tiles can proceed in parallel (some of our
// increment cell counter using atomics inverse mapping projects)
int counter = atomicAdd(&gridCounters[gridHash], 1); -  Can we be computing on one tile while copying another?
counter = min(counter, params.maxParticlesPerCell-1);
// write particle index into this cell (very
What this does: •  Strategy:
uncoalesced!) Builds up gridCells as array
limited by max # particles -  Use page-locked memory on host, and asynchronous copies
gridCells[gridHash*params.maxParticlesPerCell + per grid -  Primitive cudaMemcpyAsync
counter] = index;
atomicAdd gives how many
} -  Synchronize with cudaThreadSynchronize()
particles have already been
added to this cell

L17: Particle Systems L17: Particle Systems

CS6963 7 CS6963 8

2
4/16/09

Copying from Host to Device Example of Asynchronous Data Transfer

•  cudaMemcpy(dst, src, nBytes, direction)
•  Can only go as fast as the PCI-e bus and not eligible for cudaStreamCreate(&stream1);
asynchronous data transfer
cudaStreamCreate(&stream2);
•  cudaMallocHost(…): Page-locked host memory cudaMemcpyAsync(dst1, src1, size, dir, stream1);
-  Use this in place of standard malloc(…) on the host
kernel<<<grid, block, 0, stream1>>>(…);
-  Prevents OS from paging host memory
-  Allows PCI-e DMA to run at full speed
cudaMemcpyAsync(dst2, src2, size, dir, stream2);
kernel<<<grid, block, 0, stream2>>>(…);
•  Asynchronous data transfer
-  Requires page-locked host memory

src1 and src2 must have been allocated using cudaMallocHost

stream1 and stream2 identify streams associated with asynchronous
call (note 4th “parameter” to kernel invocation)

L17: Particle Systems L17: Particle Systems

CS6963 9 CS6963 10

Particle Data has some Reuse Code from Oster presentation

•  Two ideas: •  Newtonian mechanics on point masses:
-  Cache particle data in shared memory (3.)
struct particleStruct{
-  Cache particle data in texture cache (4.)
float3 pos;
float3 vel;
float3 force;
};

pos = pos+ vel*dt

vel = vel+ force/mass*dt

11
L17: Particle Systems L17: Particle Systems
CS6963 L2:Introduction to CUDA CS6963
11 12

3
4/16/09

3. Cache Particle Data in Shared Memory 4. Use texture cache for read-only data
__shared__ float3 s_pos[N_THREADS]; •  Texture memory is special section of device global
memory
__shared__ float3 s_vel[N_THREADS];
-  Read only
__shared__ float3 s_force[N_THREADS];
-  Cached by spatial location (1D, 2D, 3D)
int tx= threadIdx.x;
•  Can achieve high performance
idx= threadIdx.x+ blockIdx.x*blockDim.x;
-  If reuse within thread block so access is cached
s_pos[tx] = P[idx].pos;
-  Useful to eliminate cost of uncoalesced global memory
s_vel[tx] = P[idx].vel; access
s_force[tx] = P[idx].force; •  Requires special mechanisms for defining a texture,
__syncthreads(); and accessing a texture
s_pos[tx] = s_pos[tx] + s_vel[tx] * dt;
s_vel[tx] = s_vel[tx] + s_force[tx]/mass * dt;
P[idx].pos= s_pos[tx];
P[idx].vel= s_vel[tx];
L17: Particle Systems L17: Particle Systems
CS6963 13 CS6963 14

Using Textures: from Finite Difference Example Use of Textures in Particle Simulation
•  Declare a texture ref •  Macro determines whether texture is used
texture<float, 1, …> fTex; a. Declaration of texture references in
particles_kernel.cu
•  Bind f to texture ref via an array
#if USE_TEX
cudaMallocArray(fArray,…) // textures for particle position and velocity
cudaMemcpy2DToArray(fArray, f, …);
cudaBindTextureToArray(fTex, fArray…); texture<float4, 1, cudaReadModeElementType> oldPosTex;
texture<float4, 1, cudaReadModeElementType> oldVelTex;
•  Access with array texture functions
f[x,y] = tex2D(fTex, x,y); texture<uint2, 1, cudaReadModeElementType> particleHashTex;
texture<uint, 1, cudaReadModeElementType> cellStartTex;

texture<uint, 1, cudaReadModeElementType> gridCountersTex;

texture<uint, 1, cudaReadModeElementType> gridCellsTex;
L17: Particle Systems #endif L17: Particle Systems
CS6963 15 CS6963 16

4
4/16/09

Use of Textures in Particle Simulation Use of Textures in Particle Simulation

b. Bind/Unbind Textures right before kernel invocation c. Texture fetch (hidden in a macro)
#if USE_TEX
CUDA_SAFE_CALL(cudaBindTexture(0, oldPosTex, oldPos,
numBodies*sizeof(float4))); ifdef USE_TEX
CUDA_SAFE_CALL(cudaBindTexture(0, oldVelTex, oldVel, #define FETCH(t, i) tex1Dfetch(t##Tex, i)
numBodies*sizeof(float4)));
#endif #else
#define FETCH(t, i) t[i]
reorderDataAndFindCellStartD<<< numBlocks, numThreads
>>>((uint2 *) particleHash, (float4 *) oldPos, (float4 *) oldVel, #endif
(float4 *) sortedPos, (float4 *) sortedVel, (uint *) cellStart);

#if USE_TEX •  Here’s an access in particles_kernel.cu

CUDA_SAFE_CALL(cudaUnbindTexture(oldPosTex));
float4 pos = FETCH(oldPos, index);
CUDA_SAFE_CALL(cudaUnbindTexture(oldVelTex));
#endif
L17: Particle Systems L17: Particle Systems
CS6963 17 CS6963 18

5. OpenGL Rendering OpenGL Interoperability

•  OpenGL buffer objects can be mapped into the •  Register a buffer object with CUDA
CUDA address space and then used as global -  cudaGLRegisterBufferObject(GLuintbuffObj);
memory -  OpenGL can use a registered buffer only as a source
-  Vertex buffer objects -  Unregister the buffer prior to rendering to it by OpenGL
-  Pixel buffer objects
•  Map the buffer object to CUDA memory
•  Allows direct visualization of data from -  cudaGLMapBufferObject(void**devPtr, GLuintbuffObj);
computation -  Returns an address in global memory Buffer must be registered
-  No device to host transfer prior to mapping
-  Data stays in device memory –very fast compute / viz •  Launch a CUDA kernel to process the buffer
-  Automatic DMA from Tesla to Quadro (via host for now) •  Unmap the buffer object prior to use by OpenGL
•  Data can be accessed from the kernel like any other global -  cudaGLUnmapBufferObject(GLuintbuffObj);
data (in device memory)
•  Unregister the buffer object
-  cudaGLUnregisterBufferObject(GLuintbuffObj);
-  Optional: needed if the buffer is a render target
L17: Particle Systems •  Use the buffer object in OpenGL code
L17: Particle Systems
CS6963 19 CS6963 20

5
4/16/09

Final Project Presentation Final Remaining Lectures

•  Dry run on April 27 •  This one:
-  Easels, tape and poster board provided •  Particle Systems
-  Tape a set of Powerpoint slides to a standard 2’x3’ poster,
or bring your own poster.
•  April 20
•  Sorting
•  Final Report on Projects due May 6
-  Submit code •  April 22
-  And written document, roughly 10 pages, based on earlier -  ?
submission. -  Would like to talk about dynamic scheduling?
-  In addition to original proposal, include -  If nothing else, following paper:
-  Project Plan and How Decomposed (from DR) “Efficient Computation of Sum-products on GPUs Through Software-
-  Description of CUDA implementation Managed Cache,” M. Silberstein, A. Schuster, D. Geiger, A. Patney, J.
Owens, ICS 2008.
-  Performance Measurement
http://www.cs.technion.ac.il/~marks/docs/SumProductPaper.pdf
-  Related Work (from DR)

L17: Particle Systems L17: Particle Systems

CS6963 21 CS6963 22

Automotive Mechanics / S. Srinivasan
No ratings yet
Automotive Mechanics / S. Srinivasan
1 page
GPU Compute
100% (1)
GPU Compute
58 pages
CUDA Physx Fluids - Harris
No ratings yet
CUDA Physx Fluids - Harris
46 pages
Gaddis Python 4e Chapter 02
100% (1)
Gaddis Python 4e Chapter 02
60 pages
CH 2 History of Tourism
100% (3)
CH 2 History of Tourism
32 pages
Chapter 1 Big Data Development Trend and Kunpeng Big Data Solution
No ratings yet
Chapter 1 Big Data Development Trend and Kunpeng Big Data Solution
534 pages
Chapter 1 Project Background
100% (1)
Chapter 1 Project Background
2 pages
Components of An ICT System
No ratings yet
Components of An ICT System
8 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
Anisimova VulkanAndAnimation.4pp PDF
No ratings yet
Anisimova VulkanAndAnimation.4pp PDF
5 pages
CUDA
No ratings yet
CUDA
33 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Mega Particles Slides
No ratings yet
Mega Particles Slides
46 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
chapter-8
No ratings yet
chapter-8
58 pages
CGR microproject 45-48
No ratings yet
CGR microproject 45-48
12 pages
07_gpuarch
No ratings yet
07_gpuarch
73 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
CSE_lec4_cuda
No ratings yet
CSE_lec4_cuda
91 pages
Class4 Advanced Cuda Opencl
No ratings yet
Class4 Advanced Cuda Opencl
64 pages
Lecture 17-Introduction to GPU
No ratings yet
Lecture 17-Introduction to GPU
36 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Lec 1
No ratings yet
Lec 1
27 pages
Lec 14
No ratings yet
Lec 14
52 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
20CD503 - Computer Graphics and Animation
No ratings yet
20CD503 - Computer Graphics and Animation
2 pages
4. CUDA Programming
No ratings yet
4. CUDA Programming
35 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Hardware-Based Simulation and Collision Detection For
No ratings yet
Hardware-Based Simulation and Collision Detection For
9 pages
14 Parallel Algorithms CUDA Basics s20
No ratings yet
14 Parallel Algorithms CUDA Basics s20
89 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Parallel ProgrammingSyllabus
No ratings yet
Parallel ProgrammingSyllabus
2 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
1 Cuda
100% (1)
1 Cuda
173 pages
07 cmsc416 Cuda
No ratings yet
07 cmsc416 Cuda
26 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Gpu Programming
100% (2)
Gpu Programming
96 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
06_gpuarch
No ratings yet
06_gpuarch
78 pages
2023-CSC14120-Lecture01-CUDAIntroduction
No ratings yet
2023-CSC14120-Lecture01-CUDAIntroduction
32 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
sc09 Fluid Sim Cohen
No ratings yet
sc09 Fluid Sim Cohen
33 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
No ratings yet
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
45 pages
Vdecyk Adaptable Pic Gpus
No ratings yet
Vdecyk Adaptable Pic Gpus
15 pages
Lesson 13 Review For Midterm
No ratings yet
Lesson 13 Review For Midterm
3 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
6963 Midterm Review
No ratings yet
6963 Midterm Review
20 pages
GPU Programming: CUDA
No ratings yet
GPU Programming: CUDA
29 pages
05 schedulingGraphicsPipelineWithNotes BPS2011 Ragankelley
No ratings yet
05 schedulingGraphicsPipelineWithNotes BPS2011 Ragankelley
52 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Gaddis Python 4e Chapter 12
No ratings yet
Gaddis Python 4e Chapter 12
23 pages
On Biomes
No ratings yet
On Biomes
11 pages
On Individusl Differences
No ratings yet
On Individusl Differences
10 pages
On Metacognition
No ratings yet
On Metacognition
20 pages
Gaddis Python 4e Chapter 12
No ratings yet
Gaddis Python 4e Chapter 12
23 pages
Gaddis Python 4e Chapter 05
100% (1)
Gaddis Python 4e Chapter 05
63 pages
Gaddis Python 4e Chapter 01
100% (1)
Gaddis Python 4e Chapter 01
30 pages
Gaddis Python 4e Chapter 03
No ratings yet
Gaddis Python 4e Chapter 03
39 pages
Our Shrinking World: Computers & The Global Economy
No ratings yet
Our Shrinking World: Computers & The Global Economy
25 pages
Chapter1 2
No ratings yet
Chapter1 2
10 pages
Midterm Faqs: - We Are Strict When Grading The Syntax
No ratings yet
Midterm Faqs: - We Are Strict When Grading The Syntax
4 pages
Lecture 2 Cell Referencing
No ratings yet
Lecture 2 Cell Referencing
17 pages
Approaches To Tourism Planning
No ratings yet
Approaches To Tourism Planning
10 pages
Orca Share Media1547443144986
100% (1)
Orca Share Media1547443144986
5 pages
Group17 - ITPM 101 Project Proposal PHASE 1 2
100% (1)
Group17 - ITPM 101 Project Proposal PHASE 1 2
7 pages
Guideline OpenVPN Access Server - GoogleAuthApp PDF
No ratings yet
Guideline OpenVPN Access Server - GoogleAuthApp PDF
18 pages
Full Computer Graphics Programming in OpenGL With Java 2nd Edition V. Scott Gordon PDF All Chapters
100% (7)
Full Computer Graphics Programming in OpenGL With Java 2nd Edition V. Scott Gordon PDF All Chapters
62 pages
Paper) : Maximum Marks:5 Questions
No ratings yet
Paper) : Maximum Marks:5 Questions
4 pages
game.log
No ratings yet
game.log
28 pages
Taimorr Ghaffar cv
No ratings yet
Taimorr Ghaffar cv
1 page
Configuring Tivoli Workload Scheduler in A Firewalled Network
No ratings yet
Configuring Tivoli Workload Scheduler in A Firewalled Network
20 pages
Docker Commands - GeeksforGeeks
No ratings yet
Docker Commands - GeeksforGeeks
6 pages
A Conceptual Framework For Using Geospatial Big Data For Web Mapping
No ratings yet
A Conceptual Framework For Using Geospatial Big Data For Web Mapping
14 pages
Multimedia-Web Technology
No ratings yet
Multimedia-Web Technology
2 pages
API For Challan
No ratings yet
API For Challan
8 pages
De 6 Sem Report Designing 02
No ratings yet
De 6 Sem Report Designing 02
19 pages
Gps - MBD ????: Geometrical Product Specification - Model-Based-Definition
No ratings yet
Gps - MBD ????: Geometrical Product Specification - Model-Based-Definition
17 pages
Buy Iphone 13 - Apple
No ratings yet
Buy Iphone 13 - Apple
1 page
3 PDF
No ratings yet
3 PDF
79 pages
Itcp Lab Manual
No ratings yet
Itcp Lab Manual
222 pages
CSM Workshop Handout - Vijay Bandaru - Version 7.6 (Jun 2022)
No ratings yet
CSM Workshop Handout - Vijay Bandaru - Version 7.6 (Jun 2022)
125 pages
A Survey of Machine Learning Methods For Iot and Their Future Applications
No ratings yet
A Survey of Machine Learning Methods For Iot and Their Future Applications
5 pages
SPSS Statistics V 17 SPSS Statistics-Python Integration Plug-In Installation Instructions
No ratings yet
SPSS Statistics V 17 SPSS Statistics-Python Integration Plug-In Installation Instructions
3 pages
JRTW
No ratings yet
JRTW
5 pages
Example 2
0% (1)
Example 2
15 pages
Introduction To Programmable Logic Devices
100% (1)
Introduction To Programmable Logic Devices
30 pages
2yj S4CLD2302 BPD en XX
No ratings yet
2yj S4CLD2302 BPD en XX
13 pages
8051901443791812-Artificial Intelligence Sustainable Farming Presentation
No ratings yet
8051901443791812-Artificial Intelligence Sustainable Farming Presentation
29 pages