Hardware-Based Simulation and Collision Detection For
Hardware-Based Simulation and Collision Detection For
Abstract
Particle systems have long been recognized as an essential building block for detail-rich and lively visual environ-
ments. Current implementations can handle up to 10,000 particles in real-time simulations and are mostly limited
by the transfer of particle data from the main processor to the graphics hardware (GPU) for rendering.
This paper introduces a full GPU implementation using fragment shaders of both the simulation and rendering of
a dynamically-growing particle system. Such an implementation can render up to 1 million particles in real-time
on recent hardware. The massively parallel simulation handles collision detection and reaction of particles with
objects for arbitrary shape. The collision detection is based on depth maps that represent the outer shape of an
object. The depth maps store distance values and normal vectors for collision reaction. Using a special texture-
based indexing technique to represent normal vectors, standard 8-bit textures can be used to describe the complete
depth map data. Alternately, several depth maps can be stored in one floating point texture.
In addition, a GPU-based parallel sorting algorithm is introduced that can be used to perform a depth sorting of
the particles for correct alpha blending.
Categories and Subject Descriptors (according to ACM CCS): I.3.1 [Computer Graphics]: Graphics processors I.3.5
[Computer Graphics]: Boundary representations I.3.7 [Computer Graphics]: Animation
1. Introduction of smaller particles decreases the overdraw and the fill rate
limitation looses importance. The bandwidth limitation now
Physically correct particle systems (PS) are designed to dominates the system. Sharing the graphics bus with many
add essential properties to the virtual world. Over the last other rendering tasks allows CPU-based PS to achieve only
decades they have been established as a valuable technique up to 10,000 particles per frame in typical real-time appli-
for a variety of applications, e.g. deformable objects like cations. A much larger number of particles can be used by
cloth [VSC01] and volumetric effects [Har03]. minimizing the amount of communication of particle data by
Dynamic PS have been introduced by [Ree83] in the con- integrating simulation and rendering on the GPU.
text of the motion picture Star Trek II. Reeves describes ba- Stateless PS, i.e. all particle data can be computed by
sic motion operations and basic data representing a particle closed form functions based on a set of start values and the
- both have not been altered much since. An implementation current time, have been implemented using vertex shaders
on parallel processors of a super computer has been done (cf. [NVI03]). However, state-preserving PS can utilize nu-
by [Sim90]. [Sim90] and [McA00] describe many of the ve- merical, iterative integration methods to compute the parti-
locity and position operations of the motion simulation also cle data from previous values and a dynamically changing
used in our PS. environment. They can be used in a much wider range of
applications.
Real-time PS are often limited by the fill rate or the CPU
to graphics hardware (GPU) communication. The fill rate is While collision reaction for particles is a fairly simple ap-
often a limiting factor when there is a high overdraw due plication of Newtonian physics, collision detection can be
to relatively large particle geometries. Using a large number a rather complex task w.r.t. the geometric representation of
the collider object. [LG98] gives a good overview on col- 2.2. Collision detection techniques
lision detection techniques for models represented as CSG,
The field of collision detection is one of the most active in re-
polygonal, parametric or implicit surfaces. There are three
cent years. Lin and Gottschalk [LG98] give a good overview
basic image-based hardware accelerated approaches to col-
on various collision detection techniques and a wide range of
lision detection based on depth buffers, stencil buffers or oc-
applications, e.g. game development, virtual environments,
clusion culling. However, all these techniques use the GPU
robotics and engineering simulation.
to generate spatial information which has to be read back
from the GPU to the CPU for further collision processing. There are three basic hardware accelerated approaches
based on depth buffers, stencil buffers and occlusion culling.
The technique presented in this paper uses the “stream All approaches are image based and thus their accuracy is
processing” paradigm, e.g. [PBMH02], to implement PS limited due to the discrete geometry representation.
simulation, collision detection and rendering completely on
the fragment processor. Thus a large number of particles Stencil buffer and depth buffer based approaches like
can be simulated using the state-preserving approach. The [BW03, HZLM02, KOLM02] use the graphics hardware to
collision detection is based on an implicit, image based generate proximity, collision or penetration information.
object boundary representation using a sequence of depth This data has to be read back to the CPU to perform col-
maps similar to [KJ01]. Several approaches are presented to lision detection and reaction. Usually, these techniques use
store one, two or six depth maps in a single texture. Storing the graphics hardware to detect pairs of objects which are
the normal vectors for collision reaction is realized using a potentially colliding. This process may be organized hierar-
texture-based normal indexing technique. chically to get either more detailed information or to reduce
the potentially colliding objects on a coarser scale.
The remainder of this paper is organized as follows. Sec-
Govindaraju et.al. [GRLM03] utilize hardware accel-
tion 2 gives an overview over works related to this paper. The
erated occlusion queries. This minimizes the bandwidth
GPU based simulation for large particle systems is described
needed for the read-back from the GPU. Again, the collision
in section 3. Collision detection on the GPU is discussed in
reaction is computed on the CPU.
section 4. Results and conclusions are given in sections 5
and 6 respectively.
2.3. Implicit representation of polygonal objects
Implicit representations of polygonal objects have advan-
2. Prior work tages in the context of collision detection, since the distance
of any 3D-point is directly given by the value of the implicit
This section describes prior works related to particle systems model representation, the so-called distance-map.
(PS) and their implementation using graphics hardware (sec-
tion 2.1). Additionally, we briefly discuss collision detection Nooruddin and Turk [NT99, NT03] introduced a tech-
approaches (section 2.2), techniques to generate implicit rep- nique to convert a polygonal model in an implicit one us-
resentations for polygonal objects (section 2.3) and the com- ing a scanline conversion algorithm. They use the implicit
pression of normal vectors (section 2.4). representation to modify the object with 3D morphological
operators.
Kolb and John [KJ01] build upon Nooruddin and Turk’s
2.1. Stateless particle systems algorithm using graphics hardware. They remove mesh arti-
facts like holes and gaps or visually unimportant portions of
Some PS have been implemented with vertex shaders on pro- objects like nested or overlapping parts. This technique gen-
grammable GPUs [NVI03]. However, these PS are stateless, erates an approximate distance map of a polygonal model,
e.g. they do not store the current positions of the particles. which is exact on the objects surface w.r.t. to the visibility of
To determine a particle’s position a closed form function for object points and the resolution of the depth buffer.
computing the current position only from initial values and
the current time is needed. As a consequence such PS can 2.4. Compression of normal vectors
hardly react to a dynamically changing environment.
For collision reaction, an efficient way to store an object’s
Particle attributes besides velocity and position, e.g. the normal, i.e. the collision normal, at a particular point on the
particle’s orientation, size and texture coordinates, have gen- object’s surface is needed. Deering [Dee95] notes, that an-
erally much simpler computation rules, e.g. they might be gular differences below 0.01 radians, yielding some 100k
calculated from a start value and a constant factor of change normal vectors, are not visually recognizable in rendering.
over time. Deering introduces a normal encoding technique which re-
quires several trigonometric function calls per normal.
So far there have been no state-preserving particle systems
fully implemented on the GPU. In our context we need a normal representation technique
which is space and time efficient. Possible decoding of the Optionally, in step 4. the particle positions can be sorted
vector components must be as cheap as possible, while the depending on the viewer distance to avoid rendering ar-
encoded data must be efficiently stored in textures. tifacts. The sorting performs several additional rendering
passes on textures that contain the particle-viewer distance
Normal maps store normal vectors explicitly and are not
and a reference to the particle itself.
space efficient. Applying an optional DXT-compression re-
sults in severe quantization artifacts (cf. [ATI03]). Then the particle positions are transferred from the posi-
tion texture to a vertex buffer and the particles are rendered
Sphere maps, cube maps and parabolic maps, commonly as point sprites, triangles or quads.
used to represent environmental information, may be used to
store normal vectors. Sphere maps heavily depend on a spe-
cific viewing direction. Cube maps build upon a 3D-index, 3.2. Particle data storage
i.e. a point position in 3-space. Parabolic maps need two tex-
The positions and velocities of all active particles are stored
tures to represent a whole sphere. Additionally, they only
in floating point textures using the three color components
utilize the inscribed circle of the texture.
as x, y and z coordinates. The texture itself is also a render
target, so it can be updated with the computed positions and
velocities. Since a texture cannot be used as input and output
2.5. Other related works
at the same time, we use a pair of these textures and a double
Green [Gre03] describes a cloth simulation using simple buffering technique (cf. figure 1). Depending on the integra-
grid-aligned particle physics, but does not discuss generic tion algorithm the explicit storage of the velocity texture can
particle systems’ problems, like allocation, rendering and be omitted (cf. section 3.3.2).
sorting of PS. The photon mapping algorithm described by
Other particle attributes like mass, orientation, size, color,
Purcell et.al. [PDC∗ 03] uses a sorting algorithm similar to
and opacity are typically static or can be computed using a
the odd-even merge sort presented in section 3.3.3. How-
simple stateless approach (cf. section 2.1). To minimize the
ever, their algorithm does not show the necessary properties
upload of static attribute parameters we introduce particle
to exploit the high frame-to-frame coherence of the particle
types. Thus the simulation of these attributes uses one fur-
system simulation.
ther texture to store the time of birth and a reference to the
particle type for each particle (cf. figure 1). To model more
complex attribute behavior, simple key-frame interpolation
3. Particle simulation on Graphics Hardware
over the age of the particle can be applied.
The following sections describe the algorithm of a state-
preserving particle system on a GPU in detail. After a brief
overview (section 3.1), the storage (section 3.2) and then the
processing of particles is described (section 3.3).
the smallest available index. The heap data structure guaran- ~vi into a normal component ~v⊥
i and a tangential component
tees that particles in use remain packed in the first portion of k
~vi the velocity after the collision can be computed applying
the textures. The following simulation and rendering steps friction ν and resilience ε:
only need to update that portion of data, then. The initial k
particle data is determined on the CPU and can use com- ~vi = (1 − ν)~vi − ε~v⊥
i
plex algorithms, e.g. probability distributions (cf. McAllister To avoid velocities from slowing down close to zero, the fric-
[McA00]). tion slow-down should not be applied if the overall velocity
A particle’s death is processed independently on the CPU is smaller than a given threshold.
and GPU. The CPU registers the death of a particle and adds Having collider with sharp edges, e.g. a height field, or
the freed index to the allocator. The GPU does an extra pass two colliders close to each other, the collision reaction might
to determine the death of a particle by the time of birth push particles into a collider. In this case a caught particle
and the computed age. The dead particle’s position is sim- ought to be pushed out in the next simulation step.
ply moved to invisible areas. As particles at the end of their
lifetime usually fade out or fall out of visible areas anyway, To handle this situation, the collision detection is done
the extra clean-up pass rarely needs to be done. twice, once with the previous and once with the expected
position P∗i based on velocity~v∗i . This allows differentiating
3.3.2. Update velocities and position between particles that are about to collide and those having
already penetrated (cf. figure 2). The latter must be pushed
Updating a particle’s velocity and position is based on the in direction of the shortest way out of the collider. This di-
Newtonian laws of motion. The actual simulation is imple- rection can be guessed from the normal component of the
mented in a fragment shader. The shader is executed for each velocity:
pixel of the render target by rendering a screen-sized quad. (
~v∗i if ~v∗i · n̂ ≥ 0 (heading outside)
The double buffer textures are alternately used as render tar- ~vi =
~v⊥
i −~vi
∗ if ~v∗ · n̂ < 0 (heading inside)
get and as input data stream, containing the velocities and i
positions from the previous time step.
There are several operations that can be used to manipu- P∗i
late the velocity (cf. [Sim90] and [McA00] for more details):
global forces (e.g. gravity, wind), local forces (attraction, re- Pi
Pi−1
pulsion), velocity dampening, and collision responses. For
our GPU-based particle system these operations need to be Pi
parameterized via fragment shader constants. Pi−2
Pi−1
A complex local force can be applied by mapping the par- P∗i P∗i−1
ticle position into a 2D or 3D texture containing flow ve-
locity vectors~vflow . Stoke’s law is used to derive a dragging Figure 2: Particle collision: a) Reaction before penetration;
force: b) Double collision with caught particle and push-back.
~Fflow = 6 π η r(~vi−1 −~vflow )
where η is the flow viscosity, r the particle radius and ~vi−1 Verlet integration cannot directly handle collision reac-
the particle’s previous velocity. tions in the way discussed above. Here position manipula-
The new velocity ~vi and position P is derived from the tions are required to implicitly change the velocity in the
accumulated global and local forces ~F using simple Euler following frames.
integration.
3.3.3. Sorting
Alternatively, Verlet integration (cf. [Ver67]) can be used
If particles are blended using a non-commutative blending
to avoid the explicit storage of the velocity by utilizing the
mode, a depth-based sorting should be applied to avoid arti-
position information Pi−2 . The great advantage is that this
facts.
technique reduces memory consumption and removes the
velocity update rendering pass. A particle system on the GPU can be sorted quite effi-
ciently with the parallel sorting algorithm "odd-even merge
Verlet integration uses a position update rule based only
sort" (cf. Batcher [Bat68]). Its runtime complexity is inde-
on the acceleration:
pendent of the data’s sortedness. Thus, a check whether all
Pi = 2Pi−1 − Pi−2 +~a∆2i data is already in sequence does not need to be performed
on the GPU, which would be rather inefficient. Additionally,
Using Euler integration, collision reaction is based on a with each iteration the sortedness never decreases. Thus, us-
change of the particle’s velocity. Splitting the velocity vector ing the high frame-to-frame coherence of the particle order,
the whole sorting sequence can be distributed over 20 - 50 2D-rotation, texture coordinates are transformed in the frag-
frames. This, of course, leads to an approximate depth sort- ment shader.
edness, which, in our examples, does not yield any visual
artifacts.
4. Collision detection
The basic principle of odd-even merge sort is to divide
In this section we describe the implicit object representa-
the data into two halves, to sort these and then to merge the
tion used for collision detection (section 4.1). Furthermore,
two halves. The algorithm is commonly written recursively,
the normal indexing technique (section 4.2) and various ap-
but a closer look at the resulting sorting network reveals its
proaches to represent depth maps in textures are introduced
parallel nature. Figure 3 shows the sorting network for eight
(section 4.3).
values. Several consecutive comparisons are independent of
each other and can be grouped for parallel execution (vertical
lines in figure 3). 4.1. Implicit model representation
We use an image-based technique similar to [KJ01] to repre-
sent an objects outer boundary by a set of depth maps. These
depth maps contain the distance to the object’s boundary and
the surface normal at the relevant object point.
Each depth map DMi , i = 1, . . . , k stores the following in-
formation:
1. distance dist(x, y) to the collider object in projection di-
Figure 3: Odd-Even sorting network for eight values; ar- rection for each pixel (x, y)
rows mark comparison pairs. 2. normal vector n̂(x, y) at the relevant object surface point
3. transformation matrix TOC→DC mapping from collider
object coordinates to depth map coordinates, i.e. the co-
The sorting requires 21 log22 n + 21 log2 n passes, where n is ordinate system in which the projection was performed
the number of elements to sort. For a 1024 × 1024 texture 4. zscale scaling value in z-direction to compensate for pos-
this leads to 210 rendering passes. Running all 210 passes sible scaling performed by TOC→DC
each frame is far too expensive, but spreading the whole sort- The object’s interior is assigned with negative distance val-
ing sequence over 50 frames, i.e. 1 - 2 seconds, reduces the ues. Assuming we look from outside onto the object and or-
workload to 4 − 5 passes each frame. thographic projection is used, the distance value f (P) for a
The sorting algorithm requires an additional texture con- point P is computed using the transformation TOC→DC :
taining the particle-viewer distance. The distance in this tex- f (P) = zscale · dist(p0x , p0y ) − p0z ,
(1)
ture is updated after the position simulation. After sorting the
rendering step looks up the particle attributes via the index where P0 = (p0x , p0y , p0z )T = TOC→DC P
in this texture. TOC→DC usually also contains the transformation to texture
coordinates. Thus fetching the depth value dist(p0x , p0y ) is a
3.3.4. Render particles simple texture lookup at coordinates (p0x , p0y ).
The copying of position data from a texture to vertex data Taking several depth maps into account, the most appro-
is an upcoming hardware feature in PC GPUs. DirectX and priate depth for point P has to be determined. The following
OpenGL offer the vertex textures technique (vertex shader definition guarantees that P is outside of the collider if at
3.0 rsp. ARB_vertex_shader extension). Unfortunately least one depth map has recognized P to be exterior:
there is no hardware supporting this feature at the moment. (
max{ fi (P)} if fi (P) < 0 ∀i
Alternatively "über-buffers" (also called super buffers; cf. f (P) =
min{ fi (P) : fi (P) > 0} else
[Per03]) can be used. This functionality is already avail-
able in current GPUs, but up to now it is only supported where fi (P) denotes the signed distance value w.r.t. depth
by the OpenGL API. The current implementation uses the map DMi .
vendor specific NV_pixel_data_range extension (cf.
Handling several depth maps DMi , i = 1, . . . , k, f (P) can
[NVI04]).
be computed iteratively:
The transferred vertex positions are used to render prim-
( f (P) < 0 ∧ fi (P) > f (P))
itives to the frame buffer. In order to reduce the workload ∨ ⇒ ( f (P) ← fi (P)) (2)
( fi (P) > 0 ∧ fi (P) < f (P))
of the vertex unit, particles are currently rendered as point
sprites instead of as triangles or quads. The disadvantage where f (P) is initially set to a large negative value, i.e. P is
though is that particles are always axis-aligned. To allow a placed “far inside” the collider.
If P0 = TOC→DC P lies outside the view volume for the cur- reflection computation. On the other hand, parabolic maps
rent depth map or the texture lookup dist(p0x , p0y ) results in show a very uniform parameterization of the hemi-sphere
the initial background value, e.g. no distance to the object (cf. Heidrich and Seidel [HS98]), but two of these textures
can be computed, this value may be ignored as invalid vote. are needed to span the whole sphere.
Alternatively, if the view volume encloses the complete col-
We propose the following mapping, which is based on the
lider object, the invalid votes can be declared as “far out-
L1 -norm: k~vk1 = |vx | + |vz | + |vz |:
side”. To avoid erroneous data due to clamping of (p0x , p0y ),
we simply keep a one pixel border in the depth map un-
s
changed with background values to indicate an invalid vote.
t if |s| + |t| ≤ 1
A fragment shader computes the distance using rule (2) 1 − |s| − |t|
and applies a collision reaction when the final f (P) distance l1 (s,t) = (3)
sgn(s)(1 − |t|)
is negative.
sgn(t)(1 − |s|) if |s| + |t| > 1
Note, that this approach may have problems with small
1 − |s| − |t|
object details in case of an insufficient buffer resolution.
Problems can also be caused by local surface concavities. where s,t ∈ [−1, 1]. l1 maps (s,t) ∈ [−1, 1]2 onto the L1 -unit
In many cases, these local concavities can be reconstructed sphere, i.e. the unit octahedron (cf. figure 4). Applying an ad-
with properly placed depth map view volumes (cf. section ditional affine transformation, we get a continuous mapping
5). of the standard texture-space (s,t) ∈ [0, 1]2 onto the octahe-
dron. The resolution of the texture-space naturally implies
the resolution of the sphere, i.e. the normal space.
4.2. Quantization and indexing of normal vectors
It should be pointed out, that the L1 -parametrization pro-
Explicitly storing normal vectors using 8, 16 or 32 bit
posed above can easily be used to represent any directional
per component is sufficient within certain error bound (cf.
data, e.g. reflection maps.
[ATI03]). Since we store unit vectors, most of the used 3-
space remains unused, though. The depth map representation
requires a technique which allows the retrieval of normal 4.3. Depth map representation
vectors using a space- and time efficient indexing method.
Ideally, we want to encode as many depth values and nor-
Indices must be encoded in depth maps and the reconstruc-
mal vectors as possible into a single texture, thus keeping
tion of the normal vector in the fragment shader must be
the amount of data to be transfered and kept in the graphics
efficient.
hardware memory as small as possible.
The normal representation, which is implemented by in-
Throughout our experiments, we have investigated the fol-
dexing into a normal index texture, should have the follow-
lowing depth map formats:
ing properties:
1. the complete coordinate space [0, 1]2 of a single texture Floating point depth map
is used The simplest, but most storage ineffient variant uses a float-
2. decoding and ideally encoding is time efficient ing point texture to store the surface normals uncompressed
3. sampling of the directional space is as regular as possible in the R, G, B-components and the alpha-channel to hold the
distance value.
t y y
x<0 x>0
y>0
z<0
y>0
z<0 8-bit fixed point depth map
x<0 x>0
y>0
z>0
y>0
z>0
This variant uses a standard RGBA-texture with 8 bit per
x<0 x>0 s component. Here the R, G-components contain the index
y<0 y<0
x<0
z>0 z>0
x>0 x x into the normal texture, whereas G, A store the depth value,
y<0 y<0
z<0 z<0 z z thus having a depth-resolution of 16-bit fixed point. The nor-
mal index texture with resolution 256 × 256 is build using
Figure 4: The eight octants of the L1 -parametrization (left), the L1 -parametrization technique described in section 4.2.
the octahedron after applying l1 (middle) and the sampling The RGB-components of this texture store the normal vec-
of the unit sphere (right). tors, which are looked up using the index stored in the depth
map.
Cube maps can not be used, since the index to look-up 16-bit floating point depth map (front-back)
the function value is already a vector with three components. Combining orthographic projection with depth compare
Sphere maps, commonly used as reflection maps, heavily de- function LESS generates a front depth map. Naturally the
pend on a specific direction, e.g. the viewing direction for the depth map taking the inverse z-direction (depth compare
Figure 5: Sample applications: “bunny in the snow” (left) and “Venus-fountain” (middle and right)
function GREATER) is a usefull back depth map counter- The placement of the depth cube w.r.t. the collider object
part. We can easily represent both of these depth maps in specifies the depth compare function for the generation of
one 16-bit texture. Here the R, G components store the nor- the depth maps. In the default situation, where the view vol-
mal texture indices, where two 8 bit indices are packed into ume’s center is outside the object, the depth compare func-
a single 16-bit float. The B, A components store the depth tion LESS is used. Otherwise, if the center is inside the col-
value for the front and back map respectively. lider object, GREATER is applied and the fragment shader
which computes the distance has to negate the distance value
8-bit fixed point cubic depth maps (“depth cube”) (cf. equation 1).
Another variant is to use cube maps to represent depth maps.
In this case perspective projection w.r.t. the view volume
center is applied. The depth map representation is analog to 5. Results
the 8-bit fixed point variant. Several tests have been made with different setups for the
Generally, the different types of depth maps can be com- particle system, e.g. number of particles, sorting, complexity
bined for collision detection within a single fragment shader, of collider etc. We discuss the general particle system (PS)
in order to utilize the advantages of the various types for the issues in section 5.1 and describe different aspects on colli-
specific collider object (cf. section 5.2). sion dection in section 5.2. Section 5.3 gives some hardware
aspects.
The depth cube variant uses perspective projection,
whereas the other variants use orthographic projection only. The presented particle system was implemented in Cg and
Using perspective projection during the depth map genera- tested on NVIDIA Geforce FX 5900 XT graphics hardware,
tion distorts the z-values. To avoid this, the vertices of the which represents the first generation of floating-point GPUs.
collider object are passed w.r.t. the normalized view volume
[−1, 1]3 to the fragment shader. The shader simply uses this 5.1. Particle simulation
information to compute the depth values dist(x, y) relative to
the center of the view volume. Using a position texture of size 1024 × 1024, our PS is ca-
pable of simultaneously rendering and updating a maximum
To compute the distance value for a point P ∈ R3 w.r.t. of 1 million particles at about 10 frames per second. This
the depth cube, the transformation TOC→DC in depth map implementation uses Euler integration, point sprites for ren-
coordinates (cf. section 4.1) does not contain the perspec- dering, no depth sorting and no collision detection.
tive projection. TOC→DC transforms into the normalized view
volume [−1, 1]3 only, thus picking the corresponding depth In a typical application a particle texture of size 512 ×
value is just a cube map texture lookup 512 can be rendered in real-time (about 15 fps) including
depth sorting (5 passes per frame) and collision detection
dist(p0x , p0y , p0z ), P0 = TOC→DC P (one depth cube). Performance measurement for the fully-
featured collision detection is given in the next section.
The distance value for a point P ∈ R3 is computed as Following the clear trend towards increasing parallelism, a
! s · p 0
x significant performance enhancement is expected with the
dist(p0x , p0y , p0z ) x
f (P) = 1 − sy · p0y , P0 = TOC→DC P forthcoming second generation of floating-point GPUs.
kP0 k
sz · p0z
Figure 5 shows two sample applications using a quarter
where sx , sy , sz are the scaling factors from the normalized million particles. In example “bunny in the snow” each par-
view volume [−1, 1]3 to the view volume’s extends in col- ticle is rendered as a snow flake, i.e. the particles velocity is
lider object coordinates. set to zero after a collision has been detected. The collision
Figure 6: Visualization of the implicit torii model (top row) and the implicit bunny model (bottom row) along slices: In-
terior/exterior classification (left) and approximate distance map plotting absolute distance values (right). Additionally, the
wireframe model is rendered.
detection uses one depth cube and one 16-bit front-back tex- Some tests have been made with deformable objects, forcing
ture. The simulation uses depth sorting and runs with 15 fps. the depth map generation to be part of the simulation pro-
The second example, the “Venus fountain”, also simulates cess. The generation of a complete depth map format with
5122 particles. The implicit object boundary is represented resolution 5122 takes about 11, 17 and 26 ms using the 8-
using three 16-bit front-back textures and one 8-bit fixed bit fixed, 16-bit front-back or the depth cube format respec-
point texture. This examples runs with 10 fps. tively. Thus deformable objects should be applied only in
combination with a small number of depth maps or particles.
5.2. Depth map based collision detection Figure 6 visualizes the depth maps for two models: Two
The main difference between the above presented depth map torii and the Stanford bunny. The torii are reconstructed us-
formats lies in the number of depth maps used and in the ing two depth cubes and two 16-bit front-back textures, giv-
projection type. In situations, where collisions occur only ing 16 depth maps in total. The bunny is captured using one
from one direction and the collider object has a rather sim- depth cube and one 16-bit front-back textures, giving 8 depth
ple shape, a single 8-bit fixed point depth map may result in maps in total.
a proper interior/exterior classification. If there is no restric-
tion to the potential collision direction, the complete collider 5.3. Hardware aspects
object has to be reconstructed. Here, either one depth cube
or three orthogonal 16-bit front-back textures are used. Con- We use a standard normal index texture with resolution 2562 .
cave regions may have to be treated using additional depth Even though the L1 -parametrization would allow any appli-
maps. cation specific resolution, the handling of n bit integers or
floats in the graphics hardware is hardly possible. Currently
Concerning distance values, depth cubes work well for we use NVIDIA’s pack/unpack functionality, which allows
sphere-like collider objects (cf. figure 6). If the model has the packing of four bytes in one 32-bit float, for example.
many concavities, is strongly twisted or is partly nested, the We would highly appreciate more functionality of this kind,
reconstruction of the distance values based on depth maps e.g. to pack 5, 5, 6-bits in a 16-bit float.
leads only to coarse approximations.
Additionally, improved integer arithmetic and modulo op-
In our experiments we use six to 15 depth maps to rep- erators would simplify the implementation of various shader
resent the collider object boundary without restriction to the functionality, e.g. the parallel sorting.
potential collision direction. Testing a quarter million parti-
cles for collision takes 7, 9 and 12 ms using the 8-bit fixed,
the depth cube or the 16-bit front-back format respectively. 6. Conclusions and future work
We mainly made experiments with rigid objects, thus per- A fully GPU based approach to realize the simulation and
forming the depth map generation in a preprocessing step. collision detection for large particle systems (PS) has been
introduced. The simulation of PS is based on the “stream virtual reality applications. In EUROGRAPHICS
processing” paradigm, using textures to represent all data Short Presentation (2001), University of Manchester,
necessary to implement a state-preserving PS and collison pp. 249–256. 2, 5
detection using fragment shaders. The collision detection is [KOLM02] K IM Y., OTADUY M., L IN M., M ANOCHA D.:
based on depth maps. These are used to reconstruct an im- Fast penetration depth computation using rasterization
plicit model of the collider objects at the time of collision hardware and hierarchical refinement. In Proceedings
detection for particles. A novel technique to represent direc- ACM SIGGRAPH/Eurographics symposium on Com-
tional data was introduced and applied to store normal vec- puter animation (2002), ACM Press, pp. 23–31. 2
tors using an indexing technique. When rendering the PS a [LG98] L IN M. C., G OTTSCHALK S.: Collision detection be-
parallel sorting algorithm can be applied to keep a proper tween geometric models: a survey. In Proceedings of
particle order for non-commutative blending modes. IMA Conference on Mathematics of Surfaces (1998),
pp. 37–56. 2
The proposed L1 -parametrization should be investigated
further, especially its applicability to represent directional [McA00] M C A LLISTER D.: The Design of an API for Particle
Systems. Tech. rep., Dep. of Computer Science, Uni-
data, e.g. reflection maps. Additionally, investigations to-
versity of North Carolina at Chapel Hill, 2000. 1, 4
wards GPU based collision detection using polygons or
more complex objects instead of particles should be made. [NT99] N OORUDDIN F., T URK G.: Simplification and re-
pair of polygonal models using volumetric techniques.
Tech. Rep. GITGVU -99-37, Georgia Institute of Tech-
References nology, Atlanta, 1999. 2
[ATI03] ATI T ECHNOLOGIES I NC .: Normal map com- [NT03] N OORUDDIN F., T URK G.: Simplification and re-
pression. Tech. rep., ATI Technologies Inc., 2003. pair of polygonal models using volumetric techniques.
http://www.ati.com/developer/techpapers.html. 3, 6 IEEE Trans. on Visualization and Computer Graph-
ics9, 2 (2003), 191–205. 2
[Bat68] BATCHER K.: Sorting networks and their applications.
In Spring Joint Computer Conference, AFIPS Proceed- [NVI03] NVIDIA C ORPORATION: NVIDIA SDK.
ings (1968), pp. 307–314. 4 http://developer.nvidia.com, 2003. 1, 2
[BW03] BACIU G., W ONG S.-K.: Image-based techniques in a [NVI04] NVIDIA C ORPORATION: OpenGL ex-
hybrid collision detector. In IEEE Trans. on Visualiza- tension EXT_pixel_buffer_object.
tion and Computer Graphics (2003), vol. 9, pp. 254– http://oss.sgi.com/projects/ogl-sample/registry/EXT/
271. 2 pixel_buffer_object.txt, 2004. 5
[PBMH02] P URCELL T., B UCK I., M ARK W. R., H ANRAHAN P.:
[Dee95] D EERING M.: Geometry compression. In ACM Pro-
Ray tracing on programmable graphics hardware. In
ceedings SIGGRAPH (1995), vol. 14, pp. 13–20. 2
ACM Proceedings SIGGRAPH (2002), vol. 21, ACM
[Gre03] G REEN S.: Stupid opengl shader tricks. Press, pp. 703–712. 2
http://developer.nvidia.com/docs/IO/8230/
[PDC∗ 03] P URCELL T., D ONNER C., C AMMARANO M.,
GDC2003_OpenGLShaderTricks.pdf, 2003. 3
J ENSEN H., H ANRAHAN P.: Photon mapping on
[GRLM03] G OVINDARAJU N., R EDON S., L IN M., M ANOCHA programmable graphics hardware. In Proceedings of
D.: Cullide: interactive collision detection be- the ACM SIGGRAPH/EUROGRAPHICS Workshop on
tween complex models in large environments using Graphics Hardware (2003), Eurographics Association,
graphics hardware. In Proceedings of the ACM pp. 41–50. 3
SIGGRAPH/EUROGRAPHICS Workshop on Graphics [Per03] P ERCY J.: OpenGL Extensions. www.ati.com/ devel-
Hardware (2003), Eurographics Association, pp. 25– oper/techpapers.html, 2003. 5
32. 2
[Ree83] R EEVES W.: Particle systems - technique for mod-
[Har03] H ARRIS M.: Real-Time Cloud Simulation and Ren- eling a class of fuzzy objects. In ACM Proceedings
dering. PhD thesis, Department of Computer Science, SIGGRAPH (1983), vol. 2, pp. 91–108. 1
University of North Carolina at Chapel Hill, 2003. 1
[Sim90] S IMS K.: Particle animation and rendering using data
[HS98] H EIDRICH W., S EIDEL H.-P.: View-independent parallel computation. In Proceedings of the 17th an-
environment maps. In Proceedings of the ACM nual conference on Computer graphics and interactive
SIGGRAPH/EUROGRAPHICS Workshop on Graphics techniques (1990), ACM Press, pp. 405–413. 1, 4
Hardware (1998), ACM Press, pp. 39–45. 6
[Ver67] V ERLET L.: Computer experiments on classical flu-
[HZLM02] H OFF K., Z AFERAKIS A., L IN M., M ANOCHA D.: ids. i. thermodynamical properties of lennard-jones
Fast 3D Geometric Proximity Queries between Rigid molecules. Physical Review 159 (1967). 4
and Deformable Models Using Graphics Hardware
[VSC01] VASSILEV T., S PANLANG B., C HRYSANTHOU Y.:
Acceleration. Tech. Rep. TR-02-004, University of
Fast cloth animation on walking avatars. In Proc. EU-
North Carolina at Chapel Hill, 2002. 2
ROGRAPHICS (2001), vol. 20, Eurographics Associa-
[KJ01] KOLB A., J OHN L.: Volumetric model repair for tion, pp. 260–267. 1