GPU Programming EE 4702-1 Final Examination: Name Solution
GPU Programming EE 4702-1 Final Examination: Name Solution
GPU Programming EE 4702-1 Final Examination: Name Solution
GPU Programming
EE 4702-1
Final Examination
Monday, 5 December 2016 17:30–19:30 CST
Good Luck!
Problem 1: [20 pts] The geometry shader code below is based on the
solution to Homework 6 Problem 2 in which a curved link is rendered by
having the vertex shader compute points and vectors related to a curve
and by having the geometry shader, whose input primitive type is a line
strip, emitting a cylinder between the locations described by its two input
vertices. Input In[].t is the position along the curve, with 0 ≤ t ≤ 1.
Add code to the shader so that it draws rings around the cylinder. The inner radius should be the same as
the cylinder, the outer radius should be twice the cylinder radius. Do not emit a ring at the link endpoints
(the parts that touch the spheres). See the illustration to the upper right.
Complete the code so that it draws the rings. Can use suggested and other reasonable abbreviations.
Set normal e properly don’t forget to set vertex e and gl Position.
The code should be reasonably efficient. Don’t overlap primitives.
Solution appears below. The solution code is in the repo in directory hw/gpup/2016/fe.
void gs_main_2() {
const float rad = tex_rad[In[1].iid].z, sides = sides_rad.x;
for ( int j=0; j<=sides; j++ ) {
const float theta = j * ( 2 * M_PI / sides );
vec3 vect0 = cos(theta) * In[0].norm_e + sin(theta) * In[0].binorm_e;
vec3 vect1 = cos(theta) * In[1].norm_e + sin(theta) * In[1].binorm_e;
2
Problem 2: [20 pts] Appearing below is a typical vertex shader.
(a) Suppose that the shader above was used in a rendering pass of v vertices. No buffer objects were used
and all shader inputs were sourced from client arrays. Compute the amount of data sent from the CPU to
the GPU for the rendering pass.
CPU to GPU data for a rendering pass of v vertices counting shader inputs is:
Short Answer: counting gl Vertex, gl Normal, gl MultiTexCoord0, and gl Color the amount of data is
CPU to GPU data for a rendering pass of v vertices counting uniform variables is:
Short Answer: counting gl ModelViewProjectionMatrix, gl ModelViewMatrix, and gl NormalMatrix the amount
of data is 4(42 + 42 + 32 ) B = 164 B.
Explanation: Uniforms are sent once.
3
Problem 2, continued: The vertex shader below is used for an instanced draw of spheres. The vertex
shader inputs are sourced from buffer objects and other buffer objects are accessed directly using the instance
ID.
(b) In the code above label the vertex shader inputs and uniform variables.
(c) Compute the amount of data sent from the CPU to the GPU for an instanced draw rendering pass
in which n instances (spheres) are rendered, each consisting of v vertices. Assume that initially all buffer
objects are on the CPU. Note that mat4 is a 4 × 4 matrix of floats.
Amount of CPU to GPU data for n instances of v vertices, accounting for uniforms, buffer objects,
and anything else that really needs to be moved.
Short Answer: 164 B for uniforms (see part a), n(43 +42 ) B = 80n B for bound buffer objects, and v(4×4+4×2) B = 24v B
for shader inputs.
Explanation: The vertex shader inputs are gl InstanceID, gl Vertex, gl MultiTexCoord0. The size of a gl Vertex is
4 × 4 B and assuming a 2-component texture coordinate the size of gl MultiTexCoord0 is 4 × 2 B. It is reasonable to assume
that the data for gl Vertex and gl MultiTexCoord0 were put in buffer objects and sent to the GPU. The size of the two
buffer objects would be v(4 × 4 + 4 × 2) B = 24v B. Input gl InstanceID is just an ID number created by the GPU software,
it’s not something sent from the CPU to the GPU, and so zero data is sent for gl InstanceID. (That is, it would be silly to send
an array of values 0, 1, . . . , n − 1 from the CPU to the GPU, the GPU could generate such values with a simple loop. All that’s
needed is the number of instances, n.)
The shader accesses two bound buffer objects, sphere transform and sphere color. Based on the layout declaration each
element of sphere transform is 4 × 4 × 4 B and each element of sphere color is 4 × 4 B. Since these arrays are indexed
using gl InstanceID it is reasonable to assume that they each have n elements. So the total amount of data for the bound buffer
objects is n(43 + 42 ) B = 80n B.
The shader accesses three uniforms, the same uniforms as are accessed in part a. So their total data is 164 B.
4
Problem 3: [15 pts] Consider our well written CUDA example (slightly simplified):
__global__ void kmain_efficient() {
const int tid = threadIdx.x + blockIdx.x * blockDim.x;
const int n_threads = blockDim.x * gridDim.x;
Let n denote the value of array_size, and let S denote the number of streaming multiprocessors (abbreviated
SMs and sometimes MPs). Assume that n is large so that there’s plenty of work to do for each thread.
(a) Determine a launch configuration in terms of n and S that uses the minimum number of blocks needed
to maximize warp (or thread) occupancy on the SMs. Do this for a device with a block size limit of 1024
threads, an SM limit of 2048 threads, and an SM limit of 16 blocks. Hint: There’s no need to use both n
and S. Use B for the block size and G for the number of blocks (grid size).
6
Problem 4: [20 pts] Consider the CUDA kernel below. Let n denote the value of array_size, B the block
size, and G the number of blocks.
(a) Compute how much data is read from global memory during the execution of the kernel. Note that each
element d_in[i] is accessed by two threads.
7
(b) The kernel below includes a declaration for shared memory intended to fix the inefficiency in the kernel
above. Complete the kernel.
// SOLUTION
__syncthreads();
sm[threadIdx.x] = p;
if ( threadIdx.x == blockDim.x - 1 ) sm[threadIdx.x+1] = d_in[h+1];
__syncthreads();
8
Problem 5: [25 pts] Answer each question below.
(a) Show a diagram illustrating the relationship between vertices and primitives in a triangle strip. Number
both the vertices and the triangles in the correct order. Show at least three triangles.
1 3 5
one three
two four
2 4 6
(b) Describe what is tested in the stencil test and what the stencil test might be used for.
9
(c) Interpolation qualifiers, flat, noperspective, and smooth, are used for the inputs of which rendering
pipeline stage? What do they do?
(d) Between which coordinate spaces does the OpenGL projection matrix transform.
(e) What is the difference between a material property (including attributes emitted with glColor3f) and a
lighted color?
10