Welcome to Scribd!

0% found this document useful (0 votes)

1 views

CENG443_2023_Final

Uploaded by

flyphofly

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

CENG443_2023_Final

Uploaded by

flyphofly

0% found this document useful (0 votes)

1 views4 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

1 views4 pages

CENG443_2023_Final

Uploaded by

flyphofly

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 4

Search inside document

Name: 12.01.

2023
Student ID:

CENG443 Final Exam - Fall 2023

(90 minutes)

Q.1 (20 points) A kernel performs 36 floating-point operations and 7 32-bit word global memory accesses
per thread. For each of the following device properties, indicate whether this kernel is compute- or memory-
bound. Explain your answer.

a) Peak FLOPS= 200 GFLOPS, Peak Memory Bandwidth= 100 GB/s

36 FLOPs
7*4 = 28 bytes

200/36 = 5.5
100/28 = 3.5

Memory-bound

b) Peak FLOPS= 300 GFLOPS, Peak Memory Bandwidth= 250 GB/s

300/36 = 8.3
250/28= 8.9

Compute-bound
Q.2 (30 Points) Assume that we have a GPU device that can overlap kernel executions in different streams
and the following code with k1, k2, k3, k4 kernel functions, f1 CPU function, where each function execution
takes similar time.

int n_streams = 4;
cudaStream_t streams[4];
//memory allocations and initializations
for (int i = 0; i < n_streams; i++) {
k1<<<grid, block, 0, streams[i]>>>();
k2<<<grid, block, 0, streams[i]>>>();
k3<<<grid, block>>>();
k4<<<grid, block, 0, streams[i]>>>();
f1();
}

Show an example execution order of the functions in different streams and CPU. Make sure your order
includes at least one overlapped kernel execution. For your execution order, use a presentation as the
following (The given ordering is just an example here):

Time

default stream k1 k2 k3

streams[0] k1 k2 k3

streams[1] k1 k2 k3

streams[2] .. .. ..

streams[3] .. .. ..

CPU .. .. ..

Show how your execution order changes if f1 function is a kernel function and launched as the following:

f1<<<grid, block>>>();

Any valid sequence

Q.3 (20 Points) Consider the vector addition kernel below and assume that the size of A, B, and C is 20,000
elements each.

__global__
void vecAddKernel(float* A, float* B, float* C, int n) {
int i = threadIdx.x + 2*blockDim.x * blockIdx.x;
if(i<n) C[i] = A[i] + B[i];
i += blockDim.x;
if(i<n) C[i] = A[i] + B[i];
}

int vectAdd(float* A, float* B, float* C, int n) {

...
int size = n * sizeof(float);
cudaMalloc((void **) &A_d, size);
cudaMalloc((void **) &B_d, size);
cudaMalloc((void **) &C_d, size);
cudaMemcpy(A_d, A, size, cudaMemcpyHostToDevice);
cudaMemcpy(B_d, B, size, cudaMemcpyHostToDevice);

vecAddKernel<<<ceil(n/(2*1024.0)), 1024>>>(A_d, B_d, C_d, n);

cudaMemcpy(C, C_d, size, cudaMemcpyDeviceToHost);

...
}

a) How many thread blocks will be generated?

b) How many warps are there in each block?

c) How many threads will be created in the grid?

10240

d) Is there any control divergence during the execution of the kernel? If so, identify the block number that
causes the control divergence.

Yes, block number 9

Q.4 (30 Points) Which code is better in terms of memory coalescing if we replace KERNEL function in
main function with a) doubleKernelStride, b) doubleKernel. Explain your answer with example
memory access scenarios.

__global__
void doubleKernelStride(int *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int stride = gridDim.x * blockDim.x;
for (int i = idx; i < N; i += stride)
a[i] *= 2;
}

__global__
void doubleKernel(int *a, int N)
{
int idx = (threadIdx.x+blockDim.x*blockIdx.x)*2;
a[idx] *= 2;
int jdx = (threadIdx.x+blockDim.x*blockIdx.x)*2 + 1;
a[jdx] *= 2;
}

int main()
{
int N = 16384; // 2^14
int *a;

...

KERNEL<<<32, 256>>>(a, N);

...
}

doubleKernelStride

Solutions To Exercises On Parallelism and Concurrency
Document5 pages
Solutions To Exercises On Parallelism and Concurrency
1K subscribers trust me you wont regret it
No ratings yet
ECE408 2012 Practice Exam1
Document10 pages
ECE408 2012 Practice Exam1
shahdabdelrhman8x
No ratings yet
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
Document44 pages
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
yassin mechbal
No ratings yet
Matrix Mult
Document55 pages
Matrix Mult
sundeepadapa
100% (1)
Class4 Advanced Cuda Opencl
Document64 pages
Class4 Advanced Cuda Opencl
Carlangaslangas
No ratings yet
Processors
Document25 pages
Processors
chuck212
No ratings yet
06-CUDA Thread Organization
Document27 pages
06-CUDA Thread Organization
chirag
No ratings yet
Pgi Cuda Tutorial
Document58 pages
Pgi Cuda Tutorial
Maziar Irani
No ratings yet
Gate Sample Paper
Document7 pages
Gate Sample Paper
shettyashwin19
No ratings yet
Cs 60
Document80 pages
Cs 60
Sirsendu Roy
No ratings yet
20 Quiz 14
Document12 pages
20 Quiz 14
demro channel
No ratings yet
CUDA Putting It All Together
Document39 pages
CUDA Putting It All Together
omarobeidd03
No ratings yet
Gpu History and Cuda Programming Basics
Document44 pages
Gpu History and Cuda Programming Basics
Fransiskus Yoga Esa Wibowo
No ratings yet
CAF206 - Computer Networks
Document2 pages
CAF206 - Computer Networks
khushalcloud
No ratings yet
GPU Programming EE 4702-1 Final Examination: Exam Total
Document10 pages
GPU Programming EE 4702-1 Final Examination: Exam Total
moien
No ratings yet
CUDA Programming: Johan Seland Johan - Seland@sintef - No
Document76 pages
CUDA Programming: Johan Seland Johan - Seland@sintef - No
Lilly Cheerotha
No ratings yet
2011 Advanced Computer Architecture: CS/B.TECH (CSE) /SEM-4/CS-403/2011
Document7 pages
2011 Advanced Computer Architecture: CS/B.TECH (CSE) /SEM-4/CS-403/2011
Avik Mitra
No ratings yet
Indian Institute of Technology, Kharagpur: Mid-Spring Semester 2021-22
Document4 pages
Indian Institute of Technology, Kharagpur: Mid-Spring Semester 2021-22
Utkarsh Patel
No ratings yet
CSF204 CS214 OS Garima End Garima Verma M
Document2 pages
CSF204 CS214 OS Garima End Garima Verma M
shivangi Singh
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
Document25 pages
ECE408 S19 ZJUI Exam1 Study Guide
Haider ali
No ratings yet
G80 Cuda
Document25 pages
G80 Cuda
Gatzo Gatzbit
No ratings yet
Future Soft
Document5 pages
Future Soft
ashokno1985
No ratings yet
LAB2
Document4 pages
LAB2
omarobeidd03
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication
Document12 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication
BagongJaruh
No ratings yet
Part 1 (Digital Design) :: CLK, A, B, C: IN STD - Logic Z 1, Z2: OUT STD - Logic)
Document6 pages
Part 1 (Digital Design) :: CLK, A, B, C: IN STD - Logic Z 1, Z2: OUT STD - Logic)
Dr-Eng Imad A. Shaheen
No ratings yet
Computer Science 2002 Question Paper
Document5 pages
Computer Science 2002 Question Paper
baluchandrashekar2008
No ratings yet
4th Sem. / Comp / IT Subject: Computer Organization
Document2 pages
4th Sem. / Comp / IT Subject: Computer Organization
mukulranag4
No ratings yet
GATE QnA
Document5 pages
GATE QnA
Gaurav More
No ratings yet
02 RTVis GPGPU CUDA
Document34 pages
02 RTVis GPGPU CUDA
Benedikt Baumgartner
No ratings yet
GPU_Assignment-3_Solution
Document4 pages
GPU_Assignment-3_Solution
Cat
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
Document121 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
eipu tu
No ratings yet
VLSI, SP Interview Questions
Document3 pages
VLSI, SP Interview Questions
itsontopayush
No ratings yet
First Term Examination 2015-16 Subject: Computer Science Grade: XII Max. Marks: 70 Time: 3 Hrs General Instructions
Document9 pages
First Term Examination 2015-16 Subject: Computer Science Grade: XII Max. Marks: 70 Time: 3 Hrs General Instructions
Rohan Chacko
No ratings yet
First Quarterly Test August 2024
Document1 page
First Quarterly Test August 2024
abhaygenkery
No ratings yet
DD&CO QBANK (Vtu23)
Document3 pages
DD&CO QBANK (Vtu23)
manjufk456
No ratings yet
Week 11
Document21 pages
Week 11
Tg Wallas
No ratings yet
Sheet 1
Document2 pages
Sheet 1
nilay khare
No ratings yet
Vlsi Design
Document2 pages
Vlsi Design
XXX
No ratings yet
Discussion Questions 5
Document2 pages
Discussion Questions 5
Sudip Adhikari
No ratings yet
Lab 1 Parallel
Document4 pages
Lab 1 Parallel
omarobeidd03
No ratings yet
MTL458_Minor1_2020-21_Sem2
Document2 pages
MTL458_Minor1_2020-21_Sem2
Kinjal Anchhara
No ratings yet
Examination Papers, 2003: (Delhi)
Document12 pages
Examination Papers, 2003: (Delhi)
Arnab Chakrabarti
No ratings yet
Examination Papers, 2004: (Delhi)
Document12 pages
Examination Papers, 2004: (Delhi)
sachin sachan
No ratings yet
Class Xii Computer Science (083) : General Instructions - (I) All Questions Are Compulsory (Ii) Programming Language: C++
Document7 pages
Class Xii Computer Science (083) : General Instructions - (I) All Questions Are Compulsory (Ii) Programming Language: C++
Aditya Kumar
No ratings yet
GPU Programming: CUDA
Document29 pages
GPU Programming: CUDA
Milagros Vega
No ratings yet
Intro 2 Cuda
Document30 pages
Intro 2 Cuda
ab c
No ratings yet
Lab 05
Document5 pages
Lab 05
Mahina
No ratings yet
Assignment - 3 Solution
Document16 pages
Assignment - 3 Solution
Chandan kumar Choudhury
No ratings yet
KCS401 - OS - Final Question Bank
Document6 pages
KCS401 - OS - Final Question Bank
paridhiagarwal129
No ratings yet
Second nd_prep_com
Document3 pages
Second nd_prep_com
madan01mady
No ratings yet
II PU Quaterly Test - Sample QP
Document2 pages
II PU Quaterly Test - Sample QP
ssshivashankar902
No ratings yet
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
Document23 pages
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
daweley389
No ratings yet
Basic Elements of A Program
Document12 pages
Basic Elements of A Program
Cebrayil Huseynov
No ratings yet
Cellular Automata: Examples Using C++, QT, Python, Wxpython and Numpy
Document13 pages
Cellular Automata: Examples Using C++, QT, Python, Wxpython and Numpy
DharanaNow
No ratings yet
CH 2 Part 2
Document26 pages
CH 2 Part 2
Sam Zee
No ratings yet
2011 Eda For Vlsi Design: CS/B.TECH (ECE) (Separate Supple) /SEM-7/EC-702/2011
Document7 pages
2011 Eda For Vlsi Design: CS/B.TECH (ECE) (Separate Supple) /SEM-7/EC-702/2011
John Cater
No ratings yet
Final
Document4 pages
Final
Apinya SUTHISOPHAARPORN
No ratings yet
How Js Works: Memory Management Concurrency in JS?
Document18 pages
How Js Works: Memory Management Concurrency in JS?
Oscar Callisaya
No ratings yet
Programming Your GPU with OpenMP: Performance Portability for GPUs
From Everand
Programming Your GPU with OpenMP: Performance Portability for GPUs
Tom Deakin
No ratings yet
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Log
Document6 pages
Log
Daniel Sousa
No ratings yet
How To Install Mac With Windows
Document3 pages
How To Install Mac With Windows
Jazlan Saeed
No ratings yet
UNIX Practical
Document22 pages
UNIX Practical
[L]Akshat Modi
No ratings yet
TLE-CSS10 - 11 - q2 - wk4 - Install Operating System and Drivers For Peripherals Devices - v3
Document28 pages
TLE-CSS10 - 11 - q2 - wk4 - Install Operating System and Drivers For Peripherals Devices - v3
EVELYN GRACE TADEO
No ratings yet
Unix System Programming Module-3 RR
Document47 pages
Unix System Programming Module-3 RR
hethishe
No ratings yet
V2V OSY Super 25 Questions Solution
Document24 pages
V2V OSY Super 25 Questions Solution
sktaufik753
No ratings yet
ISCSI Target 3.3
Document5 pages
ISCSI Target 3.3
Zainuri Muhammad
No ratings yet
Rhel Advanced Linux Cheat Sheet r3v1
Document5 pages
Rhel Advanced Linux Cheat Sheet r3v1
testdottest9251
No ratings yet
Kernel Programming Master Class
Document4 pages
Kernel Programming Master Class
duke
No ratings yet
Instalando o Samba 4
Document103 pages
Instalando o Samba 4
igmotta2010
No ratings yet
Install macOS High Sierra Final On VMware PDF
Document28 pages
Install macOS High Sierra Final On VMware PDF
Dr De
No ratings yet
Vsphere Resource Management Guide: Esx 4.1 Esxi 4.1 Vcenter Server 4.1
Document120 pages
Vsphere Resource Management Guide: Esx 4.1 Esxi 4.1 Vcenter Server 4.1
Cuoi Ngua Ban Cung
No ratings yet
Mysql Config
Document19 pages
Mysql Config
Belgium Diamonds
No ratings yet
GPU Programming: CUDA
Document29 pages
GPU Programming: CUDA
Milagros Vega
No ratings yet
Cluster Support
Document32 pages
Cluster Support
Lee van der Knaap
No ratings yet
Operating System Concepts 8th Edition Chapter 1 Summary
Document2 pages
Operating System Concepts 8th Edition Chapter 1 Summary
Alfred Fred
100% (1)
Output Log
Document3 pages
Output Log
Willson Romero Huancas
No ratings yet
Dberr
Document28 pages
Dberr
sergio
No ratings yet
List of DOS Commands
Document22 pages
List of DOS Commands
Vikash Sharma
No ratings yet
Crash 7.4.5 (70456271) 20210122 163300
Document2 pages
Crash 7.4.5 (70456271) 20210122 163300
Hilmara Graterol
No ratings yet
Manage Files and Folders
Document1 page
Manage Files and Folders
michael
No ratings yet
Kali Linux Hard Disk Install
Document7 pages
Kali Linux Hard Disk Install
Ardit Mezini
No ratings yet
Module 1 - Intro To OS-1
Document37 pages
Module 1 - Intro To OS-1
Arsenio Arellano Jr
No ratings yet
Unit 3 - Operating System - WWW - Rgpvnotes.in
Document38 pages
Unit 3 - Operating System - WWW - Rgpvnotes.in
Girraj Dohare
No ratings yet
Double Take5.3
Document628 pages
Double Take5.3
kamaljoshi
No ratings yet
Img 0001 PDF
Document1 page
Img 0001 PDF
MRUNALI POWAR
No ratings yet
File Operations in Assembly Language - Mark's Blog
Document4 pages
File Operations in Assembly Language - Mark's Blog
cisco_fan
No ratings yet
Launcher Log
Document69 pages
Launcher Log
lb6486262
No ratings yet
LimitState Geo2 - Complete Installation Instruction
Document1 page
LimitState Geo2 - Complete Installation Instruction
Kristi Gonzales
No ratings yet
2021 1 OS43 중간시험문제 1
Document3 pages
2021 1 OS43 중간시험문제 1
nhjbest22
No ratings yet