Lab 1 Parallel

Uploaded by

omarobeidd03

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lab 1 Parallel

Uploaded by

omarobeidd03

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Omar Obeid

Parallel Lab 1

#include <stdio.h>
#include <cuda.h>
#include <stdlib.h>
#include <time.h>

// CUDA kernel for matrix-vector multiplication

__global__ void matrixVectorMulKernel(float *matrix, float *vector, float *result, int M, int N) {
int row = blockIdx.x * blockDim.x + threadIdx.x;
if (row < M) {
float sum = 0.0;
for (int col = 0; col < N; col++) {
sum += matrix[row * N + col] * vector[col];
}
result[row] = sum;
}
}

// Function to perform matrix-vector multiplication on CPU for validation

void matrixVectorMulCPU(float *matrix, float *vector, float *result, int M, int N) {
for (int row = 0; row < M; row++) {
float sum = 0.0;
for (int col = 0; col < N; col++) {
sum += matrix[row * N + col] * vector[col];
}
result[row] = sum;
}
}

// Helper function to initialize matrix and vector with random values

void initializeMatrixAndVector(float *matrix, float *vector, int M, int N) {
srand(time(NULL));
for (int i = 0; i < M * N; i++) {
matrix[i] = (float)(rand() % 100) / 10.0;
}
for (int i = 0; i < N; i++) {
vector[i] = (float)(rand() % 100) / 10.0;
}
}
// Function to validate GPU results by comparing with CPU results
bool validateResults(float *cpuResult, float *gpuResult, int M) {
for (int i = 0; i < M; i++) {
if (fabs(cpuResult[i] - gpuResult[i]) > 1e-4) {
return false;
}
}
return true;
}

int main() {
int M, N;
printf("Enter matrix dimensions M (rows) and N (columns): ");
scanf("%d %d", &M, &N);

// Allocate memory for matrix and vectors on host (CPU)

float *h_matrix = (float*)malloc(M * N * sizeof(float));
float *h_vector = (float*)malloc(N * sizeof(float));
float *h_result_cpu = (float*)malloc(M * sizeof(float));
float *h_result_gpu = (float*)malloc(M * sizeof(float));

// Initialize matrix and vector with random values

initializeMatrixAndVector(h_matrix, h_vector, M, N);

// Allocate memory on device (GPU)

float *d_matrix, *d_vector, *d_result;
cudaMalloc(&d_matrix, M * N * sizeof(float));
cudaMalloc(&d_vector, N * sizeof(float));
cudaMalloc(&d_result, M * sizeof(float));

// Transfer matrix and vector from host to device

cudaMemcpy(d_matrix, h_matrix, M * N * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_vector, h_vector, N * sizeof(float), cudaMemcpyHostToDevice);

// Define thread and block configuration

int blockSize = 256;
int gridSize = (M + blockSize - 1) / blockSize;

// Measure GPU computation time

cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start);
// Launch kernel on GPU
matrixVectorMulKernel<<<gridSize, blockSize>>>(d_matrix, d_vector, d_result, M, N);

cudaEventRecord(stop);
cudaEventSynchronize(stop);
float gpuTime = 0;
cudaEventElapsedTime(&gpuTime, start, stop);

// Copy result from device back to host

cudaMemcpy(h_result_gpu, d_result, M * sizeof(float), cudaMemcpyDeviceToHost);

// Perform matrix-vector multiplication on CPU for validation

clock_t cpuStart = clock();
matrixVectorMulCPU(h_matrix, h_vector, h_result_cpu, M, N);
clock_t cpuStop = clock();
float cpuTime = 1000.0 * (cpuStop - cpuStart) / CLOCKS_PER_SEC;

// Validate GPU results against CPU results

if (validateResults(h_result_cpu, h_result_gpu, M)) {
printf("Validation: GPU and CPU results match!\n");
} else {
printf("Validation failed: Results do not match.\n");
}

// Print computation times

printf("GPU computation time: %.2f milliseconds\n", gpuTime);
printf("CPU computation time: %.2f milliseconds\n", cpuTime);

// Free allocated memory

free(h_matrix);
free(h_vector);
free(h_result_cpu);
free(h_result_gpu);
cudaFree(d_matrix);
cudaFree(d_vector);
cudaFree(d_result);

return 0;
}Explanation of Code

1. CUDA Kernel (matrixVectorMulKernel):

o Each GPU thread calculates one element of the resulting vector by iterating
through the corresponding row of the matrix and multiplying each element by
the vector elements.
2. CPU Function for Validation (matrixVectorMulCPU):
o This function performs matrix-vector multiplication on the CPU to validate the
GPU result. It uses nested loops to calculate each element of the output vector.
3. Data Initialization (initializeMatrixAndVector):
o Populates the matrix and vector with random values, simulating typical inputs
for the computation.
4. Memory Management:
o Allocates memory on both the CPU (host) and GPU (device) for matrix, vector,
and result vectors.
o Transfers data between the host and device as necessary.
5. Kernel Launch:
o Uses a grid and block configuration to define the number of threads needed.
Each thread computes one element of the output vector.
6. Validation (validateResults):
o Compares each element of the CPU and GPU results to check if they match
within a small error tolerance.
7. Timing:
o Measures execution times for both CPU and GPU computations to compare
performance.

Sample Output
sql
Copy code
Enter matrix dimensions M (rows) and N (columns): 1000 500
Matrix (1000x500) and Vector (size 500) successfully generated.
Validation: GPU and CPU results match!
GPU computation time: X.XX milliseconds
CPU computation time: Y.YY milliseconds
Discussion
The GPU is generally faster than the CPU due to parallel computation, particularly as matrix
dimensions increase. The grid and block configuration ensures that each GPU thread handles
one element of the output vector, maximizing parallelism and reducing computation time.
…

Exit Exam
100% (1)
Exit Exam
258 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
LP 1,,1
No ratings yet
LP 1,,1
5 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
cuda
No ratings yet
cuda
4 pages
HPC (Pra 04)
No ratings yet
HPC (Pra 04)
11 pages
PDC assignment
No ratings yet
PDC assignment
9 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
Rishi
No ratings yet
Rishi
30 pages
2023-CSC14120-Lecture01-CUDAIntroduction
No ratings yet
2023-CSC14120-Lecture01-CUDAIntroduction
32 pages
L06_GPGPU_CUDA_Programming_1
No ratings yet
L06_GPGPU_CUDA_Programming_1
23 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
CUDA PPT Anurita Unit3
No ratings yet
CUDA PPT Anurita Unit3
42 pages
Google Colab Solution Activity
No ratings yet
Google Colab Solution Activity
5 pages
Mulmatrix Cu
No ratings yet
Mulmatrix Cu
3 pages
G80 Cuda
No ratings yet
G80 Cuda
25 pages
Vector Addition
No ratings yet
Vector Addition
3 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
Threads
No ratings yet
Threads
54 pages
Allocate The Device Memory Where We Will Copy M
No ratings yet
Allocate The Device Memory Where We Will Copy M
2 pages
Hpc file
No ratings yet
Hpc file
22 pages
3-CUDA
No ratings yet
3-CUDA
5 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
Parallel Computing Lab4
No ratings yet
Parallel Computing Lab4
13 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
作业2
No ratings yet
作业2
5 pages
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
No ratings yet
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
45 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
7. Moving to Parallel - Addition of 2 Matrices
No ratings yet
7. Moving to Parallel - Addition of 2 Matrices
14 pages
CUDA
No ratings yet
CUDA
33 pages
cs239 Ejer1
No ratings yet
cs239 Ejer1
2 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
Group A Assignment 4 (A) : Two Large Vectors
No ratings yet
Group A Assignment 4 (A) : Two Large Vectors
5 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Parallel Scan in C CUda
No ratings yet
Parallel Scan in C CUda
3 pages
Ejercicio 2 Práctica 3: CUDA Desempeño en Función de La Homogeneidad para Acceder A Memoria y de La Regularidad Del Código
No ratings yet
Ejercicio 2 Práctica 3: CUDA Desempeño en Función de La Homogeneidad para Acceder A Memoria y de La Regularidad Del Código
8 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
5-computation
No ratings yet
5-computation
13 pages
CUDA
No ratings yet
CUDA
3 pages
GPU Series III CUDA Compilation Host Side 1721302802
No ratings yet
GPU Series III CUDA Compilation Host Side 1721302802
8 pages
Mamindla Sathvika Lab8
No ratings yet
Mamindla Sathvika Lab8
7 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
CUDA Programming Invert
No ratings yet
CUDA Programming Invert
36 pages
GPU_Assignment-3_Solution
No ratings yet
GPU_Assignment-3_Solution
4 pages
TP1: Converting Vector Addition To CUDA.: Listing 1 An Example of Vector Addition Implemented in C
No ratings yet
TP1: Converting Vector Addition To CUDA.: Listing 1 An Example of Vector Addition Implemented in C
1 page
DeviceFunc Cu
100% (1)
DeviceFunc Cu
1 page
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
PyCUDA AH PDF
No ratings yet
PyCUDA AH PDF
16 pages
HPC-Practical-4Addition of two large vectors
No ratings yet
HPC-Practical-4Addition of two large vectors
4 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
19 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
CUDAProgModel
No ratings yet
CUDAProgModel
24 pages
cuuda nvidai guide_Part3
No ratings yet
cuuda nvidai guide_Part3
15 pages
Basic-Cuda
No ratings yet
Basic-Cuda
49 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
Pdclab 5
No ratings yet
Pdclab 5
11 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
LAB2
No ratings yet
LAB2
4 pages
Deep Learning L4
No ratings yet
Deep Learning L4
19 pages
Image Blurring Report
No ratings yet
Image Blurring Report
2 pages
Deep Learning L5
No ratings yet
Deep Learning L5
17 pages
CSC447 Multidimensional Grids and Data
No ratings yet
CSC447 Multidimensional Grids and Data
65 pages
Semaphores and Mutexes
No ratings yet
Semaphores and Mutexes
36 pages
Lt.j-A) : TLR-D
No ratings yet
Lt.j-A) : TLR-D
4 pages
CH 1234 Summaries SE
No ratings yet
CH 1234 Summaries SE
13 pages
CSC430 L2 Sum
No ratings yet
CSC430 L2 Sum
3 pages
BTCO12107-pps
No ratings yet
BTCO12107-pps
9 pages
Chapter 02
No ratings yet
Chapter 02
18 pages
CS311 Data Structure Lab
No ratings yet
CS311 Data Structure Lab
93 pages
Arrays
No ratings yet
Arrays
17 pages
New Simple Array Questions
No ratings yet
New Simple Array Questions
50 pages
Rahman MD Matiur-2230130236-11no
No ratings yet
Rahman MD Matiur-2230130236-11no
2 pages
12th Computer Science Practical Question Papers 1&2
100% (1)
12th Computer Science Practical Question Papers 1&2
35 pages
Lecture 05-06 - Linked List
No ratings yet
Lecture 05-06 - Linked List
71 pages
312303-PROGRAMMING IN C
No ratings yet
312303-PROGRAMMING IN C
8 pages
Mini Assignment
No ratings yet
Mini Assignment
33 pages
Algorithm
No ratings yet
Algorithm
62 pages
Unit 6 Note Packet Key
No ratings yet
Unit 6 Note Packet Key
17 pages
EEE324 Digital Signal Processing V (Latest) 2
No ratings yet
EEE324 Digital Signal Processing V (Latest) 2
178 pages
Loops
No ratings yet
Loops
11 pages
10 Pointers
No ratings yet
10 Pointers
11 pages
CodeMat'22 Problem Ideas
No ratings yet
CodeMat'22 Problem Ideas
4 pages
Array
No ratings yet
Array
30 pages
GDS II Stream Format Manual 6.0 Feb87
No ratings yet
GDS II Stream Format Manual 6.0 Feb87
47 pages
Java Module2
No ratings yet
Java Module2
117 pages
CD Lexical
No ratings yet
CD Lexical
26 pages
Computer Application UG (2)
No ratings yet
Computer Application UG (2)
120 pages
Instant Download (Ebook) Programming: Principles and Practice Using C++, Third Edition by Bjarne Stroustrup PDF All Chapters
100% (9)
Instant Download (Ebook) Programming: Principles and Practice Using C++, Third Edition by Bjarne Stroustrup PDF All Chapters
65 pages
Array in Data Structure
No ratings yet
Array in Data Structure
102 pages
Abstract Data Structures
No ratings yet
Abstract Data Structures
19 pages
Untitled
No ratings yet
Untitled
5 pages
Assignment3 DDA
No ratings yet
Assignment3 DDA
3 pages
A500610 - en - Power Measurement Module 750-494 and Simatic S7 PLC
No ratings yet
A500610 - en - Power Measurement Module 750-494 and Simatic S7 PLC
46 pages
2023-24 Even Course File .XLSX - PIC D1
No ratings yet
2023-24 Even Course File .XLSX - PIC D1
3 pages
Sharmistha Chowdhury - CA1 - Advanced Algorithm
No ratings yet
Sharmistha Chowdhury - CA1 - Advanced Algorithm
11 pages