DAA Mini Project-1

KJ's Educational Institute
KJ COLLEGE OF ENGINEERING AND

MANAGEMENT RESEARCH
Affiliated to Savitribai Phule Pune University (SPPU), Approved by Govt. of Maharashtra, Recognized by AICTE, New Delhi.
DEPARTMENT OF COMPUTER ENGINEERING

YEAR 2023-24
Semester – VII
Design and Analysis of Algorithms

[410241]
BE-A COMPUTER ENGG.
Project Title: “Matrix multiplication & multithreaded matrix multiplication

with either one thread per row or one thread per cell. Analyze and compare
their performance” Done By:
Name Roll No
Atharva Nagewade B16
Aditya Mohite B11
1
KJ COLLEGE OF ENGINEERING AND
MANAGEMENT RESEARCH
DEPARTMENT OF COMPUTER ENGINEERING
CERTIFICATE
This is certify that the project entitled
“Matrix multiplication & multithreaded matrix multiplication with either

one thread per row or one thread per cell. Analyze and compare their
performance”
submitted by
Shubham Dhaigude 22(A)

Sahil Divekar 28 (A)
Pranit Jagtap 49(A)
Akshay Kaygude 56(A)
is a record of bonafide work carried out by them, in the partial fulfilment of the
requirement for the award of Degree of Bachelor of Engineering (Computer
Engineering) at KJ COLLEGE OF ENGINEERING AND MANAGEMENT
RESEARCH, Pune under the University of Pune. This work is done during year
2023-2024, under our guidance.
Date: / /
Prof. Ashwini G Kamble Dr.Nikita Kulkarni
2
ACKNOWLEDGEMENT
I take this opportunity to express my sincere gratitude and respect to KJ COLLEGE OF
ENGINEERING AND MANGEMENT RESEARCH, Pune for providing me a platform to
pursue my studies and carry out my final year project.
I would like to thank Dr. Nikita Kulkarni, Professor and Head, Department of Computer
Engineering, KJ COLLEGE OF ENGINEERING AND MANGEMENT RESEARCH, Pune
who has been a constant support and encouragement throughout the course of this project. I
consider it a privilege and honor to express my sincere gratitude to my guide. Prof Ashwini
G Kamble, Associative professor, Department of Computer Engineering, for the valuable
guidance throughout the tenure of this review.
I also extend my thanks to all the faculty of Computer Engineering who directly or indirectly
encouraged me.
3
CONTENTS
Sr. No TITLE Page no
1. Abstract 5
2. Introduction 6
3. Problem Statement 7
4. Motivation & Introduction 7
5. Objectives 8
6. Theory 9
7. Result 11
8. Conclusion 14
9. References 15
Abstract
Matrix multiplication is a fundamental operation in linear algebra, with applications in various

fields. This abstract outline a comparative analysis of traditional matrix multiplication and
multithreaded matrix multiplication, employing two distinct thread allocation strategies: one
thread per row and one thread per cell. The traditional matrix multiplication algorithm, based on
nested loops, serves as a reference point. It is efficient for small matrices but may become a
bottleneck for larger ones. On the other hand, multithreaded matrix multiplication, which
leverages the power of parallel processing, is explored with two different approaches. One thread
per row divides the work among threads based on matrix rows, while one thread per cell assigns
each thread to compute individual matrix elements. Performance metrics, including execution time
and CPU utilization, are assessed to gauge the efficiency of these methods. The comparative
analysis aims to reveal the tradeoffs between these techniques in terms of speedup and scalability,
shedding light on the optimal strategy for matrix multiplication under varying matrix sizes and
hardware configurations.
4
Introduction
Matrix multiplication is a fundamental mathematical operation widely used in various scientific

and computational applications. It involves multiplying two matrices to generate a resultant matrix
by taking the dot product of rows and columns. To optimize the performance of matrix
multiplication, multithreading techniques are often employed. In this context, two common
approaches are one thread per row and one thread per cell. One thread per row assigns a separate
thread to each row of the resulting matrix, enabling parallel computation of multiple rows
simultaneously. This method takes advantage of multi-core processors and can significantly
improve the speed of large matrix multiplications. On the other hand, one thread per cell divides
the task even further, with each element of the result matrix computed by an individual thread.
While this approach maximizes parallelism, it may introduce additional overhead due to thread
management.
Analyzing the performance of these techniques involves considerations of matrix size,
hardware, and the specific implementation. Generally, one thread per row can yield excellent
speedup for large matrices, while one thread per cell may offer better performance when dealing
with smaller matrices or highly parallel hardware configurations. Careful assessment and
benchmarking are crucial to determine the optimal approach for a given scenario.
Problem Statement
Write a program to implement Matrix multiplication & multithreaded matrix
multiplication with either one thread per row or one thread per cell. Analyze and
compare their performance.
5
Motivation
Matrix multiplication is a fundamental operation in linear algebra with various real-world

applications, motivating optimization. Multithreaded matrix multiplication using one thread per
row can maximize parallelism, as each thread handles an independent row, reducing computation
time for large matrices. On the other hand, one thread per cell minimizes context switching
overhead but may not fully exploit parallelism. Performance depends on hardware, matrix size,
and thread management, making it essential to benchmark and choose the method that best suits
the specific scenario.
6
.
Objectives
Objective 1: Matrix Multiplication

- Perform matrix multiplication to obtain the product of two matrices, optimizing for accuracy and
correctness.
- Aim to efficiently utilize memory and cache resources during the computation to minimize data
transfer overhead.
Objective 2: Multithreaded Matrix Multiplication (One Thread per Row)

- Implement multithreaded matrix multiplication with a thread dedicated to each row of the result
matrix.
- Goal is to parallelize the computation to improve performance on multi-core processors.
Objective 3: Multithreaded Matrix Multiplication (One Thread per Cell)

- Develop multithreaded matrix multiplication with a thread for each individual cell in the result
matrix.
- Objective is to explore fine-grained parallelism and evaluate its impact on speedup.
Theory
THREAD
A THREAD is the execution state of a program instance, sometimes called an independent

flow of control. Each thread runs within the context of a parent process and is characterized by:
• registers to manage code execution.
• a stack.
• scheduling properties (such as priority).
• its own set of signals.
• some thread-specific data.
METHODOLOGY
ALGORITHM:
7
• Input: Two matrices, A (dimensions MxN) and B (dimensions NxP).
• Output: A resulting matrix C (dimensions MxP).
• Check if the number of columns in matrix A is equal to the number of rows in matrix B.
If not, matrix multiplication is not possible.
• Initialize an empty result matrix C with dimensions MxP.
• Iterate through each row 'i' of matrix A:
• For each row 'i' of matrix A, iterate through each column 'j' of matrix B:
• For each element 'k' in the row 'i' of matrix A, multiply it by the element in the column 'j'
of matrix B.
• Sum the products for all elements 'k' to compute the value of C[i][j].
• Assign this computed value to the corresponding position in matrix C.
• Continue this process until all elements in the resulting matrix C have been calculated.
• The resulting matrix C contains the product of matrices A and B, where C[i][j]
represents the dot product of row 'i' of A and column 'j' of B.
•
• The algorithm is complete, and the resulting matrix C can be used for further
computations or analysis.
SOFTWARE AND HARDWARE DETAILS:

• Hardware Requirements: PIV, 2GB RAM, 500 GB HDD
• Software Requirements: Cpp/C IDE, Mingw
Matrix multiplication is a fundamental operation in linear algebra, and optimizing it is important

for various applications, especially in fields like computer graphics, scientific computing, and
machine learning. Let's discuss the theory of matrix multiplication and then explore two
multithreaded approaches: one thread per row and one thread per cell. We'll also analyze and
compare their performance.
Theory of Matrix Multiplication:

Matrix multiplication is defined for two matrices, A (of size m x n) and B (of size n x p), where
the number of columns in A must equal the number of rows in B. The result, denoted as C, is a
new matrix of size m x p, where each element C(i, j) is calculated as the dot product of the i th
row of A and the j th column of B:
8
C(i, j) = Σ(A(i, k) * B(k, j)) for k = 1 to n
This operation has a time complexity of O(mnp), making it computationally expensive for large
matrices. To optimize its performance, we can leverage multithreading.
Multithreaded Matrix Multiplication:

Multithreading can speed up matrix multiplication by dividing the task among multiple threads,
taking advantage of modern multi-core processors. Here, we'll discuss two approaches: one
thread per row and one thread per cell.
1. One Thread per Row (Parallelizing Rows):

- In this approach, each thread is responsible for computing a single row of the result
matrix C.
- Threads can work in parallel without the need for synchronization because they operate in
different rows. - The number of threads used is typically equal to the number of rows (m).
2. One Thread per Cell (Fine-Grained Parallelism):

- In this approach, each thread is responsible for computing a single cell (element) in the
result matrix C.
- Threads must access both input matrices A and B, which can lead to more complex
memory access patterns.
- Synchronization may be required when multiple threads write to the same element in C.
CODE
import numpy as np
import threading
import time
matA = np.array([[3, 7, 3, 6],

[9, 2, 0, 3],
[0, 2, 1, 7],
[2, 2, 7, 9]])
matB = np.array([[6, 5, 5, 2],

[1, 7, 9, 6],
[6, 6, 8, 9],
[0, 3, 5, 2]])
matC = np.zeros((4, 4), dtype=int)
9
def multiply_row(row): global matA, matB, matC
for j in range(4): for k in range(4): matC[row][j] +=
matA[row][k] * matB[k][j]
num_threads = 4
threads = []
start_time_single =
time.time() for i in range(4):
multiply_row(i)
end_time_single = time.time()
start_time_multi =
time.time()
for i in range(num_threads): thread =

threading.Thread(target=multiply_row, args=(i,))
threads.append(thread) thread.start()
for thread in threads:

thread.join()
end_time_multi = time.time()
print("Matrix A:") print(matA)
print("Matrix B:") print(matB)
print("Matrix C (Result of A x
B):") print(matC)
elapsed_time_single = end_time_single - start_time_single elapsed_time_multi =

end_time_multi - start_time_multi print(f"Single-threaded execution time:
{elapsed_time_single:.4f} seconds") print(f"Multithreaded execution time:
{elapsed_time_multi:.4f} seconds")
if elapsed_time_multi < elapsed_time_single:

print("Multithreaded multiplication is faster.") else: print("Single-
threaded multiplication is faster"
10
RESULT
.
Reference serial code performance:
Benchmarking is based on multiplying 100,300,500,1000,3000 and 5000-large square matrices.

Serial code execution gives the following results:
11
Native threads-based implementation:
● Use native threads to improve performance.
● Assume 12 cores machine.
● Changes are required in the code.
● Keep the algorithm intact.
❖ Note: Higher-grade optimization of matrix multiplication requires algorithm re-design. It

is advisable to consider the nature of the matrices.
Native threads-based implementation:

• Require additional routine.
• Parallelize outermost loop only.
• Assign a subset of lines per thread.
• Need thread configuration structure.

• Simple implementation - no messaging support required Threads will receive configuration
data upon creation - no need to deliver configuration data through external means (queues,
messages, etc.).
The actual working routine should:

Note: threads are created asynchronously, which means that thread create call will not block and
wait until the thread function is done. • Accept thread configuration for matrices.
12
• Work within the line limits
Benchmarking:
Benchmarking is based on multiplying 100,300,500,1000,3000 and 5000-large square matrices.

Native threads-based code gives the following results.
One Thread per Row: The implementation and performance analysis of this strategy are
discussed in detail. Benefits, challenges, and trade-offs are thoroughly examined. This includes
analyzing how well the workload is distributed among threads and how efficiently they collaborate to
compute the result. Performance metrics reveal whether this approach reduces execution time
significantly.
One Thread per Cell: Similar to the one-thread-per-row strategy, this section provides an in-
depth examination of the one-thread-per-cell approach. Its advantages, potential disadvantages,
and associated performance metrics are explored. The fine-grained parallelism of this approach
may lead to performance improvements, but potential bottlenecks and synchronization
challenges are also considered.
Conclusion
In conclusion, the choice between matrix multiplication and multithreaded matrix multiplication
with one thread per row or one thread per cell depends on the specific requirements of the task.
Traditional matrix multiplication is suitable for small to moderately sized matrices and is
generally easier to implement. However, for large matrices, multithreaded matrix multiplication
can significantly improve performance by utilizing parallel processing. When using one thread
per row, the performance gain is more evident for wide matrices, while one thread per cell is
effective for square or tall matrices. The choice ultimately hinges on the matrix size, hardware
capabilities, and the trade-off between complexity and performance, making it essential to
evaluate and select the most suitable approach based on the specific application.
13
References
• https://en.wikipedia.org/wiki/Matrix_multiplication
• https://www.sciencedirect.com/science/article/abs/pii/S0957417414004473
• https://matrix.reshish.com/multiplication.php
• https://www.geeksforgeeks.org/matrix-multiplication/
14

DAA Mini Project-1

Uploaded by

Copyright:

Available Formats

DAA Mini Project-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DAA Mini Project-1

Uploaded by

Copyright:

Available Formats

KJ's Educational Institute

KJ COLLEGE OF ENGINEERING AND

DEPARTMENT OF COMPUTER ENGINEERING

Design and Analysis of Algorithms

Project Title: “Matrix multiplication & multithreaded matrix multiplication

“Matrix multiplication & multithreaded matrix multiplication with either

Shubham Dhaigude 22(A)

Prof. Ashwini G Kamble Dr.Nikita Kulkarni

Sr. No TITLE Page no

4. Motivation & Introduction 7

Matrix multiplication is a fundamental operation in linear algebra, with applications in various

Matrix multiplication is a fundamental mathematical operation widely used in various scientific

Matrix multiplication is a fundamental operation in linear algebra with various real-world

Objective 1: Matrix Multiplication

Objective 2: Multithreaded Matrix Multiplication (One Thread per Row)

Objective 3: Multithreaded Matrix Multiplication (One Thread per Cell)

A THREAD is the execution state of a program instance, sometimes called an independent

SOFTWARE AND HARDWARE DETAILS:

Matrix multiplication is a fundamental operation in linear algebra, and optimizing it is important

Theory of Matrix Multiplication:

Multithreaded Matrix Multiplication:

1. One Thread per Row (Parallelizing Rows):

2. One Thread per Cell (Fine-Grained Parallelism):

matA = np.array([[3, 7, 3, 6],

matB = np.array([[6, 5, 5, 2],

matC = np.zeros((4, 4), dtype=int)

for i in range(num_threads): thread =

for thread in threads:

elapsed_time_single = end_time_single - start_time_single elapsed_time_multi =

if elapsed_time_multi < elapsed_time_single:

Benchmarking is based on multiplying 100,300,500,1000,3000 and 5000-large square matrices.

● Use native threads to improve performance.

● Assume 12 cores machine.

● Changes are required in the code.

● Keep the algorithm intact.

❖ Note: Higher-grade optimization of matrix multiplication requires algorithm re-design. It

Native threads-based implementation:

• Need thread configuration structure.

The actual working routine should:

Benchmarking is based on multiplying 100,300,500,1000,3000 and 5000-large square matrices.

You might also like