Multithreading in Python
Multithreading in Python
13 Jul 2017
Note: This article has also featured on geeksforgeeks.org . This article covers the basics of multithreading in Python
programming language.
Just like multiprocessing, multithreading is a way of achieving multitasking. In multithreading, the concept of
threads is used.
Thread
In computing, a process is an instance of a computer program that is being executed. Any process has 3 basic
components:
An executable program.
The associated data needed by the program (variables, work space, buffers, etc.)
The execution context of the program (State of process)
A thread is an entity within a process that can be scheduled for execution. Also, it is the smallest unit of processing
that can be performed in an OS (Operating System).
In simple words, a thread is a sequence of such instructions within a program that can be executed independently
of other code. For simplicity, you can assume that a thread is simply a subset of a process!
Each thread contains its own register set and local variables (stored in stack).
All thread of a process share global variables (stored in heap) and the program code.
Consider the diagram below to understand how multiple threads exist in memory:
In a simple, single-core CPU, it is achieved using frequent switching between threads. This is termed as context
switching. In context switching, the state of a thread is saved and state of another thread is loaded whenever any
interrupt (due to I/O or manually set) takes place. Context switching takes place so frequently that all the threads
appear to be running parallely (this is termed as multitasking).
Consider the diagram below in which a process contains two active threads:
Multithreading in Python
In Python, the threading module provides a very simple and intuitive API for spawning multiple threads in a
program.
t1 = threading.Thread(target=print_square, args=(10,))
t2 = threading.Thread(target=print_cube, args=(10,))
To start a thread, we use start method of Thread class.
t1.start()
t2.start()
Once the threads start, the current program (you can
think of it like a main thread) also keeps on executing. In
order to stop execution of current program until a thread
is complete, we use join method.
t1.join()
t2.join()
Now, consider the python program given below in which we print thread name and corresponding process for each
task:
import threading
1 import os
2
3 def task1():
4 print("Task 1 assigned to thread:
5 {}".format(threading.current_thread().name))
6 print("ID of process running task 1: {}".format(os.getpid()))
7
8 def task2():
9 print("Task 2 assigned to thread:
10{}".format(threading.current_thread().name))
11 print("ID of process running task 2: {}".format(os.getpid()))
12
13if __name__ == "__main__":
14
15 # print ID of current process
16 print("ID of process running main program: {}".format(os.getpid()))
17
18 # print name of main thread
19 print("Main thread name: {}".format(threading.main_thread().name))
20
21 # creating threads
22 t1 = threading.Thread(target=task1, name='t1')
23 t2 = threading.Thread(target=task2, name='t2')
24
25 # starting threads
26 t1.start()
27 t2.start()
28
29 # wait until all threads finish
30 t1.join()
t2.join()
ID of process running main program: 11758
Main thread name: MainThread
Task 1 assigned to thread: t1
ID of process running task 1: 11758
Task 2 assigned to thread: t2
ID of process running task 2: 11758
As it is clear from the output, the process ID remains same for all threads.
We use threading.main_thread() function to get the main thread object. In normal conditions, the main
thread is the thread from which the Python interpreter was started. name attribute of thread object is
used to get the name of thread.
print("Main thread name: {}".format(threading.main_thread().name))
We use the threading.current_thread() function to get the current thread object.
print("Task 1 assigned to thread:
{}".format(threading.current_thread().name))
So, this was a brief introduction to multithreading in Python. The next article in this series covers synchronization
between multiple threads.
Critical section refers to the parts of the program where the shared resource is accessed. For example, in the
diagram below, 3 threads try to access shared resource or critical section at the same time.
Concurrent accesses to shared resource can lead to race condition.
A race condition occurs when two or more threads can access shared data and they try to change it at the same
time. As a result, the values of variables may be unpredictable and vary depending on the timings of context
switches of the processes.
1 import threading
2
3 # global variable x
4 x = 0
5
6 def increment():
7 """
8 function to increment global variable x
9 """
10global x
11x += 1
12
13def thread_task():
14"""
15task for thread
16calls increment function 100000 times.
17"""
18for _ in range(100000):
19increment()
20
21def main_task():
22global x
23# setting global variable x as 0
24x = 0
25
26# creating threads
27t1 = threading.Thread(target=thread_task)
28t2 = threading.Thread(target=thread_task)
29
30# start threads
31t1.start()
32t2.start()
33
34# wait until threads finish their job
35t1.join()
36t2.join()
37
38if __name__ == "__main__":
39for i in range(10):
40main_task()
41print("Iteration {0}: x = {1}".format(i,x))
Output:
Iteration 0: x = 175005
Iteration 1: x = 200000
Iteration 2: x = 200000
Iteration 3: x = 169432
Iteration 4: x = 153316
Iteration 5: x = 200000
Iteration 6: x = 167322
Iteration 7: x = 200000
Iteration 8: x = 169917
Iteration 9: x = 153589
In above program:
Two threads t1 and t2 are created in main_task function and global variable x is set to 0.
Each thread has a target function thread_task in which increment function is called 100000 times.
increment function will increment the global variable x by 1 in each call.
The expected final value of x is 200000 but what we get in 10 iterations of main_task function is some different
values.
This happens due to concurrent access of threads to the shared variable x. This unpredictability in value of x is
nothing but race condition.
Given below is a diagram which shows how can race condition occur in above program:
Notice that expected value of x in above diagram is 12 but due to race condition, it turns out to be 11!
A semaphore is a synchronization object that controls access by multiple processes/threads to a common resource
in a parallel programming environment. It is simply a value in a designated place in operating system (or kernel)
storage that each process/thread can check and then change. Depending on the value that is found, the
process/thread can use the resource or will find that it is already in use and must wait for some period before
trying again. Semaphores can be binary (0 or 1) or can have additional values. Typically, a process/thread using
semaphores checks the value and then, if it using the resource, changes the value to reflect this so that subsequent
semaphore users will know to wait.
1 import threading
2
3 # global variable x
4 x = 0
5
6 def increment():
7 """
8 function to increment global variable x
9 """
10global x
11x += 1
12
13def thread_task(lock):
14"""
15task for thread
16calls increment function 100000 times.
17"""
18for _ in range(100000):
19lock.acquire()
20increment()
21lock.release()
22
23def main_task():
24global x
25# setting global variable x as 0
26x = 0
27
28# creating a lock
29lock = threading.Lock()
30
31# creating threads
32t1 = threading.Thread(target=thread_task, args=(lock,))
33t2 = threading.Thread(target=thread_task, args=(lock,))
34
35# start threads
36t1.start()
37t2.start()
38
39# wait until threads finish their job
40t1.join()
41t2.join()
42
43if __name__ == "__main__":
44for i in range(10):
45main_task()
46print("Iteration {0}: x = {1}".format(i,x))
Output:
Iteration 0: x = 200000
Iteration 1: x = 200000
Iteration 2: x = 200000
Iteration 3: x = 200000
Iteration 4: x = 200000
Iteration 5: x = 200000
Iteration 6: x = 200000
Iteration 7: x = 200000
Iteration 8: x = 200000
Iteration 9: x = 200000
As you can see in the results, the final value of x comes out to be 200000 every time (which is the expected final
result). Here is a diagram given below which depicts the implementation of locks in above program:
This brings us to the end of this tutorial series on Multithreading in Python.
Finally, here are are a few advantages and disadvantages of multithreading:
Advantages:
It doesn’t block the user. This is because threads are independent of each other.
Better use of system resources is possible since threads execute tasks parallely.
Enhanced performance on multi-processor machines.
Multi-threaded servers and interactive GUIs use multithreading exclusively.
Disadvantages: