Python - Unit 3 Complete Notes
Python - Unit 3 Complete Notes
System tools
To begin our exploration of the systems domain, we will take a quick tour through the
standard library sys and os modules in this chapter, before moving on to larger system
programming concepts. As you can tell from the length of their attribute lists, both of these are
large modules—the following reflects Python 3.1 running on Windows 7 outside IDLE:
C:\...\PP4E\System> python
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (...)] on win32
65
122
52
The content of these two modules may vary per Python version and platform. For example, os is
much larger under Cygwin after building Python 3.1 from its source code there (Cygwin is a
system that provides Unix-like functionality on Windows; it is discussed further in More on
Cygwin Python for Windows):
$ ./python.exe
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
>>> len(dir(sys))
64
>>> len(dir(os))
217
>>> len(dir(os.path))
51
>>> os.path.split('/usr/bin/python')
('/usr/bin', 'python')
1.2. Functions
The os module has lots of functions. We will not cover all of them thoroughly but this could be a
good start to use the module.
1.2.1. Manipulating Directories
The getcwd() function returns the current directory (in unicode format with getcwdu() ).
The current directory can be changed using chdir():
os.chdir(path)
The listdir() function returns the content of a directory. Note, however, that it mixes directories
and files.
The mkdir() function creates a directory. It returns an error if the parent directory does not exist.
If you want to create the parent directory as well, you should rather use makedirs():
>>> os.mkdir('temp') # creates temp directory inside the current directory
>>> os.makedirs(/tmp/temp/temp")
os.F_OK Value to pass as the mode parameter of access() to test the existence of path.
os.R_OK: Value to include in the mode parameter of access() to test the readability of path.
os.W_OK Value to include in the mode parameter of access() to test the writability of path.
os.X_OK Value to include in the mode parameter of access() to determine if path can be
import os
pid = os.fork()
if pid == 0: # the child
print "this is the child"
elif pid > 0:
print "the child is pid %d" % pid
else:
print("An error occured")
Here, the fork is zithin the executed script but ,ost of the time; you would require the
One of the most common things to do after an os.fork call is to call os.execl immediately
afterward to run another program. os.execl is an instruction to replace the running program with
a new program, so the calling program goes away, and a new program appears in its place:
import os
pid = os.fork()
# fork and exec together
print "second test"
if pid == 0: # This is the child
print "this is the child"
print "I'm going to exec another program now"
os.execl(`/bin/cat', `cat', `/etc/motd')
else:
print "the child is pid %d" % pid
os.wait()
The os.wait function instructs Python that you want the parent to not do anything until the child
process returns. It is very useful to know how this works because it works well only under Unix
and Unix-like platforms such as Linux.
Windows also has a mechanism for starting up new processes. To make the common task of
starting a new program easier, Python offers a single family of functions that combines os.fork
and os.exec on Unix-like systems, and enables you to do something similar on Windows
platforms.
When you want to just start up a new program, you can use the os.spawn family of functions.
execl(path, args) or execle(path, args, env) env is a dict with env variables.
exexp(file; a1; a2, a3) or exexp(file; a1; a2, a3, env)
todo
os.getloadavg os.setegid
os.getlogin os.seteuid
os.abort os.getpgid os.setgid
os.getpgrp os.setgroups
os.setpgid os.setpgrp
os.UserDict os.getresgid os.setregid
os.getresuid os.setresgid os.getsid
os.setresuid os.setreuid
os.closerange os.initgroups os.setsid
os.confstr os.isatty os.setuid
os.confstr_names os.ctermid
os.defpath os.devnull
os.link os.dup os.dup2
os.errno os.major
os.error os.makedev os.stat_float_times
os.execl
os.execle os.minor os.statvfs
os.execlp os.statvfs_result
os.execlpe os.mkfifo os.strerror
os.execv os.mknod os.symlink
os.execve
os.execvp os.sysconf
os.execvpe os.open os.sysconf_names
os.extsep os.openpty os.system
os.fchdir os.pardir os.tcgetpgrp
os.tcsetpgrp os.pathconf os.tempnam
os.fdatasync os.pathconf_names os.times
os.fdopen os.tmpfile
os.pipe os.tmpnam
os.forkpty os.popen os.ttyname
os.fpathconf os.popen2 os.popen3
os.fstatvfs os.popen4
os.fsync os.putenv os.unsetenv
os.ftruncate os.read os.urandom
os.readlink os.utime
os.wait os.wait3
os.getenv os.wait4
os.waitpid os.getgroups
The os.walk() function allows to recursively scan a directory and obtain tuples containing
tuples of (dirpath, dirnames, filename) where dirnames is a list of directories found in
dirpath, and filenames the list of files found in dirpath.
Alternatevely, the os.path.walk can also be used but works in a different way (see below).
>>> os.uname
('Linux',
'localhost.localdomain',
'3.3.4-5.fc17.x86_64',
'#1 SMP Mon May 7 17:29:34 UTC 2012',
'x86_64')
The function os.name() returns the OS-dependent module (e.g., posix, doc, mac,...)
The function os.pardir() refers to the parent directory (.. for unix and windows and :: for Mac
OS).
The os.pathsep() function (also found in os.path.sep()) returns the correct path separator for
your system (slash / under Linux and backslash under Windows).
Finally, the os.sep() is the character that separates pathname components (/ for Unix, for
windows and : for Mac OS). It is also available in os.path.sep()
>>> # under linux
>>> os.path.sep
'/'
Another function that is related to multi-platform situations is the os.path.normcase() that is
useful under Windows where the OS ignore cases. So, to compare two filenames you will need
this function.
1.3.1. More about directories and files
os.path provides methods to extract information about path and file names:
>>> os.path.curdir # returns the current directory ('.')
>>> os.path.isdir(dir) # returns True if dir exists
>>> os.path.isfile(file) # returns True if file exists
>>> os.path.islink(link) # returns True if link exists
>>> os.path.exists(dir) # returns True if dir exists (full pathname or filename)
>>> os.path.getsize(filename) # returns size of a file without opening it.
You can access to the time when a file was last modified. Nevertheless, the output is not friendly
user. Under Unix it corresponds to the time since the Jan 1, 1970 (GMT) and under Mac OS
since Jan 1, 1904 (GMT)Use the time module to make it easier to read:
>>> import os
>>> os.path.basename("/home/user/temp.txt")
temp.txt
To get the directory name of a path, use os.path.dirname():
>>> import os
>>> os.path.dirname("/home/user/temp.txt")
/home/user
The os.path.abspath() returns the absolute path of a file:
>>> import os
>>> os.path.abspath('temp.txt')
In summary, consider a file temp.txt in /home/user:
function Output
basename ‘temp.txt’
dirname ‘’
abspath ‘/home/user/temp.txt
the path should not end with ‘/’, otherwise the name is empty.
Conversely, the join method allows to join several directory name to create a full path name:
>>> os.path.join('/home', 'user')
'/home/user'
os.path.walk() scan a directory recursively and apply a function of each item found (see
also os.walk() above):
def print_info(arg, dir, files):
for file in files:
print dir + ' ' + file
os.path.walk('.', print_info, 0)
import os
os.environ.keys()
and if you know what you are doing, you can add or replace a variable:
os.environ[NAME] = VALUE
The sys module in Python provides various functions and variables that are used to manipulate
different parts of the Python runtime environment. It allows operating on the interpreter as it
provides access to the variables and functions that interact strongly with the interpreter. Let’s
consider the below example.
Example:
Python3
import sys
print(sys.version)
Output:
3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0]
In the above example, sys.version is used which returns a string containing the version
of Python Interpreter with some additional information.
This shows how the sys module interacts with the interpreter. Let us dive into the
article to get more information about the sys module.
Input and Output using sys
The sys modules provide variables for better control over input or output. We can even redirect
the input and output to other devices. This can be done using three variables –
stdin
stdout
stderr
stdin: It can be used to get input from the command line directly. It used is for standard input.
It internally calls the input() method. It, also, automatically adds ‘\n’ after each sentence.
Example:
Python3
import sys
for line in sys.stdin:
if 'q' == line.rstrip():
break
print(f'Input : {line}')
print("Exit")
Output:
stdout: A built-in file object that is analogous to the interpreter’s standard output stream in
Python. stdout is used to display output directly to the screen console. Output can be of any
form, it can be output from a print statement, an expression statement, and even a prompt
direct for input. By default, streams are in text mode. In fact, wherever a print function is
called within the code, it is first written to sys.stdout and then finally on to the screen.
Example:
Python3
import sys
sys.stdout.write('Geeks')
Output:
Geeks
Python3
import sys
def print_to_stderr(*a):
# Here a is the array holding the objects
# passed as the arguement of the function
print(*a, file = sys.stderr)
print_to_stderr("Hello World")
Output:
Python3
Output:
Python3
import sys
age = 17
Output:
An exception has occurred, use %tb to see the full traceback.
SystemExit: Age less than 18
Working with Modules
sys.path is a built-in variable within the sys module that returns the list of directories that the
interpreter will search for the required module.
When a module is imported within a Python file, the interpreter first searches for the specified
module among its built-in modules. If not found it looks through the list of directories defined
by sys.path.
Note: sys.path is an ordinary list and can be manipulated.
Example 1: Listing out all the paths
Python3
import sys
print(sys.path)
Output:
Output:
ModuleNotFoundError: No module named 'pandas'
Python3
import sys
print(sys.modules)
Output:
Reference Count
sys.getrefcount() method is used to get the reference count for any given object. This value is
used by Python as when this value becomes 0, the memory for that particular value is deleted.
Example:
Python3
import sys
a = 'Geeks'
print(sys.getrefcount(a))
Output
Function Description
sys.getrecursionlimit() limit of the interpreter or to find the maximum depth of the Python
method interpreter stack.
Parameters:
top: Starting directory for os.walk().
topdown: If this optional argument is True then the directories are scanned from top-down
otherwise from bottom-up. This is True by default.
onerror: It is a function that handles errors that may occur.
followlinks: This visits directories pointed to by symlinks, if set to True.
Return Type: For each directory in the tree rooted at directory top (including top itself), it
yields a 3-tuple (dirpath, dirnames, filenames).
We want to list out all the subdirectories and file inside the directory Tree. Below is the
implementation.
import os
Output:
The above code can be shortened using List Comprehension which is a more Pythonic way.
Below is the implementation.
# Python program to list out
# all the sub-directories and files
import os
Output:
Forking Processes
Forked processes are a traditional way to structure parallel tasks, and they are a
fundamental part of the Unix tool set.
Forking is a straightforward way to start an independent program, whether it is different
from the calling program or not.
Forking is based on the notion of copying programs: when a program calls the fork
routine, the operating system makes a new copy of that program and its process in
memory and starts running that copy in parallel with the original.
Some systems don’t really copy the original program (it’s an expensive operation), but
the new copy works as if it were a literal copy.
After a fork operation, the original copy of the program is called the parent process, and
the copy created by os.fork is called the child process.
In general, parents can make any number of children, and children can create child
processes of their own; all forked processes run independently and in parallel under the
operating system’s control, and children may continue to run after their parent exits.
This is probably simpler in practice than in theory, though.
The Python script in Example 5-1 forks new child processes until you type the letter q at
the console.
Example 5-1. PP4E\System\Processes\fork1.py
"forks child processes until you type 'q'"
import os
def child():
print('Hello from child', os.getpid())
os._exit(0) # else goes back to parent loop
def parent():
while True:
newpid = os.fork()
if newpid == 0:
child()
else:
print('Hello from parent', os.getpid(), newpid)
if input() == 'q': break
parent()
Python’s process forking tools, available in the os module, are simply thin wrappers over
standard forking calls in the system library also used by C language programs.
To start a new, parallel process, call the os.fork built-in function.
Because this function generates a copy of the calling program, it returns a different value
in each copy: zero in the child process and the process ID of the new child in the parent.
Programs generally test this result to begin different processing in the child only; this
script, for instance, runs the child function in child processes only.
Because forking is ingrained in the Unix programming model, this script works well on
Unix, Linux, and modern Macs. Unfortunately, this script won’t work on the standard
version of Python for Windows today, because fork is too much at odds with the
Windows model.
Python scripts can always spawn threads on Windows, and the multiprocessing module
described later in this chapter provides an alternative for running processes portably,
which can obviate the need for process forks on Windows in contexts that conform to its
constraints (albeit at some potential cost in low-level control).
The script in Example 5-1 does work on Windows, however, if you use the Python
shipped with the Cygwin system (or build one of your own from source-code with
Cygwin’s libraries). Cygwin is a free, open source system that provides full Unix-like
functionality for Windows (and is described further in More on Cygwin Python for
Windows).
You can fork with Python on Windows under Cygwin, even though its behavior is not
exactly the same as true Unix forks. Because it’s close enough for this book’s examples,
though, let’s use it to run our script live:
[C:\...\PP4E\System\Processes]$ python fork1.py
Hello from parent 7296 7920
Hello from child 7920
for i in range(5):
pid = os.fork()
if pid != 0:
print('Process %d spawned' % pid) # in parent: continue
else:
counter(5) # else in child/new process
os._exit(0) # run function and exit
print('Main process exiting.') # parent need not wait
When run, this script starts 5 processes immediately and exits. All 5 forked processes check in
with their first count display one second later and every second thereafter. Notice that child
processes continue to run, even if the parent process that created them terminates:
[C:\...\PP4E\System\Processes]$ python fork-count.py
Process 4556 spawned
Process 3724 spawned
Process 6360 spawned
Process 6476 spawned
Process 6684 spawned
Main process exiting.
[4556] => 0
[3724] => 0
[6360] => 0
[6476] => 0
[6684] => 0
[4556] => 1
[3724] => 1
[6360] => 1
[6476] => 1
[6684] => 1
[4556] => 2
[3724] => 2
[6360] => 2
[6476] => 2
[6684] => 2
The output of all of these processes shows up on the same screen, because all of them
share the standard output stream (and a system prompt may show up along the way, too).
Technically, a forked process gets a copy of the original process’s global memory,
including open file descriptors.
Because of that, global objects like files start out with the same values in a child process,
so all the processes here are tied to the same single stream.
But it’s important to remember that global memory is copied, not shared; if a child
process changes a global object, it changes only its own copy. (As we’ll see, this works
differently in threads, the topic of the next section.)
THE FORK/EXEC COMBINATION
In Examples 5-1 and 5-2, child processes simply ran a function within the Python
program and then exited. On Unix-like platforms, forks are often the basis of starting
independently running programs that are completely different from the program that
performed the fork call.
For instance, Example 5-3 forks new processes until we type q again, but child processes
run a brand-new program instead of calling a function in the same file.
Example 5-3. PP4E\System\Processes\fork-exec.py
"starts programs until you type 'q'"
import os
parm = 0
while True:
parm += 1
pid = os.fork()
if pid == 0: # copy process
os.execlp('python', 'python', 'child.py', str(parm)) # overlay program
assert False, 'error starting program' # shouldn't return
else:
print('Child is', pid)
if input() == 'q': break
If you’ve done much Unix development, the fork/exec combination will probably look
familiar. The main thing to notice is the os.execlp call in this code. In a nutshell, this call
replaces (overlays) the program running in the current process with a brand new program.
Because of that, the combination of os.fork and os.execlp means start a new process and
run a new program in that process—in other words, launch a new program in parallel
with the original program.
os.exec call formats
The arguments to os.execlp specify the program to be run by giving command-line
arguments used to start the program (i.e., what Python scripts know as sys.argv).
If successful, the new program begins running and the call to os.execlp itself never
returns (since the original program has been replaced, there’s really nothing to return to).
If the call does return, an error has occurred, so we code an assert after it that will always
raise an exception if reached.
There are a handful of os.exec variants in the Python standard library; some allow us to
configure environment variables for the new program, pass command-line arguments in
different forms, and so on.
All are available on both Unix and Windows, and they replace the calling program (i.e.,
the Python interpreter). exec comes in eight flavors, which can be a bit confusing unless
you generalize:
os.execv(program, commandlinesequence)
The basic “v” exec form is passed an executable program’s name, along with a list or
tuple of command-line argument strings used to run the executable (that is, the words you
would normally type in a shell to start a program).
os.execl(program, cmdarg1, cmdarg2,... cmdargN)
The basic “l” exec form is passed an executable’s name, followed by one or more
command-line arguments passed as individual function arguments. This is the same
as os.execv(program, (cmdarg1, cmdarg2,...)).
os.execlp
os.execvp
Adding the letter p to the execv and execl names means that Python will locate the executable’s
directory using your system search-path setting (i.e., PATH).
os.execle
os.execve
Adding a letter e to the execv and execl names means an extra, last argument is a dictionary
containing shell environment variables to send to the program.
os.execvpe
os.execlpe
Adding the letters p and e to the basic exec names means to use the search path and to accept a
shell environment settings dictionary.
So when the script in Example 5-3 calls os.execlp, individually passed parameters specify
a command line for the program to be run on, and the word python maps to an executable
file according to the underlying system search-path setting environment variable (PATH).
It’s as if we were running a command of the form python child.py 1 in a shell, but with a
different command-line argument on the end each time.
Spawned child program
Just as when typed at a shell, the string of arguments passed to os.execlp by the fork-exec script
in Example 5-3 starts another Python program file, as shown in Example 5-4.
Example 5-4. PP4E\System\Processes\child.py
import os, sys
print('Hello from child', os.getpid(), sys.argv[1])
Here is this code in action on Linux. It doesn’t look much different from the
original fork1.py, but it’s really running a new program in each forked process.
More observant readers may notice that the child process ID displayed is the same in the
parent program and the launched child.py program; os.execlp simply overlays a program
in the same process:
[C:\...\PP4E\System\Processes]$ python fork-exec.py
Child is 4556
Hello from child 4556 1
Child is 5920
Hello from child 5920 2
Child is 316
Hello from child 316 3
q
What Is a Thread?
A thread is a separate flow of execution. This means that your program will have two things
happening at once. But for most Python 3 implementations the different threads do not actually
execute at the same time: they merely appear to.
Starting a Thread
Now that you’ve got an idea of what a thread is, let’s learn how to make one. The Python
standard library provides threading, which contains most of the primitives you’ll see in this
article. Thread, in this module, nicely encapsulates threads, providing a clean interface to work
with them.
To spawn another thread, you need to call following method available in thread module −
thread.start_new_thread ( function, args[, kwargs] )
This method call enables a fast and efficient way to create new threads in both Linux and
Windows.
The method call returns immediately and the child thread starts and calls function with the
passed list of args. When function returns, the thread terminates.
Here, args is a tuple of arguments; use an empty tuple to call function without passing any
arguments. kwargs is an optional dictionary of keyword arguments.
Example
#!/usr/bin/python
import thread
import time
while 1:
pass
When the above code is executed, it produces the following result −
Thread-1: Thu Jan 22 15:42:17 2009
Thread-1: Thu Jan 22 15:42:19 2009
Thread-2: Thu Jan 22 15:42:19 2009
Thread-1: Thu Jan 22 15:42:21 2009
Thread-2: Thu Jan 22 15:42:23 2009
Thread-1: Thu Jan 22 15:42:23 2009
Thread-1: Thu Jan 22 15:42:25 2009
Thread-2: Thu Jan 22 15:42:27 2009
Thread-2: Thu Jan 22 15:42:31 2009
Thread-2: Thu Jan 22 15:42:35 2009
Although it is very effective for low-level threading, but the thread module is very limited
compared to the newer threading module.
The Threading Module
The newer threading module included with Python 2.4 provides much more powerful, high-
level support for threads than the thread module discussed in the previous section.
The threading module exposes all the methods of the thread module and provides some
additional methods −
threading.activeCount() − Returns the number of thread objects that are active.
threading.currentThread() − Returns the number of thread objects in the caller's thread
control.
threading.enumerate() − Returns a list of all thread objects that are currently active.
In addition to the methods, the threading module has the Thread class that implements
threading. The methods provided by the Thread class are as follows −
run() − The run() method is the entry point for a thread.
start() − The start() method starts a thread by calling the run method.
join([time]) − The join() waits for threads to terminate.
isAlive() − The isAlive() method checks whether a thread is still executing.
getName() − The getName() method returns the name of a thread.
setName() − The setName() method sets the name of a thread.
Creating Thread Using Threading Module
To implement a new thread using the threading module, you have to do the following −
Define a new subclass of the Thread class.
Override the __init__(self [,args]) method to add additional arguments.
Then, override the run(self [,args]) method to implement what the thread should do when
started.
Once you have created the new Thread subclass, you can create an instance of it and then start a
new thread by invoking the start(), which in turn calls run() method.
Example
#!/usr/bin/python
import threading
import time
exitFlag = 0
Thread Objects
The simplest way to use a Thread is to instantiate it with a target function and call start() to let
it begin working.
import threading
def worker():
"""thread worker function"""
print 'Worker'
return
threads = []
for i in range(5):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
$ python threading_simple.py
Worker
Worker
Worker
Worker
Worker
It useful to be able to spawn a thread and pass it arguments to tell it what work to do. This
example passes a number, which the thread then prints.
import threading
def worker(num):
"""thread worker function"""
print 'Worker: %s' % num
return
threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
The integer argument is now included in the message printed by each thread:
$ python -u threading_simpleargs.py
Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4
import threading
import time
def worker():
print threading.currentThread().getName(), 'Starting'
time.sleep(2)
print threading.currentThread().getName(), 'Exiting'
def my_service():
print threading.currentThread().getName(), 'Starting'
time.sleep(3)
print threading.currentThread().getName(), 'Exiting'
t = threading.Thread(name='my_service', target=my_service)
w = threading.Thread(name='worker', target=worker)
w2 = threading.Thread(target=worker) # use default name
w.start()
w2.start()
t.start()
The debug output includes the name of the current thread on each line. The lines with "Thread-
1" in the thread name column correspond to the unnamed thread w2.
$ python -u threading_names.py
The logging module defines a standard API for reporting errors and status information from
applications and libraries. The key benefit of having the logging API provided by a standard
library module is that all Python modules can participate in logging, so an application’s log can
include messages from third-party modules.
Logging in Applications
There are two perspectives for examining logging. Application developers set up
the logging module, directing the messages to appropriate output channels. It is possible to log
messages with different verbosity levels or to different destinations. Handlers for writing log
messages to files, HTTP GET/POST locations, email via SMTP, generic sockets, or OS-specific
logging mechanisms are all included, and it is possible to create custom log destination classes
for special requirements not handled by any of the built-in classes.
Logging to a File
Most applications are probably going to want to log to a file. Use the basicConfig() function to
set up the default handler so that debug messages are written to a file.
import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',
)
def worker():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')
def my_service():
logging.debug('Starting')
time.sleep(3)
logging.debug('Exiting')
t = threading.Thread(name='my_service', target=my_service)
w = threading.Thread(name='worker', target=worker)
w2 = threading.Thread(target=worker) # use default name
w.start()
w2.start()
t.start()
logging is also thread-safe, so messages from different threads are kept distinct in the output.
$ python threading_names_log.py
Daemon Threads
In computer science, a daemon is a process that runs in the background.
Python threading has a more specific meaning for daemon. A daemon thread will shut down
immediately when the program exits. One way to think about these definitions is to consider
the daemon thread a thread that runs in the background without worrying about shutting it down.
If a program is running Threads that are not daemons, then the program will wait for those threads
to complete before it terminates. Threads that are daemons, however, are just killed wherever they
are when the program is exiting.
Let’s look a little more closely at the output of your program above. The last two lines are the
interesting bit. When you run the program, you’ll notice that there is a pause (of about 2 seconds)
after __main__ has printed its all done message and before the thread is finished.
This pause is Python waiting for the non-daemonic thread to complete. When your Python
program ends, part of the shutdown process is to clean up the threading routine.
If you look at the source for Python threading, you’ll see that threading._shutdown() walks through all
of the running threads and calls .join() on every one that does not have the daemon flag set.
So your program waits to exit because the thread itself is waiting in a sleep. As soon as it has
completed and printed the message, .join() will return and the program can exit.
Frequently, this behavior is what you want, but there are other options available to us. Let’s first
repeat the program with a daemon thread. You do that by changing how you construct the Thread,
adding the daemon=True flag
The default is for threads to not be daemons, so passing True turns the daemon mode on.
import threading
import time
import logging
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)s) %(message)s',
)
def daemon():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
logging.debug('Starting')
logging.debug('Exiting')
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
Notice that the output does not include the "Exiting" message from the daemon thread, since all
of the non-daemon threads (including the main thread) exit before the daemon thread wakes up
from its two second sleep.
$ python threading_daemon.py
(daemon ) Starting
(non-daemon) Starting
(non-daemon) Exiting
join() a Thread
Daemon threads are handy, but what about when you want to wait for a thread to stop? What
about when you want to do that and not exit your program? Now let’s go back to your original
program and look at that commented out line twenty:
# x.join()
To tell one thread to wait for another thread to finish, you call .join(). If you uncomment that line,
the main thread will pause and wait for the thread x to complete running.
Did you test this on the code with the daemon thread or the regular thread? It turns out that it
doesn’t matter. If you .join() a thread, that statement will wait until either kind of thread is
finished.
To wait until a daemon thread has completed its work, use the join() method.
import threading
import time
import logging
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)s) %(message)s',
)
def daemon():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
logging.debug('Starting')
logging.debug('Exiting')
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
d.join()
t.join()
Waiting for the daemon thread to exit using join() means it has a chance to produce
its "Exiting" message.
$ python threading_daemon_join.py
(daemon ) Starting
(non-daemon) Starting
(non-daemon) Exiting
(daemon ) Exiting
Frequently, you’ll want to start a number of threads and have them do interesting work. Let’s
start by looking at the harder way of doing that, and then you’ll move on to an easier method.
Using a ThreadPoolExecutor
There’s an easier way to start up a group of threads than the one you saw above. It’s called
a ThreadPoolExecutor, and it’s part of the standard library in concurrent.futures (as of Python 3.2).
The easiest way to create it is as a context manager, using the with statement to manage the
creation and destruction of the pool.
Race Conditions
Before you move on to some of the other features tucked away in Python threading, let’s talk a bit
about one of the more difficult issues you’ll run into when writing threaded programs: race
conditions.
Once you’ve seen what a race condition is and looked at one happening, you’ll move on to some
of the primitives provided by the standard library to prevent race conditions from happening.
Race conditions can occur when two or more threads access a shared piece of data or resource. In
this example, you’re going to create a large race condition that happens every time, but be aware
that most race conditions are not this obvious. Frequently, they only occur rarely, and they can
produce confusing results. As you can imagine, this makes them quite difficult to debug.
The Queue module provides a FIFO implementation suitable for multi-threaded programming. It
can be used to pass messages or other data between producer and consumer threads safely.
Locking is handled for the caller, so it is simple to have as many threads as you want working
with the same Queue instance. A Queue’s size (number of elements) may be restricted to throttle
memory usage or processing.
Basic FIFO Queue
The Queue class implements a basic first-in, first-out container. Elements are added to one “end”
of the sequence using put(), and removed from the other end using get().
import Queue
q = Queue.Queue()
for i in range(5):
q.put(i)
This example uses a single thread to illustrate that elements are removed from the queue in the
same order they are inserted.
$ python Queue_fifo.py
0
1
2
3
4
LIFO Queue
In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out
ordering (normally associated with a stack data structure).
import Queue
q = Queue.LifoQueue()
for i in range(5):
q.put(i)
The item most recently put() into the queue is removed by get().
$ python Queue_lifo.py
4
3
2
1
0
import Queue
import threading
import time
exitFlag = 0
Program Exits
The functions quit(), exit(), sys.exit() and os._exit() have almost same functionality as they raise
the SystemExit exception by which the Python interpreter exits and no stack traceback is printed.
We can catch the exception to intercept early exits and perform cleanup activities; if uncaught,
the interpreter exits as usual.
When we run a program in Python, we simply execute all the code in file, from top to bottom.
Scripts normally exit when the interpreter reaches the end of the file, but we may also call for the
program to exit explicitly with the built-in exit functions.
1. The quit function is used to raise the SystemExit exception and it gives you a message:
>>> print (quit) Use quit() or use Ctrl-Z+ Return to quit >>>
This is for beginners trying to learn python. But you need to remember that you must not use the
quit function in production code.
2. The exit function is more like a synonym for the quit function. Like the quit function, the exit
function is also used to make python more user-friendly and it too does display a message:
os._exit():
Exit the process without calling the cleanup handlers.
exit(0):
a clean exit without any errors / problems.
exit(1):
When there was some issue / error / problem and this is only reason to exit the program.
sys.exit():
When the system and python shuts down; it means less memory is being used after the program
is run.
quit():
Closes the python file.
Interfaces in Python are handled differently than in most other languages, and they can vary in
their design complexity.
At a high level, an interface acts as a blueprint for designing classes. Like classes, interfaces
define methods. Unlike classes, these methods are abstract. An abstract method is one that the
interface simply defines. It doesn’t implement the methods. This is done by classes, which
then implement the interface and give concrete meaning to the interface’s abstract methods.
There are two ways in python to create and implement the interface, which are –
Informal Interfaces
Formal Interfaces
1. Informal Interfaces
python informal interface is also a class that defines methods that can be overridden but
without force enforcement.
An informal interface also called Protocols or Duck Typing. The duck typing is actually
we execute a method on the object as we expected an object to have, instead of checking
the type of an object.
An informal interface in python is termed as a protocol because it is informal and cannot
be formally enforced. It is mostly defined by templates or demonstrates in the
documentations.
2. Formal Interfaces
A formal Interface is an interface which enforced formally. In some situations, the protocols or
duck typing creates confusion, like consider the example we have two classes FourWheelVehicle
and TwoWheelVehicle both have a method SpeedUp( ), so the object of both class can speedup,
but both objects are not the same even if both classes implement the same interface. So to resolve
this confusion, we can use the formal interface. To create a formal interface, we need to use
ABCs (Abstract Base Classes).
An ABC is simple as an interface or base classes define as an abstract class in nature and the
abstract class contains some methods as abstract. Next, if any classes or objects implement or
drive from these base classes, then these bases classes forced to implements all those methods.
Note that the interface cannot be instantiated, which means that we cannot create the object of
the interface. So we use a base class to create an object, and we can say that the object
implements an interface. And we will use the type function to confirm that the object implements
a particular interface or not.
Program: Interface having two abstract methods and one sub class
class Bank(ABC):
@abstractmethod
def balance_check(self):
pass
@abstractmethod
def interest(self):
pass
class SBI(Bank):
def balance_check(self):
def interest(self):
s = SBI()
s.balance_check()
s.interest()
Output:
We can also register a class as a virtual subclass of an ABC. In that case, even if that class
doesn’t subclass our ABC, it will still be treated as a subclass of the ABC (and thus accepted to
have implemented the interface). Example codes will be able to demonstrate this better:
@Bird.register
class Robin:
pass
r = Robin()
And then:
True
True
>>>
In this case, even if Robin does not subclass our ABC or define the abstract method, we
can register it as a Bird. issubclass and isinstance behavior can be overloaded by adding two
relevant magic methods.
Binary files, tree walkers:
Trees are non-linear data structures that represent nodes connected by edges. Each tree consists
of a root node as the Parent node, and the left node and right node as Child nodes.
Binary tree
A tree whose elements have at most two children is called a binary tree. Each element in a binary
tree can have only two children. A node’s left child must have a value less than its parent’s
value, and the node’s right child must have a value greater than its parent value.
2714351093142
Implementation
Here we have created a node class and assigned a value to the node.
python
# node class
class Node:
def __init__(self, data):
# left child
self.left = None
# right child
self.right = None
# node's value
self.data = data
# print function
def PrintTree(self):
print(self.data)
root = Node(27)
root.PrintTree()
Run
The above code will create node 27 as parent node.
Insertion
The insert method compares the value of the node to the parent node and decides whether to add
it as a left node or right node.
Remember: if the node is greater than the parent node, it is inserted as a right
node; otherwise, it’s inserted left.
Finally, the PrintTree method is used to print the tree.
Python
def __init__(self, data):
self.left = None
self.right = None
self.data = data
def insert(self, data):
# Compare the new value with the parent node
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data
# Print the tree
def PrintTree(self):
if self.left:
self.left.PrintTree()
print( self.data),
if self.right:
self.right.PrintTree()
Run
The above code will create root node as 27, left child as 14, and right child as 35.
Searching
While searching for a value in the tree, we need to traverse the node from left to right and with a
parent.
Python
def __init__(self, data):
self.left = None
self.right = None
self.data = data
# Insert method to create nodes
def insert(self, data):
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data
# findval method to compare the value with nodes
def findval(self, lkpval):
if lkpval < self.data:
if self.left is None:
return str(lkpval)+" is not Found"
return self.left.findval(lkpval)
Parallel processing can increase the number of tasks done by your program which reduces the
overall processing time. These help to handle large scale problems.
In this section we will cover the following topics:
Introduction to parallel processing
Multi Processing Python library for parallel processing
IPython parallel framework
Introduction to parallel processing
For parallelism, it is important to divide the problem into sub-units that do not depend on other
sub-units (or less dependent). A problem where the sub-units are totally independent of other
sub-units is called embarrassingly parallel.
For example, An element-wise operation on an array. In this case, the operation needs to aware
of the particular element it is handling at the moment.
In another scenario, a problem which is divided into sub-units have to share some data to
perform operations. These results in the performance issue because of the communication cost.
Distributed memory
In distributed memory, each process is totally separated and has its own memory space. In this
scenario, communication is handled explicitly between the processes. Since the communication
happens through a network interface, it is costlier compared to shared memory.
Threads are one of the ways to achieve parallelism with shared memory. These are the
independent sub-tasks that originate from a process and share memory. Due to Global
Interpreter Lock (GIL) , threads can’t be used to increase performance in Python. GIL is a
mechanism in which Python interpreter design allow only one Python instruction to run at a
time. GIL limitation can be completely avoided by using processes instead of thread. Using
processes have few disadvantages such as less efficient inter-process communication than
shared memory, but it is more flexible and explicit.
Multiprocessing for parallel processing
Using the standard multiprocessing module, we can efficiently parallelize simple tasks by
creating child processes. This module provides an easy-to-use interface and contains a set of
utilities to handle task submission and synchronization.
Process and Pool Class
Process
By subclassing multiprocessing.process, you can create a process that runs independently. By
extending the __init__ method you can initialize resource and by
implementing Process.run() method you can write the code for the subprocess. In the below code,
we see how to create a process which prints the assigned id:
To spawn the process, we need to initialize our Process object and invoke Process.start() method.
Here Process.start() will create a new process and will invoke the Process.run() method.
The code after p.start() will be executed immediately before the task completion of process p. To
wait for the task completion, you can use Process.join().
import multiprocessing
import time
class Process(multiprocessing.Process):
def __init__(self, id):
super(Process, self).__init__()
self.id = id
def run(self):
time.sleep(1)
print("I'm the process with id: {}".format(self.id))
if __name__ == '__main__':
p = Process(0)
p.start()
p.join()
p = Process(1)
p.start()
p.join()
Output:
Pool class
Pool class can be used for parallel execution of a function for different input data.
The multiprocessing.Pool() class spawns a set of processes called workers and can submit tasks
using the methods apply/apply_async and map/map_async . For parallel mapping, you should first
initialize a multiprocessing.Pool() object. The first argument is the number of workers; if not
given, that number will be equal to the number of cores in the system.
Let see by an example. In this example, we will see how to pass a function which computes the
square of a number. Using Pool.map() you can map the function to the list and passing the
function and the list of inputs as arguments, as follows:
import multiprocessing
import time
def square(x):
return x * x
if __name__ == '__main__':
pool = multiprocessing.Pool()
pool = multiprocessing.Pool(processes=4)
inputs = [0,1,2,3,4]
outputs = pool.map(square, inputs)
print("Input: {}".format(inputs))
print("Output: {}".format(outputs))
Output:
When we use the normal map method, the execution of the program is stopped until all the
workers completed the task. Using map_async(), the AsyncResult object is returned immediately
without stopping the main program and the task is done in the background. The result can be
retrieved by using the AsyncResult.get() method at any time as shown below:
import multiprocessing
import time
def square(x):
return x * x
if __name__ == '__main__':
pool = multiprocessing.Pool()
inputs = [0,1,2,3,4]
outputs_async = pool.map_async(square, inputs)
outputs = outputs_async.get()
print("Output: {}".format(outputs))
Output:
Pool.apply_async assigns a task consisting of a single function to one of the workers. It takes the
function and its arguments and returns an AsyncResult object.
import multiprocessing
import time
def square(x):
return x * x
if __name__ == '__main__':
pool = multiprocessing.Pool()
result_async = [pool.apply_async(square, args = (i, )) for i in
range(10)]
results = [r.get() for r in result_async]
print("Output: {}".format(results))
Output:
As a final step, you can execute commands by using the DirectView.execute method.
dview.execute(‘ a = 1 ’)
The above command will be executed individually by each engine. Using the get method you
can get the result in the form of an AsyncResult object.
dview.pull(‘ a ‘).get()
dview.push({‘ a ’ : 2})
As shown above, you can retrieve the data by using the DirectView.pull method and send the data
by using the DirectView.push method.
Task-based interface
The task-based interface provides a smart way to handle computing tasks. From the user point
of view, this has a less flexible interface but it is efficient in load balancing on the engines and
can resubmit the failed jobs thereby increasing the performance.
LoadBalanceView class provides the task-based interface using load_balanced_view method.
from IPython.parallel import Client
rc = Client()
tview = rc.load_balanced_view()
Using the map and apply method we can run some tasks. In LoadBalanceView the task
assignment depends upon how much load is present on an engine at the time. This ensures that
all engines work without downtime.