Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
25 views

PChem3 Python Tutorial5

This document discusses Python modules and libraries for scientific computing. It introduces the os, subprocess, time and datetime modules, covering functions for file/directory management, running system commands, and working with dates/times. Examples demonstrate basic usage of these modules.

Uploaded by

Suhyun Lee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

PChem3 Python Tutorial5

This document discusses Python modules and libraries for scientific computing. It introduces the os, subprocess, time and datetime modules, covering functions for file/directory management, running system commands, and working with dates/times. Examples demonstrate basic usage of these modules.

Uploaded by

Suhyun Lee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Physical Chemistry 3 Spring 2024, SNU

2.5 Modules and libraries


Modules are files containing functions and classes that can be imported by other Python programs. This
collection of modules is referred to as libraries. You have the flexibility to import modules and libraries
created by other users or craft your own for integration into other programs. One of Python’s strengths
lies in its extensive library ecosystem, comprising both built-in and user-generated resources, which
offer a plethora of versatile functionalities tailored to various needs. These libraries significantly enhance
Python’s capability for scientific calculations, providing ready-made solutions for common tasks such as
numerical computations, data analysis, and visualization. Whether you’re performing basic arithmetic
operations or conducting complex scientific simulations, Python’s rich library support ensures that you
have the tools necessary to tackle diverse computational challenges effectively.

Modules and libraries can be imported using the import statement. In the upcoming sections, we will
introduce several frequently-used modules in Python. It’s important to note that we won’t cover every
function and class within each library. For detailed information, you should refer to the library doc-
umentation. Additionally, it’s crucial to always check the version of the library you are using. Often,
libraries have dependencies, which can be quite cumbersome. Inconsistent versions of libraries can lead
to numerous problems.

Proficiency in Python coding often involves the ability to search for and utilize libraries that provide the
necessary functions and classes. Becoming adept at effectively navigating library documentation and
leveraging existing resources is a key skill for Python programmers.

In this section, we will maintain consistency by using the following versions of Python and libraries.

Code 2.77: Python and libraries’ versions.


1 import sys
2 # os, time, datetime modules are built-in modules
3 print('Current Python version:\n', sys.version)
4
5 # Un-comment this line if you haven't installed numpy
6 # Comment/Un-comment with ctrl + / (Windows) and cmd + / (Mac)
7 # !pip install numpy
8 import numpy as np
9 print('Current numpy version:', np.__version__)
10
11 # Un-comment this line if you haven't installed numba
12 # !pip install numba
13 import numba
14 print('Current numba version:', numba.__version__)
15
16 # Un-comment this line if you haven't installed joblib
17 # !pip install joblib
18 import joblib
19 print('Current joblib version:', joblib.__version__)

Output 2.77

Current Python version:


3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ]
Current numpy version: 1.26.4
Current numba version: 0.59.0
Current joblib version: 1.3.2

44
Physical Chemistry 3 Spring 2024, SNU

2.5.1 os and subprocess module


❗ For the documentation of os and subprocess modules in Python 3.11.8, see https://docs.python.
org/3.11/library/os.html and https://docs.python.org/3.11/library/subprocess.html, respectively.

os module provides utilization of os(operating system)-dependent functionalities. os.getcwd() func-


tion returns current directory into a string. This is equivalent to pwd command in bash shell.

Code 2.78: os.getcwd() function.


1 import os
2
3 os.getcwd()

Output 2.78

'/Users/(...)/Downloads/python_tutorial'

Directory management

os.mkdir() (equivalent to mkdir command in bash ) generates a new directory, while os.listdir()
lists all directories and files under current directory. Note that built-in sorted() function sorts the list
given in the argument.

Code 2.79: os.mkdir() and os.listdir() function.


1 os.mkdir('./dir') # Equiv. to 'mkdir dir'
2 sorted(os.listdir()) # sorted() sorts os.listdir()

Output 2.79

['README.md',
'dir',
'file.txt',
'tutorial1.ipynb',
'tutorial2.ipynb',
'tutorial3.ipynb',
'tutorial4.ipynb',
'tutorial5.ipynb',
'tutorial6.ipynb',
'tutorial7.ipynb']

Note that os.mkdir() raises FileExistsError if you try to create an already existing directory.

Code 2.80: FileExistsError


1 os.mkdir('./dir') # Error

Output 2.80
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
Cell In[4], line 1
----> 1 os.mkdir('./dir')

FileExistsError: [Errno 17] File exists: './dir'

45
Physical Chemistry 3 Spring 2024, SNU

Compared to os.mkdir() , os.makedirs() can avoid errors by utilizing the exist_ok option. In fact,
os.makedirs() creates directories recursively. Even if there is no parent directory, os.makedirs() au-
tomatically generates directories recursively until the destination directory is reached. Conversely, the
os.rmdir() function, which is equivalent to the rm -r command in bash , removes directories.

Code 2.81: os.makdirs() and os.rmdir() function.


1 os.makedirs('./dir', exist_ok = True) # No error
2 os.rmdir('./dir') # Equiv. to 'rm -r dir'
3 sorted(os.listdir())

Output 2.81

['README.md',
'file.txt',
'tutorial1.ipynb',
'tutorial2.ipynb',
'tutorial3.ipynb',
'tutorial4.ipynb',
'tutorial5.ipynb',
'tutorial6.ipynb',
'tutorial7.ipynb']

File management

os.rename() funciton (equivalent to the mv command in bash ) renames a file. To check if a path exists,
use the os.path.exists() function.

Code 2.82: os.rename() and os.path.exists() functions.


1 os.rename('./file.txt', './file2.txt') # Equiv. to 'mv file.txt file2.txt'
2 print(os.path.exists('./file.txt'))
3 print(os.path.exists('./file2.txt'))

Output 2.82

False
True

To remove a file, use os.remove() function (equivalent to the rm command in bash ).

Code 2.83: os.remove() function.


1 os.remove('./file2.txt') # Equiv. to 'rm file2.txt'
2 print(os.path.exists('./file2.txt'))

Output 2.83

False

os.path.isdir() and os.path.isfile() checks if such directory or file exists. os.path.join() is a


useful tool for managing paths in Python. You can join paths with it. Following example Code 2.84
shows joining base path and file path inside the directory.

❗ . means current directory in Linux. .. means parent directory. So ./dir means dir directory (or
a file) in the current directory.

46
Physical Chemistry 3 Spring 2024, SNU

Code 2.84: Methods of os.path module.


1 print(os.path.isdir('./dir'))
2 print(os.path.isfile('./file.txt'))
3
4 pwd = os.getcwd()
5 file = './file.txt'
6 PATH = os.path.join(pwd, file)
7 print(PATH)
8
9 print(os.path.split(PATH))
10 print(os.path.splitext(PATH))

Output 2.84

False
False
/Users/(...)/Downloads/python_tutorial/./file.txt
('/Users/(...)/Downloads/python_tutorial/.', 'file.txt')
('/Users/(...)/Downloads/python_tutorial/./file', '.txt')

os.path.split() and os.path.splitext() split the path string. os.path.split() separates the file
path from the entire path while os.path.splitext() separates the file extension.

subprocess module

subprocess module in python manages processes itself and standard input/output/error pipes. This
module replaces some old functions in os module. Here we only introduce one class in subprocess
module: the subprocess.Popen() class. Code 2.85 executes ls -la command in bash .

Code 2.85: subprocess.Popen() class usage.


1 import subprocess
2
3 # Valid for UNIX operating systems
4 proc = subprocess.Popen(['ls', '-la'], stdout = subprocess.PIPE, stderr = subprocess.PIPE)
5 out = proc.communicate()
6 print(out)

Output 2.85

(b'total 6400\ndrwxr-xr-x@ 11 kadryjh1724 staff 352 Feb 19 19:56 ...


(output truncated)

You can execute bash shell commands with subprocess.Popen() class: generate a process and ”com-
municate” with the process with communicate() method. The standard output ( stdout ) and standard
error ( stderr ) is returned into the variable out in line 5.

47
Physical Chemistry 3 Spring 2024, SNU

2.5.2 time and datetime module


time and datetime module are standard python libraries useful for dealing with time information and
calculations. For the official documentation, see https://docs.python.org/3.11/library/time.html and
https://docs.python.org/3.11/library/datetime.html.

time.time() function

The time.time() function returns the current time from the ”epoch” in seconds. For most devices, the
epoch is set as UTC 1970/01/01 00:00:00.

Code 2.86: time.time() function usage.


1 import time
2
3 now = time.time()
4 print(now)
5 print(time.strftime('%Y/%m/%d %H:%M:%S', time.localtime(now)))

Output 2.86

1708780414.4803193
2024/02/24 22:13:34

The time.strftime() function converts time information into a readable format. It takes a format string
and the current time as arguments. It’s important to note that the current time should be converted into
your local time (such as KST in our case) before passing it to the function.

Measuring elapsed time

You can measure elapsed time by executing time.time() before and after your code. However, Python
provides more functionalities, such as time.perf_counter() and time.process_time() .

Code 2.87: Measuring execution time.


1 start_time = time.perf_counter()
2 start_proc_time = time.process_time()
3
4 sum = 0
5 for i in range(10000000):
6 sum += i
7 time.sleep(5)
8
9 end_time = time.perf_counter()
10 end_proc_time = time.process_time()
11
12 print(f'Elapsed time (real): {end_time - start_time}')
13 print(f'Elapsed time (cpu): {end_proc_time - start_proc_time}')

Output 2.87

Elapsed time (real): 6.273692643968388


Elapsed time (cpu): 1.275854185

48
Physical Chemistry 3 Spring 2024, SNU

The time.perf_counter() (performance counter) function measures the real amount of time, while
time.process_time() returns CPU time. In Code 2.87, time measured by time.perf_counter() in-
cludes any sleeping time generated by the time.sleep() function, whereas time.process_time() does
not.

datetime.datetime class

datetime.datetime class provides convenient processing of time information. Current time can be re-
trieved with datetime.now() function (note that default datetime.datetime object has higher readabil-
ity), and can be converted into the other format you want with strftime() method.

Code 2.88: The datetime module.


1 from datetime import datetime # From the module datetime, import datetime.datetime
2
3 now = datetime.now()
4 print(f'now has {type(now)} and its value is {now}')
5 print(now.strftime('%Y/%m/%d %H:%M:%S'))

Output 2.88

now has <class 'datetime.datetime'> and its value is 2024-02-24 22:13:47.179442


2024/02/24 22:13:47

You can compute the time difference by subtracting two datetime.datetime objects (recall magic meth-
ods). This operation results in a datetime.timedelta object.

It’s important to note that a datetime.timedelta object is distinct from a datetime.datetime object, as
they possess different attributes and methods.

Code 2.89: The datetime.timedelta object.


1 future = datetime.now()
2 dt = future - now
3 print(f'dt has {type(dt)} and its value is {dt}')

Output 2.89

dt has <class 'datetime.timedelta'> and its value is 0:00:02.103660

Following Code 2.90 shows another example.

Code 2.90: Adding a timedelta object to the datetime object.


1 from datetime import timedelta
2
3 now = datetime.now()
4 future = now + timedelta(days = 30, hours = 5, minutes = 17, seconds = 23)
5 print(now)
6 print(future)

Output 2.90

2024-02-24 22:13:55.997377
2024-03-26 03:31:18.997377

49
Physical Chemistry 3 Spring 2024, SNU

Parsing and formatting time strings

The datetime.strptime (string parse time) function parses time from a string, while datetime.strftime
(string format time) function formats time data into a formatted string.

Code 2.91: datetime.strptime() and datetime.strftime() functions.


1 date_str = '2024-03-05 15:00:00'
2 first_class = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
3 print(first_class)
4 print(first_class.strftime('Year %Y, Month %m, Day %d |%H|:|%M|:|%S|'))

Output 2.91

2024-03-05 15:00:00
Year 2024, Month 03, Day 05 |15|:|00|:|00|

Decorators

A decorator is a function that decorates another function by taking it as an argument. By decorating the
function, new functionalities can be added to it. Decorators can be used in various situations, but at
an introductory level, one of the easiest ways to utilize decorators is for measuring time. In Code 2.92,
the sum() function computes the sum of numbers up to the input number. The time_wrapper() func-
tion, which takes another function as an argument, wraps the input function with the datetime.now()
function and returns the time difference.

Code 2.92: The concept of decorators.


1 def time_wrapper(func):
2
3 def wrapper(*args, **kwargs):
4
5 start = datetime.now()
6 ret = func(*args, **kwargs)
7 end = datetime.now()
8 print(f'Time elapsed: {(end - start).total_seconds()} s')
9 return ret
10
11 return wrapper
12
13 def sum(N):
14
15 ret = 0
16 for i in range(N):
17 ret += i
18
19 return ret
20
21 fn = time_wrapper(sum)
22
23 print(fn(1000000))

Output 2.92

Time elapsed: 0.082543 s


499999500000

50
Physical Chemistry 3 Spring 2024, SNU

Python provides a simpler syntax for decorating a function: you can decorate a function by writing
@(function name) before the function definition. The following Code 2.93 is equivalent to Code 2.92.

Code 2.93: Decorator syntax.


1 @time_wrapper
2 def sum(N):
3
4 ret = 0
5 for i in range(N):
6 ret += i
7
8 return ret
9
10 print(sum(1000000))

Output 2.93

Time elapsed: 0.084964 s


499999500000

2.5.3 numpy module


numpy is a fundamental package for fast scientific computing in Python. It supports multi-dimensional
arrays, along with a collection of mathematical functions designed to operate efficiently on these arrays.

One of the major drawbacks of pure Python is its speed. Despite its convenience, Python is known to be
slow. The high performance of numpy can be attributed to its C and C++ backends. While basic Python
can be slower due to its ”interpreted” nature, numpy ’s core functionality is primarily implemented in C
and C++, allowing it to execute array operations much faster than equivalent Python code.

However, although numpy is fast, its operations are executed on a single CPU, which means that many
linear algebra operations, which are faster on GPUs, can be slow. To address this issue, packages like
jax have emerged.

numpy provides a plenty of functionalities beyond what I have introduced here. If you require additional
features, please refer to numpy ’s official documentation at https://numpy.org/doc/stable/.

numpy arrays

Lists are one-dimensional arrays in numpy . You can assign data types with the dtype keyword argument.

Code 2.94: One-dimensional numpy array.


1 import numpy as np
2
3 a = np.array([1, 2, 3], dtype = np.int64)
4 print(a.shape)

Output 2.94

(3,)

numpy arrays have shape attribute.

51
Physical Chemistry 3 Spring 2024, SNU

Multidimensional arrays can be indexed and sliced like the lists.

Code 2.95: Two-dimensional numpy array.


1 b = np.array([[4., 5., 6.], [7., 8., 9.]], dtype = np.float32)
2 print(b[0], b[1])
3 print(b[1][2])
4 print(b.shape)

Output 2.95

[4. 5. 6.] [7. 8. 9.]


9.0
(2, 3)

In Code 2.95, an array with shape (2, 3) was introduced. It’s important to note that the first dimension,
2, indicates that the first set of parentheses [] contains two elements. Similarly, the second dimension,
3, signifies that the second set of parentheses contains three elements. This rule applies consistently for
higher-dimensional arrays as well.

Code 2.96: Multi-dimensional numpy array.


1 c = np.array([[[1, -1, 1], [3, 4, 7]],
2 [[5, 1, 1], [-2, 0, 3]],
3 [[0, -2, -4], [3, 1, 3]]])
4 print(c.shape)
5
6 for i in range(2):
7
8 print(f'{i}:', '-' * 10)
9 print('c[i, :, 0]:', c[i, :, 0])
10 print('c[:, i, 0]:', c[:, i:, 0])
11 print('c[0, i, :]:', c[0, i, :])
12 print('\n')

Output 2.96

(3, 2, 3)
0: ----------
c[i, :, 0]: [1 3]
c[:, i, 0]: [[ 1 3]
[ 5 -2]
[ 0 3]]
c[0, i, :]: [ 1 -1 1]

1: ----------
c[i, :, 0]: [ 5 -2]
c[:, i, 0]: [[ 3]
[-2]
[ 3]]
c[0, i, :]: [3 4 7]

52
Physical Chemistry 3 Spring 2024, SNU

Functions for creating certain shapes of arrays

The usages are straightforward.

Code 2.97: np.zeros(), np.ones() and np.arange() functions.


1 a = np.zeros((3, 2))
2 b = np.ones(5)
3 c = np.arange(1, 11, 1)
4
5 print(a, b, c)

Output 2.97

[[0. 0.]
[0. 0.]
[0. 0.]] [1. 1. 1. 1. 1.] [ 1 2 3 4 5 6 7 8 9 10]

Reshaping arrays

numpy provides array reshaping functions, which involve rearranging the dimensions of an array. For ex-
ample, in the following Code 2.98, an array with shape (4, 2, 2) is created using the np.random.randn
function. The np.random module provides various random number generators, and the randn function
generates random numbers following a normal (or Gaussian) distribution with mean 0 and standard de-
viation 1.

Code 2.98: Reshaping of arrays.


1 x = np.random.randn(4, 2, 2)
2 print(x)
3 print(x.reshape(8, 2))
4 # print(x.reshape(-1, 2))

Output 2.98

[[[-0.06248479 -2.79681374]
[ 1.05793301 -0.29597319]]

[[-0.50820033 0.3952909 ]
[ 0.71850602 -1.03609737]]

[[ 0.18801837 0.59544032]
[ 0.85238323 -0.03165555]]

[[-0.69334143 -1.1885556 ]
[-2.82748787 -0.43549003]]]
[[-0.06248479 -2.79681374]
[ 1.05793301 -0.29597319]
[-0.50820033 0.3952909 ]
[ 0.71850602 -1.03609737]
[ 0.18801837 0.59544032]
[ 0.85238323 -0.03165555]
[-0.69334143 -1.1885556 ]
[-2.82748787 -0.43549003]]

53
Physical Chemistry 3 Spring 2024, SNU

Using the reshape function, one can rearrange the elements to match a new shape. It’s important to
ensure that the reshaped dimensions are compatible with the original shape. In some cases, you can use
a wildcard -1 as an input to the reshape function. If -1 is used, numpy automatically determines the
dimension corresponding to -1 based on the other dimensions.

Adding and removing additional dimensions are commonly referred to as unsqueezing and squeezing. In
Code 2.99, we add one additional dimension to the one-dimensional array.

Code 2.99: Expanding dimensions.


1 y = np.random.randn(10)
2 print(y)
3 print(np.expand_dims(y, axis = 0))
4 print(np.expand_dims(y, axis = 0).shape)
5
6 # import torch
7 # y = torch.randn(10)
8 # print(y.unsqueeze(0))

Output 2.99

[ 1.74403849 -0.05739269 (...) -1.66095744]


[[ 1.74403849 -0.05739269 (...) -1.66095744]]
(1, 10)

Note that the shape changed from (10,) to (1, 10) when adding an additional dimension. You can
change the location of dimension addition using the axis argument. If you set axis=1 , the new di-
mension will be added along the second axis, resulting in a shape of (10, 1).

Code 2.100: np.squeeze() function.


1 z = np.random.randn(4, 2, 1)
2 print(z, '\n')
3 print(np.squeeze(z), np.squeeze(z).shape)

Output 2.100

[[[-2.41010067]
[ 0.75490811]]

[[-1.62621492]
[ 0.2833764 ]]

[[ 0.11003475]
[ 1.34817212]]

[[ 0.80623919]
[ 0.3242343 ]]]

[[-2.41010067 0.75490811]
[-1.62621492 0.2833764 ]
[ 0.11003475 1.34817212]
[ 0.80623919 0.3242343 ]] (4, 2)

In constrast, np.squeeze() squeezes extra dimension.

54
Physical Chemistry 3 Spring 2024, SNU

Broadcasting

Broadcasting is a powerful tool for performing operations between numpy arrays with different shapes.
Let’s begin with a somewhat trivial example of broadcasting.

❗ Examples and figures from this section are retrieved from:

https://numpy.org/doc/stable/user/basics.broadcasting.html

Code 2.101: Multiplying a scalar and a 1D numpy array.


1 a = np.array([1.0, 2.0, 3.0])
2 b = 2.0
3 print(a * b)

Output 2.101

[2. 4. 6.]

Two arrays a and b has different shapes: (3,) and (1,) (or a scalar). However we naturally assume
that these two arrays with different shape can be multiplied indeed. If you look at this multiplication
process closely, this process happens:

Code 2.102: Multiplying a scalar and a 1D numpy array, detailed process.


1 a = np.array([1.0, 2.0, 3.0])
2 b = np.array([2.0, 2.0, 2.0])
3 print(a * b)

Output 2.102

[2. 4. 6.]

In Code 2.102, the array b with shape (1,) were extended (or repeated) to match the shape of the array
a. Then two matrices can be multiplied elementwise. By similar way, you can define multiplications
between high-dimensional arrays with different shapes when their shapes are somewhat compatible.

Figure 2.2: Broadcasting in Code 2.101 and Code 2.102.

numpy compares dimensions elementwise. The rules for broadcastable arrays are:

» Two dimensions are same (trivial).

» One of the dimension is 1 (stretchable).

which also applies to the example in Code 2.101 and Code 2.102.

55
Physical Chemistry 3 Spring 2024, SNU

Examples of compatible and incompatible arrays

» A (shape (5, 4)) and B (shape (1,), or a scalar): A*B (shape (5, 4))

» A (shape (256, 256, 3), a RGB image) and B (shape (3,)): A*B (shape (256, 256, 3))

» A (shape (3,)) and B (shape (4,)) and A (shape (4, 3)) and B (shape (4,)) (incompatible)

One example of broadcasting

A (shape (4, 3)) is compatible with B (shape (3,)), but not with C (shape (4,)).

Figure 2.3: Compatible arrays.

Figure 2.4: Incompatible arrays.

If we unsqueeze (or expand dimensions) C into np.expand_dims(C, axis=1) (shape (4, 1)), then A and
np.expand_dims(C, axis=1) would be compatible. Lastly, array a (shape (4, 1)) and array b (shape (3,))
is compatible.

Figure 2.5: Compatible arrays, example 2.

56
Physical Chemistry 3 Spring 2024, SNU

Code 2.103: Incompatible arrays.


1 a = np.array([[1, 1, 1, 1], [2, 2, 2, 2]]) # (2, 4) array
2 b = np.array([[1, 1], [2, 2], [3, 3]]) # (3, 2) array
3
4 print(a + b)

Output 2.103
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[27], line 4
1 a = np.array([[1, 1, 1, 1], [2, 2, 2, 2]]) # (2, 4) array
2 b = np.array([[1, 1], [2, 2], [3, 3]]) # (3, 2) array
----> 4 print(a + b)

ValueError: operands could not be broadcast together with shapes (2,4) (3,2)

Code 2.104: Compatible arrays.


1 a = np.array([[1, 1, 1, 1], [2, 2, 2, 2]]) # (2, 4) array
2 b = np.array([[7], [7]]) # (2, 1) array
3
4 print(a + b)
5 print((a + b).shape)

Output 2.104

[[8 8 8 8]
[9 9 9 9]]
(2, 4)

Vectorizing with numpy

Typically, Python lists may contain data with different types. For instance, sum() function which calcu-
lates the sum of all elements inside the given list, should check the type of its elements every time you
add the subsequent element.

Meanwhile, numpy arrays are homogeneous (all array elements have same data types); this allows to use
compiled C code for operations between matrix elements, which provides definite speed-up. In Code
2.105, first we generate a numpy array with np.linspace() function, which takes starting point, ending
point and number of points between two points as arguments.

Code 2.105: Vectorized np.sin() function.


1 a = np.linspace(0, np.pi / 2, 100)
2 b = np.sin(a)
3 print(b)
4 print(np.average(np.power(b, 2)))

Output 2.105

[0. 0.01586596 0.03172793 0.04758192 0.06342392 0.07924996


0.09505604 0.1108382 0.12659245 0.14231484 0.1580014 0.17364818
(...)
0.98982144 0.99195481 0.99383846 0.99547192 0.99685478 0.99798668
0.99886734 0.99949654 0.99987413 1. ]
0.5

57
Physical Chemistry 3 Spring 2024, SNU

The np.sin() function is a vectorized function; which guarantees better performance than pure Python
for loops. Note that line 4 in Code 2.105 means

Z π/2
2 1
sin2 x dx =
π 0 2

np.average() function provides averaging through the specific axis of the array.

Code 2.106: np.average() function.


1 x = np.random.randn(4, 3, 2)
2 print(np.average(x, axis = 0)) # (3, 2) array
3 print(np.average(x, axis = 1)) # (4, 2) array
4 print(np.average(x, axis = 2)) # (4, 3) array

Output 2.106

[[-0.50325378 0.73180525]
[ 0.55506121 -0.51120121]
[ 0.09804889 0.40828138]]
[[ 0.21118092 0.59686133]
[ 0.36192477 -0.25026971]
[-0.62335483 -0.48032833]
[ 0.25005756 0.9722506 ]]
[[ 0.10923951 0.54430973 0.55851413]
[ 0.04499412 -0.47082109 0.59330956]
[-0.18002283 -1.25462955 -0.22087235]
[ 0.48289215 1.26886091 0.08170919]]

Linear algebra with numpy

numpy.linalg module provides various linear algebra related functions. Take a look at the documenta-
tion if you are interested in. In this tutorial, we just introduce few useful functions.

Matrix multiplications can be done with @ operator, not with * operator (elementwise multiplication).

Code 2.107: Matrix multiplication.


1 A = np.random.randn(4, 3)
2 x = np.random.randn(3)
3 b = np.random.randn(4)
4
5 print(A @ x + b)

Output 2.107

[ 4.56414939 -0.03387167 0.37951254 -0.56519003]

Determinants, eigenvalues, singular value decomposition (SVD), inverse matrix can be calculated easily!

Code 2.108: Linear algebra operations.


1 A = np.array([[4, 2], [3, 5]], dtype = np.int32)
2 print(np.linalg.det(A))
3 print(np.linalg.eigvals(A))
4 print(np.linalg.svd(A))
5 print(np.linalg.inv(A))

58
Physical Chemistry 3 Spring 2024, SNU

Output 2.108

14.000000000000004
[2. 7.]
SVDResult(U=array([[-0.59025263, -0.80721858],
[-0.80721858, 0.59025263]]), S=array([7.07720233, 1.97818281]), Vh=array([[-0.67578487,
-0.73709891], [-0.73709891, 0.67578487]]))
[[ 0.35714286 -0.14285714]
[-0.21428571 0.28571429]]

numpy has way much more functionalities. Never stay inside this tutorial: navigate the outside world,
the official documentations.

2.5.4 numba module


When used effectively, the numba module offers significantly faster computations. For more information,
refer to the official documentation of numba at https://numba.readthedocs.io/en/stable/index.html.

Code 2.109: Numba example.


1 # !pip install scipy
2 import time
3 from numba import jit
4
5 def f1():
6 sum = 0
7 for i in range(100000000):
8 sum += i
9 return sum
10
11 @jit(nopython = True) # Equivalent to @njit
12 def f2():
13 sum = 0
14 for i in range(100000000):
15 sum += i
16 return sum
17
18 start = time.time()
19 f1()
20 end = time.time()
21 print('Without numba jit:', end - start)
22
23 start = time.time()
24 f2()
25 end = time.time()
26 print('With numba jit (first compile):', end - start)
27
28 start = time.time()
29 f2()
30 end = time.time()
31 print('With numba jit:', end - start)

59
Physical Chemistry 3 Spring 2024, SNU

Output 2.109

Without numba jit: 4.764200687408447


With numba jit (first compile): 0.44391965866088867
With numba jit: 6.175041198730469e-05

numba is a compiler that generates machine-optimized code. It provides the @jit decorator, which
stands for just-in-time compilation. When writing array-based or math-heavy code, decorating your
function with @jit can significantly enhance performance (compare the execution time of f1() and
f2() ). Upon the first function call, numba compiles it (compare the execution time of the first and sec-
ond call of f2() ). The @jit decorator’s nopython=True argument indicates that the decorated function
does not use Python-ic objects such as lists and dictionaries, which may contain elements with different
data types. numba can handle integers, floating-point numbers, strings, numpy arrays, and other fixed
data types. If you include Python-ic objects in your decorated function, you’ll need to use object mode,
which may not yield performance enhancements.

2.5.5 multiprocessing and joblib module


This subsection is only useful if your machine has multiple processors (or CPUs). If not, you can skip
it. The multiprocessing module in Python provides a way to utilize multiple processes (not multiple
threads), while the joblib package encapsulates the functionalities of the multiprocessing module,
making it more user-friendly. You can find the documentations here: https://docs.python.org/3.11/
library/multiprocessing.html and https://joblib.readthedocs.io/en/stable/. We will focus on the
joblib package here.

If you hava more computer science background, searching for Python GIL (Global Interpreter Lock)
would be an interesting thing to do.

The multiprocessing.cpu_count() function extracts the number of CPU cores available (Actually, you
can do similar things or extract more information about your machine with cat /proc/cpuinfo , if you
use a Linux machine).

Code 2.110: How many CPU cores does your machine have?
1 import time
2 import multiprocessing as mp
3 from joblib import Parallel, delayed
4
5 n_cores = mp.cpu_count()
6 print(n_cores)

Output 2.110

32

Assume that you have 32 independent tasks (each of them has nothing to do with the other 31 tasks)
to do. Then you do not need to do them sequentially with only 1 CPU. If you can utilize 32 CPUs
simultaneously and assign them individual tasks, theoretically 32-fold speed-up can be achieved. In the
following Code 2.111, we utilized 4 CPU cores for the parallelization.

60
Physical Chemistry 3 Spring 2024, SNU

Code 2.111: Example joblib usage.


1 import math
2
3 def parallel_function(i):
4 return math.factorial(int(math.sqrt(i ** 3)))
5
6 start = time.time()
7 for i in range(100, 1000):
8 parallel_function(i)
9 end = time.time()
10 print('Serial execution:', end - start)
11
12 start = time.time()
13 with Parallel(n_jobs = 4) as parallel:
14 parallel(delayed(parallel_function)(i) for i in range(100, 1000))
15 end = time.time()
16 print('(Embarrassingly) Parallel execution:', end - start)

Output 2.111

Serial execution: 5.801486968994141


(Embarrassingly) Parallel execution: 1.9781787395477295

If you try to parallelize too simple task, assigning tasks and copying data would take more time than
sequential computation. Therefore using joblib would worsen the performance.

Code 2.112: Bad joblib usage.


1 def joblib_worsens_this(i):
2 return np.sqrt(i * i)
3
4 start = time.time()
5 for i in range(1000000):
6 joblib_worsens_this(i)
7 end = time.time()
8 print('Serial execution:', end - start)
9
10 start = time.time()
11 with Parallel(n_jobs = n_cores) as parallel:
12 parallel(delayed(joblib_worsens_this)(i) for i in range(1000000))
13 end = time.time()
14 print('(Embarrassingly) Parallel execution:', end - start)

Output 2.112

Serial execution: 1.4704248905181885


(Embarrassingly) Parallel execution: 7.743301868438721

61

You might also like