PChem3 Python Tutorial5
PChem3 Python Tutorial5
Modules and libraries can be imported using the import statement. In the upcoming sections, we will
introduce several frequently-used modules in Python. It’s important to note that we won’t cover every
function and class within each library. For detailed information, you should refer to the library doc-
umentation. Additionally, it’s crucial to always check the version of the library you are using. Often,
libraries have dependencies, which can be quite cumbersome. Inconsistent versions of libraries can lead
to numerous problems.
Proficiency in Python coding often involves the ability to search for and utilize libraries that provide the
necessary functions and classes. Becoming adept at effectively navigating library documentation and
leveraging existing resources is a key skill for Python programmers.
In this section, we will maintain consistency by using the following versions of Python and libraries.
Output 2.77
44
Physical Chemistry 3 Spring 2024, SNU
Output 2.78
'/Users/(...)/Downloads/python_tutorial'
Directory management
os.mkdir() (equivalent to mkdir command in bash ) generates a new directory, while os.listdir()
lists all directories and files under current directory. Note that built-in sorted() function sorts the list
given in the argument.
Output 2.79
['README.md',
'dir',
'file.txt',
'tutorial1.ipynb',
'tutorial2.ipynb',
'tutorial3.ipynb',
'tutorial4.ipynb',
'tutorial5.ipynb',
'tutorial6.ipynb',
'tutorial7.ipynb']
Note that os.mkdir() raises FileExistsError if you try to create an already existing directory.
Output 2.80
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
Cell In[4], line 1
----> 1 os.mkdir('./dir')
45
Physical Chemistry 3 Spring 2024, SNU
Compared to os.mkdir() , os.makedirs() can avoid errors by utilizing the exist_ok option. In fact,
os.makedirs() creates directories recursively. Even if there is no parent directory, os.makedirs() au-
tomatically generates directories recursively until the destination directory is reached. Conversely, the
os.rmdir() function, which is equivalent to the rm -r command in bash , removes directories.
Output 2.81
['README.md',
'file.txt',
'tutorial1.ipynb',
'tutorial2.ipynb',
'tutorial3.ipynb',
'tutorial4.ipynb',
'tutorial5.ipynb',
'tutorial6.ipynb',
'tutorial7.ipynb']
File management
os.rename() funciton (equivalent to the mv command in bash ) renames a file. To check if a path exists,
use the os.path.exists() function.
Output 2.82
False
True
Output 2.83
False
❗ . means current directory in Linux. .. means parent directory. So ./dir means dir directory (or
a file) in the current directory.
46
Physical Chemistry 3 Spring 2024, SNU
Output 2.84
False
False
/Users/(...)/Downloads/python_tutorial/./file.txt
('/Users/(...)/Downloads/python_tutorial/.', 'file.txt')
('/Users/(...)/Downloads/python_tutorial/./file', '.txt')
os.path.split() and os.path.splitext() split the path string. os.path.split() separates the file
path from the entire path while os.path.splitext() separates the file extension.
subprocess module
subprocess module in python manages processes itself and standard input/output/error pipes. This
module replaces some old functions in os module. Here we only introduce one class in subprocess
module: the subprocess.Popen() class. Code 2.85 executes ls -la command in bash .
Output 2.85
You can execute bash shell commands with subprocess.Popen() class: generate a process and ”com-
municate” with the process with communicate() method. The standard output ( stdout ) and standard
error ( stderr ) is returned into the variable out in line 5.
47
Physical Chemistry 3 Spring 2024, SNU
time.time() function
The time.time() function returns the current time from the ”epoch” in seconds. For most devices, the
epoch is set as UTC 1970/01/01 00:00:00.
Output 2.86
1708780414.4803193
2024/02/24 22:13:34
The time.strftime() function converts time information into a readable format. It takes a format string
and the current time as arguments. It’s important to note that the current time should be converted into
your local time (such as KST in our case) before passing it to the function.
You can measure elapsed time by executing time.time() before and after your code. However, Python
provides more functionalities, such as time.perf_counter() and time.process_time() .
Output 2.87
48
Physical Chemistry 3 Spring 2024, SNU
The time.perf_counter() (performance counter) function measures the real amount of time, while
time.process_time() returns CPU time. In Code 2.87, time measured by time.perf_counter() in-
cludes any sleeping time generated by the time.sleep() function, whereas time.process_time() does
not.
datetime.datetime class
datetime.datetime class provides convenient processing of time information. Current time can be re-
trieved with datetime.now() function (note that default datetime.datetime object has higher readabil-
ity), and can be converted into the other format you want with strftime() method.
Output 2.88
You can compute the time difference by subtracting two datetime.datetime objects (recall magic meth-
ods). This operation results in a datetime.timedelta object.
It’s important to note that a datetime.timedelta object is distinct from a datetime.datetime object, as
they possess different attributes and methods.
Output 2.89
Output 2.90
2024-02-24 22:13:55.997377
2024-03-26 03:31:18.997377
49
Physical Chemistry 3 Spring 2024, SNU
The datetime.strptime (string parse time) function parses time from a string, while datetime.strftime
(string format time) function formats time data into a formatted string.
Output 2.91
2024-03-05 15:00:00
Year 2024, Month 03, Day 05 |15|:|00|:|00|
Decorators
A decorator is a function that decorates another function by taking it as an argument. By decorating the
function, new functionalities can be added to it. Decorators can be used in various situations, but at
an introductory level, one of the easiest ways to utilize decorators is for measuring time. In Code 2.92,
the sum() function computes the sum of numbers up to the input number. The time_wrapper() func-
tion, which takes another function as an argument, wraps the input function with the datetime.now()
function and returns the time difference.
Output 2.92
50
Physical Chemistry 3 Spring 2024, SNU
Python provides a simpler syntax for decorating a function: you can decorate a function by writing
@(function name) before the function definition. The following Code 2.93 is equivalent to Code 2.92.
Output 2.93
One of the major drawbacks of pure Python is its speed. Despite its convenience, Python is known to be
slow. The high performance of numpy can be attributed to its C and C++ backends. While basic Python
can be slower due to its ”interpreted” nature, numpy ’s core functionality is primarily implemented in C
and C++, allowing it to execute array operations much faster than equivalent Python code.
However, although numpy is fast, its operations are executed on a single CPU, which means that many
linear algebra operations, which are faster on GPUs, can be slow. To address this issue, packages like
jax have emerged.
numpy provides a plenty of functionalities beyond what I have introduced here. If you require additional
features, please refer to numpy ’s official documentation at https://numpy.org/doc/stable/.
numpy arrays
Lists are one-dimensional arrays in numpy . You can assign data types with the dtype keyword argument.
Output 2.94
(3,)
51
Physical Chemistry 3 Spring 2024, SNU
Output 2.95
In Code 2.95, an array with shape (2, 3) was introduced. It’s important to note that the first dimension,
2, indicates that the first set of parentheses [] contains two elements. Similarly, the second dimension,
3, signifies that the second set of parentheses contains three elements. This rule applies consistently for
higher-dimensional arrays as well.
Output 2.96
(3, 2, 3)
0: ----------
c[i, :, 0]: [1 3]
c[:, i, 0]: [[ 1 3]
[ 5 -2]
[ 0 3]]
c[0, i, :]: [ 1 -1 1]
1: ----------
c[i, :, 0]: [ 5 -2]
c[:, i, 0]: [[ 3]
[-2]
[ 3]]
c[0, i, :]: [3 4 7]
52
Physical Chemistry 3 Spring 2024, SNU
Output 2.97
[[0. 0.]
[0. 0.]
[0. 0.]] [1. 1. 1. 1. 1.] [ 1 2 3 4 5 6 7 8 9 10]
Reshaping arrays
numpy provides array reshaping functions, which involve rearranging the dimensions of an array. For ex-
ample, in the following Code 2.98, an array with shape (4, 2, 2) is created using the np.random.randn
function. The np.random module provides various random number generators, and the randn function
generates random numbers following a normal (or Gaussian) distribution with mean 0 and standard de-
viation 1.
Output 2.98
[[[-0.06248479 -2.79681374]
[ 1.05793301 -0.29597319]]
[[-0.50820033 0.3952909 ]
[ 0.71850602 -1.03609737]]
[[ 0.18801837 0.59544032]
[ 0.85238323 -0.03165555]]
[[-0.69334143 -1.1885556 ]
[-2.82748787 -0.43549003]]]
[[-0.06248479 -2.79681374]
[ 1.05793301 -0.29597319]
[-0.50820033 0.3952909 ]
[ 0.71850602 -1.03609737]
[ 0.18801837 0.59544032]
[ 0.85238323 -0.03165555]
[-0.69334143 -1.1885556 ]
[-2.82748787 -0.43549003]]
53
Physical Chemistry 3 Spring 2024, SNU
Using the reshape function, one can rearrange the elements to match a new shape. It’s important to
ensure that the reshaped dimensions are compatible with the original shape. In some cases, you can use
a wildcard -1 as an input to the reshape function. If -1 is used, numpy automatically determines the
dimension corresponding to -1 based on the other dimensions.
Adding and removing additional dimensions are commonly referred to as unsqueezing and squeezing. In
Code 2.99, we add one additional dimension to the one-dimensional array.
Output 2.99
Note that the shape changed from (10,) to (1, 10) when adding an additional dimension. You can
change the location of dimension addition using the axis argument. If you set axis=1 , the new di-
mension will be added along the second axis, resulting in a shape of (10, 1).
Output 2.100
[[[-2.41010067]
[ 0.75490811]]
[[-1.62621492]
[ 0.2833764 ]]
[[ 0.11003475]
[ 1.34817212]]
[[ 0.80623919]
[ 0.3242343 ]]]
[[-2.41010067 0.75490811]
[-1.62621492 0.2833764 ]
[ 0.11003475 1.34817212]
[ 0.80623919 0.3242343 ]] (4, 2)
54
Physical Chemistry 3 Spring 2024, SNU
Broadcasting
Broadcasting is a powerful tool for performing operations between numpy arrays with different shapes.
Let’s begin with a somewhat trivial example of broadcasting.
https://numpy.org/doc/stable/user/basics.broadcasting.html
Output 2.101
[2. 4. 6.]
Two arrays a and b has different shapes: (3,) and (1,) (or a scalar). However we naturally assume
that these two arrays with different shape can be multiplied indeed. If you look at this multiplication
process closely, this process happens:
Output 2.102
[2. 4. 6.]
In Code 2.102, the array b with shape (1,) were extended (or repeated) to match the shape of the array
a. Then two matrices can be multiplied elementwise. By similar way, you can define multiplications
between high-dimensional arrays with different shapes when their shapes are somewhat compatible.
numpy compares dimensions elementwise. The rules for broadcastable arrays are:
which also applies to the example in Code 2.101 and Code 2.102.
55
Physical Chemistry 3 Spring 2024, SNU
» A (shape (5, 4)) and B (shape (1,), or a scalar): A*B (shape (5, 4))
» A (shape (256, 256, 3), a RGB image) and B (shape (3,)): A*B (shape (256, 256, 3))
» A (shape (3,)) and B (shape (4,)) and A (shape (4, 3)) and B (shape (4,)) (incompatible)
A (shape (4, 3)) is compatible with B (shape (3,)), but not with C (shape (4,)).
If we unsqueeze (or expand dimensions) C into np.expand_dims(C, axis=1) (shape (4, 1)), then A and
np.expand_dims(C, axis=1) would be compatible. Lastly, array a (shape (4, 1)) and array b (shape (3,))
is compatible.
56
Physical Chemistry 3 Spring 2024, SNU
Output 2.103
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[27], line 4
1 a = np.array([[1, 1, 1, 1], [2, 2, 2, 2]]) # (2, 4) array
2 b = np.array([[1, 1], [2, 2], [3, 3]]) # (3, 2) array
----> 4 print(a + b)
ValueError: operands could not be broadcast together with shapes (2,4) (3,2)
Output 2.104
[[8 8 8 8]
[9 9 9 9]]
(2, 4)
Typically, Python lists may contain data with different types. For instance, sum() function which calcu-
lates the sum of all elements inside the given list, should check the type of its elements every time you
add the subsequent element.
Meanwhile, numpy arrays are homogeneous (all array elements have same data types); this allows to use
compiled C code for operations between matrix elements, which provides definite speed-up. In Code
2.105, first we generate a numpy array with np.linspace() function, which takes starting point, ending
point and number of points between two points as arguments.
Output 2.105
57
Physical Chemistry 3 Spring 2024, SNU
The np.sin() function is a vectorized function; which guarantees better performance than pure Python
for loops. Note that line 4 in Code 2.105 means
Z π/2
2 1
sin2 x dx =
π 0 2
np.average() function provides averaging through the specific axis of the array.
Output 2.106
[[-0.50325378 0.73180525]
[ 0.55506121 -0.51120121]
[ 0.09804889 0.40828138]]
[[ 0.21118092 0.59686133]
[ 0.36192477 -0.25026971]
[-0.62335483 -0.48032833]
[ 0.25005756 0.9722506 ]]
[[ 0.10923951 0.54430973 0.55851413]
[ 0.04499412 -0.47082109 0.59330956]
[-0.18002283 -1.25462955 -0.22087235]
[ 0.48289215 1.26886091 0.08170919]]
numpy.linalg module provides various linear algebra related functions. Take a look at the documenta-
tion if you are interested in. In this tutorial, we just introduce few useful functions.
Matrix multiplications can be done with @ operator, not with * operator (elementwise multiplication).
Output 2.107
Determinants, eigenvalues, singular value decomposition (SVD), inverse matrix can be calculated easily!
58
Physical Chemistry 3 Spring 2024, SNU
Output 2.108
14.000000000000004
[2. 7.]
SVDResult(U=array([[-0.59025263, -0.80721858],
[-0.80721858, 0.59025263]]), S=array([7.07720233, 1.97818281]), Vh=array([[-0.67578487,
-0.73709891], [-0.73709891, 0.67578487]]))
[[ 0.35714286 -0.14285714]
[-0.21428571 0.28571429]]
numpy has way much more functionalities. Never stay inside this tutorial: navigate the outside world,
the official documentations.
59
Physical Chemistry 3 Spring 2024, SNU
Output 2.109
numba is a compiler that generates machine-optimized code. It provides the @jit decorator, which
stands for just-in-time compilation. When writing array-based or math-heavy code, decorating your
function with @jit can significantly enhance performance (compare the execution time of f1() and
f2() ). Upon the first function call, numba compiles it (compare the execution time of the first and sec-
ond call of f2() ). The @jit decorator’s nopython=True argument indicates that the decorated function
does not use Python-ic objects such as lists and dictionaries, which may contain elements with different
data types. numba can handle integers, floating-point numbers, strings, numpy arrays, and other fixed
data types. If you include Python-ic objects in your decorated function, you’ll need to use object mode,
which may not yield performance enhancements.
If you hava more computer science background, searching for Python GIL (Global Interpreter Lock)
would be an interesting thing to do.
The multiprocessing.cpu_count() function extracts the number of CPU cores available (Actually, you
can do similar things or extract more information about your machine with cat /proc/cpuinfo , if you
use a Linux machine).
Code 2.110: How many CPU cores does your machine have?
1 import time
2 import multiprocessing as mp
3 from joblib import Parallel, delayed
4
5 n_cores = mp.cpu_count()
6 print(n_cores)
Output 2.110
32
Assume that you have 32 independent tasks (each of them has nothing to do with the other 31 tasks)
to do. Then you do not need to do them sequentially with only 1 CPU. If you can utilize 32 CPUs
simultaneously and assign them individual tasks, theoretically 32-fold speed-up can be achieved. In the
following Code 2.111, we utilized 4 CPU cores for the parallelization.
60
Physical Chemistry 3 Spring 2024, SNU
Output 2.111
If you try to parallelize too simple task, assigning tasks and copying data would take more time than
sequential computation. Therefore using joblib would worsen the performance.
Output 2.112
61