Package Purpose: Pytroll Scikit Learn Scipy
Package Purpose: Pytroll Scikit Learn Scipy
Package Purpose: Pytroll Scikit Learn Scipy
Table 6.2.1.1: Some of the Python packages commonly used in climate computing and visualisation.
Package Purpose
PyTROLL¹¹ Processing of earth observation
satellite data.
Scikit Learn¹² Machine learning library.
SciPy¹³ Libraries for mathematics,
science, and engineering.
The modular Python building blocks concept can be taken one step further by
combining packages to create even larger and more complex applications. This
creates a semi-layered structure of lower to higher level Python packages Figure
6.2.1.1. The links between packages are further explored in the following section
(Section 6.2.2).
Figure 6.2.1.1: Schematic showing the semi-layered structure of Python packages from lower-level
(bottom) to higher-level (top). Examples of dependencies are indicated for Iris (red lines) and MetPy
(blue lines).
¹¹http://pytroll.github.io
¹²https://scikit-learn.org/stable
¹³https://www.scipy.org
Python - Concepts and Work Environment 90
The standard package manger for Python is called Pip¹⁵ (recursive acronym for Pip
Installs Packages). Pip only installs packages from the Python Package Index (PyPI ¹⁶)
repository.
A Python distribution specifically developed for scientific computing is Anaconda¹⁷.
It comes with its own package manager called Conda¹⁸ and by default installs
packages from its standard Anaconda Repository¹⁹.
The use of the package manager Conda is recommended for the purpose of climate
computations as it is a robust, well-supported and versatile application that combines
both package management and a manager for virtual environments (introduced in
Section 6.2.4). The use of Conda to install and manage packages will be discussed in
Section 6.3.3.
6.3 Conda
Conda integrates both a Python package manager and a manager for Python virtual
environments. Python virtual environments created with Conda are referred to as
Conda environments. The following sections provide a brief introduction to creating
and managing Conda environments as well as installing packages inside a Conda
environment. More details on the usage of Conda can be found in the Conda User
Guide²².
The above command will create a Conda environment that uses the default Python
version. To find out what the default Python version is the command python --version
can be used. If a Python version different from the default version is required then the
Python version can be specified as part of the Conda environment creation command
as done in the following example.
The installation of the Conda environment may take a few minutes. Executing the
above command will produce output in terminal similar to the following.
²²https://conda.io/projects/conda/en/latest/user-guide/index.html
Python - Concepts and Work Environment 93
## Package Plan ##
package | build
---------------------------|-----------------
_libgcc_mutex-0.1 | conda_forge 3 KB conda-forge
_openmp_mutex-4.5 | 0_gnu 435 KB conda-forge
ca-certificates-2019.11.28 | hecc5488_0 145 KB conda-forge
certifi-2019.11.28 | py36h9f0ad1d_1 149 KB conda-forge
ld_impl_linux-64-2.33.1 | h53a641e_8 589 KB conda-forge
libgcc-ng-9.2.0 | h24d8f2e_2 8.2 MB conda-forge
libgomp-9.2.0 | h24d8f2e_2 816 KB conda-forge
libstdcxx-ng-9.2.0 | hdf63c60_2 4.5 MB conda-forge
pip-20.0.2 | py_2 1.0 MB conda-forge
python-3.6.10 |h9d8adfe_1009_cpython 34.1 MB conda-forge
python_abi-3.6 | 1_cp36m 4 KB conda-forge
setuptools-46.0.0 | py36h9f0ad1d_1 653 KB conda-forge
sqlite-3.30.1 | hcee41ef_0 2.0 MB conda-forge
tk-8.6.10 | hed695b0_0 3.2 MB conda-forge
Python - Concepts and Work Environment 94
_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-0_gnu
ca-certificates conda-forge/linux-64::ca-certificates-2019.11.28-hecc5488_0
certifi conda-forge/linux-64::certifi-2019.11.28-py36h9f0ad1d_1
ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.33.1-h53a641e_8
libffi conda-forge/linux-64::libffi-3.2.1-he1b5a44_1006
libgcc-ng conda-forge/linux-64::libgcc-ng-9.2.0-h24d8f2e_2
libgomp conda-forge/linux-64::libgomp-9.2.0-h24d8f2e_2
libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-9.2.0-hdf63c60_2
ncurses conda-forge/linux-64::ncurses-6.1-hf484d3e_1002
openssl conda-forge/linux-64::openssl-1.1.1d-h516909a_0
pip conda-forge/noarch::pip-20.0.2-py_2
python conda-forge/linux-64::python-3.6.10-h9d8adfe_1009_cpython
python_abi conda-forge/linux-64::python_abi-3.6-1_cp36m
readline conda-forge/linux-64::readline-8.0-hf8c457e_0
setuptools conda-forge/linux-64::setuptools-46.0.0-py36h9f0ad1d_1
sqlite conda-forge/linux-64::sqlite-3.30.1-hcee41ef_0
tk conda-forge/linux-64::tk-8.6.10-hed695b0_0
wheel conda-forge/noarch::wheel-0.34.2-py_1
xz conda-forge/linux-64::xz-5.2.4-h14c3975_1001
zlib conda-forge/linux-64::zlib-1.2.11-h516909a_1006
Proceed ([y]/n)? y
In lines 5 to 7 Conda informs the user that a newer Conda version is available and
lists the command that can be used to update Conda in line 11. It is unlikely that
Linux users will have administrator rights on the server to update Conda. Therefore,
there is no need to take action here (perhaps inform the system administrator).
Lines 23 to 44 provide a list of basic default packages that will be downloaded
followed by a list of new packages that will be installed in lines 45 to 68. Note that the
list of packages to be installed is longer than the list of packages to be downloaded.
This is because the latter includes some dependency packages.
Once the user confirms the installation in line 71 Conda starts downloading and
installing the package. The progress for each package can be followed in the terminal
window.
The download and installation of packages may take some time. Be patient.
Python - Concepts and Work Environment 96
It is advisable to check the terminal output for any error messages. The Preparation
transaction, Verifying transaction and Executing transaction (lines 91 to 93) should
all show done if completed successfully.
Finally, the commands to activate and deactivate the Conda environment are shown
in line 97 and 99, respectively (covered in Section 6.3.2).
Once the above command has been executed the Conda environment name (in this
example myenv) will appear in brackets at the beginning of the Unix command prompt
similar to the following.
(myenv)abcd1234@linux:~$
To deactivate a Conda environment use the following command. The Conda envi-
ronment name does not need to be provided to deactivate the environment.
conda deactivate
Deactivating a Conda environment using the above command will remove the Conda
environment name (in this example myenv) from the beginning of the Unix command
prompt. The Unix command prompt will appear as normal similar to the following.
abcd1234@linux1:~$
Activating a Conda environment will not change the behaviour of the Unix
command line. All Unix commands can be used as normal.
Python - Concepts and Work Environment 97
When installing Python packages Conda will now by default try to download the
packages and their dependencies from the conda-forge channel first. Setting the
default channel should be done before any packages are being installed. Setting the
default channel has to be done only once.
The basic command for installing a Python package is conda install followed by the
package name. A list of Python packages commonly used in climate computations
can be found in Table 6.2.1.1. For instance, to install the Iris package (available from
the conda-forge channel) and its dependencies execute the following command on
the Unix command line inside the activated Conda environment.
Installing a higher-level package such as Iris will also install most of the
recommended packages required for climate computing as dependencies
including Cartopy, Matplotlib, NumPy, netCDF4 and SciPy.
To list all packages currently installed in an activated Conda environment use the
following command.
conda list
Executing the above command inside the Conda environment myenv will generate
output similar to the following.
²³https://conda-forge.org
Python - Concepts and Work Environment 98
The list includes package details such as package name, version number, build
string and the channel the package was sourced from. The build string is used to
differentiate builds of packages with otherwise identical names and version numbers.
# conda environments:
#
base /opt/miniconda
myenv /ouce-home/staff/worc1870/.conda/envs/myenv
ouce /ouce-home/staff/worc1870/.conda/envs/ouce
test /ouce-home/staff/worc1870/.conda/envs/test
The first Conda environment listed is named base. This is the default environment
that was created when Conda was installed on the system. It should not be used
for installing packages or climate data analysis. Instead create additional Conda
environments.
In the above example, three more Conda environments are listed named myenv, ouce
and test. All files associated with these environments are located in the directories
indicated by their respective paths.
To permanently delete the Conda environment named test and all its associated files
the following command can be used.
a clear distinction between the two terms is not usually made. Most Python code
editors will come with some basic features such as syntax highlighting and code
formatting. In more advanced code editors additional features can often be enabled
or installed via extensions or plug-ins.
To edit Python code files saved on the server, either a text editor installed on the local
machine or one installed on the remote server can be used. If a stable connection to
the server is available and the home directory on the remote server can be mapped
on the local machine (see Section 3.3.6) then a locally installed editor can be used
(Table 6.4.1.1). Python files can be created or edited on the server using the locally
installed editor by navigating to the file via the mapped network drive.
Table 6.4.1.1: Locally installed text editors commonly used for Python coding (available for all
platforms).
If the home directory on the server cannot be mapped on the local machine then
a text editor installed on the server should be used. There are two ways in which a
server-side installed text editor may open. First, it may open a graphical user interface
(GUI) in which case the X Window Manager and X11 forwarding need to be set up
and configured correctly (see Section 3.2.3).
Second, a text editor may open inside the terminal window. Those may be referred
to as screen-based or screen-orientated editors. Editors opening inside the terminal
window are recommended when the internet connection is slow as interactions
²⁴https://atom.io
²⁵https://www.gnu.org/software/emacs
²⁶https://www.sublimetext.com
²⁷https://code.visualstudio.com
Python - Concepts and Work Environment 101
with GUIs may be slow and choppy due to the graphical information having to be
transferred between the local display and the server (see Figure 3.2.1 and Section
3.2.3). A list of common server-side installed text editors can be found in Table 6.4.1.2.
Table 6.4.1.2: Server-side installed text editors commonly used for Python coding.
For code editors that open inside the terminal window such as Emacs or
vi/vim it is advisable to take the time to go through one of the many online
tutorials in order to learn the keyboard short-cuts required to use them.
Keyboard short-cut cheat sheets are also helpful for a beginner.
Some IDEs will support multiple coding languages whereas others were developed
with a specific coding language in mind. A Python IDE provides an environment
for the development Python code that is much more comprehensive than a simple
code editor. The following list summarises some of the features commonly found in
Python IDEs.
Python IDEs are usually installed on the local machine. They can be configured to
connect to the remote server. Care should be taken during installation as some IDEs
(e.g, Spyder) will install their own Python executables during the installation process.
Pointing the IDE towards the correct Python executable (e.g., on the server inside a
Conda environment) can be challenging for a beginner.
Due to the complexity and wide range of features IDEs may be slower than simple
but feature-rich code editors. A list of some of the more popular Python IDEs is
provided in Table 6.4.2.1.
a = 7
b = 5.8
c = 'Climate science is cool!'
Multiple variables can be assigned the same value in a single line as follows.
a, b, c = 100
Variables can be assigned new values at any point in a script but the previous value
will be lost.
Python will try to guess the appropriate variable type of the variable if the variable
type has not been declared explicitly during the variable creation (see following
Section 7.1.2 for variable types).
print(type(<var>))
The print() function prints Python objects to the text stream, meaning in
most cases the terminal window.
7.1.2.1 Numbers
Numbers are a very common variable type especially in the field of climate sciences.
Python represents numbers as one of three number types: integer, float and com-
plex. In Python 3 there is no maximum size for integers but the largest possible
integer value is limited by the memory the system architecture allows. On a 64-
bit platform for instance the maximum integer value is 2^63 - 1 which is equal to
9223372036854775807.
Floats or floating point values are numbers which have decimals. Several float
number notation formats are possible. For example, 0.0, 13.5, -273.15, 300., or
54.921+e10.
7.1.2.2 Strings
Strings can be joined by using the plus symbol (+). Also, numerical values can
be converted to a string using the str(x) function wherein x is the number to be
converted. Both concepts are applied in the following example.
The above code will generate the following output. Note that the numerical value
14.9 (float) was converted to a string before being joined with the other strings.
Python - Programming Basics 109
7.1.2.3 Lists
Lists can be created using square brackets ([]). The elements in a list are separated
by commas (,). The elements in a list often are but do not need to be of the same
variable type. Lists containing sequences of numbers or names of models over which
to loop are quite common in climate computing. List elements can be referenced
using indexes. The following are some examples of indexing and manipulating lists.
a = [1, 2, 3, 4, 5]
The methods append() and insert() in the code examples above are associated with
the list variable (python object) a. Python object methods and attributes are discussed
in more detail in Section 7.x.x. The above code will generate the following output.
[1, 2, 3, 4, 5]
1
[3]
5
[1, 2, 3, 4, 5, 100]
[1, 2, 50, 3, 4, 5, 100]
Lists are mutable which means that they can be changed after they have been created
as shown in the above examples.
Python - Programming Basics 110
7.1.2.4 Dictionaries
A dictionary is in many ways very similar to a list (Section 7.1.2.3). Both store Python
objects (elements), are mutable and can have elements of different data types. The
main difference to lists is the way in which the elements are referenced. While list
elements can be referenced by indices the elements of a dictionary are associated
with keys. The keys can be used to reference the dictionary elements.
Dictionaries can be defined in different ways, but most commonly comma-separated
value pairs within curly brackets are used. In the following example a dictionary is
created that contains country capitals.
The first element of the value pair is the key (e.g., ‘Germany’). The second element
is associated value (e.g., Berlin). A value can be accessed using the associated key
in the dictionary. In the following example the value associated with the key UK is
printed.
print(capitals['UK'])
London
While in some situations a key may be easier to access a specific value then an index
dictionaries are generally less common in climate computing than lists, and as such
are not explored further in this chapter.
Python - Programming Basics 111
7.1.2.5 Tuples
Tuples are very similar to lists. They store sequences of python objects which can be
of different data types. The main difference is that tuples are immutable, meaning
once created they cannot be changed.
A tuple can be created in the same way as a list but using normal brackets (()) instead
of square ones ([]). In the following example a tuple called tup is created with three
elements of different data types.
7.1.2.6 Booleans
A variable of the data type Boolean can only have one of two values: False or True.
Note that boolean values are case-sensitive, meaning that capitalisation of the first
letter is important.
Converting an integer or float number to the boolean data type using the bool()
function will return True for all values different from 0 (including negative values)
and False for 0.
Boolean values are also frequently used with keyword arguments, which control
how a function operates. For instance, in the following code example the keyword
arguments sharex is set to True and sharey is set to False in a way to control which
plot axes are shared when multiple subplots are being created.
Conversions between the variable types integer, float and string are common
practice in Python coding. Some examples and pitfalls are discussed here.
Consider the following sequence of Python commands executed on the Python
command prompt (>>>). Note that a print() statement is not required on the Python
command line. The output of a command will be directly displayed in the terminal
window.
Python - Programming Basics 112
1 >>> a = 5.6
2 >>> type(a)
3 <class 'float'>
4 >>>
5 >>> b = '10.0'
6 >>> type(b)
7 <class 'str'>
8 >>>
9 >>> c = a+b
10 Traceback (most recent call last):
11 File "<stdin>", line 1, in <module>
12 TypeError: unsupported operand type(s) for +: 'float' and 'str'
13 >>>
14 >>> b = float(b)
15 >>> type(b)
16 <class 'float'>
17 >>>
18 >>> c = a+b
19 >>> c
20 15.6
21 >>>
22 >>> d = int(c)
23 >>> type(d)
24 <class 'int'>
25 >>> d
26 15
27 >>>
In line 1 the variable a is declared as 5.6. The type(a) command returns float as the
variable type for a in line 2 and 3 which Python has automatically assigned to a.
In line 5, the variable b is declared as '10.0' and Python correctly assigns the variable
type string (line 6 and 7) because the number 10.0 is inside single quotes.
Trying to add together a and b as attempted in line 9 fails because a float variable
can not be added to a string variable returning an error message in line 10 to 12.
To fix this problem the string variable b is converted to float in line 14 by using the
float() function. The successful conversion is confirmed in line 15 and 16.
Now that both a and b are variables of the type float they can be added together as
Python - Programming Basics 113
done in line 18, returning the correct value of 15.6 for the variable c in line 20.
The float variable c is converted to an integer variable using the int() function in
line 22 and the variable type is confirmed in line 23 and 24. Printing c returns 15.
Note that converting a variable of the type float to integer just cuts off the digits. It
does not round the float number up or down.
The conversion of a float value to integer using the int() function does not
round the float value. It just removes the digits.
Similarly, variables of the type integer and float can be converted to strings as done
in the following sequence of Python commands.
The string variable a is declared as 'The air temperature is: ' in line 1 and the the
Python - Programming Basics 114
float variable b is declared as 23.6 in line 5. As expected (see discussion above), trying
to add both variables together (line 9) fails in line 10 to 12.
However, the float variable b can be converted to a string by using the str() function.
Joining string variables a and str(b) now creates a new string saved in the variable
c in line 14 that can be printed (line 15 and 16).
Just to confirm, the variable b is still a float (line 18 and 19) as it was not overwritten
as done in line 14 of the previous command sequence.
If the variables are just displayed in the terminal window then a conversion of b from
float to string is not necessary. The variables can be displayed, for instance, using a
print statement like print(a, b) as done in line 21. Note that an additional space is
added between the two variables (line 22).
7.1.3 Functions
In general, functions are used for repetitive or common tasks. They can be reused
and often make the code layout more clear. Functions generally require some input
and return some output. There are two types of functions, built-in functions and
user-defined functions.
Built-in functions come with the Python installation. They do not need to be defined
or imported and can be used directly. A list of 60+ Python built-in functions can be
found in the Python documentation¹. Some commonly used built-in functions are
listed in Table 7.1.3.1.1. Some of them have been introduced already in the previous
sections (float(), int(), str() and print()).
¹https://docs.python.org/3.3/library/functions.html
Python - Programming Basics 115
Function Description
dir() Returns a list of object methods and attributes.
enumerate() Returns an iterable enumerate object.
float() Returns a floating point number.
int() Returns an integer number.
len() Returns the length of an object.
print() Prints objects to the terminal window.
range() Returns a sequence of values starting with 0.
str() Returns a string object.
type() Returns the type of an object.
User-defined functions are (as the name says) functions that are defined by the user.
The following example shows a function that converts temperature values given in
Fahrenheit to Celsius.
1 def f2c(t):
2 return (t-32)*5/9
3
4 print(f2c(68))
The function definition starts with def followed by the name of the function f2c (line
1). The variable t given in brackets represents the variable that will be passed to the
function when called. In line 2 (indented) return is followed by the equation which
converts the variable t from Fahrenheit to Celsius.
The function is used in line 4 within a print() statement. The function is given an
input value of 68 degree Fahrenheit. Running the code will return the following
temperature in degree Celsius.
20.0
User-defined functions can be defined anywhere in a script and are available for
use elsewhere in the code. Alternatively, frequently used functions can be saved in a
separate file. Consider the following code saved in a file named t_conversions.py.
Python - Programming Basics 116
In order to make use of the functions k2c() and k2f() defined in t_conversions.py they
can be imported in the following way in a script as long as the file t_conversions.py
is located in the same directory.
If the file containing the functions is located in a different directory from the Python
script that uses them then the path to the directory can be added to the system
path at the beginning of script as done in the following example using the sys
(system) package. In this example the file t_conversions.py is located in the directory
/home/rjones/python/functions.
Python - Programming Basics 117
import sys
sys.path.append('/home/rjones/python/functions')
A list called mylist is created in line 1 and printed to the display in line 2 and 3.
The reverse() method is applied to the mylist object. When mylist is printed again
it in line 6 and 7 the order of the list elements has been reversed. Note that no new
variable has been created. The obejct mylist has been modified by the object’s own
reverse() method.
Python - Programming Basics 118
In the same way the sort() method applied to mylist in line 9 sorts the list elements
in alphanumeric order.
Methods can be accessed using dotted notation. A dot (.) is placed between
the object and the method.
To identify what methods are attached to an object the built-in function dir() can
be used. Executing the command dir(mylist) will return the following output.
Method names that start and end with double underscores (__) are called magic
methods. They are used mainly internally and can be ignored most of the time.
The methods for specific variable types may also be found in the documentation (e.g.,
for lists²).
While methods and functions have many similarities they are not the same.
The main difference between methods and functions is that methods are
called on an object and may change the object whereas functions stand on
their own and usually return variables or objects.
7.1.5.1 for-Loops
Loops are an essential part of any programming language because they allow the user
to iterate over sequences of numbers, list elements or files by allowing blocks of code
to be run repeatedly within a given set of constraints. Loops allow batch-processing
making them a powerful tool.
The most common loop is the for-loop which will be discussed here. The general
syntax of the for-loop is as follows.
The loop will iterate over the elements of a sequence (<sequence>). In most cases the
sequence will be a Python list but it can also be a tuple, dictionary, set or string. The
variable (<var>) changes with each iteration to the next element in the sequence. Any
code that is indented in the following lines (<do something>) is inside the loop and will
be executed with each loop iteration. Unlike other coding languages the Python for-
loop does not have a closing statement. The loop ends when the indentation of code
is removed.
The following are some common for-loop examples. In this example the built-in
function range() is used to create a sequence of numbers from 0 to 4. The variable
i changes with each iteration. The variable that changes with each iteration can be
given any chosen name (not a number or boolean) but i is very common for an index.
for i in range(5):
print(i)
0
1
2
3
4
import numpy as np
for i in np.arange(3, 8, 2):
print(i)
3
5
7
In the next example the loop iterates over a list of model names.
Processing: CCSM4
Processing: HadCM3
Processing: Miroc4
The same loop as above may be executed differently by iterating through a sequence
of numbers and using them as an index as done in the following example. The
variable modellist is a list of model names. The built-in len() function is used to
return the number of elements in the list into the variable n (n=3). The variable n is
passed to the np.arange() function which generates a sequence of numbers from 0 to
2 which is used as an index to refer to element position in modellist inside the loop.
import numpy as np
modellist = ['CCSM4', 'HadCM3', 'Miroc4']
n = len(modellist)
for i in np.arange(n):
print('Processing:', modellist[i])
Executing the above code will generate the same output as the previous loop.
Python - Programming Basics 121
Processing: CCSM4
Processing: HadCM3
Processing: Miroc4
Often it is useful to have both a variable such as a model name and the associated
index available inside the loop. The built-in function enumerate() comes in handy. In
the following example the list of model names (modellist) is passted to the enumerate()
function. The for loop now has two variables that change with each iteration. i is the
index and m is the name of the associated model.
Executing the above code will give the following output. The index and the associated
model name are printed with each loop iteration.
0 CCSM4
1 HadCM3
2 Miroc4
Nested for-loops are two or more loops inside one another. For instance, if one wants
to loop through each grid box of a two-dimensional data field to perform an operation
on each grid box then a nested for-loop can be used. The nested loop has to be
intended.
The following code, snipped and copied from Code 7.7.3.1 (line 27 to 29), shows
an example of how to loop through each grid box of a global field where lats
holds the latitude values and lons holds the longitude values. The line r, p =
stats.pearsonr(sst, pr[:,y,x]) then performs some processing on each grid box of
the three-dimensional pr field (the last two dimensions are latitude and longitude).
for y in range(len(lats)):
for x in range(len(lons)):
r, p = stats.pearsonr(sst, pr[:,y,x])
The if -statement can be found in almost every coding language. It is useful when a
command or code block should only be executed when a certain condition is met.
The if-statement has the following general syntax. Note the indentation of the code
block to be executed if the condition returns True. Same as with the for-loop, the
if-statement does not need to be closed at the end.
if (condition_returns_True):
(do something)
If the condition returns False then an alternative command of code block can be
offered as follows.
if (conditional_returns_True):
(do_something)
else
(do_something_else)
Operator Description
== equal to
!= not equal to
> greater than
< less than
>= greater or equal
<= less or equal to
1 a = 3
2
3 if a == 3:
4 print(a, 'is equal to 3.')
5
6 if a > 5:
7 print(a, 'is greater than 5.')
8 else:
9 print(a, 'is less or equal to 5.')
3 is equal to 3.
3 is less or equal to 2.
The variable a is set to 3 in line 1. In the first if-statement in line 3 the comparison
operator == is used to test if the variable a actually is equal to 3. As the test returns
True the indented print() command in line 4 is executed resulting in the first output
of 3 is equal to 3..
In line 6 the > comparison operator is used to test if a is greater than 5. This test returns
False the print statement in line 7 is not executed. Instead, the print statement in line
9 is executed resulting in the second output 3 is less or equal to 2..
In addition to single test conditions also multiple test conditions can be applied. For
those Python logical operators can be used (and, or and not) as listed in Table 7.1.5.2.1.
Note that when a condition is defined, the python object it creates is a boolean (see
Section 7.1.2.6, which the if statement then evaluates.
1 import numpy as np
2
3 a = np.arange(8)
4 print(a)
5
6 print(a[10])
[0 1 2 3 4 5 6 7]
Traceback (most recent call last):
File "7_python_error.py", line 6, in <module>
print(a[10])
IndexError: index 10 is out of bounds for axis 0 with size 8
The first line of the output is a sequence of numbers from 0 to 7. It comes from the
print statement in line 4 of the code where the variable a holding a one-dimensional
NumPy array is printed.
Python - Programming Basics 125
After that print statement something went wrong when running the code and Python
returns a Traceback with some details. It tells the user that in line 6 of the code in File
"7_python_error.py" something went wrong. It even prints out line 6 in the following
line for reference (print(a[10])).
The last line usually presents some information about what exactly went wrong. In
this case there is an IndexError and the index 10 is out of bounds for axis 0 with
size 8. The variable a is a NumPy array with 8 elements as defined in line 3 of the
code. Trying to print the 10ʰ element of that array in line 6 causes the error.
When analysing error messages, first examine the first line and identify
the line in the code where the (first) error occurred. Second, examine the
last line of the error message which provides an indication as to what went
wrong. There may be a lots of error messages between the first and last line
caused by subsequent failures in dependent functions which can be ignored
most of the time.
Frequent print() statements in the code can help to identify problems associated with
variables. These could include the following for a variable named var.
While developing code it is also sometimes useful to stop the code at a given point.
This can be done using the Python native exit() function.
output from different climate models (looping over models) or files organised by a
certain time criteria such as years, months, days and hours.
The filenames can be created manually or programmatically and some solutions are
discussed in the following sub-sections.
If the number of files considered is small then the filenames may be constructed
manually as demonstrated in the following example.
In line 2 the variable datadir is defined which holds the full path to the root of where
the data are stored. In line 3 a list named mlist is created which holds a number of
model names.
The loop initiated in line 6 loops over each element of mlist. With each loop iteration
the variable f holding the model name as well as the counter c will change. The
enumerate() built-in function is used here in order to have access to both the element
of the list (f) and its associated index (c).
In line 8 the full path to the file is constructed by joining different strings together.
Note that the model name stored in the variable f appears twice in the path. First, as
a directory name and second in the filename (see output below).
To check that the full path to the file was constructed correctly the counter c and the
full path saved in ifile are printed in line 11. The output from the code above is as
follows.
Python - Programming Basics 127
0 /home/data/model/cmip5/CanESM2/rcp85/mon/Amon/r1i1p1/tas_CanESM2_rcp85.nc
1 /home/data/model/cmip5/CCSM4/rcp85/mon/Amon/r1i1p1/tas_CCSM4_rcp85.nc
2 /home/data/model/cmip5/HadCM3/rcp85/mon/Amon/r1i1p1/tas_HadCM3_rcp85.nc
3 /home/data/model/cmip5/inmcm4/rcp85/mon/Amon/r1i1p1/tas_inmcm4_rcp85.nc
4 /home/data/model/cmip5/Miroc4/rcp85/mon/Amon/r1i1p1/tas_Miroc4_rcp85.nc
The Unix find command is an extremely versatile tool that allows the user to
modulate search patterns in all kind of ways (see Section 3.6.7).
The following example illustrates how the Unix find command can be used to create
a sorted list of input filenames which is subsequently used in a loop.
1 import subprocess
2
3 # create sorted list of input files
4 cmd = 'find ../data -iname "*.nc"'
5 process = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
6 output = process.communicate()[0]
7 flist = output.split()
8 flist.sort()
9
10 # loop over input files
11 for counter, f in enumerate(flist):
12 print(counter, f.decode())
In line 1 the subprocess module is imported. The complete find command is saved in
a variable named cmd in line 4. In this example, files ending with .nc are searched for.
It is recommended to test the complete find command on the Unix command line
before running the Python code to make sure the command works as expected.
In line 5 the find command is executed using the subprocess.Popen() function as
described in Section 7.x.x.
The process.communicate() function is used in line 6 to capture the output from the
command and save it in a variable named output.
The output variable contains the filenames as a single string which is why the
variable-internal method output.split() is used to create a list named flist where
each list element corresponds to a single filename.
Python - Programming Basics 128
The find command returns an unsorted list by default. The list is sorted in line 8
using the variable-internal method flist.sort()
The loop in line 11 iterates over each element of the sorted list (flist) allowing
the processing of each file using, for instance, a CDO command as demonstrated
in Section 7.x.x.
In the above example the index counter and the corresponding list element are printed
inside the loop (line 12). Note that the list elements returned are byte strings and they
are converted into normal strings using the decode() method.
The output from the above code may look similar to the following.
0 ../data/ERAI_sh_1997_P.nc
1 ../data/ERAI_sh_1997_potT.nc
2 ../data/ERAI_sh_1997_potVort.nc
3 ../data/ERAI_sh_1997_sigma.nc
4 ../data/HadISST_sst.nc
5 ../data/HadISST_sst_Nov1997.nc
6 ../data/HadISST_sst_Nov1997_anom.nc
7 ../data/HadISST_sst_Nov_ltm.nc
8 ../data/InSalah.SYNOP.wspd10m.nc
9 ../data/InSalah_wpsd10m_seasonal_cycle.nc
10 ../data/SYNOP_InSalah_wpsd10m_9utc_jul_1985_2019.nc
11 ../data/Sahel_JAS_pre.nc
12 ../data/Sahel_JAS_pre_anom.nc
13 ../data/cru_ts4.02.1979.2015.tmp.dat.nc
14 ../data/era5_u_3d_bodele_2018_12.nc
15 ../data/era5_v_3d_bodele_2018_12.nc
16 ../data/era5_z_bodele_20050301_1200.nc
17 ../data/foo.nc
18 ../data/test.nc
19 ../data/tmp_ltm.nc
20 ../data/tmp_timeseries.nc
The Python ‘glob’³ module aims at imitating the Unix find command but
does not match all its functionality (for example, search files by file size).
³https://docs.python.org/2/library/glob.html
Python - Programming Basics 129
Consider the following list of TRMM (version 3B42) precipitation data files covering
the period 1 January 2005 to 31 December 2005 with one file every 3 hours. The
directory structure is YYYY/MM/DD/ with each directory containing data for one day (8
files) totalling to 2920 files.
2005/01/01/2005010100.trmm.3b42.nc
2005/01/01/2005010103.trmm.3b42.nc
2005/01/01/2005010106.trmm.3b42.nc
2005/01/01/2005010109.trmm.3b42.nc
2005/01/01/2005010112.trmm.3b42.nc
2005/01/01/2005010115.trmm.3b42.nc
2005/01/01/2005010118.trmm.3b42.nc
2005/01/01/2005010121.trmm.3b42.nc
...
2005/12/31/2005123100.trmm.3b42.nc
2005/12/31/2005123103.trmm.3b42.nc
2005/12/31/2005123106.trmm.3b42.nc
2005/12/31/2005123109.trmm.3b42.nc
2005/12/31/2005123112.trmm.3b42.nc
2005/12/31/2005123115.trmm.3b42.nc
2005/12/31/2005123118.trmm.3b42.nc
2005/12/31/2005123121.trmm.3b42.nc
The following things have to be considered when writing a loop that iterates over
the files. The number of days is different for each month (28, 30 or 31). With each
loop iteration the month, day and hour part of the path and filename and the hour
changes in the filename. The month, day and hour details also have to be in the
correct two-character format (MM, DD and hh).
One solution is to create a list of date objects covering the whole period in 3-hourly
timesteps, then iterate over this list and extract the month, day and hour information
in the correct format. The following code example does exactly that.
Python - Programming Basics 130
In the first line the datetime and timedelta functions are imported from the datetime
package.
A function called daterange is defined in line 4 to 7 which takes the two arguments
date_start and date_end. Both arguments have to be datetime objects. While date_-
start is less or equal date_end (line 5) the start_date variable is defined anew with
each iteration by adding 3 hours to it using the timedelta() function (line7).
The daterange function is now being used in a for-loop in line 13 using the date_start
and date_end defined in line 10 and 11, respectively. With each iteration of the loop
the variable dt change to the next date. The month, day and hour information is
extracted from the date object dt in the correct two-character string format using
the strftime() method in line 15, 16 and 17, respectively.
The path and file name is constructed in line 20 and printed in line 23.
Python - Programming Basics 131
To read data from a netCDF file the Dataset function from the netCDF4 module can be
used. Line 1 in the code example below imports the Dataset function from the netCDF4
module. In line 3 the netCDF file erai_t2m.nc is opened in read-only mode (r) using
the Dataset function creating a file handle f. In lines 4 to 6 the variables longitude,
latitude and t2m are read in, respectively. Check the netCDF input file for the correct
variable names using tools such as ncdump or CDO. In line 7 the units attribute of the
variable t2m is read in by adding the data variable’s attribute units at the end of the
line (see netCDF file headers for variable attributes). In general, it is good practice to
close the file once all variables and units of interest have been read in (line 7).
The netCDF4 module is backward compatible with netCDF3 and can also be
used with netCDF files using the HDF5 library.
After the netCDF file is closed the variables lons, lats, t2m and t2mu can be used in
the remaining part of the script. While the variables lons, lats and t2m are NumPy
arrays the variable t2mu is of the type string.
Sometimes climate data are made available as formatted ASCII files (see Section
2.5.1). The data values tend to be organised in rows and columns sometimes including
a few lines in the beginning of the file known as file headers. If the values in each
row are separated by commas then they are called comma-separated values (CSV
files) and the standard file extension .csv should have been used (this is not always
done). Other separators are also possible including tabs or white spaces.
The following is an example of a CSV file listing date, time, wind speed and wind
direction information for every hour of the year 2011. The file includes two lines at
the beginning (the file header) providing the station ID and the column headers.
Station ID 65340
date [YYYY/MM/DD], time [hours], wind speed [m/s], wind direction [sector]
2011/01/01, 0, 1.5, N
2011/01/01, 1, 1.8, NE
2011/01/01, 2, 2.1, N
2011/01/01, 3, 2.6, N
2011/01/01, 4, 3.7, NW
2011/01/01, 5, 5.2, W
...
2011/12/31,22, 0.2, W
2011/12/31,23, 0.5, W
2011/12/31,24, 0.3, W
The following Python code reads in the CSV file assuming that the data are saved in
a file named wspd_2011.csv. The numpy module is imported in line 1 and given the alias
np. Line 3 assigns the input file name to the variable ifile. In lines 4 to 7 the actual
data values are read into the variables d, t, wspd and wdir, respectively. The loadtxt
function from the np module requires some arguments (inside brackets) that tell the
function how to read in the data. These arguments are the input filename (ifile)
followed by the data type (dtype), the delimiter (delimiter), the number of rows to
skip in the beginning of the file (skiprows) and the column to read in (usecols).
Python - Programming Basics 133
1 import numpy as np
2
3 ifile = 'long/path/to/file/wspd_2011.csv'
4 d = np.loadtxt(ifile, dtype=str, delimiter=',', skiprows=2, usecols=(0,))
5 t = np.loadtxt(ifile, dtype=int, delimiter=',', skiprows=2, usecols=(1,))
6 wspd = np.loadtxt(ifile, dtype=float, delimiter=',', skiprows=2, usecols=(2,))
7 wdir = np.loadtxt(ifile, dtype=str, delimiter=',', skiprows=2, usecols=(3,))
The data are now available for the remaining part of the code as NumPy arrays d, t,
wspd and wdir.
Figure 7.2.3.3.1: The PIBAL data entry spreadsheet. Cells with a light green background colour can
be edited, while other cells are calculated automatically.
Reading data from an Excel spreadsheet into Python can be done using the openpyxl⁴
module. In the following code example the method for reading in data from the above
Excel spreadsheet is demonstrated.
⁴https://openpyxl.readthedocs.io/en/stable/
Python - Programming Basics 135
1 import numpy as np
2 from openpyxl import load_workbook
3
4 # open Excel file and iterate through sheets
5 wb = load_workbook('../data/pibal_data.xlsx', data_only=True)
6 ws = wb['P01']
7
8 # read in date, time and location
9 d = ws.cell(row=3, column=2).value
10 t = ws.cell(row=4, column=2).value
11 loc = ws.cell(row=5, column=2).value
12
13 # create empty numpy array variables
14 alt = np.array([], dtype='float64')
15 wspd = np.array([], dtype='float64')
16 wdir = np.array([], dtype='float64')
17
18 # iterate over rows 8 to 39; read altitude, wind speed and wind direction
19 for row in range(8, 40):
20 alt = np.append(alt, np.float64(ws.cell(row=row, column=2).value))
21 wspd = np.append(wspd, np.float64(ws.cell(row=row, column=8).value))
22 wdir = np.append(wdir, np.float64(ws.cell(row=row, column=12).value))
In line 1 numpy is imported and in line 2 the load_workbook function is imported from
the openpyxl module. The function is used in line 5 to open the Excel spreadsheet
pibal_data.xlsx and create the handle wb. Setting data_only to True ensures that the
actual data value is read in and not the underlying formula which is the default.
An Excel file can have several worksheets. In line 6 a handle ws is created to the
worksheet named P01.
In lines 9 to 11 the values of three specific cells that hold date, time and location
information are read in. The ws.cell() function expects the row and column numbers
associated with the cell of interest to be specified (compare the specified row and
column numbers with Figure 7.2.3.3.1 for clarity). Note that the variables d, t and loc
are of the Python native variable type string.
Unfortunately, the ws.cell() function does not allow a range of cells to be specified
and read in. In order to read in the wind speed and wind direction values the
following approach may be applied. In lines 14 to 16 the empty NumPy variables
Python - Programming Basics 136
alt for altitude, wspd for wind speed and wdir for wind direction are declared. They
are of the NumPy data type float64.
In line 19 a for-loop is initiated which loops over rows 8 to 39 in the Excel spreadsheet
(Figure 7.2.3.3.1). With each iteration of the loop the altitude, wind speed and wind
direction values are read in and appended to the variables alt, wspd and wdir using
the np.append() function (lines 20 to 22).
As the ws.cell() function returns the cell values in variables of the data type string
they are converted to NumPy variables on the fly using the np.float64() function
inside the np.append() function.
The NumPy variables alt, wspd and wdir now hold the data from the Excel spreadsheet
and are available for further analysis or plotting. Code examples for plotting the data
from the above spreadsheet example can be found in Section 7.5.1 and Section 7.5.2.
1 import subprocess
2
3 cmd = 'cdo -b F64 vertsum ../data/ERAI_sh_1997_P.nc ../data/test.nc'
4 process = subprocess.Popen([cmd], shell=True, stdout=subprocess.PIPE)
5 process.communicate()
6
7 # print return code (0 = success)
8 print('return code:', process.returncode)
In line 1 the subprocess modules is imported. Line 3 stores the complete CDO
command to be executed in a variable named cmd.
In line 4, the subprocess.Popen() function is used to execute the command. A handle
named ‘process’ is created. The first argument passed to the subprocess.Popen()
function is the command to be executed (cmd). The shell keyword is set to True
meaning that the command is passed on to the shell (see Section 3.3.2) as is (check
security considerations in the documentation). To capture the command output the
stdout keyword is set to subprocess.PIPE.
import numpy as np
In the following subsections, the main features of NumPy are very briefly introduced
including how to create number arrays and how to index them. There are many much
more comprehensive introductions available as video tutorials, webpages and books
and it is worth spending some time exploring them.
Table 7.3.1.1: Examples for functions frequently used to create NumPy arrays.
Function Description
np.array([1, 5, 87, 3]) Returns one-dimensional array with set values.
np.arange(5) Returns a sequence of numbers from 0 to 4.
np.zeros(5) Returns a 5-element array with 0 values.
np.empty([3, 2]) Returns a 3 by 2-element array with no values.
np.full((2, 2), 5) Returns a 2 by 2-element array filled value 5.
Most of the time, however, this is not necessary as many Python packages have
integrated NumPy and generate NumPy arrays as output.