Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

PythonInEarthScience

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

PythonInEarthScience

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

P YTHON IN E ARTH S CIENCE

A B RIEF I NTRODUCTION

by

Sujan Koirala and Jake Nelson

V ersion 1.0

February, 2017.

Department of Biogeochemical Integration,


Max Planck Institute for Biogeochemistry
Jena, Germany
F OREWORD

This document is a summary of our experiences in learning to use Python over last
several years. It is not intended to be a standalone document that will help the user to
solve every problem. What we hope is to encourage new users to delve into a wonderful
programming language.

Sujan Koirala and Jake Nelson


skoirala@bgc-jena.mpg.de
jnelson@bgc-jena.mpg.de
Jena, Germany
February, 2017

i
C ONTENTS

1 Installation and Package Management 1


1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Python, a brief history . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Python 2, and Python 3 . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Environments and packages . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Using other people’s code . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Which package manager to use? . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Versions, packages, environments, why so complicated? . . . . . 4
1.3 Installing Anaconda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Windows installation notes . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 OSX installation notes . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Linux installation notes . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Creating your first environment . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Installing a package. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Python Data Types 8


2.1 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Boolean Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Bytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Combined Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ii
C ONTENTS iii

3 Input/Output of files 30
3.1 Read Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Plain Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.2 Comma Separated Text . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 Unstructured Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Save Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Read Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Write Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Read NetCDF Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Write NetCDF Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Read MatLab Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 Read Excel Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Data Operations in Python 36


4.1 Size and Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Slicing and Dicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Built-in Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 String Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Other Useful Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Essential Python Scripting 51


5.1 Control Flow Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.1 if Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.2 for Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.3 while Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1.4 break and continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1.5 range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Python Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Python Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4 Python Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Additional Relevant Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5.1 sys Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
C ONTENTS iv

5.5.2 os Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5.3 Errors and Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Advanced Statistics and Machine Learning 61


6.1 Quick overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.1.1 required packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.1.2 Overview of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Import and prepare the data . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 Setting up the gapfillers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 Actually gapfilling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.5 And now the plots! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5.1 scatter plots! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.5.2 Distributions with KDE . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.6 Bonus points! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Data Visualization and Plotting 71


7.1 Plotting a simple figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Multiple plots in a figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.3 Plot with Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.4 Scatter Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.5 Playing with the Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.6 Map Map Map! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.6.1 Global Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.6.2 Customizing a Colorbar . . . . . . . . . . . . . . . . . . . . . . . . . 79
L ISTINGS

v
1
I NSTALLATION OF P YTHON AND
PACKAGE M ANAGEMENT

T his chapter provides information on the installation of core Python and


addition of packages.

1
1.1. Introduction 2

1.1. I NTRODUCTION
If you are currently using a recent Mac or Linux operating system, open a terminal and
type,
:∼ $ python
and you should see something like,
Python 2.7.12
Type "help", "copyright", "credits" or "license" for more information.
>>>
You have just entered the native installation of Python on your computer, no extra
steps needed. This is because, though it is a great tool for earth science and data
analytics, Python is a general purpose language that is used by all sorts of programs
and utilities. While is it nice the Python is a very open and widely used tool, one should
also take care that this native installation is not modified to the point that the other
useful and essential utilities that depend on it are disrupted. For instance, a package or
command may no longer be installed where it originally was by the operating system.
For this reason, this chapter will outline how to install a modern version of Python, as
well as many packages useful for data science, in a tidy environment all it’s own.

1.1.1. P YTHON , A BRIEF HISTORY


As the story goes, in 1989 Guido van Rossum decided he needed something to do over
the Christmas holidays, and instead of reading a nice book or learning to brew his own
beer, he decided to develop the scripting language with the name of Python, named
after Monty Python’s Flying Circus. Since then, Python has come to be know by several
core principles, most notable of which are the focus on readability and requiring fewer
lines of code. Because of this, lines of code can almost be read as a sentence in plain
English. For example, if I would like to add one to every number in my list, I would,
>>> [number+1 for number in MyList]
Though this may look daunting if you are new to coding, if you read it out loud you
can almost hear what it does. And along this line, the Python philosophy tends not to
be that there are many clever ways to do one thing, but one very clear way. Because of
these ideologies Python can be a very useful and rewarding coding language to learn,
1.2. Environments and packages 3

which is reflected in it’s popularity.

1.1.2. P YTHON 2, AND P YTHON 3


As you start in Python, you will quickly find yourself wondering why there are two
different versions being used. Python 2 was released in 2000 as the first major update,
and many programs have been written using this flavor. However, in 2010 with the
release of version 2.7, it was announce that Python 2 would be phased out in favor of
the new Python 3, so there is no plan for a version 2.8. This major update from 2 to
3 was made to change some small yet significant things to the language, such as how
it handles text data and iterates through lists and dictionaries. The idea is that it is
better to update a language to fix things, than always dealing with small bugs because
of refusal to change. As Python 2 is scheduled to be retired in the next few years, this
manual will focus on using Python 3. This does mean that our Python 3 code may
not work with our native Python 2 installation, but in the realm of data science, as
you will be using so many specialized packages of code, this would be the case anyway.
In the end, you will be using a self contained Python environment that contains our
Python installation, as well as all the code you will be using, in one neat little box.

1.2. E NVIRONMENTS AND PACKAGES


1.2.1. U SING OTHER PEOPLE ’ S CODE
As Python is a general purpose language, the basic functionality out of the box is also
very general: things such as basic math, file manipulation, and printing output. So
if you want to do anything beyond what is defined in the core language, you need to
write our own little bit of code to do it. However, as you are taking a Python course,
you can assume that the first time you need a bit of code that the core Python doesn’t
have built in, something like calculating the standard deviation of a set of numbers,
someone else will have probably run into the same issue before you. Luckily, the Python
community is very active in writing these bits of code and sharing them so that you
don’t have to write every function from scratch. Not only that, many of these little bits
of code have been bundled into large collections of code called packages. For example,
the mean, median, standard deviation, percentile, and other statistical functions are
1.2. Environments and packages 4

already built into a package called NumPy (Numerical Py thon) which gives you access
to a whole bunch of bits of code. Not only that, there entire package managers that
will take care of downloading and installing the package, as well as making sure it
plays nice with all the other packages you are using, all you have to do is tell it which
package!

1.2.2. W HICH PACKAGE MANAGER TO USE ?


Probably the most common package manager is called pip. pip is a wonderfully useful
tool that is widely supported, which you will not use. Instead you will use Anaconda
for the following reasons:

• Anaconda is designed for data science.

• Anaconda will handle not only the Python packages, but non-Python thing such
as HDF5 (which allows us to read some data files) and the Math Kernel Library.
It will even manage an R installation.

• Anaconda also manages environments, which:


Keep our Python installations working together.

Keep separate collections of packages in case some don’t work well together.

Are duplicatable and exportable, so our work can be replicated.

1.2.3. V ERSIONS , PACKAGES , ENVIRONMENTS , WHY SO COMPLICATED ?


Though this all may seem a bit complicated to just make a plot or do some math, it
becomes necessary because of two main issues: the computer needs to know where to
look for things, and what to call them.
Just like when you go back to look at the wonderful photos you took on vacation
3 years ago only to find a giant mess of folders and sub-folders to go through, your
computer also has to look through all it’s memory to find where a bit of code might be
located. When properly managed, all the files are put in the appropriate place, where
the computer can easily find them. Similarly, if I have a file in the folder Photos/
called MyBestPicture.jpg, and I have a different file in the folder Photos2/ called
MyBestPicture.jpg, when I tell my computer I want MyBestPicture.jpg, it has no idea
1.3. Installing Anaconda 5

which one you mean. In this way, by using these tools, you keep everything nice and
tidy.

1.3. I NSTALLING A NACONDA


Anaconda is a commercially maintained package manager designed for data science.
As such they have made it quite easy to install on Windows, Mac, and Linux. Simply go
to https://www.continuum.io/downloads, find your operating system, and download
the appropriate Python 3.6 version installer for your operating system. Again, you
want to use version 3.6, but if you end up mixing up versions or already have another
version installed don’t panic, you can create a Python 3.6 environment later.

1.3.1. W INDOWS INSTALLATION NOTES


Installation on Microsoft Windows is fairly straight forward, but can take quite some
time. Simply follow the graphical installer, with the only thing to change is to uncheck
the option to register Anaconda as the default Python installation. Though this is not
as vital as with Unix based systems, it is still a good idea. After the long installation
prompt, you can access an Anaconda command line via Anaconda Prompt in the Start
Menu.

1.3.2. OSX INSTALLATION NOTES


Installation on OSX should be quite straight forward, simply follow the installation
guide of the graphical installer.

1.3.3. L INUX INSTALLATION NOTES


Once the file has been downloaded, open a terminal and navigate to where the file was
saved. The file installer is a bash script, which can be run by entereing
:∼ $ bash Anaconda3-FILE-NAME.sh
where Anaconda3-FILE-NAME.sh is the name of your file. The package will ask
you to review the licence information and agree. You will then be asked if you would
like to install Anaconda in another location, and you can simply install into the default
location. The installer will then proceed to install Anaconda on your machine. Once
1.4. Creating your first environment 6

the installation is complete, the installer will ask "You may wish to edit your .bashrc
or prepend the Anaconda3 install location:", followed by a suggested command that
looks something like,
export PATH=/YOUR/PATH/TO/anaconda3/bin:$PATH
In order to make Anaconda work, you need to add the file path to Anaconda to a
variable the operating system uses called $PATH. To do this, you can add a modified
version of this line to a file called .bashrc in your home folder. Simply go to your home
folder and open the file .bashrc with a text editor, and at the end of the file add the
line,
export PATH=$PATH:/YOUR/PATH/TO/anaconda3/bin
where the /YOUR/PATH/TO/anaconda3/bin is the same one that Anaconda sug-
gested at the end of installation. If you forgot it, it should be something like
/home/YOURNAME/anaconda3/bin
You may notice that you switched our path and the $PATH around. This is because
you want to add our Anaconda location to end of $PATH, meaning that the operating
system looks in this folder last instead of first. The insures that you don’t cause any
problems with the native Python installation.

1.4. C REATING YOUR FIRST ENVIRONMENT


First, you will verify that our anaconda installation is working. To do so, open a
command line and simply type,
:∼ $ conda
You should see a nice overview of how to use the conda command. If this is not
the case, either the installation didn’t work, or you might have a problem with your
PATH (where the computer looks for commands). But, if it worked, you can move on
to creating our first environment. you will name the environment CoursePy and you
will initially only require the numpy package. In the same command line, input:
:∼ $ conda create - -name CoursePy numpy
You will be asked if you would like to proceed in installing a bunch of new packages,
way more than numpy, and you can say yes. The reason so many new packages were
listed is the magic of a package manager. The basic Python 3 with the numpy package
actually depends on all these underlying dependencies, which Anaconda kindly figures
1.4. Creating your first environment 7

out for you. So now you have our nice new environment, and you can activate it by
entering
:∼ $ source activate CoursePy
on Mac or Linux and
:∼ $ activate CoursePy
on Windows.
You command line should now tell you that you are now in the CoursePy environ-
ment. If you now open a Python console by typing python in the command line, our
version should now be 3.6.0. In this same manner, you can do things like duplicate and
export our environments, or make new environments with different packages or even
different Python versions.

1.4.1. I NSTALLING A PACKAGE


Now that you are in our nice new environment, you can add any package you might
need. Open an command line and enter the CoursePy environment. Now to install the
Pandas package, you simply enter,
:∼ $ conda install spyder
Anaconda will list all the package changes it will make, and ask if you would like to
proceed. Confirm yes, then let the magic happen. Now you have the Spyder IDE, which
you can use to develop code (similar concept to R Commander or the MATLAB IDE).
Anaconda has some nice documentation about how to use their software, including
how to search for packages not in their repositories, which we will not cover here. Now
that you have our installation and environment all sorted out, you can start to explore
Python itself a bit in the next chapters.
2
P YTHON D ATA T YPES

T his chapter provides information on the basic data types in Python. It also
introduces the basic operations used to access and manipulate the data

8
2.1. Basic Data Types 9

In python, there are various types of data. Every data has a type and a value.
Every value has a fixed data type but it should not specified beforehand. The most
basic data types in python are:

1. Boolean: These are data which have only two values: True or False.

2. Numbers: These are numeric data.

3. Strings: These data are sequences of unicode characters.

4. Bytes: An immutable sequence of numbers.

Furthermore, these data types can be combined and following types of datasets can
be produced:

1. Lists: Ordered sequences of values.

2. Tuples: Ordered but immutable, i.e. cannot be modified, sequences of values.

3. Sets: Unordered bags of values.

4. Dictionaries: Unordered bag of key-value pairs.

5. Arrays: Ordered sequences of data of same type mentioned above.

2.1. B ASIC D ATA T YPES


In this section, a brief description of basic data types, their possible values, and various
operations that can be applied to them are described.

2.1.1. B OOLEAN D ATA


These data are either True or False. If an expression can produce either yes or no
answers, booleans can be used to interpret the result. This kind of yes/no situations
are known as boolean context. Here is a simple example.

• Assign some variable (size) as 1.


In [1]: 1 size = 1
2.1. Basic Data Types 10

• Check if size is less than 0.


In [2]: 1 size < 0

Out[2]: 1 False

XIt is false as 1 > 0.

• Check if size is greater than 0.


In [3]: 1 size > 0

Out[3]: 1 True

XIt is true as 1 > 0.

True or False can also be treated as numbers: True=1 and False=0.

2.1.2. N UMBERS
Python supports both integers and floating point numbers. There’s no type declaration
to distinguish them and Python automatically distinguishes them apart by the presence
or absence of a decimal point.

• You can use type() function to check the type of any value or variable.
In [4]: 1 type (1)

Out[4]: 1 int

XAs expected, 1 is an int.


In [5]: 1 type (1.)

Out[5]: 1 float

XThe decimal at the end make 1. a float.


In [6]: 1 1+1

Out[6]: 1 2

XAdding an int to an int yields an int.


2.1. Basic Data Types 11

In [7]: 1 1+1.

Out[7]: 1 2.0

XAdding an int to a float yields a float. Python coerces the int into a float
to perform the addition, then returns a float as the result.

• Integer can be converted to float using float() and float can be converted to
integer using int()
In [8]: 1 float (2)

Out[8]: 1 2.0

In [9]: 1 int (2.6)

Out[9]: 1 2

XPython truncates the float to integer, 2.6 becomes 2 instead of 3. To


round the float number use
In [10]: 1 round (2.6)

Out[10]: 1 3.0

N UMERICAL O PERATIONS
• The / operator performs division.
In [11]: 1 1/2

Out[11]: 1 0

In [12]: 1 1/2.

Out[12]: 1 0.5

XBe careful on float or integer data type as the result can be different as
shown above.
2.1. Basic Data Types 12

• The // operator performs a division combined with truncating and rounding.


When the result is positive, it truncates the result but when the result is negative,
it rounds off the result to nearest integer but the result is always a float.
In [13]: 1 1.//2

Out[13]: 1 0.0

In [14]: 1 -1.//2

Out[14]: 1 -1.0

• The ‘**’ operator means “raised to the power of”. 112 is 121.
In [15]: 1 11**2

Out[15]: 1 121

In [16]: 1 11**2.

Out[16]: 1 121.0

XBe careful on float or integer data type as the result can be different as
shown above.

• The ‘%’ operator gives the remainder after performing integer division.
In [17]: 1 11%2

Out[17]: 1 1

X11 divided by 2 is 5 with a remainder of 1, so the result here is 1.

F RACTIONS
To start using fractions, import the fractions module. To define a fraction, create a
Fraction object as
In [18]: 1 import fractions
2 fractions . Fraction (1 ,2)

Out[18]: 1 Fraction (1 , 2)
2.1. Basic Data Types 13

You can perform all the usual mathematical operations with fractions as
In [19]: 1 fractions . Fraction (1 , 2) *2

Out[19]: 1 Fraction (1 , 1)

T RIGONOMETRY
You can also do basic trigonometry in Python.
In [20]: 1 import math
2 math . pi

Out[20]: 1 3.1415926535897931

In [21]: 1 math . sin ( math . pi / 2)

Out[21]: 1 1.0

2.1.3. S TRINGS
In Python, all strings are sequences of Unicode characters. It is an immutable sequence
and cannot be modified.

• To create a string, enclose it in quotes. Python strings can be defined with either
single quotes (' ') or double quotes ('' '').
In [22]: 1 s = ' sujan '

In [23]: 1 s = " sujan "

• The built-in len() function returns the length of the string, i.e. the number of
characters.
In [24]: 1 len ( s )

Out[24]: 1 5

• You can get individual characters out of a string using index notation.
In [25]: 1 s [1]

Out[25]: 1 u
2.2. Combined Data Types 14

• You can concatenate strings using the + operator.


In [26]: 1 s + + ' koirala '

Out[26]: 1 sujan koirala

XEven space has to be specified as an empty string.

2.1.4. B YTES
An immutable sequence of numbers between 0 and 255 is called a bytes object. Each
byte within the bytes object can be an ascii character or an encoded hexadecimal
number from \x00 to \xff (0–255).

• To define a bytes object, use the b' 'syntax. This is commonly known as “byte
literal” syntax.
In [27]: 1 by = b ' abcd \ x65 '
2 by

Out[27]: 1 ' abcde '

X\x65 is 'e'.

• Just like strings, you can use len() function and use the + operator to concatenate
bytes objects. But you cannot join strings and bytes.
In [28]: 1 len ( by )

Out[28]: 1 5

In [29]: 1 by += b '\ x66 '


2 by

Out[29]: 1 ' abcdef '

2.2. C OMBINED D ATA T YPES


The basic data types explained in the previous section can be arranged in sequences
to create combined data types. These combined data types can be modified, for e.g.,
lists or are immutable which cannot be modified, for e.g., tuples. This section provides
brief description of these data and the common operations that can be used.
2.2. Combined Data Types 15

2.2.1. L ISTS
Lists are the sequence of data stored in an arranged form. It can hold different types
of data (strings, numbers etc.) and it can be modified to add new data or remove old
data.

C REATING A L IST
To create a list: use square brackets “[ ]” to wrap a comma-separated list of values of
any data types.
In [30]: 1 a_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2]
2 a_list

Out[30]: 1 [ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2]

XAll data except last data are strings. Last one is integer.
In [31]: 1 a_list [ 0]

Out[31]: 1 'a '

XList data can be accessed using index.


In [32]: 1 type ( a_list [0])

Out[32]: 1 str

In [33]: 1 type ( a_list [ -1])

Out[33]: 1 int

XType of data can be checked using type().

S LICING A L IST
Once a list has been created, a part of it can be taken as a new list. This is called
slicing the list. A slice can be extracted using indices. Let’s consider same list as
above:
In [34]: 1 a_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2]

• The length of the list can be obtained as:


In [35]: 1 len ( a_list )

Out[35]: 1 6
2.2. Combined Data Types 16

Xthe index can be from 0 to 5 if we count from left to right or -1 to -6 if


we count from right to left.

• We can obtain any other list as:


In [36]: 1 b_list = a_list [0:3]
2 b_list

Out[36]: 1 [ 'a ' , 'b ' , ' mpilgrim ']

A DDING I TEM TO A L IST


There are 4 different ways to add item/items to a list. Let’s consider same list as
above:
In [37]: 1 a_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2]

1. ‘+’ operator: The + operator concatenates lists to create a new list. A list
can contain any number of items; there is no size limit.
In [38]: 1 b_list = a_list +[ ' Hydro ' , ' Aqua ']
2 b_list

Out[38]: 1 [ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ']

2. append(): The append() method adds a single item to the end of the list. Even
if the added item is a list, the whole list is added as a single item in the old list.
In [39]: 1 b_list . append ( True )
2 b_list

Out[39]: 1 [ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , True ]

XThis list has strings, integer, and boolean data.


In [40]: 1 len ( b_list )

Out[40]: 1 9

In [41]: 1 b_list . append ([ 'd ' , 'e ' ])


2 b_list
2.2. Combined Data Types 17

Out[41]: 1 [ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , True ,[ 'd ' , '
e '] ]

In [42]: 1 len ( b_list )

Out[42]: 1 10

XThe length of b_list has increased by only one even though two items,
['d', 'e'], were added.

3. extend(): Similar to append but each item is added separately. For e.g., let’s
consider the list
In [43]: 1 b_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , True
]
2 len ( b_list )

Out[43]: 1 9

In [44]: 1 b_list . extend ([ 'd ' , 'e ' ])


2 b_list

Out[44]: 1 [ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , True , 'd ' , 'e
']

In [45]: 1 len ( b_list )

Out[45]: 1 11

XThe length of b_list has increased by two as two items in the list, ['d',
'e'], were added.

4. insert(): The insert() method inserts a single item into a list. The first argument
is the index of the first item in the list that will get bumped out of position.
In [46]: 1 b_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' ,
True ]
2 b_list . insert (0 , 'd ')

XInsert 'd' in the first position,i.e., index 0.


In [47]: 1 b_list
2.2. Combined Data Types 18

Out[47]: 1 [ 'd ' , 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , True ]

In [48]: 1 b_list . insert (0 ,[ 'x ' , 'y ' ])

In [49]: 1 b_list

Out[49]: 1 [[ 'x ' , 'y '] , 'd ' , 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , '
Aqua ' , True ]

XThe list ['x', 'y'] is added as one item as in the case of append().

S EARCH FOR I TEM IN A L IST


Consider the following list:
In [50]: 1 b_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ']

• count() can be used as in the case of string


In [51]: 1 b_list . count ( 'b ')

Out[51]: 1 2

• in can be used to check if certain value exists in a list.


In [52]: 1 'b ' in b_list

Out[52]: 1 True

In [53]: 1 'c ' in b_list

Out[53]: 1 False

XThe output is boolean data, i.e., True or False.

• index can be used to find the index of search data.


In [54]: 1 b_list . index ( 'a ')

Out[54]: 1 0

In [55]: 1 b_list . index ( 'b ')

Out[55]: 1 1

XEven though there are 2 'b', the index of first 'b' is returned.
2.2. Combined Data Types 19

R EMOVING I TEM FROM A L IST


There are many ways to remove an item from a list. The list automatically adjusts its
size after some element has been removed.

R EMOVING I TEM BY I NDEX


The del command removes an item from a list if the index of an element that needs
to be removed is provided.

• Consider the following list:


In [56]: 1 b_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ']

• Suppose we want to remove the element 'mpilgrim' from the list. Its index is 2.
In [57]: 1 b_list [2]

Out[57]: 1 ' mpilgrim '

In [58]: 1 del b_list [2]


2 b_list

Out[58]: 1 [ 'a ' , 'b ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ']

X'mpilgrim' is now removed.

The pop() command can also remove an item by specifying an index. But, it is even
more versatile as it can be used without any argument to remove the last item of a
list.

• Consider the following list:


In [59]: 1 b_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ']

• Suppose we want to remove the element 'mpilgrim' from the list. Its index is 2.
In [60]: 1 b_list [2]

Out[60]: 1 ' mpilgrim '

In [61]: 1 b_list . pop (2)


2.2. Combined Data Types 20

Out[61]: 1 ' mpilgrim '

XThe item to be removed will be displayed.

• Now the b_list is as follows


Out[61]: 1 [ 'a ' , 'b ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ']

• If pop() is used without an argument.


In [62]: 1 b_list . pop ()

• Now the b_list is as follows


Out[62]: 1 [ 'a ' , 'b ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ']

XThe last 'b' is removed from the list.

• If pop() is used once again. The list will be as follows:


Out[62]: 1 [ 'a ' , 'b ' , 'z ' , ' example ' , 2 , ' Hydro ']

R EMOVING I TEM BY VALUE


The remove command removes item/items from a list if the value of the item is
specified.

• Consider the following list:


In [63]: 1 b_list =[ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ']

• Suppose we want to remove the elements 'b' from the list.


In [64]: 1 b_list . remove ( 'b ')

In [65]: 1 b_list

Out[65]: 1 [ 'a ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ']

XAll the 'b' in the list are now removed.


2.2. Combined Data Types 21

2.2.2. T UPLES
A tuple is an immutable list. A tuple can not be changed/modified in any way once it
is created.

• A tuple is defined in the same way as a list, except that the whole set of elements
is enclosed in parentheses instead of square brackets.

• The elements of a tuple have a defined order, just like a list. Tuples indices are
zero based, just like a list, so the first element of a non empty tuple is always
t[0].

• Negative indices count from the end of the tuple, just as with a list.

• Slicing works too, just like a list. Note that when you slice a list, you get a new
list; when you slice a tuple, you get a new tuple.

• A tuple is used because reading/writing a tuple is faster than the same for lists.
If you do not need to modify a set of item, a tuple can be used instead of list.

C REATING T UPLES
A tuple can be created just like the list but parentheses “( )” has to be used instead
of square brackets“[ ]”. For e.g.,
In [66]: 1 a_tuple =( 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ')

T UPLE O PERATIONS
All the list operations except the ones that modify the list itself can be used for tuples
too. For e.g., you cannot use append(), extend(), insert(), del, remove(), and pop() for
tuples. For other operations, please follow the same steps as explained in the previous
section. Here are some examples of tuple operations.

• Consider the following tuple:


In [67]: 1 a_tuple =( 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b '
)

In [68]: 1 a_tuple . index ( 'z ')

Out[68]: 1 3
2.2. Combined Data Types 22

Xitem 'z' is at the index 3, i.e., it is the fourth element of the tuple.
In [69]: 1 b_tuple = a_tuple [0:4]

In [70]: 1 b_tuple

Out[70]: 1 ( 'a ' , 'b ' , ' mpilgrim ' , 'z ')

XNew tuple can be created by slicing a tuple as original tuple does not
change.
In [71]: 1 a_tuple

Out[71]: 1 ( 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b ')

2.2.3. S ETS
A set is an unordered collection of unique values. A single set can contain values of
any datatype.

C REATING S ET
There are basically two ways of creating set.

1. From scratch: Sets can be created like lists but curly brackets “ {}” have to be
used instead of square brackets “[ ]”. For e.g.,
In [72]: 1 a_set ={ 'a ' , 'b ' , ' mpilgrim ' , 'z ' , ' example ' , 2 , ' Hydro ' , ' Aqua ' , 'b '}

In [73]: 1 type ( a_set )

Out[73]: 1 set

In [74]: 1 a_set

Out[74]: 1 {2 , ' Aqua ' , ' Hydro ' , 'a ' , 'b ' , ' example ' , ' mpilgrim ' , 'z '}

XThe set has different orders than the values given inside {} because it is
unordered and original orders are ignored. Also, there is only one 'b' in the set
even though two 'b' were given because a set is a collection of unique values.
Duplicate values are taken as one.
2.2. Combined Data Types 23

2. From list or tuple: A set can be created from a list or tuple as,
In [75]: 1 set ( a_list )
2 set ( a_tuple )

M ODIFYING S ET
A set can be modified by adding an item or another set to it. Also, items of set can
be removed.

A DDING E LEMENTS
• Consider a set as follows,
In [76]: 1 a_set ={2 , ' Aqua ' , ' Hydro ' , 'a ' , 'b ' , ' example ' , ' mpilgrim ' , 'z '}

• Using add: To add single item to a set.


In [77]: 1 a_set . add ( 'c ')

In [78]: 1 a_set

Out[78]: 1 {2 , ' Aqua ' , ' Hydro ' , 'a ' , 'b ' , 'c ' , ' example ' , ' mpilgrim ' , 'z '}

X'c' is added after 'b'.

• Using update: To add multiple items as a set or list or tuple.


In [79]: 1 a_set . update ( 'a ' , ' Sujan ' , ' Koirala ')

In [80]: 1 a_set

Out[80]: 1 {2 , ' Aqua ' , ' Hydro ' , ' Koirala ' , ' Sujan ' , 'a ' , 'b ' , 'c ' , ' example ' , '
mpilgrim ' , 'z '}

X'Koirala' and 'Sujan' are added but 'a' is not added.

R EMOVING E LEMENTS
• Consider a set as follows,
In [81]: 1 a_set ={2 , ' Aqua ' , ' Hydro ' , 'a ' , 'b ' , ' example ' , ' mpilgrim ' , 'z '}
2.2. Combined Data Types 24

• Using remove() and discard(): These are used to remove an item from a set.
In [82]: 1 a_set . remove ( 'b ')

In [83]: 1 a_set

Out[83]: 1 {2 , ' Aqua ' , ' Hydro ' , ' Koirala ' , ' Sujan ' , 'a ' , 'c ' , ' example ' , ' mpilgrim
' , 'z '}

X'b' has been removed.


In [84]: 1 a_set . discard ( ' Hydro ')

In [85]: 1 a_set

Out[85]: 1 {2 , ' Aqua ' , ' Koirala ' , ' Sujan ' , 'a ' , 'c ' , ' example ' , ' mpilgrim ' , 'z '}

• Using pop() and clear(): pop() is same as list but it does not remove the last
item as list. pop() removes one item ramdomly. clear() is used to clear the whole
set and create an empty set.
In [86]: 1 a_set . pop ()

In [87]: 1 a_set

Out[87]: 1 {2 , ' Koirala ' , ' Sujan ' , 'a ' , 'c ' , ' example ' , ' mpilgrim ' , 'z '}

S ET O PERATIONS
Two sets can be combined or common elements in two sets can be combined to form
a new set. These functions are useful to combine two or more lists.

• Consider following two sets,


In [88]: 1 a_set ={2 ,4 ,5 ,9 ,12 ,21 ,30 ,51 ,76 ,127 ,195}
2 b_set ={1 ,2 ,3 ,5 ,6 ,8 ,9 ,12 ,15 ,17 ,18 ,21}

• Union: Can be used to combine two sets.


In [89]: 1 c_set = a_set . union ( b_set )

In [90]: 1 c_set

Out[90]: 1 {1 ,2 ,195 ,4 ,5 ,6 ,8 ,12 ,76 ,15 ,17 ,18 ,3 ,21 ,30 ,51 ,9 ,127}
2.2. Combined Data Types 25

• Intersection: Can be used to create a set with elements common to two sets.
In [91]: 1 d_set = a_set . intersection ( b_set )

In [92]: 1 d_set

Out[92]: 1 {9 ,2 ,12 ,5 ,21}

2.2.4. D ICTIONARIES
A dictionary is an unordered set of key-value pairs. A value can be retrieved for a
known key but the other-way is not possible.

C REATING D ICTIONARY
Creating a dictionary is similar to set in using curled brackets “ ” but key:value pairs
©ª

are used instead of values. The following is an example,


In [93]: 1 a_dict ={ ' Hydro ': ' 131.112.42.40 ' , ' Aqua ': ' 131.112.42.41 '}

In [94]: 1 a_dict

Out[94]: 1 { ' Aqua ': ' 192.168.1.154 ' , ' Hydro ': ' 131.112.42.40 '

XThe order is changed automatically like set.


In [95]: 1 a_dict [ ' Hydro ']

Out[95]: 1 ' 131.112.42.40 '

XKey 'Hydro' can be used to access the value '131.112.42.40'.

M ODIFYING D ICTIONARY
Since the size of the dictionary is not fixed, new key:value pair can be freely added to
the dictionary. Also values for a key can be modified.

• Consider the following dictionary.


In [96]: 1 a_dict ={ ' Aqua ': ' 192.168.1.154 ' , ' Hydro ': ' 131.112.42.40 '}

• If you want to change the value of 'Aqua',


In [97]: 1 a_dict [ ' Aqua ' ]= ' 192.168.1.154 '
2.2. Combined Data Types 26

In [98]: 1 a_dict

Out[98]: 1 { ' Aqua ': ' 192.168.1.154 ' , ' Hydro ': ' 131.112.42.40 '

• If you want to add new item to the dictionary,


In [99]: 1 a_dict [ ' Lab ' ]= ' Kanae '

In [100]: 1 a_dict

Out[100]: 1 { ' Aqua ': ' 192.168.1.154 ' , ' Hydro ': ' 131.112.42.40 ' , ' Lab ': ' Kanae '}

• Dictionary values can also be lists instead of single values. For e.g.,
In [101]: 1 k_lab ={ ' Female ' :[ ' Yoshikawa ' , ' Imada ' , ' Yamada ' , ' Sasaki ' , '
Watanabe ' , ' Sato '] , ' Male ' :[ ' Sujan ' , ' Iseri ' , ' Hagiwara ' , '
Shiraha ' , ' Ishida ' , ' Kusuhara ' , ' Hirochi ' , ' Endo ' ]}

In [102]: 1 k_lab [ ' Female ']

Out[102]: 1 [ ' Yoshikawa ' , ' Imada ' , ' Yamada ' , ' Sasaki ' , ' Watanabe ' , ' Sato ']

2.2.5. A RRAYS
Arrays are similar to lists but it contains homogeneous data, i.e., data of same type
only. Arrays are commonly used to store numbers and hence used in mathematical
calculations.

C REATING A RRAYS
Python arrays can be created in many ways. It can also be read from some data file in
text or binary format, which are explained in latter chapters of this guide. Here, some
commonly used methods are explained. For a detailed tutorial on python arrays, refer
here.

1. From list: Arrays can be created from a list or a tuple using:

Xsomearray=array(somelist). Consider the following examples.


In [103]: 1 b_list =[ 'a ' , 'b ' ,1 ,2]
2.2. Combined Data Types 27

XThe list has mixed datatypes. First two items are strings and last two are
numbers.
In [104]: 1 b_array = array ( b_list )

1 array ([ 'a ' , 'b ' , '1 ' , '2 '] , dtype = '| S8 ')

XSince first two elements are string, numbers are also converted to strings
when array is created.
In [105]: 1 b_list2 =[1 ,2 ,3 ,4]

XAll items are numbers.


In [106]: 1 b_array2 = array ( b_list2 )

In [107]: 1 b_array2

Out[107]: 1 array ([1 , 2 , 3 , 4])

XNumeric array is created. Mathematical operations like addition, subtrac-


tion, division, etc. can be carried in this array.

2. Using built-in functions:

(a) From direct values:


In [108]: 1 xx = array ([2 , 4 , -11])

Xxx is array of length 3 or shape (1,3) ⇒ means 1 row and 3 columns.

(b) From arange(number): Creates an array from the range of values. Ex-
amples are provided below. For details of arange follow chapter 4.
In [109]: 1 yy = arange (2 ,5 ,1)

In [110]: 1 yy

Out[110]: 1 array ([2 ,3 ,4])

XCreates an array from lower value (2) to upper value (5) in specified
interval (1) excluding the last value (5).
In [111]: 1 yy = arange (5)
2.2. Combined Data Types 28

In [112]: 1 yy

Out[112]: 1 array ([0 ,1 ,2 ,3 ,4])

XIf the lower value and interval are not specified, they are taken as 0
and 1, respectively.
In [113]: 1 yy = arange (5 ,2 , -1)

In [114]: 1 yy

Out[114]: 1 array ([5 ,4 ,3])

XThe interval can be negative.

(c) Arrays of fixed shape: Sometimes it is necessary to create some array to


store the result of calculation. Fuctions zeros(someshape) and ones(someshape)
can be used to create arrays with all values as zero or one, respectively.
In [115]: 1 zz = zeros (20)

Xwill create an array with 20 zeros.


In [116]: 1 zz = zeros (20 ,20)

Xwill create an array with 20 rows and 20 columns (total 20*20=400


elements) with all elements as zero.
In [117]: 1 zz = zeros (20 ,20 ,20)

Xwill create an array with 20 blocks with each block having 20 rows
and 20 columns (total 20*20*20=8000 elements) with all elements as zero.

A RRAY O PERATIONS
Arithmetic operators on arrays apply elementwise. A new array is created and filled
with the result.
In [118]: 1 a = array ([20 ,30 ,40 ,50])
2 b = arange (4)

In [119]: 1 b

Out[119]: 1 array ([0 ,1 ,2 ,3])


2.2. Combined Data Types 29

In [120]: 1 c = a-b

In [121]: 1 c

Out[121]: 1 array ([20 , 29 , 38 , 47])

XEach element in b is subtracted from respective element in a.


In [122]: 1 b **2

Out[122]: 1 array ([0 , 1 , 4 , 9])

XSquare of each element in b.


In [123]: 1 10* sin ( a )

Out[123]: 1 array ([ 9.12945251 , -9.88031624 , 7.4511316 , -2.62374854])

XTwo operations can be carried out at once.


In [124]: 1 a <35

Out[124]: 1 array ([ True , True , False , False ] , dtype = bool )

X'True' if a<35 otherwise 'False'.


3
I NPUT /O UTPUT OF FILES

Read and write data from/to files in commonly used data formats, such as
text (csv), binary, excel, netCDF, R data frame and Matlab.

30
3.1. Read Text File 31

This chapter explains the method to read and write data from/to commonly used
data formats, such as text (csv), binary, excel, netCDF, and Matlab.

3.1. R EAD T EXT F ILE


Small datasets are often stored in a structured or unstructured text format. Python
libraries are able to read these data files in several ways.

3.1.1. P LAIN T EXT


First, we will load data from a free format structured text file (e.g., ASCII data).
In [125]: 1 a = loadtxt ( ' example_plain . txt ' , comments = '# ' , delimiter = None ,
converters = None , skiprows =0 , usecols = None )

Reads the data in the text file as an ’array’. Will raise an error if the data is
non-numeric (float or integer).

3.1.2. C OMMA S EPARATED T EXT


Often the data values in text file are separated by special characters like tab, line
breaks, or a comma. These separators can be excluded when reading the file by using
the option ’delimiter’ while using loadtxt.
In [126]: 1 a = loadtxt ( ' example_csv . csv ' , delimiter = ' , ' , converters ={0:
datestr2num })

A full list of options of loadtxt is available here.

3.1.3. U NSTRUCTURED T EXT


If the text data is unstructured, extra work is needed to read the file, and to process
it to be saved as an array.

• First the file has to be opened as an object :


In [127]: 1 a = file ( ' filename ')

In [128]: 1 a = open ( ' filename ')

In [129]: 1 type ( a )
3.2. Save Text File 32

Out[129]: 1 file

• Extracting data from the file as a ’list’:


In [130]: 1 a_list = a . readlines ()

Xreadlines() reads contents (each line) of the file object ’a’ and puts it in a
a_list.

• Extracting data from the file as a ’string’:


In [131]: 1 a_str = a . read ()

Xread() reads contents of the file object ’a’ and stores it as a string.

ASCII files are coded with special characters. These characters need to be removed
from each line/item of the data using read or readlines.

• Drop the ’\n’ or ’\r \n’ sign at the end of each line:

• strip() is used to remove these characters:

• To drop it from each element of a_list:


In [132]: 1 b =[ s . strip () for s in a ]

• Furthermore, to convert each element into float:


In [133]: 1 b =[ float ( s . strip () ) for s in a ]

3.2. S AVE T EXT F ILE


• To save an array ’a’,
In [134]: 1 savetxt ( filename , a , fmt = ' %.18 e ' , delimiter = ' ' , newline = '\ n ' ,
header = ' ' , footer = ' ' , comments = '# ')

A full list of options of savetxt is available here.


3.3. Read Binary Data 33

Table 3.1: Data type of the returned array

Type code C Type Python Type Minimum size in bytes


'c' char character 1
'b' signed char int 1
'B' unsigned char int 1
'u' Py_UNICODE Unicode character 2
'h' signed short int 2
'H' unsigned short int 2
'i' signed int int 2
'I' unsigned int long 2
'l' signed long int 4
'L' unsigned long long 4
'f' float float 4
'd' double float 8
ãã

3.3. R EAD B INARY D ATA


Binary data format is used because it uses smaller number of bytes to store each data,
such that its efficient in using smaller memory. This section explains the procedure of
reading and writing data in binary format using built-in function, fromfile.
In [135]: 1 dat = fromfile ( ' filename , ' type code ')

• filename is the name of the file.

• type code: can be defined as type code (e.g., 'f') or python type (e.g., 'float') as
shown in Table 3.1. It determines the size and byte-order of items in the binary
file.

In [136]: 1 dat = fromfile ( ' exam ple_bina ry . float32 ' , 'f ')
2 dat
3.4. Write Binary Data 34

3.4. W RITE B INARY D ATA


• To write/save all items (as machine values) of an array "A" to a file:
In [137]: 1 A . tofile ( ' filename ')

• can also include the data type as,


In [138]: 1 A . astype ( 'f ') . tofile ( ' filename ')

3.5. R EAD N ET CDF D ATA


NetCDF data files can be read by several packages such as Scientific, Scipy, and
NetCDF4. Below is an example of reading netCDF file using io module of Scipy.
In [139]: 1 from scipy . io import netcdf
2 ncf = netcdf . netcdf_file ( ' exam ple_netC DF . nc ')
3 ncfile . variables
4 dat = ncf . variables [ ' wbal_clim_CUM ' ][:]

3.6. W RITE N ET CDF D ATA


A short example of how to create netCDF data is below. For details, refer to the
original Scipy help page.
In [140]: 1 from scipy . io import netcdf
2 f = netcdf . netcdf_file ( ' simple . nc ' , 'w ')
3 f . history = ' Created for a test '
4 f . c re at eD im e ns io n ( ' time ' , 10)
5 time = f . createV ariable ( ' time ' , 'i ' , ( ' time ' ,) )
6 time [:] = np . arange (10)
7 time . units = ' days since 2008 -01 -01 '
8 f . close ()

3.7. R EAD M AT L AB D ATA


MatLab data files can be read by using python interface for hdf5 dataset. Requires
installation of h5py package.
In [141]: 1 a = h5py . File ( ' exam ple_matl ab . mat ')

In [142]: 1 a . keys ()
3.8. Read Excel Data 35

Out[142]: 1 [ u '# refs # ' , u ' Results ']

In [143]: 1 a [ ' Results ' ]. keys ()

Out[143]: 1 [ u ' SimpBM ' , u ' SimpBM2L ' , u ' SimpBMtH ' , u ' SimpGWoneTfC ' , u ' SimpGWvD ']

In [144]: 1 dat = a [ ' Results / SimpGWvD / Default / ModelOutput / actET ' ][:]
2 dat = a [ ' Results ' ][ ' SimpGWvD ' ][ ' Default ' ][ ' ModelOutput ' ][ ' actET ' ][:]

3.8. R EAD E XCEL D ATA


Excel workbooks created by MS Office 2010 or later (.xlsx) file can be read using
openpyxl package.
In [145]: 1 ex_f = load_workbook ( ' example_xls . xlsx ')
2 ex_f . sheetnames
3 a_sheet = ex_f [ ' Belleville_96 - pr ']
4
D ATA O PERATIONS IN P YTHON

I nformation on common mathematical and simple statistical operation on


data.

36
4.1. Size and Shape 37

4.1. S IZE AND S HAPE


All data are characterized by two things: how big they are (size), and how they are
arranged (shape). Here are some useful commands to play with the size and shape of
data.
We will use the following list as an example:
In [146]: 1 a =[1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10]

• Check the dimension of the data:


In [147]: 1 size ( a )

Out[147]: 1 10

• Check the shape of the data:


In [148]: 1 shape ( a )

Out[148]: 1 (10 ,)

Xfor list.
In [149]: 1 array ( a ) . shape

Out[149]: 1 (10 ,)

Xfor array.

XNote that the order is number of rows (longitudinal direction↓), number


of columns (lateral direction→) for 2-dimensional arrays in python.

• Change the arrangement of the data:


In [150]: 1 b = a . reshape (2 ,5)

Xcan be used in arrays only.


In [151]: 1 b = array ([[1 ,2 ,3 ,4 ,5] ,[6 ,7 ,8 ,9 ,10]])

In [152]: 1 b = reshape (a ,(2 ,5) )


4.2. Slicing and Dicing 38

Xcan be used for both array and list. List is converted to array by using this
function.
In [153]: 1 b = a . reshape ( -1 ,5)

XBy using the ‘-1’ flag, the first dimension is automatically set to match the
total size of the array. For e.g., if there are 10 elements in an array/list and 5
columns is specified during reshape, number of rows is automatically calculated
as 2. The shape will be (2,5).

• Convert the data type to array:


In [154]: 1 b = array ( a )

• Convert the data type to list:


In [155]: 1 b = a . tolist ()

• Convert the data into float and integer:


In [156]: 1 float ( a [0])
2 int ( a [0])

Xthese functions can only be used for one element at a time.

4.2. S LICING AND D ICING


This section explains how to extract data from an array or a list. The following process
can be used to take data for a region from global data, or for a limited period from
long time series data. The process is called ‘slicing’.
As same method can be used for arrays and lists. Let’s consider the following list,
In [157]: 1 a =[1 ,2 ,3 ,4 ,5]

XThere are five items in the list.

I NDEX B ASICS
Indexing is done in two ways:

1. Positive Index: The counting order is from left to right. The index for the first
element is 0 (not 1).
4.2. Slicing and Dicing 39

In [158]: 1 a [0]

Out[158]: 1 1

In [159]: 1 a [1]

Out[159]: 1 2

In [160]: 1 a [4]

Out[160]: 1 5

XThe fifth item (index=4) is 5.

2. Negative Index: The counting order is from right to left. The index for the last
item is -1. In some cases, the list is very long and it is much easier to count
from the end rather than the beginning.
In [161]: 1 a [ -1]

Out[161]: 1 5

XIt is same as a[4] as shown above.


In [162]: 1 a [ -2]

Out[162]: 1 4

D ATA E XTRACTION
Data extraction is carried out by using indices. In this section, some examples of using
indices are provided. Details of array indexing and slicing can be found here.

1. Using two indices:


In [163]: 1 somelist [ first index : last index :( interval ) ]

In [164]: 1 a [0:2]

Out[164]: 1 [1 ,2]

Xa[0] and a[1] are included but a[2] is not included.


4.2. Slicing and Dicing 40

In [165]: 1 a [3:4]

Out[165]: 1 4

2. Using single index:


In [166]: 1 a [:2]

Out[166]: 1 [1 ,2]

Xsame as a[0:2].
In [167]: 1 a [2:]

Out[167]: 1 [3 ,4 ,5]

Xsame as a[2:5].

3. Consider a 2-D list and 2-D array Different method for array and list as indexing
is different in two cases as explained below.
In [168]: 1 a_list =[[1 ,2 ,3] ,[4 ,5 ,6]]
2 a_array = array ([[1 ,2 ,3] ,[4 ,5 ,6]])

In [169]: 1 shape ( a_list )

Out[169]: 1 (2 ,3)

In [170]: 1 a_array . shape

Out[170]: 1 (2 ,3)

In [171]: 1 a_list [0]

Out[171]: 1 [1 ,2 ,3]

Xwhich is a list.
In [172]: 1 a_array [0]

Out[172]: 1 array ([1 ,2 ,3])

Xwhich is an array.
4.2. Slicing and Dicing 41

4. To extract data from list,


In [173]: 1 a_list [0][1]

Out[173]: 1 2

In [174]: 1 a_list [1][:2]

Out[174]: 1 [4 ,5]

XThe index has to be provided in two different sets of square brackets “[ ]”.

5. To extract data from array,


In [175]: 1 a_array [0 ,1]

Out[175]: 1 2

In [176]: 1 a_array [1 ,:2]

Out[176]: 1 [4 ,5]

XThe index can be provided is one set of square brackets “[ ]”.

6. Consider a 3-D list and 3-D array,


In [177]: 1 a_list =[[[2 ,3] ,[4 ,5] ,[6 ,7] ,[8 ,9]] ,[[12 ,
13] ,[14 ,15] ,[16 ,17] ,[18 ,19]]]
2 a_array = array ([[[2 ,3] ,[4 ,5] ,[6 ,7] ,[8 ,9]] ,[[12 ,
13] ,[14 ,15] ,[16 ,17] ,[18 ,19]]])

XThe shape of both data is (2,4,2).

To extract from list,


In [178]: 1 a_list [0][2]

Out6,7:

In [179]: 1 a_list [0][2][1]

Out[179]: 1 6

To extract from array,


4.3. Built-in Mathematical Functions 42

In [180]: 1 a_array [0 ,2]

Out[180]: 1 array ([6 ,7])

In [181]: 1 a_array [0 ,2 ,1]

Out[181]: 1 6

4.3. B UILT- IN M ATHEMATICAL F UNCTIONS


The Python interpreter has a number of functions built into it. This section documents
the Pythonâs built-in functions in easy-to-use order. Firstly, consider the following 2-D
arrays,
In [182]: 1 A = array ([[ -2 , 2] , [ -5 , 5]])
2 B = array ([[2 , 2] , [5 , 5]])
3 C = array ([[2.53 , 2.5556] , [5.3678 , 5.4568]])

1. max(iterable): Returns the maximum from the passed elements or if a single


iterable is passed, the max element in the iterable. With two or more arguments,
return the largest value.
In [183]: 1 max ([0 ,10 ,15 ,30 ,100 , -5])

Out[183]: 1 100

In [184]: 1 A . max ()

Out[184]: 1 5

2. min(iterable): Returns the minimum from the passed elements or if a single


iterable is passed, the minimum element in the iterable. With two or more
arguments, return the smallest value.
In [185]: 1 min ([0 ,10 ,15 ,30 ,100 , -5])

Out[185]: 1 -5

In [186]: 1 A . min ()

Out[186]: 1 -5
4.3. Built-in Mathematical Functions 43

3. mean(iterable): Returns the average of the array elements. The average is taken
over the flattened array by default, otherwise over the specified axis. For details,
click here.
In [187]: 1 mean ([0 ,10 ,15 ,30 ,100 , -5])

Out[187]: 1 75

In [188]: 1 A . mean ()

Out[188]: 1 0.0

4. median(iterable): Returns the median of the array elements.


In [189]: 1 median ([0 ,10 ,15 ,30 ,100 , -5])

Out[189]: 1 12.5

In [190]: 1 A . median ()

Out[190]: 1 0.0

5. sum(iterable): Returns the sum of the array elements. It returns sum of array
elements over an axis if axis is specified else sum of all elements. For details,
click here.
In [191]: 1 sum ([1 ,2 ,3 ,4])

Out[191]: 1 10

In [192]: 1 A . sum ()

Out[192]: 1 0

6. abs(A): Returns the absolute value of a number, which can be an integer or a


float, or an entire array.
In [193]: 1 abs ( A )

Out[193]: 1 array ([[2 ,2] ,[5 ,5]])


4.3. Built-in Mathematical Functions 44

In [194]: 1 abs ( B )

Out[194]: 1 array ([2 , 2] ,[5 , 5])

7. divmod(x,y): Returns the quotient and remainder resulting from dividing the
first argument (some number x or an array) by the second (some number y or
an array).
In [195]: 1 divmod (2 , 3)

Out[195]: 1 (0 , 2)

Xas 2 / 3 = 0 and remainder is 2.


In [196]: 1 divmod (4 , 2)

Out[196]: 1 (2 , 0)

Xas 4 / 2 = 2 and remainder is 0.

In case of two dimensional array data


In [197]: 1 divmod (A , B )

Out[197]: 1 ( array ([[ -1 , 1] , [ -1 , 1]]) , array ([[0 , 0] , [0 , 0]]) )

8. modulo (x%y): Returns the remainder of a division of x by y.


In [198]: 1 5%2

Out[198]: 1 1

9. pow(x,y[, z]): Returns x to the power y. But, if z is present, returns x to


the power y modulo z (more efficient than pow(x, y) % z). The pow(x, y) is
equivalent to x**y.
In [199]: 1 pow (A , B )

Out[199]: 1 array ([[4 , 4] , [ -3125 , 3125]])


4.3. Built-in Mathematical Functions 45

10. round(x,n): Returns the floating point value of x rounded to n digits after the
decimal point.
In [200]: 1 round (2.675 ,2)

Out[200]: 1 2.67

11. around(A,n): Returns the floating point array A rounded to n digits after the
decimal point.
In [201]: 1 around (C ,2)

Out[201]: 1 array ([[ 2.53 , 2.56] , [ 5.37 , 5.46]])

12. range([x],y[,z]) : This function creates lists of integers in an arithmetic progres-


sion. It is primarily used in for loops. The arguments must be plain integers.

• If the step argument is omitted, it defaults to 1.


• If the start argument (x) is omitted, it defaults to 0.
• The full form returns a list of plain integers [x, x + z, x + 2*z, · · · ,y-z].
• If step (z) is positive, the last element is the ‘start (x) + i * step (z)’ just
less than ‘y’.

• If step (z) is negative, the last element is the ‘start (x) + i * step (z)’ just
greater than ‘y’.

• If step (z) is zero, ValueError is raised.


In [202]: 1 range (10)

Out[202]: 1 [0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9]

In [203]: 1 range (1 ,11)

Out[203]: 1 [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10]

In [204]: 1 range (0 ,20 ,5)

Out[204]: 1 [0 ,5 ,10 ,15]


4.3. Built-in Mathematical Functions 46

In [205]: 1 range (0 , -5 , -1)

Out[205]: 1 [0 , -1 , -2 , -3 , -4]

In [206]: 1 range (0)

Out[206]: 1 [ ]

13. arange(x,y[,z]) : This function creates arrays of integers in an arithmetic pro-


gression. Same as in range().
In [207]: 1 arange (10)

Out[207]: 1 array ([0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9])

In [208]: 1 arange (1 ,11)

Out[208]: 1 array ([1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10])

In [209]: 1 arange (0 ,20 ,5)

Out[209]: 1 array ([0 ,5 ,10 ,15])

In [210]: 1 arange (0 , -5 , -1)

Out[210]: 1 array ([0 , -1 , -2 , -3 , -4])

In [211]: 1 arange (0)

Out[211]: 1 array ([ ] , dtype = int64 )

14. zip(A,B): Returns a list of tuples, where each tuple contains a pair of it h element
of each argument sequences. The returned list is truncated to length of shortest
sequence. For a single sequence argument, it returns a list with 1 tuple. With
no arguments, it returns an empty list.
In [212]: 1 zip (A , B )

Out[212]: 1 [( array ([ -2 , 2]) , array ([2 , 2]) ) , ( array ([ -5 , 5]) , array ([5 ,
5]) ) ]
4.4. Matrix operations 47

15. sort(): Sorts the array elements in smallest to largest order.


In [213]: 1 D = array ([10 ,2 ,3 ,10 ,100 ,54])

In [214]: 1 D . sort ()

In [215]: 1 D

Out[215]: 1 array ([2 , 3 , 10 , 10 , 54 , 100])

16. ravel(): Returns a flattened array. 2-D array is converted to 1-D array.
In [216]: 1 A . ravel ()

Out[216]: 1 array ([ -2 , 2 , -5 , 5])

17. transpose(): Returns the transpose of an array (matrix) by permuting the di-
mensions.
In [217]: 1 A . transpose ()

Out[217]: 1 array ([[ -2 , -5] , [ 2 , 5]])

18. diagonal(): Returns diagonal matrixs for pecified diagonals.


In [218]: 1 A . diagonal ()

Out[218]: 1 array ([ -2 , 5])

4.4. M ATRIX OPERATIONS


The linear algebra module of Numpy provides a suit of Matrix calculations.

1. Dot product:
In [219]: 1 a = rand (3 ,3)
2 b = rand (3 ,3)
3 dot_p = dot (a , b )

Xwhere a and b are two arrays.


4.5. String Operations 48

2. Cross product:
In [220]: 1 a = rand (3 ,3)
2 b = rand (3 ,3)
3 cro_p = cross (a , b )

Xwhere a and b are two arrays.

3. Matrix multiplication:
In [221]: 1 a = rand (2 ,3)
2 b = rand (3 ,2)
3 mult_ab = matmul (a , b )

In [222]: 1 shape ( mult_ab )

Out[222]: 1 (2 ,2)

4.5. S TRING O PERATIONS


Lets assume a string s as,
In [223]: 1 s = ' sujan koirala '

1. split(): Splitting the strings. It has one required argument, a delimiter. The
method splits a string into a list of strings based on the delimiter.
In [224]: 1 s . split ()

Out[224]: 1 [ ' sujan ' , ' koirala ']

Xblank space as delimiter. creates a list with elements separated at locations


of blank space.
In [225]: 1 s . split ( 'a ')

Out[225]: 1 [ ' suj ' , 'n koir ' , 'l ' , ' ']

X’a’ as delimiter. creates a list with elements separated at locations of ’a’

2. lower() and upper(): Changes the string to lower case and upper case respec-
tively.
4.6. Other Useful Functions 49

In [226]: 1 s = ' Sujan Koirala '


2 s

Out[226]: 1 ' Sujan Koirala '

In [227]: 1 s . lower ()

Out[227]: 1 ' sujan koirala '

In [228]: 1 s . upper ()

Out[228]: 1 ' SUJAN KOIRALA '

3. count(): Counts the number of occurrences of a substring.


In [229]: 1 s . count ( 'a ')

Out[229]: 1 3

XThere are 3 a’s in string s.

4. Replace a substring:
In [230]: 1 s2 = s . replace ( " Su " , " Tsu " )

5. List to String:
In [231]: 1 a_list =[ 'a ' , 'b ' , 'c ']
2 a_str = " and " . join ( str ( x ) for x in a_list )
3 a_str

Out[231]: 1 'a and b and c '

4.6. OTHER U SEFUL F UNCTIONS


1. astype('type code'): Returns an array with the same elements coerced to the
type indicated by type code in Table 3.1. It is useful to save data as some type.
In [232]: 1 A . astype ( 'f ')

Out[232]: 1 array ([[ -2. , 2.] ,[ -5. , 5.]])


4.6. Other Useful Functions 50

2. tolist(): Converts the array to an ordinary list with the same items.
In [233]: 1 A . tolist ()

Out[233]: 1 [[ -2 , 2] , [ -5 , 5]]

3. byteswap(): Swaps the bytes in an array and returns the byteswapped array. If
the first argument is 'True', it byteswaps and returns all items of the array in-
place. Supported byte sizes are 1, 2, 4, or 8. It is useful when reading data from
a file written on a machine with a different byte order. For details on machine
dependency, refer this. To convert data from big endian to little endian or vice-
versa, add byteswap() in same line where ‘fromfile’ is used. If your data is made
by big endian.
5
E SSENTIAL P YTHON S CRIPTING

C ontrol statements, structure of a Python program, and system commands.

51
5.1. Control Flow Tools 52

5.1. C ONTROL F LOW T OOLS


This section briefly introduces common control statements in Python. The control
statements are written in the block structure and do not have end statement. The end
of a block are expressed by indentation.

5.1.1. IF S TATEMENT
The if statement is used to test a condition, which can have True of False values. An
example if block is:
In [234]: 1 if x < 0:
2 print x , ' is a negative number '
3 elif x > 0:
4 print x , ' is a negative number '
5 else :
6 print , 'x is zero '

Xcan have zero or more elif, and else statement is also optional.
If statement can also be checked if a value exists within an iterable such as list,
tuple, array or a string.
In [235]: 1 a_list =[ 'a ' , 'd ' , 'v ' ,2 ,4]
2 if 'd ' in a_list :
3 print a_list . index ( 'd ')

Out[235]: 1 1

In [236]: 1 str = ' We are in a python Course '


2 if ' We ' in str :
3 print str

Out[236]: 1 We are in a Python Course

5.1.2. FOR S TATEMENT


The for statement iterates over the items of any sequence (a list or a string), and
repeats the steps within the for loop.
In [237]: 1 words = [ ' cat ' , ' window ' , ' defenestrate ']
2 for _wor in words :
3 print _wor , words . index ( _wor ) , len ( _wor )

Dictionary can be iterated using items, keys, or values.


5.1. Control Flow Tools 53

In [238]: 1 words = {1: ' cat ' ,2: ' window ' ,3: ' defenestrate '}
2 for _wor in words . items () :
3 print _wor , len ( _wor )

Out[238]: 1 (1 , ' cat ') 2


2 (2 , ' window ') 2
3 (3 , ' defenestrate ') 2

5.1.3. WHILE S TATEMENT


Similar to if statement, but it does not repeat until the end of the loop. The while
loop ends when a condition is met.
In [239]: 1 count = 0
2 while ( count < 2) :
3 print ' The count is : ' , count
4 count = count + 1
5 print " Good bye ! "

Out[239]: 1 The count is : 0


2 The count is : 1
3 Good bye !

5.1.4. BREAK AND CONTINUE

The break statement breaks out of the smallest enclosing for or while loop. The
continue statement continues with the next iteration of the same loop.
In [240]: 1 for n in range (2 , 10) :
2 for x in range (2 , n ) :
3 if n % x == 0:
4 print n , ' equals ' , x , '* ' , n / x
5 break
6 else :
7 # loop fell through without finding a factor
8 print n , ' is a prime number '

Out[240]: 1 2 is a prime number


2 3 is a prime number
3 4 equals 2 * 2
4 5 is a prime number
5 6 equals 2 * 3
6 7 is a prime number
7 8 equals 2 * 4
8 9 equals 3 * 3
5.2. Python Functions 54

In [241]: 1 for num in range (2 , 10) :


2 if num % 2 == 0:
3 print " Found an even number " , num
4 continue
5 print " Found a number " , num

Out[241]: 1 Found an even number 2


2 Found a number 3
3 Found an even number 4
4 Found a number 5
5 Found an even number 6
6 Found a number 7
7 Found an even number 8
8 Found a number 9

5.1.5. RANGE

As shown in previous chapters and examples, range is used to generate a list of numbers
from start to end at an interval step. In Python 2, range generates the whole list object,
whereas in Python 3, it is a special range generator object that does not use the memory
redundantly.

5.2. P YTHON F UNCTIONS


Sections of a Python program can be put into an indented block (a container of code)
defined as a function. A function can be called "on demand". A basic syntax is as
follows:
In [242]: 1 def funcname ( param1 , param2 ) :
2 prod = param1 * param2
3 print prod
4 return

In [243]: 1 type ( funcname (2 ,3) )

Out[243]: 1 6
2 NoneType

• def identifies a function. A name follows the def.

• Parameters can be passed to a function. These parameter values are substituted


before calculation.
5.3. Python Modules 55

• return: Returns the result of the function. In the above example, return will be
an empty NoneType object.

If the return command includes arguments, the result can be passed onto the
statement that calls the function. Also, the default values of the parameters can also
be set. If the function is called without any arguments, the default values are used for
calculation. Here is an example.
In [244]: 1 def funcname ( param1 =2 , param2 =3) :
2 prod = param1 * param2
3 return prod

In [245]: 1 funcname ()

Out[245]: 1 6

In [246]: 1 funcname (3 ,4)

Out[246]: 1 12

In [247]: 1 type ( funcname (2 ,3) )

Out[247]: 1 int

For details on defining function, refer here.

5.3. P YTHON M ODULES


A module is a file containing Python definitions and statements. A module file (with
.py ending) provides a module named filename. This module name is available as
__name__ once the module is imported.
5.3. Python Modules 56

In [248]: 1 # !/ Users / skoirala / anaconda / envs / pyfull / bin / python


2 import numpy as np
3 def lcommon (m , n ) : # def function 's name with parameters as ,
funcname ( parameter )
4 if m > 0 and n >0 and type ( m ) == int and type ( n ) == int : # int
means integer
5 a =[] # an empty list
6 for i in range (1 , n +1) : # i is from 1 to n_th +1 ; that is [1 ,
2 , ... , (n -1) +1]
7 M=m*i
8 for k in range (1 , m +1) :
9 N=n*k
10 if M == N : #M , N is common multiple of m and n
11 a = np . append (a , M ) # input the common multi - ple of m and n
into list a
12 return ( min ( a ) ) # return the minimum value in list a
13 else :
14 return ( " error " )
15 def computeHCF (x , y ) :
16

17 # choose the smaller number


18 if x > y :
19 smaller = y
20 else :
21 smaller = x
22 for i in range (1 , smaller +1) :
23 if (( x % i == 0) and ( y % i == 0) ) :
24 hcf = i
25

26 return hcf

• After a file which includes a function is created and saved, the function can be
used in interactive shell within the directory (with the file) or in other files in the
same directory as a module.

XIf the saved filename is samp_func.py, the function can be called in from
another program file in the same directory.
In [249]: 1 import samp_func as sf
2 print sf . lcommon (3 ,29)

Out[249]: 1 87.0

XIf the program is run, you can get the number 87, which is the least
common multiple of 3 and 29.
5.4. Python Classes 57

The module file can also be run as an standalone program if the following block of
code is added to the end.
In [250]: 1 if __name__ == " __main__ " :
2 import sys
3 lcommon ( int ( sys . argv [1]) , int ( sys . argv [2]) )
4 computeHCF ( int ( sys . argv [1]) , int ( sys . argv [2]) )

Also, the variables defined in the module file can be accessed as long as it is not
within functions of the module.
In [251]: 1 somevariable =[ '1 ' ,2 ,4]

In [252]: 1 import samp_func as sf


2 print sf . somevariable

Out[252]: 1 [ '1 ' ,2 ,4]

A list of all the objects from the module can be obtained by using dir() as
In [253]: 1 dir ( samp_func )

5.4. P YTHON C LASSES


As Python is a fully object oriented language, it provides class that allows you to create
(instantiate) an object. A class is something that just contains structure – it defines
how something should be laid out or structured, but doesn’t actually fill in the content.
This is useful when a set of operation is to be carried out in several instances, and
provides a distinction for every object created. The following is an example class taken
from here.
In [254]: 1 import math
2 class Point :
3 def __init__ ( self , x , y ) :
4 self . x = x
5 self . y = y
6

7 def __str__ ( self ) :


8 return " Point (% d , % d ) " % ( self .x , self . y )
9

10 def d i s t a n c e _ f r o m _ o r i g i n ( self ) :
11 return math . sqrt ( self . x **2 + self . y **2)

• It is customary to name class using upper case letters.


5.5. Additional Relevant Modules 58

• __init__ and self are critical to create an object.

• self is the object that will be created when the class is called

• __init__ creates the object self and assigns the attributes x and y to it.

In [255]: 1 p1 = Point (1 ,4)


2 p2 = Point (2 ,3)
3 p1 . x

Out[255]: 1 1

In [256]: 1 p1 . d i s t a n c e _ f r o m _ o r i g i n ()

Out[256]: 1 4.123105625617661

In [257]: 1 p2 . d i s t a n c e _ f r o m _ o r i g i n ()

Out[257]: 1 3.605551275463989

Some simple and easy to understand examples of class are provided in:

• http://www.jesshamrick.com/2011/05/18/an-introduction-to-classes-and-inheritance-
in-python/

• https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-
object-oriented-programming/

5.5. A DDITIONAL R ELEVANT M ODULES


This section briefly introduces modules that are necessary while executing a Python
program in the most efficient way.

5.5.1. SYS M ODULE


This module provides access to some variables used or maintained by the Python
interpreter.

• argv: Probably, the most useful of sys methods. sys.argv is a list object contain-
ing the arguments while running a python script from command lines. The first
element is always the program name. Any number of arguments can be passed
into the program as strings, e.g., sys.argv[1] is the second argument and so on.
5.5. Additional Relevant Modules 59

• byteorder: Remember the byteswap? sys.byteorder provides info on the machine


Endianness.

• path: The default path that Python searches is stored in sys.path. If you have
written modules and classes, and want to access it from anywhere, you can add
path to sys.path as,
In [258]: 1 sys . path . append ( ' path to your directory ')

A full set of sys methods is provided here.

5.5.2. OS M ODULE
This module provides a unified interface to a number of operating system functions.
There are lots of useful functions for process management and file object creation in
this module. Among them, it is especially useful to use functions for manipulating file
and directory, which are briefly introduced below. For details on ‘OS module’, click
here.
Before using file and directory commands, it is necessary to import os module as,
In [259]: 1 import os
2 os . getcwd ()

Xsame as pwd in UNIX. Stands for present working directory and displays the
absolute path to the current directory.
In [260]: 1 os . mkdir ( ' dirname ')

Xsame as mkdir in UNIX. Makes a new directory. dirname can be absolute or


relative path to the directory you want to create.
In [261]: 1 os . remove ( ' filename ')

Xsame as rm in UNIX. Removes a file.


In [262]: 1 os . rmdir ( ' dirname ')

Xsame as rm -r in UNIX. Removes a directory.


In [263]: 1 os . chdir ( ' dirpath ')

Xsame as cd in UNIX. Change directory to the location shown by dirpath.


dirpath can be absolute or relative.
5.5. Additional Relevant Modules 60

In [264]: 1 os . listdir ( ' dirpath ')

Xsame as ls in UNIX. Lists all files in a directory located at dirpath.


In [265]: 1 os . path . exists ( ' filepath ')

XA very useful os function that checks if a file exists. Returns True if it exists
and False if not.
If you want to know more about these functions, follow this.

5.5.3. E RRORS AND E XCEPTIONS


There are two types of errors in Python, in general: Syntax Errors and Exceptions.
Syntax error are caused by errors in Syntax. Exceptions may be caused by Value,
Name, IO, Type, etc. A full list of exceptions in Python is here.
Python has builtin functions (try and except) to handle the exceptions. An example
is below:
In [266]: 1 while True :
2 try :
3 x = int ( raw_input ( " Please enter a number : " ) )
4 break
5 except ValueError :
6 print " Oops ! That was no valid number . Try again ... "

XTries to convert x to int. If a string is passed, it will raise ValueError and


then goes to next iteration. The above is the simplest example, and there are many
other more "sophisticated" ways to handle exceptions here.
6
A DVANCED S TATISTICS AND M ACHINE
L EARNING

Using advanced statistics.

61
6.1. Quick overview 62

6.1. QUICK OVERVIEW


In this section, we will focus on a practical example to demonstrate the implementations
of some advanced statistics, specifically machine learning algorithms, to perform gap
filling of eddy covariance data. The concept is to take some gappy data and fill the holes
using the meteorological variables associated with the missing values, then compare
the methods. It should be noted that we will not go into depth about the statistical
methods themselves, but just give an example of the implementation. Indeed, in most
cases we will use the default hyper-parameters, which in nice for an overview, but bad
practice overall. One should always try to understand a method when implementing
it.

6.1.1. REQUIRED PACKAGES

This exercise will require the following packages (all should be available via "conda
install..."):

• numpy

• scipy

• pandas

• scikit-learn

• statsmodels

• netCDF4 (needs hdf4)

6.1.2. O VERVIEW OF DATA


Our sample dataset (provided by me), is a processed eddy covariance file, such as
what you would find from the FLUXNET database. If you are unfamiliar with eddy
covariance, don’t panic, just think of it as a fancy weather station that measures not
just the meteorological data, but how things come and go from the ecosystem (such
as water and carbon). This file is formatted where in half hourly resolution, so it
gives a value for each variable measured every half hour, or 48 points per day (17,520
points per year). One problem with eddy covariance datasets is that they tend to have
6.2. Import and prepare the data 63

missing values, or gaps, due to equipment failures or improper measuring conditions.


So to fix this, we can predict the missing values, or gap-fill the dataset. This particular
dataset has about 40% of the data missing. As we are not the first to deal with gappy
eddy covariance datasets, there is a current "standard" method involving sorting all
the values into a look-up table, where values from a similar time-span and meteo
conditions are binned, and the gaps are filled with mean from the bin. We will try
to fill the gaps using three statistical methods: random forest, neural networks, and a
multi-linear regression.

6.2. I MPORT AND PREPARE THE DATA


We will try to organize this project somewhat like you would a real project, which means
we will have a number of ".py" files in our project, as well as our data files. So to
start, find a nice cozy place in your file system (maybe something like in "Documents"
or "MyPyFiles") and create a new folder (maybe called "AdvStat").
In our nice, new, cozy folder, we can first copy the sample dataset, which should
have the file extension ".nc". Now we can make three files, one named "Calc.py",
one named "Regs.py", and one named "Plots.py". These files can be created and/or
opened into the Spyder IDE to make things a bit easier to work with, or simply in your
favorite text editor.
Now, starting in the "Calc.py" file, we can import numpy and netCDF4 to start us
off. We will import the variables we are interested in and convert them into a numpy
array. Because this provided file has over 300 variables, we will create a dictionary
containing only a subset of variables that we are interested in based on a list, namely:
1 IncludedVars =[ ' Tair_f ' , ' Rg_f ' , ' VPD_f ' , ' LE_f ' , ' LE ' ,
2 ' year ' , ' month ' , ' day ' , ' hour ']

So to build our dictionary, we can start with an empty dictionary (remember "")
called "df". Then we can loop through our IncludedVars and use each item in the list
as a key for df, and pair each key with a numpy array from the netCDF:
1 df [ var ]= np . array ( ncdf [ var ]) . flatten () [48*365:]

You may notice two things: first is that we not only turn our netCDF variable
into a numpy array, but we also call "flatten". This is because the netCDF has three
6.3. Setting up the gapfillers 64

dimensions (time, lat, lon), but as this is only one site, the lat and lon dimensions
don’t change, so we can just flatten the array to one dimension. Second is that we are
already slicing the data from 48*365 onwards. This is because the first year is only a
partial year, so we not only have some gaps in the fluxes, but in all the data, which
will mess us up a bit. Thankfully for you, I have been through this dataset and can tell
you to skip the first year. Now, this netCDF is fairly well annotated, so if you would
like more information on a variable, simply ask:
1 ncdf [ var ]

Some highlights are that we will be trying to gap-fill the "LE" variable (Latent
Energy, a measure of the water flux), which we can compare to the professionally filled
version "LE_f".
As this is a regression problem, we need to get things into an "X" vs "Y" format.
For the X variables we will use the following:
1 XvarList =[ ' Tair_f ' , ' Rg_f ' , ' VPD_f ' , ' year ' , ' month ' , ' day ' , ' hour ']

With our list, we can then create a 2 dimensional array in the form number-of-
samples by number-of-features. We can do this by first creating a list of the arrays,
then calling np.array and transposing. If we want to be fancy, we can do this in one
line as:
1 X = np . array ([ df [ var ] for var in XvarList ]) . T

and like magic we are all ready to go with the Xvars. The Y variable is also easy,
it is just equal to LE, which if we remeber is stored in our dictionary as df["LE"].
However, we will do a little trick that will seem a bit silly, but will make sense later.
Lets first store our Y variable name as a string, then set Y as:
1 yvarname = " LE "
2 Y = df [ yvarname ]

I promise this will come in handy. One final task is to figure out where the gaps
are, but we will come to the in the next section, which is...

6.3. S ETTING UP THE GAPFILLERS


Now we can move on to our second python file in our cozy folder: "Regs.py". This file
will hold some of our important functions that help us complete our quest of gapfilling.
6.3. Setting up the gapfillers 65

The only package to import will be numpy. After the import, we can make a very simple
function called "GetMask" that will find our gaps for us. As we extracted the data
from the netCDF, all gaps are given the value -9999, so our function will simply return
a boolean array where all gapped values are True. I tend to be a bit cautious, so I
usually look for things such as:
1 mask =( Y < -9000)

but you could easily say (Y==-9999). Don’t forget to return our mask at the end
of the function!
Now, so we don’t forget, we can go ahead and use this function in our "Calc.py"
file right away. First we need to tell "Calc.py" where to find the "GetMask", so in
"Calc.py" we simply
1 import Regs

and we can set our mask as:


1 mask = Regs . GetMask ( df [ yvarname ])

Easy as that! Now, we will want to keep everything tidy, so go ahead and also save
our mask into our dictionary (df) as something like "GapMask".
Now, lets go back to "Regs.py" and make a second function. This function will
take all the machine learning algorithms that we will use from the SKLearn package
and gap fill our dataset, so lets call it "GapFillerSKLearn" and it will take four input
variables: X,Y,GapMask, and model. As this function will be a bit abstract, let add
some documentation, which will be a string right after we define the function. I have
made an example documentation for our function here:
6.3. Setting up the gapfillers 66

1 def G a p F i l l e r S K L e a r n (X ,Y , GapMask , model ) :


2 """
3 G a p F i l l e r S K Le a r n (X ,Y , GapMask , model )
4

5 Gap fills Y via X with model


6

7 Uses the provided model to gap fill Y via the X vairiable


8

9 Parameters
10 ----------
11 X : numpy array
12 Predictor variables
13 Y : numpy array
14 Training set
15 GapMask : numpy boolean array
16 array indicating where gaps are with True
17

18 Returns
19 ---- ---
20 Y_hat
21 Gap filled Y as numpy array
22 """

Now that the function is documented, we will never forget what this function does.
So we can now move on to the actual function. The reason we can write this
function is because the SKLearn module organizes all of it’s regressions in the same
way, so the method will be called "model" whether it is a random forest or a neural
net. In all cases we fit the model as:
1 model . fit ( X [~ GapMask ] , Y [~ GapMask ])

where we are fitting only when we don’t have gaps. In this case the (tilda) inverts
the boolean matrix, making all Trues False and all Falses True, which in our case now
gives True to all indeces where we have original data. Next we can build our Y_hat
variable as an array of -9999 values by first creating an array of zeros and subtracting
-9999. This way, if we mess up somewhere, we can see the final values as a -9999.
Now, we can fill the gaps with by making a prediction of the model with the Xvars as
1 Y_hat [ GapMask ]= model . predict ( X [ GapMask ])

where we are no longer using the tilda ( ) because we want the gap indices. We can
return our Yh at at t heend o f our f unc t i onand movebackt oour "C al c.p y" f i l e.
6.4. Actually gapfilling 67

6.4. ACTUALLY GAPFILLING


With our X, Y, mask, and filling functions built, we can actually do some calculations.
For this, we will need to import some more packages, namely:
1 from sklearn . ensemble import R a n d o m F o r e s t R e g r e s s o r
2 from sklearn . neu ral_netw ork import MLPRegressor
3 import statsmodels . api as sm

where our random forest (RandomForestRegressor) and neural network (MLPRe-


gressor, or Multi-layer Perceptron) is from the SKLearn package and our linear model
will be from the statsmodel package. As everything is set up, we can immediately call
our SKLearn gap filler function as:
1 df [ yvarname + ' _RF ' ]= Regs . G ap F i l l e r S K L e a r n (X ,Y , mask ,
R a n d o m F o r e s t R e g r e s s o r () )

and likewise for the MLPRegressor (just remember to change the df key!). Note
that there are many, many options for both RandomForestRegressor and MLPRegressor
that should likely be changed, but as this is a quick overview, we will just use the
defaults. If you were to add the options, such as increasing to 50 trees in the random
forest, it would look like this
1 df [ yvarname + ' _RF ' ]= Regs . G ap F i l l e r S K L e a r n (X ,Y , mask ,
R a n d o m F o r e s t R e g r e s s o r ( n_estimators =50) )

Unfortunately we cannot use the same function for the linear model, as statsmodels
uses a slightly different syntax (note that SKLearn also has an implementation for linear
models, but it’s good to be well rounded). The statsmodels portion will look strikingly
similar to our "GapFillerSKLearn" function, but with some key differences:
1 X_ols = sm . add_constant ( X )
2 df [ yvarname + ' _OLS ' ]= Y
3 model = sm . OLS ( Y [~ mask ] , X_ols [~ mask ])
4 results = model . fit ()
5 df [ yvarname + ' _OLS ' ][ mask ]= results . predict ( X_ols [ mask ])

Basically, we have to add another row to our array that acts as the intercept variable,
then we run the same set of commands, but the pesky X’s and Y’s are switched in the
fit command, making it too different to adapt for our "GapFillerSKLearn" function.
Now, our script is basically done, and we can actually run it (in in Spyder, just press
f5).
6.5. And now the plots! 68

Depending on the speed of your computer, it may take a few seconds to run, more
than you might want to wait for over and over. Therefor, before we move on to the
"Plots.py" file, it would be a good idea to save the data so we don’t have to run it every
time. For this, we will use the "pickle" package. "pickle" does a nice job of saving
python objects as binary files, which Sujan loves, so after we import the package, we
can dump our pickle with:
1 pickle . dump ( df , open ( yvarname + " _GapFills . pickle " , " wb " ) )

You can notice that we save the file with our yvarname, which you will see can
come in handy.

6.5. A ND NOW THE PLOTS !


Now we can finally move on to our "Plots.py" file, where we will need the numpy,
pandas, and pickle packages. To start, we will keep things simple and just do a
comparison of each gap filling method to the standard "LE_f" from the datafile.
After comparing these, we will use a kernel density estimate to look at the distribution
of our gap-filled values compared to the real, measured values. So in total we will have
four figures.
First, we will use the exact same mysterious trick that we have been using where
we set the yvarname:
1 yvarname = " LE "

Again, mysterious and will be useful I promise.


Now we will need to load the datafile we just created from "Calc.py", but this
time instead of using a dictionary, as the data is all neatly named and every vector is
the same length, we can use the magic of Pandas! So as we load our pickle, we can
directly convert it to a Pandas DataFrame with
1 df = pd . DataFrame . from_dict ( pickle . load ( open ( yvarname + " _GapFills .
pickle " , " rb " ) ) )

Now, in the python or ipython console, you can explore "df" a little bit and see
that it is a nice and orderly DataFrame, which R users will feel right at home in. And
with this DataFrame, we can do much of our initial plotting directly, so we didn’t even
have to import Matplotlib.
6.5. And now the plots! 69

6.5.1. SCATTER PLOTS !

As we have three different methods to compare, we can write the plotting steps as a
function so we aviod doing all that copy and pasting. Lets call our function "GapComp"
and it will take the input variables df, xvar, yvar, and GapMask. First thing we will do
is make our scatter plot of the gap filled values. Pandas is actually bundled with much
of the plotting functionally built in, so the plot becomes one line:
1 fig = df [ GapMask ]. plot . scatter ( x = xvar , y = yvar )

Notice that we will be using our boolean array "GapMask" to index the entire
DataFrame, this is the magic of Pandas. Now, we could call it a day, but what fun
is a scatter plot without some lines on it. So, we will add the results of a linear
regression between our gap filling and the "LE_f" using the "linregress" function from
"scipy.stats" (go ahead and add it to the import list). "linregress" gives a nice output
of a simple linear regression including all the standard stuff:
1 slope , intercept , r_value , p_value , std_err = linregress ( df [ GapMask ][
xvar ] , df [ GapMask ][ yvar ])

Now that we have fit a model to our models, we can plot our line. We will need
an x variable that can fill our line, which we can use the "numpy.linspace" command
as
1 x = np . linspace ( df [ GapMask ][ yvar ]. min () , df [ GapMask ][ yvar ]. max () )

And finally, we can print our line with a nice label showing both our equation and
the r 2 value with
1 fig . plot (x , x * slope + intercept , label = " y ={0:0.4}* x +{1:0.4} , r ^2={2:0.4} "
. format ( slope , intercept , r_value **2) )
2 fig . legend ()

And that finishes our function. We can now plot all of our models with a neat little
for loop:
1 for var in [ " _RF " , ' _NN ' , ' _OLS ' ]:
2 GapComp ( df , yvarname + var , yvarname + " _f " , df . GapMask )
6.6. Bonus points! 70

6.5.2. D ISTRIBUTIONS WITH KDE


With the first three plot done, we can move on to our kernel density plots. again
Pandas will make our lives easier as instead of "df.plot.scatter" we use "df.plot.kde".
Remember we want to compare both our gap filling techniques and the "LE_f" with
the distribution of the real dataset. We can start with plotting the filled dataset using
only the filled values ("df.GapMask"). One fancy trick of Pandas is you can pass a list
of columns, and it will plot all of them. However, because of our mysterious magic
trick with "yvarname", we have to build this list with a little loop, which looks like
1 [ yvarname + ending for ending in ( " _RF " , ' _NN ' , ' _OLS ' ," _f " ) ]

Now we can pass this fancy list, either as a named variable, or in a one-liner if we
are even fancier, to the command
1 KDEs = df [ df . GapMask ][ ThisFancyList ]. plot . kde ()

where our plot is saved as the variable KDEs. Now, we have to plot our final KDE
from the "LE" column, but we can no longer call it using "KDEs.plot" like we did for
our line in the "GapComp" function. What we have to do then is tell the "df.plot.kde"
command which plot we want it in. For this, we pass the "ax=" argument like so
1 df [~ df . GapMask ][[ yvarname ]]. plot . kde ( ax = KDEs )

and viola, our plotting is complete! There, some advanced statistics, easy as cake.

6.6. B ONUS POINTS !


For some bonus points, you can gap fill another variable called "NEE". NEE stands
for net ecosystem exchange, and it measures how carbon comes and goes from the
ecosystem. All you have to do is extract it from the netCDF (both the NEE and
NEE_f), then switch out all the times you reference LE (hint, we can finally use the
magic trick).
7
D ATA V ISUALIZATION AND P LOTTING

An introduction to plotting using matplotlib and Bokeh

71
7.1. Plotting a simple figure 72

The first part of this chapter introduces plotting standard figures using matplotlib
and the second part introduces interactive plotting using Bokeh.
For comprehensive set of examples with source code used to plot the figure using
matplotlib, click here. For the same for Bokeh, click here.

7.1. P LOTTING A SIMPLE FIGURE


Read the data in the data folder using:
In [267]: 1 import numpy as np
2 dat = np . loadtxt ( ' data / FD - P r e c i p i t a t i o n _ G a n g e s _ d a i l y _ k g . txt ') [:365]

First, a figure object can be defined. fisize is the figure size in (width,height) tuple.
The unit is inches.
In [268]: 1 from matplotlib import pyplot as plt
2 plt . Figure ( figsize =(3 ,4) )

In [269]: 1 plt . plot ( dat )


2 plt . show ()

There are several keyword arguments such as color, style and so on that control
the appearance of the line object. They are listed here. The line and marker styles in
matplotlib are shown in Table 7.1.
For axis labels and figure title:
In [270]: 1 plt . xlabel ( ' time ')
2 plt . ylabel ( ' Precip ' , color = 'k ' , fontsize =10)
3 plt . title ( ' One Figure ')

The axis limits can be set by using xlim() and ylim() as:
In [271]: 1 plt . xlim (0 ,200)
2 plt . ylim (0 ,1 e14 )

In [272]: 1 plt . text (0.1 ,0.5 , ' the first


2 text ' , fontsize =12 , color = ' red ' , rotation =45 , va = ' bottom ')
3 plt . text (0.95 ,0.95 , ' the second text ' , fontsize =12 , color = ' green ' , ha = '
right ' , transform = plt . gca () . transAxes )
4 plt . figtext (0.5 ,0.5 , ' the third text ' , fontsize =12 , color = ' blue ')

The color and fontsize can be change. For color, use color= some color name
such as 'red' or color= hexadecimal color code such as '#0000FF'. For font size, use
fontsize=number (number is > 0). Also, grid lines can be turned on by using
7.2. Multiple plots in a figure 73

Table 7.1: Line and marker styles

Line style Marker style


Linestyle Lines Marker Signs
'Solid' — 'o' Circle
'Dashed' −− 'v' Triangle_down
'Dotted' ··· '∧' Triangle_up
'<' Triangle_left
'>' Triangle_right
's' Square
'h' Hexagon
'+' Plus
'x' X
'd' Diamond
'p' pentagon

Also, grid lines can be turned on by using


In [273]: 1 plt . grid ( which = ' major ' , axis = 'x ' , ls = ': ' , lw =0.5)

To set the scale to log


In [274]: 1 plt . yscale ( ' log ')

7.2. M ULTIPLE PLOTS IN A FIGURE


Matplotlib has several methods to make subplots within a figure. Here are some quick
examples of using the ’mainstream’ subplots.
In [275]: 1 selVars = ' Precipitation Runoff '. split ()
2 nrows =2
3 ncols =1
4 plt . Figure ( figsize =(3 ,4) )
5 for _var in selVars :
6 dat = np . loadtxt ( ' data / FD - '+ _var + ' _ G a n g e s _ d a i ly _ k g . txt ') [:365]
7 spI = selVars . index ( _var ) +1
8 plt . subplot ( nrows , ncols , spI )
9 plt . plot ( dat )
7.3. Plot with Dates 74

7.3. P LOT WITH D ATES


The datetime module supplies classes for manipulating dates and times. This module
comes in handy when calculating temporal averages, such as monthly mean from daily
time series. When these date objects are combined with dates functions of matplotlib,
time series data can be plotted with axis formatted as dates. First import the necessary
modules and functions. timeOp is a self made module consisting of functions to convert
daily data to monthly or yearly data.
In [276]: 1 import timeOp as tmop # a self made module to compute monthly data
from daily data considering calendar days and so on
2 import numpy as np
3 import matplotlib as mpl
4 from matplotlib import pyplot as plt
5 from matplotlib import dates
6 import datetime
7 dat1 = np . loadtxt ( ' data / FD - P r e c i p i t a t i o n _ A m a z o n _ d a i l y _ k g . txt ')

Now, date objects can be created using datetime module. In the current file, the
data is available from 1979-01-01 to 2007-12-31. Using these date instances, a range
of date object can be created by using step of dt, that is again a timedelta object from
datetime.
In [277]: 1 sdate = datetime . date (1979 ,1 ,1)
2 edate = datetime . date (2008 ,1 ,1)
3 dt = datetime . timedelta ( days =30.5)
4 dates_mo = dates . drange ( sdate , edate , dt )

Using the functions within tmop module, monthly and year data are created.
In [278]: 1 dat_mo = np . array ([ np . mean ( _m ) for _m in tmop . day2month ( dat1 , sdate ) ])
2 dat_y = np . array ([ np . mean ( _y ) for _y in tmop . day2year ( dat1 , sdate ) ])

Next up, we create axes instances on which the plots will be made. These axes
objects are the founding blocks of all subplots like object in Python and form the
basics for having as many subplots as one wants in a figure. It is defined by using
axes command with [lower left x, lower left y, width, and height] as an argument. The
co-ordinates and sizes are given in relative terms of figure, and thus, they vary from 0
to 1.
In [279]: 1 ax1 = plt . axes ([0.1 ,0.1 ,0.6 ,0.8])
2 ax1 . plot_date ( dates_mo , dat_mo , ls = ' - ' , marker = None )

XWhile plotting dates, plot_date function is used with the date range as the
7.4. Scatter Plots 75

x variable and data as the y variable. Note that the sizes of x and y variables should
be the same. Automatically, the axis is formatted as years.
In [280]: 1 ax2 = plt . axes ([0.75 ,0.1 ,0.25 ,0.8])
2 ax2 . plot ( dat_mo . reshape ( -1 ,12) . mean (0) )
3 ax2 . set_xticks ( range (12) )
4 ax2 . se t_ x ti ck la b el s ([ ' Jan ' , ' Feb ' , ' Mar ' , ' Apr ' , ' May ' , ' Jun ' , ' Jul ' , ' Aug ' ,
' Sep ' , ' Oct ' , ' Nov ' , ' Dec '] , rotation =90)
5 plt . show ()

XSometimes, it is easier to set the ticks and labels manually. In this case, the
mean seasonal cycle is plotted normally, and the xticks are changed to look like dates.
Remember that with proper date range object, this can be achieved automatically with
plot_date as well.
XMatplotlib has a dedicated ticker module that handles the location and for-
matting of the ticks. Even though we dont go through the details, we recommend
everyone to read and skim through the ticker page.

7.4. S CATTER P LOTS


Let’s read the data and import the modules first:
In [281]: 1 import numpy as np
2 from matplotlib import pyplot as plt
3 dat1 = np . loadtxt ( ' data / FD - P r e c i p i t a t i o n _ G a n g e s _ d a i l y _ k g . txt ') [:365]
4 dat2 = np . loadtxt ( ' data / FD - R u n o f f _ G a n g e s _ d a i l y _ k g . txt ') [:365]
5 dat3 = np . loadtxt ( ' data / FD - E v a p o r a t i o n _ G a n g e s _ d a i l y _ k g . txt ') [:365]

Once the data is read, we can open a figure object and start adding things to it.
In [282]: 1 plt . Figure ( figsize =(3 ,4) )
2 plt . scatter ( dat1 , dat2 , facecolor = ' blue ' , edgecolor = None )
3 plt . scatter ( dat1 , dat3 , marker = 'd ' , facecolor = ' red ' , alpha =0.4 , edgewidth
=0.7)
4 plt . xlabel ( ' Precip ( $kg \ d ^{ -1} $ ) ')
5 plt . ylabel ( ' Runoff or ET ( $ \\ frac { kg }{ d }) $ ' , color = 'k ' , fontsize =10)
6 plt . grid ( which = ' major ' , axis = ' both ' , ls = ': ' , lw =0.5)
7 plt . title ( 'A scatter ')

Xscatter has a slightly different name for colors. The color of the marker, and
the lines around it can be set separately using facecolor or edgecolor respectively. It
also allows changing the transparency using alpha argument. Note than the width of
the the line around the markers is set by edgewidth and not linewidth like in plot.
7.5. Playing with the Elements 76

In [283]: 1 plt . legend (( ' Runoff ' , ' ET ') , loc = ' best ')

7.5. P LAYING WITH THE E LEMENTS


Until now, it’s been a dull and standard plotting library. The figure comprises of several
instances or objects which can be obtained from several methods, and then modified.
This makes customization of a figure extremely fun. Here are some examples of what
can be done.

• The Ugly lines: The boxes around figures are stored as splines, which is actually
a dictionary object with information of which line, and their properties. In the
rem_axLine function of plotTools, you can see that the linewidth of some of the
splines have been set to zero.
In [284]: 1 import plotTools as pt
2 pt . rem_axLine ()

• Getting the limits of the axis from the figure. Use gca() method of pyplot to get
x and y limits.
In [285]: 1 ymin , ymax = plt . gca () . get_ylim ()
2 xmin , xmax = plt . gca () . get_xlim ()

• Let’s draw that 1:1 line.


In [286]: 1 plt . arrow ( xmin , ymin , xmax , ymax , lw =0.1 , zorder =0)

• A legendary legend: Here is an example of how flexible a legend object can be. It
has a tonne of options and methods. Sometimes, becomes a manual calibration.
7.6. Map Map Map! 77

In [287]: 1 leg = plt . legend (( ' Runoff ' , ' ET ') , loc =(0.05 ,0.914) , markerscale
=0.5 , scatterpoints =4 , ncol =2 , fancybox = True , handlelength =3.5 ,
handletextpad =0.8 , borderpad =0.1 , labelspacing =0.1 ,
columnspacing =0.25)
2 leg . get_frame () . set_linewidth (0)
3 leg . get_frame () . set_facecolor ( ' firebrick ')
4 leg . legendPatch . set_alpha (0.25)
5 texts = leg . get_texts ()
6 for t in texts :
7 tI = texts . index ( t )
8 # t . set_color ( cc [ tI ])
9 plt . setp ( texts , fontsize =7.83)

7.6. M AP M AP M AP !
This section explains the procedure to draw a map using basemap and matplotlib.

7.6.1. G LOBAL D ATA


Let’s read the data that we will use to make the map. The data is stored as a big
endian plain binary. It consists of float32 data values, and has unknown number of
times steps, but it is at a spatial resolution of 1◦ .
In [288]: 1 import numpy as np
2 datfile = ' runoff .1986 -1995. bin '
3 data = np . fromfile ( datfile , np . float32 ) . byteswap () . reshape ( -1 ,180 ,360)
4 print ( np . shape ( data ) )

Once the data is read, first a map object should be created using basemap module.
In [289]: 1 from mpl_toolkits . basemap import Basemap
2 _map = Basemap ( projection = ' cyl ' , \
3 llcrnrlon = lonmin , \
4 urcrnrlon = lonmax , \
5 llcrnrlat = latmin , \
6 urcrnrlat = latmax , \
7 resolution = 'c ')

1. Set the projection and resolution of the background map:

Xresolution: specifies the resolution of the map. 'c', 'l', 'i', 'h', 'f'or None
can be used. 'c'(crude), 'l'(low), 'i'(intermediate), 'h'(high) and 'f'(full).

XThe lontitude and latitude for lower left corner and upper right corner can
7.6. Map Map Map! 78

be specified by llcrnrlon, llcrnrlat, urcrnrlon and urcrnrlat:

Xllcrnrlon: LONgitude of Lower Left hand CoRNeR of the desired map.

Xllcrnrlat: LATitude of Lower Left hand CoRNeR of the desired map.

Xurcrnrlon: LONgitude of Upper Right hand CoRNeR of the desired map.

Xurcrnrlat: LATitude of Upper Right hand CoRNeR of the desired map.

In the current case, the latitude and longitude of the lower left corner of the map
are set at the following values:
In [290]: 1 latmin = -90
2 lonmin = -180
3 latmax =90
4 lonmax =180

2. To draw coastlines, country boundaries and rivers:


In [291]: 1 _map . dr awcoastli nes ( color = 'k ' , linewidth =0.8)

Xcoastlines with black color and linewidth 0.8.


In [292]: 1 _map . drawcountries ( color = ' brown ' , linewidth =0.3)

Xdraws country boundaries.


In [293]: 1 _map . drawrivers ( color = ' navy ' , linewidth =0.3)

Xdraws major rivers

3. To add longitude and latitude labels:


In [294]: 1 latint =30
2 lonint =30
3 parallels = np . arange ( latmin + latint , latmax , latint )
4 _map . drawparallels ( parallels , labels =[1 ,1 ,0 ,0] , dashes =[1 ,3] ,
linewidth =.5 , color = ' gray ' , fontsize =3.33 , xoffset =13)
5 meridians = np . arange ( lonmin + lonint , lonmax , lonint )
6 _map . drawmeridians ( meridians , labels =[1 ,1 ,1 ,0] , dashes =[1 ,3] ,
linewidth =.5 , color = ' gray ' , fontsize =3.33 , yoffset =13)

Xarange: Defines an array of latitudes (parallels) and longitude (meridians)


to be plotted over the map. In the above example, the parallels (meridians) are
drawn from 90◦ S to 90 ◦ N in every 30◦ (from -180◦ to 180◦ in every 30◦ ).
7.6. Map Map Map! 79

Xcolor: Color of parallels (meridians).

Xlinewidth: Width of parallels (meridians). If you want to draw only axis


label and donât want to draw parallels (meridians) on the map, linewidths should
be 0.

Xlabels: List of 4 values (default [0,0,0,0]) that control whether parallels


are labelled where they intersect the left, right, top or bottom of the plot. For
e.g., labels=[1,0,0,1] will cause parallels to be labelled where they intersect the
left and bottom of the plot, but not the right and top.

Xxoffset: Distance of latitude labels against vertical axis.

Xyoffset: Distance of longitude labels against horizontal axis.

In the example program, the lines and ticks around the map are also removed by
In [295]: 1 import plotTools as pt
2 pt . rem_axLine ([ ' right ' , ' bottom ' , ' left ' , ' top ' ])
3 pt . rem_ticks ()

Now the data are plotted over the map object as:
In [296]: 1 from matplotlib import pyplot as plt
2 fig = plt . figure ( figsize =(9 ,7) )
3 ax1 = plt . subplot (211)
4 _map . imshow ( np . ma . masked_less ( data . mean (0) ,0.) , cmap = plt . cm . jet ,
interpolation = ' none ' , origin = ' upper ' , vmin =0 , vmax =200)
5 plt . colorbar ( orientation = ' vertical ' , shrink =0.5)
6 ax2 = plt . axes ([0.18 ,0.1 ,0.45 ,0.4])
7 data_gm = np . array ([ np . ma . masked_less ( _data ,0) . mean () for _data in data
])
8 plt . plot ( data_gm )
9 data_gm_msc = data_gm . reshape ( -1 ,12) . mean (0)
10 pt . rem_axLine ()
11 ax3 = plt . axes ([0.72 ,0.1 ,0.13 ,0.4])
12 plt . plot ( data_gm_msc )
13 pt . rem_axLine ()
14 plt . show ()

XA subplot can be combined with axes in a figure. In this case, a global mean
of runoff and its mean seasonal scyle are plotted at axes ax2 and ax3, respectively.

7.6.2. C USTOMIZING A C OLORBAR


• To specify orientation of colorbar,
7.6. Map Map Map! 80

In [297]: 1 colorbar ()

Xdefault orientation is vertical colorbar on right side of the main plot.


In [298]: 1 colorbar ( orientation = 'h ')

Xwill make a horizontal colorbar below the main plot.

• To specify the area fraction of total plot area occupied by colorbar:


In [299]: 1 colorbar ()

Xdefault fraction is 0.15 (see Fig. ??).


In [300]: 1 colorbar ( fraction =0.5)

X50% of the plot area is used by colorbar (see Fig. ??).

• To specify the ratio of length to width of colorbar:


In [301]: 1 colorbar ( aspect =20)

Xlength:width = 20:1.

Various other colormaps are available in python. Fig. 7.1 shows some commonly
used colorbars and the names for it. More details of the options for colorbar can be
found here.

Figure 7.1: Some commonly used colormaps

For a list of all the colormaps available in python, click here.

You might also like