Python Scripting With Spatial Data
Python Scripting With Spatial Data
Processing.
Aberystwyth University
Institute of Geography and Earth Sciences.
Acknowledgements
The authors would like to acknowledge to the supports of others but specifically
(and in no particular order) Prof. Richard Lucas, Sam Gillingham (developer of
RIOS and TuiView) and Neil Flood (developer of RIOS) for their support and
time.
ii
Authors
Peter Bunting
Dr Pete Bunting joined the Institute of Geography and Earth Sciences (IGES),
Aberystwyth University, in September 2004 for his Ph.D. where upon completion
in the summer of 2007 he received a lectureship in remote sensing and GIS. Prior
to joining the department, Peter received a BEng(Hons) in software engineering
from the department of Computer Science at Aberystwyth University. Pete also
spent a year working for Landcare Research in New Zealand before rejoining IGES
in 2012 as a senior lecturer in remote sensing.
Contact Details
EMail: pfb@aber.ac.uk
Senior Lecturer in Remote Sensing
Institute of Geography and Earth Sciences
Aberystwyth University
Aberystwyth
Ceredigion
SY23 3DB
United Kingdom
iii
iv
Daniel Clewley
Dr Dan Clewley joined IGES in 2006 undertaking an MSc in Remote Sensing
and GIS, following his MSc Dan undertook a Ph.D. entitled Retrieval of Forest
Biomass and Structure from Radar Data using Backscatter Modelling and Inversion under the supervision of Prof. Lucas, Dr. Bunting and Prof. Mahta Moghaddam. Prior to joining the department Dan completed his BSc(Hons) in Physics
within Aberystwyth University. Dan is currently an Airborne Remote Sensing
Data Analyst at Plymouth Marine Laboratory. He writes a blog on open source
software in GIS and Remote Sensing (http://spectraldifferences.wordpress.
com/)
Contact Details
Email: daniel.clewley@gmail.com
Table of Contents
1 Introduction
1.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
What is Python? . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2
1.1.3
A word of warning . . . . . . . . . . . . . . . . . . . . . . .
1.2.1
Software in Python . . . . . . . . . . . . . . . . . . . . . . .
1.3
Python Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Installing Python . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5
Text Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1
Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.2
Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.3
Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.4
Starting Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1
Indentation . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.2
Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.3
File Naming . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.6
TABLE OF CONTENTS
1.7
vi
1.6.4
Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.5
1.6.6
1.6.7
Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . .
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 The Basics
10
2.1
2.2
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4
2.5
2.3.1
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2
Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3
Text (Strings) . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.4
Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1
List Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2
n-dimensional list . . . . . . . . . . . . . . . . . . . . . . . . 20
IF-ELSE Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1
2.6
Logic Statements . . . . . . . . . . . . . . . . . . . . . . . . 21
Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1
while Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2
for Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.8
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Text Processing
26
TABLE OF CONTENTS
vii
3.1
3.2
3.3
Programming Styles . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4
. . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.1
3.3.2
3.5
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
41
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2
Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3
4.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Plotting - Matplotlib
48
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2
Simple Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3
Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4
Pie Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.5
Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.6
Line Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.7
Exercise: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.8
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
TABLE OF CONTENTS
viii
59
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2
Simple Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.1
6.3
Calculate Biomass . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.1
6.4
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Linear Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.4.1
6.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
79
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2
7.3
7.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.5
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2
93
. . . . . . . . . . . . . . . . . . . . 93
8.1.2
8.1.3
No Data Values . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.4
Band Name . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.5
TABLE OF CONTENTS
ix
8.2.1
8.2.2
8.2.3
8.2.4
8.2.5
8.3
8.4
8.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.6
124
9.1
9.2
9.2.2
9.3
9.4
9.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.6
145
TABLE OF CONTENTS
167
List of Figures
5.1
5.2
5.3
5.4
5.5
5.6
6.1
xi
List of Tables
1.1
2.1
2.2
3.1
6.1
Chapter 1
Introduction
1.1
1.1.1
Background
What is Python?
CHAPTER 1. INTRODUCTION
1.1.2
Python can be used for almost any task from simple file operations and text
manipulation to image processing. It may also be used to extend the functionality
of other, larger applications.
1.1.3
A word of warning
There are number of different versions of python and these are not always compatible. For these worksheets we will be using version 3.X (at the time of writing
the latest version is 3.3.0). With the exception of the quiz in Chapter 2, where
raw_input must be used instead of input, the examples will also work python 2.7.
One of the most noticeable differences between python 2 and python 3 is that the
print statement is now a function. So whilst:
print "Hello World"
will work under python 2, scripts using it wont run under python 3 and must
use:
print("Hello World")
1.2
1.2.1
Many applications have been built in python and a quick search of the web
will reveal the extent of this range. Commonly, applications solely developed
in python are web applications, run from within a web server (e.g., Apache; http:
CHAPTER 1. INTRODUCTION
//httpd.apache.org with http://www.modpython.org) but Desktop applications and data processing software such as TuiView (https://bitbucket.org/
chchrsc/tuiview) and RIOS (https://bitbucket.org/chchrsc/rios) have also
been developed.
In large standalone applications Python is often used to facilitate the development
of plugins or extensions to application. Examples of python used in this form
include ArcMap and SPSS.
For a list of applications supporting or written in python refer to the following
website http://en.wikipedia.org/wiki/Python_software.
1.3
Python Libraries
1.4
Installing Python
CHAPTER 1. INTRODUCTION
1.5
Text Editors
To write your Python scripts a text editor is required. A simple text editor such
as Microsofts Notepad will do but it is recommended that you use a syntax aware
editor that will colour, and in some cases format, your code automatically. There
are many text editors available for each operating system and it is up to you
to choose one to use. The recommend editor for this course is Spyder which is
installed with Anaconda. From within Spyder you can directly run your Python
scripts (using the run button), additionally it will alert you to errors within your
scripts before you run them.
1.5.1
Windows
CHAPTER 1. INTRODUCTION
1.5.2
Linux
Under Linux either the command line editor ne (nice editor), vim or its graphic
interface equivalent gvim is recommend but kdeveloper, gedit and many others are
also good choices.
1.5.3
Mac OS X
1.5.4
If you are writing your scripts on Windows and transferring them to a UNIX/Linux
machine to be executed (e.g., a High Performance Computing (HPC) environment)
then you need to be careful with the line ending (the invisible symbol defining the
end of a line within a file) as these are different between the various operating
systems. Using notepad++ line ending can be defined as UNIX and this is recommended where scripts are being composed under Windows.
Alternatively, if RSGISLib is installed then the command flip can be used to
convert the line ending, the example below converts to UNIX line endings.
flip -u InputFile.py
1.6
Starting Python
CHAPTER 1. INTRODUCTION
This opens python in interactive mode. It is possible to perform some basic maths
try:
>>> 1 + 1
2
To exit type:
>>>exit()
1.6.1
Indentation
There are several basic rules and syntax which you need to know to develop scripts
within Python. The first of which is code layout. To provide the structure of the
script Python uses indentation. Indentation can be in the form of tabs or spaces
but which ever is used needs to be consistent throughout the script. The most
common and recommend is to use 4 spaces for each indentation. The example
given below shows an if-else statement where you can see that after the if part
the statement which is executed if the if-statement is true is indented from rest of
the script as with the corresponding else part of the statement. You will see this
indentation as you go through the examples and it is important that you follow
the indentation shown in the examples or your scripts will not execute.
1
2
3
4
if x == 1:
x = x + 1
else:
x = x - 1
1.6.2
Keywords
As with all scripting and programming languages python has a set of keywords,
which have special meanings to the compiler or interpreter when the code is executed. As with all python code, these keywords are case sensitive i.e., else is a
CHAPTER 1. INTRODUCTION
1.6.3
File Naming
It is important that you use sensible and identifiable names for all the files you
generate throughout these tutorial worksheets otherwise you will not be able to
identify the script at a later date. Additionally, it is highly recommended that you
do not included spaces in file names or in the directory path you use to store the
files generated during this tutorial.
1.6.4
Case Sensitivity
Something else to remember when using python, is that the language is case sensitivity therefore if a name is in lowercase then it needs to remain in lowercase
everywhere it is used.
For example:
VariableName is not the same as variablename
1.6.5
In the examples provided (in the text) file paths are given as ./PythonCourse/TutorialX/File.xxx.
When writing these scripts out for yourself you will need to update these paths to
the location on your machine where the files are located (e.g., /home/pete.bunting
CHAPTER 1. INTRODUCTION
or C:\). Please note that it is recommended that you do not have any spaces within
your file paths. In the example (answer) scripts provided no file path has been
written and you will therefore need to either save input and output files in the
same directory as the script or provide the path to the file. Please note that under
Windows you need to insert a double slash (i.e., \\) within the file path as a single
slash is an escape character (e.g., \n for new line) within strings.
1.6.6
There is a significant step to be made from working your way through notes and
examples, such as those provided in this tutorial, and independently developing
your own scripts from scratch. Our recommendation for this, and when undertaking the exercises from this tutorial, is to take it slowly and think through the steps
you need to undertake to perform the operation(s) you need.
I would commonly first write the script using comments or on paper breaking the
process down into the major steps required. For example, if I were asked to write
a script to uncompress a directory of files into another directory I might write the
following outline, where I use indentation to indicate where a process is part of
the parent:
1
2
3
# Get output directory (where the files, once uncompressed, will be placed).
4
5
6
7
8
9
10
By writing the process out in this form it makes translating this into python much
simpler as you only need to think of how to do small individual elements in python
and not how to do the whole process in one step.
CHAPTER 1. INTRODUCTION
1.6.7
Getting Help
Python provides a very useful help system through the command line. To get
access to the help run python from the terminal
> python
To exit the help system just press the q key on the keyboard.
1.7
Further Reading
Chapter 2
The Basics
2.1
To create your first python script, create a new text file using your preferred text
editor and enter the text below:
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
#######################################
# A simple Hello World Script
# Author: <YOUR NAME>
# Emai: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
print(Hello World)
Save your script to file (e.g., helloworld.py) and then run it either using a command
prompt (Windows) or Terminal (UNIX), using the following command:
> python helloworld.py
Hello World
10
11
To get a command prompt under Windows type cmd from the run dialog box
in the start menu (Start run), further hints for using the command prompt are
given below. Under OS X, terminal is located in within the Utilities folder in
Applications. If you are using Spyder to create your Python scripts you can run
by clicking the run button.
Hints for using the Windows command line
cd allows you to change directory, e.g.,
cd directory1\directory2
dir allows you to list the contents of a directory, e.g.,
dir
To change drives, type the drive letter followed by a colon, e.g.,
D:
If a file path has spaces, you need to use quote, e.g, to change directory:
cd "Directory with spaces in name\another directory\"
2.2
Comments
In the above script there is a heading detailing the script function, author, and
version. These lines are preceded by a hash (#), this tells the interpreter they
are comments and are not part of the code. Any line starting with a hash is a
comment. Comments are used to annotate the code, all examples in this tutorial
use comments to describe the code. It is recommended you use comments in your
own code.
2.3
12
Variables
The key building blocks within all programming languages are variables. Variables
allow data to be stored either temperately for use in a single operation or throughout the whole program (global variables). Within python the variable data type
does not need to be specified and will be defined by the first assignment. Therefore,
if the first assignment to a variable is an integer (i.e., whole number) then that
variable will be an integer for the remained of the program. Examples defining
variables are provided below:
name = Pete # String
age = 25 # Integer
height = 6.2 # Float
2.3.1
Numbers
2.3.2
Boolean
The boolean data type is the simplest and just stores a true or false value, an
example of the syntax is given below:
13
2.3.3
Text (Strings)
To store text the string data type is used. Although not a base data type like
a float or int a string can be used in the same way. The difference lies in the
functions available to manipulate a string are similar to those of an object. A
comprehensive list of functions is available for a string is given in the python
documentation http://docs.python.org/lib/string-methods.html.
To access these functions the string modules needs to be imported as shown in the
example below. Copy this example out and save it as StringExamples.py. When
you run this script observe the change in the printed output and using the python
documentation to identify what each of the functions lstrip(), rstrip() and strip()
do.
1
#! /usr/bin/env python
2
3
4
#######################################
# Example with strings
5
6
7
8
9
14
10
11
import string
12
13
stringVariable =
Hello World
14
15
print(\ +
stringVariable + \)
16
17
18
stringVariable_lstrip = stringVariable.lstrip()
print(lstrip: \ + stringVariable_lstrip + \)
19
20
21
stringVariable_rstrip = stringVariable.rstrip()
print(rstrip: \ + stringVariable_rstrip + \)
22
23
24
stringVariable_strip = stringVariable.strip()
print(strip: \ + stringVariable_strip + \)
2.3.4
An example script illustrating the use of variables is provided below. It is recommend you copy this script and execute making sure you understand each line. In
addition, try making the following changes to the script:
1. Adding your own questions.
2. Including the persons name within the questions.
3. Remove the negative marking.
1
#! /usr/bin/env python
2
3
4
5
6
#######################################
# A simple script illustrating the use of
# variables.
# Author: <YOUR NAME>
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
print(Question 1:)
answer = getInput(ALOS PALSAR is a L band spaceborne SAR.\n)
if answer == y: # test whether the value returned was equal to y
print(Well done)
score = score + 1 # Add 1 to the score
else: # if not then the anser must be incorrect
15
48
49
print(Bad Luck)
score = score - 1 # Remove 1 from the score
50
51
52
53
54
55
56
57
58
59
print(Question 2:)
answer = getInput(CASI provides hyperspectral data in \
the Blue to NIR part of the spectrum.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
60
61
62
63
64
65
66
67
68
69
print(Question 3:)
answer = getInput(HyMap also only provides data in the \
Blue to NIR part of the spectrum.\n)
if answer == y:
print(Bad Luck)
score = score - 1
else:
print(Well done)
score = score + 1
70
71
72
73
74
75
76
77
78
print(Question 4:)
answer = getInput(Landsat is a spaceborne sensor.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
79
80
81
82
83
84
85
86
87
88
print(Question 5:)
answer = getInput(ADS-40 is a high resolution aerial \
sensor capturing RGB-NIR wavelengths.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
16
89
90
91
92
93
94
95
96
97
98
print(Question 6:)
answer = getInput(eCognition is an object oriented \
image analysis software package.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
99
100
101
102
103
104
105
106
107
108
print(Question 7:)
answer = getInput(Adobe Photoshop provides the same \
functionality as eCognition.\n)
if answer == y:
print(Bad Luck)
score = score - 1
else:
print(Well done)
score = score + 1
109
110
111
112
113
114
115
116
117
118
print(Question 8:)
answer = getInput(Python can be executed within \
the a java virtual machine.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
119
120
121
122
123
124
125
126
127
128
129
print(Question 9:)
answer = getInput(Python is a scripting language \
not a programming language.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
17
130
131
132
133
134
135
136
137
18
print(Question 10:)
answer = getInput(Aberystwyth is within Mid Wales.\n)
if answer == y:
print(Well done)
score = score + 1
else:
print(Bad Luck)
score = score - 1
138
139
140
2.4
Lists
Each of the data types outlined above only store a single value at anyone time, to
store multiple values in a single variable a sequence data type is required. Python
offers the List class, which allows any data type to be stored in a sequence and even
supports the storage of objects of different types within one list. The string data
type is a sequence data type and therefore the same operations are available.
List are very flexible structures and support a number of ways to create, append
and remove content from the list, as shown below. Items in the list are numbered
consecutively from 0-n, where n is one less than the length of the list.
Additional functions are available for List data types (e.g., len(aList), aList.sort(),
aList.reverse()) and these are described in http://docs.python.org/lib/typesseq.
html and http://docs.python.org/lib/typesseq-mutable.html.
2.4.1
List Examples
#! /usr/bin/env python
2
3
4
5
6
7
8
9
#######################################
# Example with lists
# Author: <YOUR NAME>
# Emai: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
12
13
14
# Create List:
aList = list()
anotherList = [1, 2, 3, 4]
emptyList = []
15
16
17
18
print(aList)
print(anotherList)
print(emptyList)
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
aList.remove(Pete)
print(aList)
19
2.4.2
20
n-dimensional list
#! /usr/bin/env python
2
3
4
5
6
7
8
9
#######################################
# Example with n-lists
# Author: <YOUR NAME>
# Emai: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Create List:
aList = [
[1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,0,0,1,1,1,1,1,0,0,1,1,1],
[1,1,0,0,1,1,1,1,1,0,0,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,0,1,1,1,1,1,1,1],
[1,1,1,1,1,1,0,1,1,1,1,1,1,1],
[1,1,1,1,1,0,0,0,1,1,1,1,1,1],
[1,0,1,1,1,1,1,1,1,1,1,1,0,1],
[1,0,1,1,1,1,1,1,1,1,1,1,0,1],
[1,1,0,0,0,0,0,0,0,0,0,0,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1]
]
25
26
print(aList)
2.5
21
IF-ELSE Statements
As already illustrated in the earlier quiz example the ability to make a decision is
key to any software. The basic construct for decision making in most programming
and scripting languages are if-else statements. Python uses the following syntax
for if-else statements.
if <logic statement>:
do this if true
else:
do this
if <logic statement>:
do this if true
elif <logic statement>:
do this if true
elif <logic statement>:
do this if true
else
do this
Logic statements result in a true or false value being returned where if a value of
true is returned the contents of the if statement will be executed and remaining
parts of the statement will be ignored. If a false value is returned then the if part
of the statement will be ignored and the next logic statement will be analysis until
either one returns a true value or an else statement is reached.
2.5.1
Logic Statements
Table 2.2 outlines the main logic statements used within python in addition to
these statements functions which return a boolean value can also be used to for
decision making, although these will be described in later worksheets.
22
2.6
Looping
In addition to the if-else statements for decision making loops provide another key
component to writing any program or script. Python offers two forms of loops,
while and for. Each can be used interchangeably given the developers preference
and available information. Both types are outlined below.
2.6.1
while Loop
The basic syntax of the while loop is very simple (shown below) where a logic
statement is used to terminate the loop, when false is returned.
while <logic statement> :
statements
Therefore, during the loop a variable in the logic statement needs to be altered
allowing the loop to terminate. Below provides an example of a while loop to
count from 0 to 10.
1
#! /usr/bin/env python
2
3
4
5
6
7
#######################################
# A simple example of a while loop
# Author: <YOUR NAME>
# Emai: <YOUR EMAIL>
# Date: DD/MM/YYYY
8
9
23
# Version: 1.0
#######################################
10
11
12
13
14
count = 0
while count <= 10:
print(count)
count = count + 1
2.6.2
for Loop
A for loop provides similar functionality to that of a while loop but it provides the
counter for termination. The syntax of the for loop is provided below:
1
2
The common application of a for loop is for the iteration of a list and an example
if this is given below:
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
#######################################
# A simple example of a for loop
# Author: <YOUR NAME>
# Emai: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
12
13
14
A more advance example is given below where two for loops are used to iterate
through a list of lists.
1
2
#! /usr/bin/env python
3
4
5
6
7
8
9
24
#######################################
# Example with for loop and n-lists
# Author: <YOUR NAME>
# Emai: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Create List:
aList = [
[1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,0,0,1,1,1,1,1,0,0,1,1,1],
[1,1,0,0,1,1,1,1,1,0,0,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,0,1,1,1,1,1,1,1],
[1,1,1,1,1,1,0,1,1,1,1,1,1,1],
[1,1,1,1,1,0,0,0,1,1,1,1,1,1],
[1,0,1,1,1,1,1,1,1,1,1,1,0,1],
[1,0,1,1,1,1,1,1,1,1,1,1,0,1],
[1,1,0,0,0,0,0,0,0,0,0,0,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1]
]
25
26
27
28
29
30
31
2.7
Exercises
During this tutorial you should have followed through each of the examples and
experimented with the code to understand each of components outlined. To test
your understanding of all the material, you will now be asked to complete a series
of tasks:
1. Update the quiz so the questions and answers are stored in lists which are
25
2.8
Further Reading
Chapter 3
Text Processing
3.1
An example of a script to read a text file is given below, copy this example out
and use the numbers.txt file to test your script. Note, that the numbers.txt file
needs to be within the same directory as your python script.
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
#######################################
# A simple example reading in a text file
# two versions of the script are provided
# to illustrate that there is not just one
# correct solution to a problem.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
13
14
import string
15
16
17
18
26
19
20
27
numbers = list()
dataFile = open(numbers.txt, r)
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
print(numbers)
dataFile.close()
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
print(numbers)
dataFile.close()
As you can see reading a text file from within python is a simple process1 . The
first step is to open the file for reading, option r is used as the file is only going to
be read, the other options are available in Table 3.1. If the file is a text file then
1
If your data are in tabular format (e.g., CSV) the csv module in the Python Standard Library
and the genfromtxt from NumPy provide even simpler ways of reading data.
28
the contents can then be read a line at a time, if a binary file (e.g., tiff or doc)
then reading is more complicated and not covered in this tutorial.
Table 3.1: Options when opening a file.
File Mode
Operations
r
Open for read
w
Open for write (truncate)
a
Open for write (append)
r+
Open for read/write
w+
Open for read/write (truncate)
a+
Open for read/write (append)
rb
Open for binary read
wb
Open for binary write (truncate)
ab
Open for binary write (append)
rb+
Open for read/write
wb+
Open for read/write (truncate)
ab+
Open for read/write (append)
Now your need to adapt the one of the methods given in the script above to allow
numbers and words to be split into separate lists. To do this you will need to use
the isalpha() function alongside the isdigit() function. Adapt the numbers.txt file
to match the input shown below and then run your script and you should receive
the output shown below:
Input:
1,
2,pete,
3,
4,dan,5,
6,7,8,richard,10,11,12,13
Output:
>python simplereadsplit.py
[1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13]
[pete, dan, richard]
3.2
29
Writing to a text file is similar to reading from the file. When opening the file
two choices are available either to append or truncate the file. Appending to the
file leaves any content already within the file untouched while truncating the file
removes any content already within the file. An example of writing a list to a file
with each list item on a new line is given below.
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# A simple script parsing numbers of
# words from a comma seperated text file
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
14
15
16
dataFile = open(writetest.txt, w)
17
18
19
20
21
dataFile.close()
3.3
Programming Styles
There are two main programming styles, both of which are supported by python,
and these are procedural and object oriented programming. Procedural programming preceded object oriented programming and procedural scripts provide lists
of commands which are run through sequentially.
30
3.3.1
When creating a procedural python script each of your files will have the same
basic format outlined below:
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
#######################################
# Comment explaining scripts purpose
# Author: <Author Name>
# Email: <Authors Email>
# Date: <Date Last Editor>
# Version: <Version Number>
#######################################
10
11
12
# IMPORTS
# e.g., import os
13
14
15
# SCRIPT
print("Hello World")
16
17
# End of File
3.3.2
31
When creating an object oriented script each python file you create will have the
same basic format outlined below:
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
#######################################
# Comment explaining scripts purpose
# Author: <Author Name>
# Emai: <Authors Email>
# Date: <Date Last Editor>
# Version: <Version Number>
#######################################
10
11
12
# IMPORTS
import os
13
14
15
16
17
18
# CLASS ATTRIBUTES
name =
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
32
41
42
43
44
45
46
47
# End of File
3.4
For simple scripts like those demonstrated so far simple procedural scripts are all
that have been required. When creating more complex scripts the introduction
of more structured and reusable designs are preferable. To support this design
Python supports object oriented program design.
3.4.1
#! /usr/bin/env python
2
3
4
5
6
7
8
#######################################
# An python class to parse a comma
# separates text file to calculate
# the mean and standard deviation
# of the inputted floating point
# numbers.
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
33
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
34
35
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
NOTE:
__name__
and
__main__
each have TWO underscores either side (i.e.,
).
Although, an object oriented design has been introduced making the above code,
potentially, more reusable the design does not separate more general functionality
from the application. To do this the code will be split into two files the first, named
MyMaths.py, will contain the mathematical operations calcMean and calcStdDev
while the second, named FileSummary, contains the functions run, which controls
the flow of the script, and parseCommaFile(). The code for these files is given
below but first try and split the code into the two files yourself.
1
2
#! /usr/bin/env python
3
4
5
6
7
8
9
#######################################
# An python class to hold maths operations
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# An python class to parse a comma
# separates text file to calculate
# the mean and standard deviation
# of the inputted floating point
# numbers.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
36
11
12
13
37
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
mathsObj = MyMathsClass()
mean = mathsObj.calcMean(numbers)
stddev = mathsObj.calcStdDev(numbers, mean)
39
40
41
print(Mean: + str(mean))
print(Stddev: + str(stddev))
42
43
44
45
46
if __name__ == __main__:
obj = FileSummary()
obj.run(randfloats.txt)
47
To allow the script to be used as a command line tool the path to the file needs
be passed into the script at runtime therefore the following changes are made to
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
#######################################
# An python class to parse a comma
# separates text file to calculate
# the mean and standard deviation
# of the inputted floating point
# numbers.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def run(self):
# To retrieve the command line arguments
# the sys.argv[X] is used where X refers to
# the argument. The argument number starts
# at 1 and is the index of a list.
filename = sys.argv[1]
inFile = open(filename, r)
numbers = self.parseCommaFile(inFile)
38
39
mathsObj = MyMathsClass()
mean = mathsObj.calcMean(numbers)
stddev = mathsObj.calcStdDev(numbers, mean)
40
41
42
43
print(Mean: + str(mean))
print(Stddev: + str(stddev))
44
45
46
47
48
49
if __name__ == __main__:
obj = FileSummary()
obj.run()
50
To read the new script the following command needs to be run from the command
prompt:
python fileSummary_commandline.py randfloats.txt
3.5
Exercise
Calculate the mean and standard deviation from only the first column of data
Hint:
You will need to replace:
substrs = eachLine.split(,,eachLine.count(,))
for strVar in substrs:
floatingNumbers.append(float(strVar))
With:
substrs = eachLine.split(,,eachLine.count(,))
# Select the column the data is stored in
column1 = substrs[0]
floatingNumbers.append(float(column1))
3.6
40
Further Reading
Chapter 4
File System Finding files
4.1
Introduction
A common task for which python is used is to batch process a task or series of
tasks. To do this the files to be processed need to be identified from within the file
system. Therefore, in this tutorial you will learn to implement code to undertake
this operation.
To start this type out the code below into a new file (save it as IterateFiles.py).
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
11
#######################################
# A class that iterates through a directory
# or directory structure and prints out theatre
# identified files.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
12
13
14
import os.path
import sys
15
41
16
42
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def run(self):
# Set the folder to search
searchFolder = ./PythonCourse # Update path...
self.findFiles(searchFolder)
47
48
49
50
51
if __name__ == __main__:
obj = IterateFiles()
obj.run()
Using the online python documentation read through the section on the file system:
http://docs.python.org/library/filesys.html
43
http://docs.python.org/library/os.path.html
This documentation will allow you to understand the functionality which is available for manipulating the file system.
4.2
Recursion
The next stage is to add allow the function recursively go through the directory
structure. To do this add the function below to your script above:
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
11
#######################################
# A class that iterates through a directory
# or directory structure and prints out theatre
# identified files.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
12
13
14
import os.path
import sys
15
16
17
18
19
20
21
22
23
24
25
26
27
28
44
if(os.path.isdir(os.path.join(directory,filename))):
# If a directory is found recall this function.
self.findFilesRecurse(os.path.join(directory,filename))
elif(os.path.isfile(os.path.join(directory,filename))):
print(os.path.join(directory,filename))
else:
print(filename + is NOT a file or directory!)
29
30
31
32
33
34
35
else:
print(directory + is not a directory!)
else:
print(directory + does not exist!)
36
37
38
39
40
41
42
43
44
def run(self):
# Set the folder to search
searchFolder = ./PythonCourse # Update path...
self.findFilesRecurse(searchFolder)
45
46
47
48
if __name__ == __main__:
obj = IterateFiles()
obj.run()
Now call this function instead of the findFiles. Think and observe what effect a
function which calls itself will have on the order in which the file are found.
4.3
The next step is to include the function checkFileExtension to your class and
create two new functions which only print out the files with the file extension of
interest. This should be done for both the recursive and non-recursive functions
above.
1
#! /usr/bin/env python
2
3
4
5
6
#######################################
# A class that iterates through a directory
# or directory structure and prints out theatre
# identified files.
7
8
9
10
11
45
12
13
14
import os.path
import sys
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# A function which iterates through the directory and checks file extensions
def findFilesExtRecurse(self, directory, extension):
# check whether the current directory exits
if os.path.exists(directory):
# check whether the given directory is a directory
if os.path.isdir(directory):
# list all the files within the directory
dirFileList = os.listdir(directory)
# Loop through the individual files within the directory
for filename in dirFileList:
# Check whether file is directory or file
if(os.path.isdir(os.path.join(directory,filename))):
# If a directory is found recall this function.
self.findFilesRecurse(os.path.join(directory,filename))
elif(os.path.isfile(os.path.join(directory,filename))):
if(self.checkFileExtension(filename, extension)):
48
49
50
51
52
53
54
46
print(os.path.join(directory,filename))
else:
print(filename + is NOT a file or directory!)
else:
print(directory + is not a directory!)
else:
print(directory + does not exist!)
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# A function which iterates through the directory and checks file extensions
def findFilesExt(self, directory, extension):
# check whether the current directory exits
if os.path.exists(directory):
# check whether the given directory is a directory
if os.path.isdir(directory):
# list all the files within the directory
dirFileList = os.listdir(directory)
# Loop through the individual files within the directory
for filename in dirFileList:
# Check whether file is directory or file
if(os.path.isdir(os.path.join(directory,filename))):
print(os.path.join(directory,filename) + \
is a directory and therefore ignored!)
elif(os.path.isfile(os.path.join(directory,filename))):
if(self.checkFileExtension(filename, extension)):
print(os.path.join(directory,filename))
else:
print(filename + is NOT a file or directory!)
else:
print(directory + is not a directory!)
else:
print(directory + does not exist!)
80
81
82
83
84
def run(self):
# Set the folder to search
searchFolder = ./PythonCourse # Update path...
self.findFilesExt(searchFolder, .txt)
85
86
87
88
if __name__ == __main__:
obj = IterateFiles()
obj.run()
4.4
47
Exercises
1. Rather than print the file paths to screen add them to a list and return them
from the function. This would be useful for applications where the files to
be process need to be known up front and creates a more generic piece of
python which can be called from other scripts.
2. Using the return list add code to loop through the returned list and print
out the file information in the following comma separated format.
[FILE NAME], [EXTENSION], [PATH], [DRIVE LETTER (On Windows)], [MODIFICATION TIME]
4.5
Further Reading
Chapter 5
Plotting - Matplotlib
5.1
Introduction
Many open source libraries are available from within python. These significantly
increase the available functionality, decreasing your development time. One such
library is matplotlib (http://matplotlib.sourceforge.net), which provides a
plotting library with a similar interface to those available within Matlab. The matplotlib website provides a detailed tutorial and documentation for all the different
options available within the library but this worksheet provides some examples of
the common plot types and a more complex example continuing on from previous
examples.
5.2
Simple Script
Below is your first script using the matplotlib library. The script demonstrates
the plotting of a mathematical function, in this case a sine function. The plot
function requires two lists of numbers to be provided, which provides the x and
y locations of the points which go to create the displayed function. The axis can
be labelled using the xlabel() and ylabel() functions while the title is set using
the title() function. Finally, the show() function is used to reveal the interface
48
49
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# A simple python script to display a
# sine function
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
14
15
16
17
18
19
20
#
#
t
#
#
s
21
22
23
24
25
26
27
28
5.3
Bar Chart
The creation of a bar chart is equally simply where two lists are provided, the
first contains the locations on the X axis at which the bars start and the second
the heights of the bars. The width of the bars can also be specified and their
colour. More options are available in the documentation (http://matplotlib.
50
Simple Plot
1.0
Y Axis
0.5
0.0
0.5
1.00.0
0.5
1.0
1.5
X Axis
2.0
2.5
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# A simple python script to display a
# bar chart.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
14
15
16
17
3.0
51
18
19
20
21
22
14
12
10
8
6
4
2
00
10
12
14
5.4
Pie Chart
A pie chart is similar to the previous scripts where a list of the fractions making
up the pie chart is given alongside a list of labels and if required a list of fractions to explode the pie chart. Other options including colour and shadow are
available and outlined in the documentation (http://matplotlib.sourceforge.
net/matplotlib.pylab.html#-pie) This script also demonstrates the use of the
savefig() function allowing the plot to be saved to file rather than simply displayed
on screen.
52
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# A simple python script to display a
# pie chart.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
14
15
16
17
18
19
20
21
22
23
5.5
Scatter Plot
#! /usr/bin/env python
2
3
4
5
6
#######################################
# A simple python script to display a
# scatter plot.
# Author: <YOUR NAME>
53
33
15
17
10
11
12
13
14
15
16
17
18
x = []
y = []
z = []
19
20
21
22
23
24
25
27
28
29
30
31
32
33
34
35
36
37
38
39
# Create figure
plt.figure()
# Create scatter plot where the plots are coloured using the
# Z values.
plt.scatter(x, y, c=z, marker=o, cmap=cm.jet, vmin=-100, vmax=100)
# Display colour bar
colorbar()
# Make axis tight to the data
plt.axis(tight)
plt.xlabel(X Axis)
plt.ylabel(Y Axis)
plt.title(Simple Scatter Plot)
# save plot to disk.
plt.savefig(simplescatter.pdf, dpi=200, format=PDF)
100
100
80
60
80
40
20
60
Y Axis
26
54
0
40
20
40
20
60
80
0
0
20
40
X Axis
60
80
100
100
5.6
55
Line Plot
A more complicated example is now given building on the previous tutorial where
the data is read in from a text file before being plotted. In this case data was downloaded from the Environment Agency and converted from columns to rows. The
dataset provides the five year average rainfall for the summer (June - August) and
winter (December - February) from 1766 to 2006. Two examples of plotting this
data are given where the first plots the two datasets onto the same axis (Figure 5.5)
while the second plots them onto individual axis (Figure 5.6). Information on the
use of the subplot() function can be found in the matplotlib documentation (http:
//matplotlib.sourceforge.net/matplotlib.pylab.html#-subplot).
1
2
3
4
5
6
7
8
9
#######################################
# A python script to read in a text file
# of rainfall data for summer and winter
# within the UK and display as a plot.
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
if line == 0:
year.append(int(token))
elif line == 1:
summer.append(float(token))
elif line == 2:
winter.append(float(token))
line += 1
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
plt.subplot(2,1,2)
plt.plot(year, winter)
plt.xlabel(Year)
plt.ylabel(Rainfall (5 Year Mean))
plt.title(Winter rainfall across the UK)
plt.axis(tight)
# save plot to disk.
plt.savefig(outFile, dpi=200, format=PDF)
64
65
66
67
def run(self):
filename = ukweatheraverage.csv
if os.path.exists(filename):
56
68
69
70
71
72
73
74
75
76
77
78
79
80
81
57
year = list()
summer = list()
winter = list()
try:
dataFile = open(filename, r)
except IOError as e:
print(\nCould not open file:\n, e)
return
self.parseDataFile(dataFile, year, summer, winter)
dataFile.close()
self.plotData(year, summer, winter, "Rainfall_SinglePlot.pdf")
self.plotDataSeparate(year, summer, winter, "Rainfall_MultiplePlots.pdf")
else:
print(File \ + filename + \ does not exist.)
82
84
85
if __name__ == __main__:
obj = PlotRainfall()
obj.run()
350
300
Rainfall (5 Year Mean)
83
250
200
150
100
1750
1800
1850
1900
Year
1950
2000
2050
Figure 5.5: Rainfall data for summer and winter on the same axis.
58
1800
300
1850
1900
1950
2000
1850
1900
Year
1950
2000
250
200
150
1800
Figure 5.6: Rainfall data for summer and winter on different axis.
5.7
Exercise:
Based on the available data is there a correlation between summer and winter rainfall? Use the lists read in of summer and winter rainfall and produce a scatterplot
to answer this question.
5.8
Further Reading
Matplotlib http://matplotlib.sourceforge.net
Python Documentation http://www.python.org/doc/
Core Python Programming (Second Edition), W.J. Chun. Prentice Hall
ISBN 0-13-226993-7
Chapter 6
Statistics (SciPy / NumPy)
6.1
Introduction
59
6.2
60
Simple Statistics
Forest inventory data have been collected for a number of plots within Penglais
woods (Aberystwyth, Wales). For each tree, the diameter, species height, crown
size and position have been recorded. An example script is provided to read the
diameters into a separate list for each species. The lists are then converted to
NumPy arrays, from which statistics are calculated and written out to a text
file.
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# A script to calculate statistics from
# a text file using NumPy
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
14
15
import numpy
import scipy
# Import scipy stats functions we need
import scipy.stats as spstats
16
17
18
19
20
21
22
23
24
25
26
27
def run(self):
# Set up lists to hold input diameters
# A seperate list is used for each species
beechDiameter = list()
ashDiameter = list()
birchDiameter = list()
oakDiameter = list()
sycamoreDiameter = list()
otherDiameter = list()
28
29
30
31
32
33
outFileName = PenglaisWoodsStats.csv
inFile = open(inFileName, r)
outFile = open(outFileName,w)
34
35
36
37
38
39
40
41
42
43
44
45
46
47
species = substrs[3]
if substrs[4].isdigit: # Check diameter is a number
diameter = float(substrs[4])
48
49
50
51
52
53
54
55
56
57
58
59
60
if species == BEECH:
beechDiameter.append(diameter)
elif species == ASH:
ashDiameter.append(diameter)
elif species == BIRCH:
birchDiameter.append(diameter)
elif species == OAK:
oakDiameter.append(diameter)
elif species == SYC:
sycamoreDiameter.append(diameter)
else:
otherDiameter.append(diameter)
61
62
63
64
65
66
67
68
69
70
71
61
72
62
outFile.write(headerLine)
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
if __name__ == __main__:
obj = CalculateStatistics()
obj.run()
Note in tutorial three, functions were written to calculate the mean and standard
deviation a list, in this tutorial the same result is accomplished using the built in
functionality of NumPy.
6.2.1
63
Exercises
1. Based on the example script also calculate mean, median and standard deviation for tree heights and add to the output file.
2. Look at other statistics functions available in SciPy and calculate for height
and density.
6.3
Calculate Biomass
One of the features of NumPy arrays is the ability to perform mathematical operation on all elements of an array.
For example, for NumPy array a:
a = numpy.array([1,2,3,4])
Performing
b = 2 * a
Gives
b = array([2,4,6,8])
(6.1)
64
(6.2)
The specific gravity also varies by species, values for each species are given in
Table 6.1.
Table 6.1: Coefficients for estimating volume and the specific gravity required for
estimating the biomass by species.
Species
a-coefficient b-coefficient Specific gravity
Beech
0.014306
0.0000748
0.56
Ash
0.012107
0.0000777
0.54
Beech
0.009184
0.0000673
0.53
Oak
0.011724
0.0000765
0.56
Sycamore
0.012668
0.0000737
0.54
The following function takes two arrays containing height and density, and a string
for species. From these biomass is calculated.
1
2
3
4
5
6
7
8
9
10
11
12
# Calculate Volume
volume = a + ((b*(inDiameterArray / 100)**2) * (inHeightArray**0.75))
# Calculate biomass
biomass = volume * specificGravity
# Return biomass
return biomass
Note only the coefficients for BEECH have been included therefore, if a different
species is passed in, the program will produce an error (try to think about what
the error would be). A neater way of dealing with the error would be to throw an
exception if the species was not recognised. Exceptions form the basis of controlling
errors in a number of programming languages (including C++ and Java) the simple
concept is that as a program is running, if an error occurs an exception is thrown,
at which point processing stops until the exception is caught and dealt with. If the
65
exception is never caught, then the software crashes and stops. Python provides
the following syntax for exception programming,
try:
< Perform operations during which
an error is likely to occur >
except <ExceptionName>:
< If error occurs do something
appropriate >
where the code you wish to run is written inside the try statement and the except
statement is executed only when a named exception (within the except statement)
is produced within the try block. It is good practise you use exceptions where
possible as when used properly they provide more robust code which can provide
more feedback to the user.
The function to calculate biomass may be rewritten to throw an exception if the
species is not recognised.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Calculate Volume
volume = a + ((b*(inDiameterArray / 100)**2) * (inHeightArray**0.75))
# Calculate biomass
biomass = volume * specificGravity
# Return biomass
return biomass
The function below, calls calcBiomass to calculate biomass for an array. From
this mean, median and standard deviation are calculated and an output array is
returned. By calling the function from within a try and except block if the species
is not recognised, it will not try to calculate stats and will return the string na
(not available) for all values in the output line.
1
2
3
4
5
6
7
8
9
10
66
11
12
13
14
15
16
17
18
except Exception:
# Catch exception and write na for all values
biomassStatsLine = na,na,na
19
20
return biomassStatsLine
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# A script to calculate statistics from
# a text file using NumPy
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
14
15
import numpy
import scipy
# Import scipy stats functions we need
import scipy.stats as spstats
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def run(self):
# Set up lists to hold input diameters and heights
# A seperate list is used for each species
beechDiameter = list()
beechHeight = list()
ashDiameter = list()
ashHeight = list()
birchDiameter = list()
birchHeight = list()
oakDiameter = list()
oakHeight = list()
sycamoreDiameter = list()
sycamoreHeight = list()
otherDiameter = list()
otherHeight = list()
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Iterate through the input file and save diameter and height
# into lists, based on species
header = True
for eachLine in inFile:
if header: # Skip header row
print(Skipping header row)
header = False
else:
substrs = eachLine.split(,,eachLine.count(,))
50
51
52
53
54
species = substrs[3]
if substrs[4].isdigit: # Check diameter is a number
diameter = float(substrs[4])
height = float(substrs[10])
55
56
57
58
59
if species == BEECH:
beechDiameter.append(diameter)
beechHeight.append(height)
elif species == ASH:
67
60
61
62
63
64
65
66
67
68
69
70
71
72
73
ashDiameter.append(diameter)
ashHeight.append(height)
elif species == BIRCH:
birchDiameter.append(diameter)
birchHeight.append(height)
elif species == OAK:
oakDiameter.append(diameter)
oakHeight.append(height)
elif species == SYC:
sycamoreDiameter.append(diameter)
sycamoreHeight.append(height)
else:
otherDiameter.append(diameter)
otherHeight.append(height)
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
beechHeight = numpy.array(beechHeight)
ashHeight = numpy.array(ashHeight)
birchHeight = numpy.array(birchHeight)
oakHeight = numpy.array(oakHeight)
sycamoreHeight = numpy.array(sycamoreHeight)
otherHeight = numpy.array(otherHeight)
89
90
91
92
93
94
95
96
97
98
99
100
# Calculate statistics and biomass for each species and write to file
outLine = Beech, + self.createStatsLine(beechDiameter) + , + \
self.createStatsLine(beechHeight) + , + \
self.calcBiomassStatsLine(beechDiameter, beechHeight, BEECH) + \n
outFile.write(outLine)
68
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
69
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
70
medianBiomass = numpy.median(biomass)
stDevBiomass = numpy.std(biomass)
142
143
144
145
146
147
148
except Exception:
# Catch exception and write na for all values
biomassStatsLine = na,na,na
149
150
151
152
return biomassStatsLine
153
154
155
156
157
158
159
160
161
162
# Calcualte volume
volume = a + ((b*(inDiameterArray)**2) * (inHeightArray**0.75))
# Calculate biomass
biomass = volume * specificGravity
# Return biomass
return biomass
163
164
165
166
167
168
169
170
171
172
if __name__ == __main__:
obj = CalculateStatistics()
obj.run()
6.3.1
Exercise
6.4
71
Linear Fitting
One of the built in feature of SciPy is the ability to perform fits. Using the linear
regression function (linregress) it is possible to fit equations of the form:
y = ax + b
(6.3)
Where aCoeff and bCoeff are the coefficients rVal is the r value (r**2 gives R2 ),
pVal is the p value and stdError is the standard error.
It is possible to fit the following equation to the collected data expressing height
as a function of diameter.
(6.4)
To fit an equation of this form an array must be created containing log diameter
. Linear regression may then be performed using:
linregress(np.log(inDiameterArray), inHeightArray)
To test the fit it may be plotted against the original data using MatPlotLib. The
following code first performs the linear regression then creates a plot showing the
fit against the original data.
1
2
3
4
5
6
7
8
9
10
# Create a string, showing the form of the equation (with fitted coefficients)
11
12
13
14
72
15
16
17
18
19
20
21
22
23
24
# Save plot
plt.savefig(outPlotName, dpi=200, format=PDF)
The coefficients and r2 of the fit are displayed in the legend. To display the
superscript 2 in the data it is possible to use LaTeX syntax. So r2 is written as:
r$2$.
The function may be called using:
# Set output directory for plots
outDIR = ./output/directory/
self.plotLinearRegression(beechDiameter, beechHeight, outDIR + beech.pdf)
Produce a plot similar to the one shown in Figure 6.1 and save as a PDF.
The final script should result in the following:
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
11
#######################################
# A script to calculate statistics from
# a text file using NumPy
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
45
73
measured data
5.54log(D) -3.7 (r2 = 0.37)
40
35
Height (m)
30
25
20
15
10
5
00
20
40
60
Diameter (cm)
80
import numpy
import scipy
# Import scipy stats functions we need
import scipy.stats as spstats
# Import plotting library as plt
import matplotlib.pyplot as plt
18
19
20
21
22
23
24
25
26
27
28
29
30
def run(self):
# Set up lists to hold input diameters and heights
# A seperate list is used for each species
beechDiameter = list()
beechHeight = list()
ashDiameter = list()
ashHeight = list()
birchDiameter = list()
birchHeight = list()
oakDiameter = list()
100
31
32
33
34
35
oakHeight = list()
sycamoreDiameter = list()
sycamoreHeight = list()
otherDiameter = list()
otherHeight = list()
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Iterate through the input file and save diameter and height
# into lists, based on species
header = True
for eachLine in inFile:
if header: # Skip header row
print(Skipping header row)
header = False
else:
substrs = eachLine.split(,,eachLine.count(,))
52
53
54
55
56
species = substrs[3]
if substrs[4].isdigit: # Check diameter is a number
diameter = float(substrs[4])
height = float(substrs[10])
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
if species == BEECH:
beechDiameter.append(diameter)
beechHeight.append(height)
elif species == ASH:
ashDiameter.append(diameter)
ashHeight.append(height)
elif species == BIRCH:
birchDiameter.append(diameter)
birchHeight.append(height)
elif species == OAK:
oakDiameter.append(diameter)
oakHeight.append(height)
elif species == SYC:
sycamoreDiameter.append(diameter)
74
72
73
74
75
sycamoreHeight.append(height)
else:
otherDiameter.append(diameter)
otherHeight.append(height)
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
beechHeight = numpy.array(beechHeight)
ashHeight = numpy.array(ashHeight)
birchHeight = numpy.array(birchHeight)
oakHeight = numpy.array(oakHeight)
sycamoreHeight = numpy.array(sycamoreHeight)
otherHeight = numpy.array(otherHeight)
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
75
76
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# Create a string, showing the form of the equation (with fitted coefficients)
# and r squared value
# Coefficients are rounded to two decimal places.
equation = str(round(aCoeff,2)) + log(D) + str(round(bCoeff,2)) + \
(r$^2$ = + str(round(rVal**2,2)) + )
134
135
136
137
138
139
140
141
142
143
# Save plot
plt.savefig(outPlotName, dpi=200, format=PDF)
144
145
146
147
148
149
150
151
152
153
154
77
return statsLine
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
except Exception:
# Catch exception and write na for all values
biomassStatsLine = na,na,na
174
175
return biomassStatsLine
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# Calcualte volume
volume = a + ((b*(inDiameterArray)**2) * (inHeightArray**0.75))
# Calculate biomass
biomass = volume * specificGravity
# Return biomass
return biomass
191
192
193
194
if __name__ == __main__:
obj = CalculateStatistics()
obj.run()
6.4.1
78
Exercise
Produce plots, showing linear regression fits, for the other species.
6.5
Further Reading
SciPy http://www.scipy.org/SciPy
NumPy http://numpy.scipy.org
An Introduction to Python, G. van Rossum, F.L. Drake, Jr. Network Theory
ISBN 0-95-416176-9 (Also available online http://docs.python.org/3/
tutorial/) - Chapter 8.
Python Documentation http://www.python.org/doc/
Matplotlib http://matplotlib.sourceforge.net
Chapter 7
Batch Processing Command Line
Tools
7.1
Introduction
There are many command line tools and utilities available for all platforms (e.g.,
Windows, Linux, Mac OSX), these tools are extremely useful and range from
simple tasks such as renaming a file to more complex tasks such as merging ESRI
shapefiles. One problem with these tools is that if you have a large number of files,
which need to be processed in the same way, it is time consuming and error prone
to manual run the command for each file. Therefore, if we can write scripts to do
this work for us then processing large number of individual files becomes a much
simpler and quicker task.
For this worksheet you will need to have the command line tools which come with
the GDAL/OGR (http://www.gdal.org) open source software library installed
and available with your path. With the installation of python(x,y) the python
libraries for GDAL/OGR have been installed but not the command line utilities
which go along with these libraries. If you do not already have them installed
therefore details on the GDAL website for your respective platform.
79
7.2
80
The first example illustrates how the ogr2ogr command can be used to merge
shapefiles and a how a python script can be used to turn this command into a
batch process where a whole directory of shapefiles can be merged.
To perform this operation two commands are required. The first makes a copy of
the first shapefile within the list of files into a new file, shown below:
> ogr2ogr <inputfile> <outputfile>
While the second command appends the contents of the inputted shapefile onto
the end of an existing shapefile (i.e., the one just copied).
> ogr2ogr -update -append <inputfile> <outputfile> -nln <outputfilename>
For both these commands the shapefiles all need to be of the same type (point,
polyline or polygon) and contain the same attributes. Therefore, your first exercise
is to understand the use of the ogr2ogr command and try them from the command
line with the data provided. Hint, running ogr2ogr without any options the help
file will be displayed.
The second stage is to develop a python script to call the appropriate commands to perform the required operation, where the following processes will be
required:
1. Get the user inputs.
2. List the contents of the input directory.
3. Iterate through the directory and run the required commands.
But the first step is to create the class structure in which the code will fit, this
will be something similar to that shown below:
1
#! /usr/bin/env python
2
3
4
5
6
#######################################
# MergeSHPfiles.py
# A python script to merge shapefiles
# Author: <YOUR NAME>
7
8
9
10
81
11
12
import os
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
The script will have the input directory and output file hard coded (as shown)
within the run function. Therefore, you need to edit these file paths to the location
you have the files saved. Please note that under Windows you need to insert a
double slash (i.e., \\) within the file path as a single slash is an escape character
(e.g., \n for new line) within strings.
The next step is to check that the input directory exists and is a directory, to do
this edit your run function as below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
82
if not os.path.exists(filePath):
print Filepath does not exist
elif not os.path.isdir(filePath):
print Filepath is not a directory!
else:
# Merge the shapefiles within the filePath
self.mergeSHPfiles(filePath, newSHPfile)
Additionally, you need to add the function mergeSHPFiles, which is where the
shapefiles will be merged.
# A function to control the merging of shapefiles
def mergeSHPfiles(self, filePath, newSHPfile):
To merge the shapefiles the first task is to get a list of all the shapefiles within a
directory. To do this, use the code you developed in Tutorial 4 to list files within a
directory and edit it such that the files are outputted to a list rather than printed
to screen, as shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# A function which iterates through the directory and checks file extensions
def findFilesExt(self, directory, extension):
# Define a list to store output list of files
fileList = list()
# check whether the current directory exits
if os.path.exists(directory):
# check whether the given directory is a directory
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
83
if os.path.isdir(directory):
# list all the files within the directory
dirFileList = os.listdir(directory)
# Loop through the individual files within the directory
for filename in dirFileList:
# Check whether file is directory or file
if(os.path.isdir(os.path.join(directory,filename))):
print os.path.join(directory,filename) + \
is a directory and therefore ignored!
elif(os.path.isfile(os.path.join(directory,filename))):
if(self.checkFileExtension(filename, extension)):
fileList.append(os.path.join(directory,filename))
else:
print filename + is NOT a file or directory!
else:
print directory + is not a directory!
else:
print directory + does not exist!
# Return the list of files
return fileList
Note, that you also need the function to check the file extension.
This can then be added to the mergeSHPfiles function with a list to iterate through
the identified files.
1
2
3
4
5
6
7
8
When iterating through the files the ogr2ogr commands to be executed to merge
the shapefiles need to be built and executed therefore the following code needs
to be added to your script. Once the command is built it is called using the
subprocess modules call command.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
84
Note, the use of the option shell=True within the subprocess function call, this
means to treat the command as a string and execute as is. While this is fine for
simple scripts for your own use if poses a security risk when using in production
code as it leaves the program open to shell injection attempts. For example, if
the program tool a user provided file name and used this when building a command a user could potentially pass in a malicious command (e.g., rm -fr *;)
rather than a file name and this would be executed. To avoid these problems, by
default subprocess requires a list of command line arguments and shell is set to
False.
You also require the additional functions to remove the shapefile extension (.shp)
and the windows file path, creating the layer name which are given below.
1
2
3
4
5
6
7
8
9
10
11
12
13
85
outName = name
# Find how many .shp strings are in the current file
# name
count = name.find(.shp, 0, len(name))
# If there are no instances of .shp then -1 will be returned
if not count == -1:
# Replace all instances of .shp with empty string.
outName = name.replace(.shp, , name.count(.shp))
# Return output file name without .shp
return outName
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
If you wanted to use this script on UNIX (i.e., Linux or Mac OS X) you would
need to use the removeFilePathUNIX as shown while for windows change the code
to use the removeFilePathWINS function such that the double escaped slashes are
used.
86
You script should now be complete so execute it on the data provided, within the
TreeCrowns directory. Take time to understand the lines of code which have been
provided and make sure your script works.
1
#! /usr/bin/env python
2
3
4
5
6
7
8
9
10
#######################################
# MergeSHPfiles.py
# A python script to merge shapefiles
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
11
12
13
import os
import subprocess
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
87
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# A function which iterates through the directory and checks file extensions
def findFilesExt(self, directory, extension):
# Define a list to store output list of files
fileList = list()
# check whether the current directory exits
if os.path.exists(directory):
# check whether the given directory is a directory
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
if os.path.isdir(directory):
# list all the files within the directory
dirFileList = os.listdir(directory)
# Loop through the individual files within the directory
for filename in dirFileList:
# Check whether file is directory or file
if(os.path.isdir(os.path.join(directory,filename))):
print(os.path.join(directory,filename) + \
is a directory and therefore ignored!)
elif(os.path.isfile(os.path.join(directory,filename))):
if(self.checkFileExtension(filename, extension)):
fileList.append(os.path.join(directory,filename))
else:
print(filename + is NOT a file or directory!)
else:
print(directory + is not a directory!)
else:
print(directory + does not exist!)
# Return the list of files
return fileList
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
88
89
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
7.3
The next example will require you to use the script developed above as the basis
for a new script using the command below to convert a directory of images to
GeoTIFF using the command given:
gdal_translate -of <OutputFormat> <InputFile> <OutputFile>
A useful step is to first run the command from the command line manually to
make sure you understand how this command is working.
The two main things you need to think about are:
90
1. What file extension will the input files have? This should be user selectable
alongside the file paths.
2. What output file name should be provided? The script should generate this.
Four test images have been provided in ENVI format within the directory ENVI Images,
you can use these for testing your script. If you are struggling then an example script with a solution to this task has been provided within the code directory.
7.3.1
It is often convenient to provide the inputs the scripts requires (e.g., input and
output file locations) as arguments to the script rather than needing to the edit
the script each time a different set of parameters are required (i.e., changing the
files paths in the scripts above). This is easy within python and just requires
the following changes to your run function (in this case for the merge shapefiles
script).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
91
In addition, to these changes you need to import the system library into your
script to access these arguments.
# Import the sys package from within the
# standard library
import sys
Please note that the list of user provided inputs starts at index 1 and not 0. If you
call sys.argv[0] then the name of the script being executed will be returned. When
retrieving values from the user in this form it is highly advisable to check whether
the inputs provided are valid and that all required inputs have been provided.
Create a copy of the script you created earlier and edit the run function to be as
shown above, making note of the lines which require editing.
7.4
Exercises
1. Using ogr2ogr develop a script that will convert the attribute table of a
shapefile to a CSV file which can be opened within Microsoft Excel. Note,
that the outputted CSV will be put into a separate directory.
2. Create a script which calls the gdal translate command and converts all the
images within a directory to a byte data type (i.e., with a range of 0 to 255).
7.5
Further Reading
GDAL - http://www.gdal.org
92
OGR - http://www.gdal.org/ogr
Python Documentation - http://www.python.org/doc
Core Python Programming (Second Edition), W.J. Chun. Prentice Hall
ISBN 0-13-226993-7
Learn UNIX in 10 minutes - http://freeengineer.org/learnUNIXin10minutes.
html
The Linux Command Line. W. E. Shotts. No Starch Press. ISBN 978-159327-389-7 (Available to download from http://linuxcommand.org/tlcl.
php)
Chapter 8
Image Processing using GDAL
and RIOS
8.1
Image files used within spatial data processing (i.e., remote sensing and GIS)
require the addition of a spatial header to the files which provides the origin
(usually from the top left corner of the image), the pixel resolution of the image and
a definition of the coordinate system and projection of the dataset. Additionally,
most formats also allow a rotation to be defined. Using these fields the geographic
position on the Earths surface can be defined for each pixel within the scene.
Images can also contain other information in the header of the file including no
data values, image statistics and band names/descriptions.
8.1.1
The GDAL software library provides a python interface to the C++ library, such
that when the python functions are called is it the C++ implementation which is
executed. These model has significant advantages for operations such as reading
and writing to and from image files as in pure python these operations would be
slow but they as very fast within C++. Although, python is an easier language
93
94
for people to learn and use, therefore allows software to be more quickly developed
so combing C++ and python in this way is a very productive way for software to
be developed.
Argparser
Up until this point we have read parameters from the system by just using the
sys.argv list where the user is required to enter the values in a given pre-defined
order. The problem with this is that it is not very helpful to the user as no
help is provided or error messages given if the wrong parameters are entered. For
command line tools it is generally accepted that when providing command line
options they will use switches such as -i or input where the user specifies with a
switch what the input they are providing is.
Fortunately, python provides a library to simplify the implementation of this type
of interface. An example of this is shown below, where first the argparse library
is imported. The parser is then created and the arguments added to the parser
so the parser knows what to expect from the user. Finally, the parser is called to
parse the arguments. Examples will be shown in all the following scripts.
1
2
3
4
5
6
7
8
9
10
11
8.1.2
95
The follow example demonstrates how to import the GDAL library into python
and to read the image header information and print it to the console - similar to
the functionality within the gdalinfo command. Read the comments within the
code and ensure you understand the steps involved.
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#
#
#
#
#
#
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
96
97
8.1.3
No Data Values
GDAL also allows us to edit the image header values, therefore the following
example provides an example of how to edit the no data value for image band.
Note that when opening the image file the gdal.GA Update option is used rather
than gdal.GA ReadOnly.
A no data value is useful for defining regions of the image which are not valid (i.e.,
outside of the image boundaries) and can be ignored during processing.
98
-i LSTOA_Tanz_2000Wet.img -n 0.0
for band 1
for band 2
for band 3
for band 4
for band 5
for band 6
To check that command successfully edited the input file use the gdalinfo command, as shown below:
gdalinfo -norat LSTOA_Tanz_2000Wet.img
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
99
else:
# Print an error message if the file
# could not be opened.
print("Could not open the input image file: ", inputFile)
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
8.1.4
Band Name
Band names are useful for a user to understand a data set more easily. Therefore,
naming the image bands, such as Blue, Green, Red, NIR and SWIR, is very
useful. The following example illustrates how to edit the band name description
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
100
40
41
42
43
44
45
46
47
48
101
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
setbandname.py
setbandname.py
setbandname.py
setbandname.py
setbandname.py
setbandname.py
-i
-i
-i
-i
-i
-i
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
-b
-b
-b
-b
-b
-b
1
2
3
4
5
6
-n
-n
-n
-n
-n
-n
102
Blue
Green
Red
NIR
SWIR1
SWIR2
Use you script for reading the image header values and printing them to the screen
to find out whether it worked.
8.1.5
GDAL Meta-Data
GDAL supports the concept of meta-data on both the image bands and the whole
image. The meta-data allows any other data to be stored within the image file as
a string.
The following example shows how to read the meta-data values and to list all the
meta-data variables available on both the image bands and the image.
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
103
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
104
102
103
104
105
106
107
108
109
110
111
112
113
114
105
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
106
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
ReadGDALMetaData.py
ReadGDALMetaData.py
ReadGDALMetaData.py
ReadGDALMetaData.py
ReadGDALMetaData.py
-h
-i
-i
-i
-i
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
LSTOA_Tanz_2000Wet.img
-l
-b 1 -l
-b 1 -n LAYER_TYPE
-b 3 -n STATISTICS_MEAN
8.2
107
The raster input and output (I/O) simplification (RIOS) library is a set of python
modules which makes it easier to write raster processing code in Python. Built
on top of GDAL, it handles the details of opening and closing files, checking
alignment of projections and raster grid, stepping through the raster in small
blocks, etc., allowing the programmer to concentrate on implementing the solution
to the problem rather than on how to access the raster data and detail with the
spatial header.
Also, GDAL provides access to the image data through python RIOS makes it
much more user friendly and easier to use. RIOS is available for as a free download
from https://bitbucket.org/chchrsc/rios/overview
8.2.1
Python provides a very useful help system through the command line. To get
access to the help run python from the terminal
> python
108
To exit the help system just press the q key on the keyboard.
8.2.2
Band Maths
Being able to apply equations to combine image bands, images or scale single bands
is a key tool for remote sensing, for example to calibrate Landsat to radiance. The
following examples demonstrate how to do this within the RIOS framework.
8.2.3
Multiply by a constant
The first example just multiples all the image bands by a constant (provided by
the user). The first part of the code reads the users parameters (input file, output
file and scale factor). To use the applier interface within RIOS you need to first
setup the input and output file associations and then any other options required,
in this case the constant for multiplication. Also, the controls object should be
defined to set any other parameters
All processing within RIOS is undertaken on blocks, by default 200 200 pixels
in size. To process the block a applier function needs to be defined (e.g., mutliplyByValue) where the inputs and outputs are passed to the function (these are
the pixel values) and the other arguments object previously defined. The pixel
values are represented as a numpy array, the dimensions are (n, y, x) where n is
the number of image bands, y is the number of rows and x the number of columns
in the block.
Because numpy will iterate through the array for us to multiply the whole array
by a constant (e.g., 2) then we can just need the syntax shown below, which makes
it very simple.
1
#!/usr/bin/env python
2
3
4
5
6
import sys
# Import the python Argument parser
import argparse
# Import the RIOS applier interface
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
109
48
49
50
51
52
53
54
55
56
57
58
59
60
110
infiles.image1 = args.input
# Create output files file names associations
outfiles = applier.FilenameAssociations()
# Set outImage to the output image specified
outfiles.outimage = args.output
# Create other arguments object
otherargs = applier.OtherInputs()
# Define the scale arguments
otherargs.scale = args.multiply
# Create a controls objects
aControls = applier.ApplierControls()
# Set the progress object.
aControls.progress = cuiprogress.CUIProgressBar()
61
62
63
64
65
66
67
68
8.2.4
Calculate NDVI
To use the image bands independently to calculate a new value, usually indices
such as the NDVI
NDVI =
NIR RED
NIR + RED
(8.1)
111
requires that the bands are referenced independently within the input data. Using
numpy to calculate the index, as shown below, results in a single output block with
the dimensions of the block but does not have the third dimension (i.e., the band)
which is required for RIOS to identify how to create the output image. Therefore,
as you will see in the example below an extra dimension needs to be added before
outputting the data to the file. Within the example given the input pixel values
are converted to floating point values (rather than whatever they were inputted as
from the input) because the output will be a floating point number (i.e., an NDVI
have a range of 1 to 1).
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
112
71
72
73
74
75
76
77
78
79
80
81
82
83
84
113
85
86
87
88
89
90
91
92
8.2.5
Where multiple input files are required, in this case the NIR and Red bands are
represented by different image files, the input files need to be specified in the input
files association as image1, image2 etc. and the pixel values within the applier
114
function are therefore referenced in the same way. Because, in this example the
images only have a single image band the input images has the same dimensions
as the output so no extra dimensions need to be added.
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
file.")
band
file.")
band
file.")
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
115
78
116
aControls.progress = cuiprogress.CUIProgressBar()
79
80
81
82
83
84
85
8.3
Filtering Images
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
import sys
# Import the python Argument parser
import argparse
# Import the scipy filters.
from scipy import ndimage
#Import the numpy library
import numpy
# Import the RIOS image reader
from rios.imagereader import ImageReader
# Import the RIOS image writer
from rios.imagewriter import ImageWriter
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
117
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# create the
if writer is
# Create
writer =
loop.
None:
the writer for output image.
ImageWriter(outputFile,
info=info,
firstblock=out,
drivername=HFA)
else:
# If the writer is created write the
# output block to the file.
writer.write(out)
# Close the writer and calculate
# the image statistics.
writer.close(calcStats=True)
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
118
119
83
84
85
86
87
88
89
90
After you have run this command open the images in TuiView and flick between
them to observe the change in the image, what do you notice?
8.4
Another option we have is to use the where function within numpy to select
pixel corresponding to certain criteria (i.e., pixels with an NDVI < 0.2 is not
vegetation) and classify them accordingly where a pixel values are used to indicate
the corresponding class (e.g., 1 = Forest, 2 = Water, 3 = Grass, etc). These images
where pixel values are not continuous but categories are referred to as thematic
images and there is a header value that can be set to indicate this type of image.
Therefore, in the script below there is a function for setting the image band metadata field LAYER TYPE to be thematic. Setting an image as thematic means
that the nearest neighbour algorithm will be used when calculating pyramids and
histograms needs to be binned with single whole values. It also means that a
colour table (See Chapter 9) can also be added.
To build the rule base the output pixel values need to be created, here using the
numpy function zeros (http://docs.scipy.org/doc/numpy/reference/generated/
numpy.zeros.html). The function zeros creates a numpy array of the requested
120
shape (in this case the shape is taken from the inputted image) where all the pixels
have a value of zero.
Using the where function (http://docs.scipy.org/doc/numpy/reference/generated/
numpy.where.html) a logic statement can be applied to an array or set of arrays
(which must be of the same size) to select the pixels for which the statement is
true. The where function returns an array of indexes which can be used to address
another array (i.e., the output array) and set a suitable output value (i.e., the
classification code).
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
121
72
73
74
75
76
77
78
79
80
81
82
83
84
122
infiles = applier.FilenameAssociations()
# Set image1 to the input image specified
infiles.image1 = args.input
# Create output files file names associations
outfiles = applier.FilenameAssociations()
# Set outImage to the output image specified
outfiles.outimage = args.output
# Create a controls objects
aControls = applier.ApplierControls()
# Specify that stats shouldnt be calcd
aControls.calcStats = False
# Set the progress object.
aControls.progress = cuiprogress.CUIProgressBar()
85
86
87
88
89
90
91
92
93
8.5
123
Exercises
8.6
Further Reading
GDAL - http://www.gdal.org
Python Documentation - http://www.python.org/doc
Core Python Programming (Second Edition), W.J. Chun. Prentice Hall
ISBN 0-13-226993-7
Learn UNIX in 10 minutes - http://freeengineer.org/learnUNIXin10minutes.
html
SciPy http://www.scipy.org/SciPy
NumPy http://numpy.scipy.org
RIOS https://bitbucket.org/chchrsc/rios/wiki/Home
Chapter 9
Raster Attribute Tables (RAT)
The RIOS software also allows raster attribute tables to be read and written
through GDAL. Raster attribute tables (RAT) are similar the the attribute tables
which are present on a vector (e.g., shapefile). Each row of the attribute table
refers to a pixel value within the image (e.g., row 0 refers to all pixels with a value
of 0). Therefore, RATs are used within thematic datasets were pixels values are
integers and refer to a category, such as a class from a classification, or a spatial
region, such as a segment from a segmentation. The columns of the RAT therefore
refer to variables, which correspond to information associated with the spatial
region cover by the image pixels of the clump(s) relating to the row within the
attribute table.
9.1
Reading Columns
To access the RAT using RIOS, you need to import the rat module. The RAT
module provides a simple interface for reading and writing columns. When a
column is read it is returned as a numpy array where the size is n 1 (i.e., the
number of rows in the attribute table).
As shown in the example below, a reading a column is just a single function call
specifying the input image file and the column name.
124
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
125
126
sys.exit()
42
43
44
45
46
9.2
Writing Columns
Writing a column is also quite straight forward just requiring a n 1 numpy array
with the the data to be written to the output file, the image file path and the
name of the column to be written to.
9.2.1
The first example reads a column from the input image and just multiples it by 2
and writes it to the image file as a new column.
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# column by 2.
def multiplyRATCol(imageFile, inColName, outColName):
# Read the input column
col = rat.readColumn(imageFile, inColName)
# Muliply the column by 2.
col = col * 2
# Write the output column to the file.
rat.writeColumn(imageFile, outColName, col)
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
127
128
sys.exit()
52
53
54
55
9.2.2
A useful column to have within the attribute table, where a classification has been
undertaken, is class names. This allows a user to click on the image and rather
than having to remember which codes correspond to which class they will be shown
a class name.
To add class names to the attribute table a new column needs to be created, where
the data type is set to be ASCII (string). To do this a copy of the histogram column
is made where the new numpy array is empty, of type string and the same length
at the histogram.
The following line using the ... syntax within the array index to specify all elements
of the array, such that they are all set to a value of NA.
Once the new column has been created then the class names can be simply defined
through referencing the appropriate array index.
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
129
130
9.3
Another useful tool is being able to add a colour table to an image, such that classes
are displayed in colours appropriate to make interpretation easier. To colour up
the per pixel classification undertake at the end of the previous exercise and given
class names using the previous scripts the following script is used to add a colour
table.
The colour table is represented as an n 5 dimensional array, where n is the
number of colours which are to be present within the colour table.
The 5 values associated with each colour are
1. Image Pixel Value
2. Red (0 255)
3. Green (0 255)
4. Blue (0 255)
5. Opacity (0 255)
Where an opacity of 0 means completely transparent and 255 means solid with
no transparency (opacity is something also referred to as alpha or alpha channel).
1
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Set 0 to
ct[0][0] =
ct[0][1] =
ct[0][2] =
ct[0][3] =
ct[0][4] =
# Set 1 to
ct[1][0] =
ct[1][1] =
ct[1][2] =
ct[1][3] =
ct[1][4] =
# Set 2 to
ct[2][0] =
ct[2][1] =
ct[2][2] =
ct[2][3] =
ct[2][4] =
be Royal Blue.
2
# Pixel Val
72 # Red
118 # Green
255 # Blue
255 # Opacity
# Set 3 to
ct[3][0] =
ct[3][1] =
ct[3][2] =
ct[3][3] =
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
131
46
132
47
48
49
50
51
52
53
# Set 4 to
ct[4][0] =
ct[4][1] =
ct[4][2] =
ct[4][3] =
ct[4][4] =
be Forest Green.
4
# Pixel Val
34 # Red
139 # Green
34 # Blue
255 # Opacity
54
55
rat.setColorTable(imageFile, ct)
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
133
To find the Red, Green and Blue (RGB) values to use with the colour table there
are many websites available only that provide lists of these colours (e.g., http:
//cloford.com/resources/colours/500col.htm).
9.4
To use a RAT to undertake a rule based object oriented classification the first
step is to create a set of image clumps (e.g., through segmentation see appendix A
section A.3), then the rows of the attribute table need populating with information
(e.g., see appendix A section A.4). Once these steps have been completed then a
rule base using the numpy where statements can be created and executed, resulting
in a similar process as the eCognition software.
9.4.1
#!/usr/bin/env python
3
4
5
6
7
8
9
10
11
12
134
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
135
# Photosynthetic Vegetation
l1P1 = numpy.where(numpy.logical_and(l1P1 == "NA",
numpy.logical_or(fdiPeak > FDI_PEAK_THRES,
fdiPost > FDI_POST_THRES)),
"Photosynthetic Vegetated", l1P1)
# Non PhotoSynthetic Vegetation
l1P1 = numpy.where(numpy.logical_and(l1P1 == "NA",
psriPre >= PSRI_PRE_THRES),
"Non Photosynthetic Vegetated", l1P1)
# Non Submerged Aquatic Veg
l1P1 = numpy.where(numpy.logical_and(l1P1 == "NA",
numpy.logical_and(repPeak >= REP_PEAK_THRES,
wbiPost <= WBI_POST_THRES)),
"Non Submerged Aquatic Vegetated", l1P1)
return l1P1
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
136
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
def runClassification(fname):
# Open the GDAL Dataset so it is just opened once
# and reused rather than each rios call reopening
# the image file which will large attribute tables
# can be slow.
ratDataset = gdal.Open( fname, gdal.GA_Update )
# Check the image file was openned correctly.
if not ratDataset == None:
# Provide feedback to the user.
print("Import Columns.")
urban = rat.readColumn(ratDataset, "PropUrban")
cult = rat.readColumn(ratDataset, "PropCult")
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
137
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
wbiPeak = PeakBlue/PeakNIR1
fdiPeak = PeakNIR1 - (PeakRedEdge + PeakCoastal)
repPeak = PeakRedEdge - (PeakNIR2 - PeakRed)
184
185
186
wbiPost = PostBlue/PostNIR1
fdiPost = PostNIR1 - (PostRedEdge + PostCoastal)
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
138
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
139
140
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# A function for
def colourLevel3(classLevel3):
# Create the empty output arrays and set them
# so they all have a value of 0 other than
# opacity which is 255 to create solid colours
level3red = numpy.empty_like(classLevel3, dtype=numpy.int)
level3red[...] = 0
level3green = numpy.empty_like(classLevel3, dtype=numpy.int)
level3green[...] = 0
level3blue = numpy.empty_like(classLevel3, dtype=numpy.int)
level3blue[...] = 0
level3alpha = numpy.empty_like(classLevel3, dtype=numpy.int)
level3alpha[...] = 255
27
28
29
30
31
32
141
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
142
112
143
255, level3alpha)
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
144
153
154
155
9.5
Exercises
9.6
Further Reading
GDAL - http://www.gdal.org
Python Documentation - http://www.python.org/doc
Core Python Programming (Second Edition), W.J. Chun. Prentice Hall
ISBN 0-13-226993-7
Learn UNIX in 10 minutes - http://freeengineer.org/learnUNIXin10minutes.
html
SciPy http://www.scipy.org/SciPy
NumPy http://numpy.scipy.org
RIOS https://bitbucket.org/chchrsc/rios/wiki/Home
Chapter 10
Golden Plover Population
Model
10.1
Introduction
The aim of this work sheet is to develop a populate model for a bird, called the
Golden Plover.
10.2
Model Output
The model is required to output the total population of the birds for each year
and the number of bird, eggs, fledgling and the number of fledglings which are
a year old. Providing an option to export the results as a plot should also be
provided.
10.3
Reading Parameters
To allow a user to parameterise the model a parameter card, such as the one shown
below, needs to be provided.
145
numOfYears=20
initalAdultPairPop=15
winterSurvivalRate=0.66
averageEggsPerPair=3.64
averageFledgelingsPerPair=3.2
predatorControl=False
numOfFledgelings=14
numOfFledgelingsYearOld=8
fledgelingsSurvivePredatorsCtrl=0.75
fledgelingsSurvivePredatorsNoCtrl=0.18
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
146
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
147
72
73
74
75
76
77
78
148
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
obj = GoldenPloverPopModel()
obj.run(args.input)
10.4
1
The Model
#!/usr/bin/env python
2
3
4
5
6
7
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
149
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
params[paramVals[0]] = False
elif paramVals[1].lower() == "true":
params[paramVals[0]] = True
else:
print("predatorControl must be either True or False.")
sys.exit()
elif paramVals[0] == "numOfFledgelings":
params[paramVals[0]] = int(paramVals[1])
elif paramVals[0] == "numOfFledgelingsYearOld":
params[paramVals[0]] = int(paramVals[1])
elif paramVals[0] == "fledgelingsSurvivePredatorsCtrl":
params[paramVals[0]] = float(paramVals[1])
elif paramVals[0] == "fledgelingsSurvivePredatorsNoCtrl":
params[paramVals[0]] = float(paramVals[1])
else:
# If parameter is not known then just store as
# a string.
params[paramVals[0]] = paramVals[1]
# Return the parameters and parameters string
return params, paramsStr
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
150
90
91
151
numYearOldFledgelingsOut.append(numOfFledgelingsYearOld)
numOfFledgelingsOut.append(numOfFledgelings)
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Once the model has completed return the output variables for analysis.
return numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut,
125
126
127
128
129
130
152
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
10.5
1
Exporting Data
#!/usr/bin/env python
2
3
4
5
6
7
8
import argparse
# Import the maths library
import math as math
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
153
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
154
88
89
90
91
155
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Once the model has completed return the output variables for analysis.
return numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut,
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
156
170
171
157
outFile.write(numOfFledgesStrs)
outFile.write(numOfEggsStrs)
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
10.6
1
Creating Plots
#!/usr/bin/env python
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
158
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
159
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
params[paramVals[0]] = True
else:
print("predatorControl must be either True or False.")
sys.exit()
elif paramVals[0] == "numOfFledgelings":
params[paramVals[0]] = int(paramVals[1])
elif paramVals[0] == "numOfFledgelingsYearOld":
params[paramVals[0]] = int(paramVals[1])
elif paramVals[0] == "fledgelingsSurvivePredatorsCtrl":
params[paramVals[0]] = float(paramVals[1])
elif paramVals[0] == "fledgelingsSurvivePredatorsNoCtrl":
params[paramVals[0]] = float(paramVals[1])
else:
# If parameter is not known then just store as
# a string.
params[paramVals[0]] = paramVals[1]
# Return the parameters and parameters string
return params, paramsStr
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
160
161
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# Once the model has completed return the output variables for analysis.
return numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut,
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
162
195
163
outFile.close()
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
164
plt.savefig((outputFile+"_numOfEggs.pdf"), format=PDF)
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
else:
# If the matplotlib library is not available
# print out a suitable error message.
print("Matplotlib is not available and therefore the plots cannot be created.")
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
165
# to a text file.
self.writeResultsFile(outputFile, paramsStr, params, numOfAdultsPairsOut, numYearOldFledg
# Check whether a path has been provided
# for the plots. If it has then generate
# output plots.
if plotsPath is not None:
# Give the user feedback of whats happenign.
print("Generating plots of the results")
# Call the function to generate plots
self.plots(plotsPath, params, numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEgg
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
166
10.7
Exercises
10.8
Further Reading
Appendix A
RSGISLib
A.1
Introduction to RSGISLib
The remote sensing and GIS software library (RSGISLib) was developed at Aberystwyth University by Pete Bunting and Daniel Clewley. Development started in
April 2008 and has been actively maintained and added to ever since. For more
information see http://www.rsgislib.org.
A.2
Using RSGISLib
RSGISLib has an XML interface and Python bindings. The python bindings
allows the RSGISLib functions to be called from python scripts, as with other
functions.
167
APPENDIX A. RSGISLIB
A.3
168
Segmentation
APPENDIX A. RSGISLIB
169
#! /usr/bin/env python
2
3
############################################################################
10
11
# Purpose:
12
13
14
# Email: pfb@aber.ac.uk
15
# Date: 24/07/2013
16
# Version: 1.0
17
18
# History:
19
20
21
############################################################################
RSGISLibSegmentation.py
22
23
APPENDIX A. RSGISLIB
24
import argparse
25
26
import rsgislib
27
28
import rsgislib.imageutils
29
30
import rsgislib.imagecalc
31
32
import rsgislib.segmentation
33
34
import rsgislib.rastergis
35
36
import rsgislib
37
import os.path
38
import os
39
40
import collections
41
import fnmatch
170
42
43
44
45
46
47
rsgisUtils = rsgislib.RSGISPyUtils()
48
49
basefile = os.path.basename(inputImg)
50
basename = os.path.splitext(basefile)[0]
51
52
outFileExt = rsgisUtils.getFileExtension(gdalFormat)
53
54
createdDIR = False
APPENDIX A. RSGISLIB
55
171
if not os.path.isdir(tmpath):
56
os.makedirs(tmpath)
57
createdDIR = True
58
59
60
segmentFile = inputImg
61
if not noStretch:
62
63
64
65
66
67
68
69
segmentFile = os.path.join(tmpath,basename+
str("_stchd")+outFileExt)
strchFile = os.path.join(tmpath,basename+
str("_stchdonly")+outFileExt)
strchFileOffset = os.path.join(tmpath,basename+
str("_stchdonlyOff")+outFileExt)
strchMaskFile = os.path.join(tmpath,basename+
str("_stchdmaskonly")+outFileExt)
70
71
rsgislib.imageutils.stretchImage(inputImg, strchFile,
72
73
rsgislib.imageutils.STRETCH_LINEARSTDDEV, 2)
74
75
76
77
78
79
80
81
ImgBand = collections.namedtuple(ImgBands,
82
83
bandMathBands = list()
84
bandMathBands.append(ImgBand(bandName="b1",
85
fileName=inputImg, bandIndex=1))
APPENDIX A. RSGISLIB
86
87
172
88
89
90
rsgislib.imageutils.maskImage(strchFileOffset, strchMaskFile,
91
92
93
if not noDelete:
94
95
rsgisUtils.deleteFileWithBasename(strchFile)
96
rsgisUtils.deleteFileWithBasename(strchFileOffset)
97
rsgisUtils.deleteFileWithBasename(strchMaskFile)
98
99
# Perform KMEANS
100
print("Performing KMeans.")
101
outMatrixFile = os.path.join(tmpath,basename+str("_kmeansclusters"))
102
103
104
105
# Apply KMEANS
106
107
kMeansFileZones = os.path.join(tmpath,basename+str("_kmeans")+outFileExt)
108
rsgislib.segmentation.labelPixelsFromClusterCentres(segmentFile,
109
110
111
112
113
kMeansFileZonesNoSgls = os.path.join(tmpath,basename+
114
115
116
str("_kmeans_nosgl")+outFileExt)
kMeansFileZonesNoSglsTmp = os.path.join(tmpath,basename+
str("_kmeans_nosglTMP")+outFileExt)
APPENDIX A. RSGISLIB
117
118
173
rsgislib.segmentation.eliminateSinglePixels(segmentFile, kMeansFileZones,
kMeansFileZonesNoSgls, kMeansFileZonesNoSglsTmp, gdalFormat, False, True)
119
120
# Clump
121
print("Perform clump.")
122
initClumpsFile = os.path.join(tmpath,basename+str("_clumps")+outFileExt)
123
rsgislib.segmentation.clump(kMeansFileZonesNoSgls, initClumpsFile,
124
gdalFormat, False, 0)
125
126
127
128
elimClumpsFile = os.path.join(tmpath,basename+str("_clumps_elim")+outFileExt)
129
rsgislib.segmentation.RMSmallClumpsStepwise(segmentFile, initClumpsFile,
130
131
132
# Relabel clumps
133
print("Relabel clumps.")
134
rsgislib.segmentation.relabelClumps(elimClumpsFile, outputClumps,
135
gdalFormat, False)
136
137
138
if not noStats:
139
140
141
142
143
144
145
146
147
APPENDIX A. RSGISLIB
148
if not noDelete:
149
150
151
rsgisUtils.deleteFileWithBasename(kMeansFileZones)
152
rsgisUtils.deleteFileWithBasename(kMeansFileZonesNoSgls)
153
rsgisUtils.deleteFileWithBasename(kMeansFileZonesNoSglsTmp)
154
rsgisUtils.deleteFileWithBasename(initClumpsFile)
155
rsgisUtils.deleteFileWithBasename(elimClumpsFile)
156
if not noStretch:
rsgisUtils.deleteFileWithBasename(segmentFile)
157
if createdDIR:
158
rsgisUtils.deleteDIR(tmpath)
159
160
161
if __name__ == __main__:
162
"""
163
164
"""
165
parser = argparse.ArgumentParser(prog=rsgislibsegmentation.py,
166
167
168
169
170
171
172
173
174
175
176
177
178
174
APPENDIX A. RSGISLIB
179
175
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
default=1000000, required=False)
201
202
203
204
205
206
207
208
209
APPENDIX A. RSGISLIB
210
176
211
212
# Define the argument for specifying that the input image bands
213
214
215
216
217
218
# Define the argument for specifying that the input image bands
219
220
221
222
223
224
225
226
args = parser.parse_args()
227
228
segsObj = RSGISLibSegmentation()
229
230
231
232
APPENDIX A. RSGISLIB
177
A.4
Populating Segments
To populate the segments with statistics (i.e., Mean for each spectral band) there
is a function within the rastergis module of RSGISLib. To populate segments with
statistics from two images and a DEM the following script is used:
1
#! /usr/bin/env python
2
3
#######################################
# segments
# Date: DD/MM/YYYY
10
# Version: 1.0
11
#######################################
12
13
14
import argparse
15
16
import rsgislib.rastergis
17
18
APPENDIX A. RSGISLIB
178
19
20
21
22
23
24
25
stats2Calc = list()
26
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=1, \
27
28
29
30
31
32
33
34
35
36
37
meanField="MayBlue", stdDevField="MaySDBlue"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=2, \
meanField="MayGreen", stdDevField="MaySDGreen"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=3, \
meanField="MayRed", stdDevField="MaySDRed"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=4, \
meanField="MayNIR", stdDevField="MaySDNIR"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=5, \
meanField="MaySWIR1", stdDevField="MaySDSWIR1"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=6, \
meanField="MaySWIR2", stdDevField="MaySDSWIR2"))
38
39
40
41
42
43
stats2Calc = list()
44
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=1, \
45
46
47
48
49
meanField="JuneBlue", stdDevField="JuneSDBlue"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=2, \
meanField="JuneGreen", stdDevField="JuneSDGreen"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=3, \
meanField="JuneRed", stdDevField="JuneSDRed"))
APPENDIX A. RSGISLIB
50
51
52
53
54
55
179
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=4, \
meanField="JuneNIR", stdDevField="JuneSDNIR"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=5, \
meanField="JuneSWIR1", stdDevField="JuneSDSWIR1"))
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=6, \
meanField="JuneSWIR2", stdDevField="JuneSDSWIR2"))
56
57
58
59
60
61
62
stats2Calc = list()
63
stats2Calc.append(rsgislib.rastergis.BandAttStats(band=1, \
64
minField="MinDEM", maxField="MaxDEM", \
65
stdDevField="StdDevDEM", meanField="MeanDEM"))
66
67
68
69
70
if __name__ == __main__:
71
72
parser = argparse.ArgumentParser()
73
74
75
76
77
78
79
80
parser.add_argument("--inclumps", type=str, \
help=Specify the input clumps file., required=True)
parser.add_argument("--mayimage", type=str, \
help=Specify the input image for May., required=True)
parser.add_argument("--juneimage", type=str, \
help=Specify the input image for June., required=True)
parser.add_argument("--dem", type=str, \
APPENDIX A. RSGISLIB
81
180
82
83
args = parser.parse_args()
84
85
popStats = PopWithStats(args.inclumps)
86
87
88
popStats.popWithMayStats(args.mayimage)
89
90
91
popStats.popWithMayStats(args.juneimage)
92
93
94
popStats.popWithMayStats(args.dem)
95
96
97
If you are going to use indices and other derived information within your classification it is quite often a good idea to set up a python script to calculate those indices
and write them back to the image rather than over complicating your classification
script. An example of this is shown below.
1
#!/usr/bin/env python
2
3
import sys
import numpy
7
8
APPENDIX A. RSGISLIB
#Input file.
10
fname = "L7ETM_530N035W_Classification.kea"
11
12
13
print("Import Columns.")
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
MayNIR.astype(numpy.float32)
31
MayRed.astype(numpy.float32)
32
JuneNIR.astype(numpy.float32)
33
JuneRed.astype(numpy.float32)
34
MayBlue.astype(numpy.float32)
35
JuneBlue.astype(numpy.float32)
36
37
print("Calculate Indices.")
38
39
181
APPENDIX A. RSGISLIB
40
41
MayWBI = MayBlue/MayNIR
42
JuneWBI = JuneBlue/JuneNIR
43
44
45
46
47
182