Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
27 views

AIMS-Python Notes 2016

Uploaded by

kpakoup
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

AIMS-Python Notes 2016

Uploaded by

kpakoup
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

AIMS-Python_Notes_2016

September 9, 2016

1 AIMS Ghana Scientific Computing with Python


1.1 Day One Notes: Introduction to Python
Author: Eyram Schwinger
Python is a programming language that boasts ease of use. Python as a language has gained a
lot of recognition from the scientific community for the following reasons: 1. Ease of Use: Python
as a language was created to be easy to use by even beginner programmers. It has been used
in many institutions to teach introductory programming. 2. Dynamic typing: In python one
can easily replace one variable value with another that is not the same data type. It does not
enforce data types strictly and it does all the work in the background related to dynamic memory
allocation for new data types. 3. Batteries included: The web page pypi.python.org has more than
10,000 python modules developped by developpers available for use. Modules include those for
machine learning, image/signal processing, visualization, differential equations, interpolation, etc
and are available for free download. Chances are, whatever you want to do, you can find a base
module without resorting to reinventing the wheel.
Python has some fundamental/built-in data types that are available with the raw installation.
Examples include: 1. Numeric data types: a. Integers b. Floating point numbers c. Complex
Numbers d. Boolean 2. Strings 3. Lists 4. Tuples 5. Dictionaries 6. Sets

1.2 Variable Names


A variable is a value stored in computer memory. As such it should have a name and the value
stored. We name variable by giving the storage location human readable names instead of the
cryptic machine names. In python variables can be named using the combination of alphanumeric
characters and the underscore character. The first character cannot be numeric. It is advisable to
also use descriptive names for variables. This way, the purpose of the variable can be understood
by others. The python style convention recommends lowecase characters with words separated
by underscore for readability. Mixed cases are allowed if that is an existing convention. The
following are valid variable names:

In [1]: students = 50

In [2]: std_in_cls = 30

In [3]: n = 10

In variable name ‘=’ is the assignment operator. It assigns the value on the right to the name
on the left. In the examples above, the value 50 is assigned to the variable students.

1
1.3 Numeric Types
1.3.1 Int and Long
Integers store numerical values without fractional parts. Integers therefore have no decimals. An
integer is defined by just assigning a number to a variable without any decimals trailing. Example:

In [4]: students = 50

In [5]: num = 0

In [6]: neg_num = -5

The precision of integers is limited to the number of bits of the operating system. If this pre-
cision is exceeded during calculations, python automatically converts integers into long integers
which are unlimited.

In [7]: num = 2

In [8]: num**32-1

Out[8]: 4294967295

In [9]: num**63

Out[9]: 9223372036854775808L

The L at the end of the last calculation indicates that the result of the calculation was more than
the integer data type could accommodate. The resulting value was converted to a long integer
type.

1.3.2 Floats
Floating point numbers are numbers used by the computer to represent real numbers. Since the
real numbers are infinite and the computer is a finite precision device, floats just represent a subset
of the real number system. Floats are defined by adding the decimal point. Examples

In [10]: num = .5

In [11]: num = 5.

In [12]: num = 234.567

1.3.3 Complex Numbers


Complex numbers have both a real and an imaginary part. Where the imaginary part is defined
using the engineering notation j. So √
j = −1.
To create a complex data type in python just define it as real+imagj

In [13]: cmp_num = 5+25j

2
In [14]: c = 12j

In [15]: c = 12 + 0j

Complex numbers posses certain properties. They have a real part and an imaginary part.
They also have a conjugate. Which is the number with the same real part but the sign of the
imaginary part changed. Python can calculate the conjugate of the complex number as a method
and also return the real and imaginary parts of the number. These are built into the complex
number as defined in python. These attributes can be accessed using the dot operator (.) as in
c.property or c.method(). Methods are invoked with brackets at the end while properties are not.
Essentially, the attributes are variables contained in the objects while the methods are functions
built into the object. Example usage of attributes and methods of the string class are as follows

In [16]: cmp_num.real

Out[16]: 5.0

In [17]: cmp_num.imag

Out[17]: 25.0

In [18]: cmp_num.conjugate()

Out[18]: (5-25j)

Calling instance variables or attributes as functions leads to errors

In [19]: cmp_num.real()

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-19-e9872ef84718> in <module>()
----> 1 cmp_num.real()

TypeError: 'float' object is not callable

In [20]: cmp_num.conjugate

Out[20]: <function conjugate>

The last problem is not so obvious because python does not spit out an error message. Instead,
it returns the function as the result. The function can be called to get the actual value.

In [21]: test = cmp_num.conjugate

In [22]: test()

Out[22]: (5-25j)

3
1.4 Strings
Stings can be made up of alphanumeric characters. In python, strings are defined by placing them
between matching quotes. Example

In [23]: mystr = "some string"

In [24]: mystr2 = 'another string'

In [25]: mystr3 = '''yet another long


multiline string'''

In [26]: mystr4 = """another multiline


line string"""

Strings have numerous methods attached to them. Object methods can be called using the dot
operator. Example

In [27]: mystr.capitalize()

Out[27]: 'Some string'

In [28]: mystr.upper()

Out[28]: 'SOME STRING'

In [29]: mystr.count('s')

Out[29]: 2

In [30]: mystr.replace('string','text')

Out[30]: 'some text'

1.5 Lists
Lists store a group of variables so they can be accessed individually or collectively. The variables
do not need to be of the same type. List can be created empty or be populated during creation.
Example

In [31]: mylist = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7]

In [32]: mylist2 = [0.0,'a',2+4j,2000,'learn python',[1,2,3,4,5,6,7]]

Python also makes it possible to create a numeric sequence as a list using the range function. In
this case, we can create the sequence a + di, i = 0, ..., n. This is equivalent to a, a + d, a + 2d, . . . , a +
nd. This can be done using the range function as range(a,b,d) where b = a + nd + 1.

In [33]: seq = range(0,11)

In [34]: seq

4
Out[34]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In defining seq, since we omitted the stepsize d, it defaults to one.

In [35]: seq2 = range(0,11,2)

In [36]: seq2

Out[36]: [0, 2, 4, 6, 8, 10]

In [37]: seq3 = range(0,-24,-3)

In [38]: seq3

Out[38]: [0, -3, -6, -9, -12, -15, -18, -21]

Note that the range function expects all arguments (inputs to the funtion) to be integers. You
get an error when that restriction is not respected. Example

In [39]: seq4 = range(0.0,-24,-3)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-39-98221f92311b> in <module>()
----> 1 seq4 = range(0.0,-24,-3)

TypeError: range() integer start argument expected, got float.

The elements of lists and other iterables (sequences) can be accessed using their indexes. Se-
quence indexing in python begins from 0. So the first element is index 0, the second is 1, the third
is 2 and so on. To access the first element in seq3, we type

In [40]: seq3[0]

Out[40]: 0

Note that the square brackets used to create the sequence is the same one used to index it.
Python generally, uses square brackets for indexing. This is understandable since the bracket () is
reserved for function calls. The len function is used to return the length of a sequence. This means,
it is possible to query even a programatically created sequence for the number of elements it has.

In [41]: len(seq3)

Out[41]: 8

In [42]: mylist2

5
Out[42]: [0.0, 'a', (2+4j), 2000, 'learn python', [1, 2, 3, 4, 5, 6, 7]]

mylist2 has different kinds of elements including two strings and a list. To access the elements
sequences contained in other sequences like in this case, we use two indices; the first one points to
the location of the inner list and the second the location of the element in the inner list. Example

In [43]: mylist2[1][0]

Out[43]: 'a'

In [44]: mylist2[4][6]

Out[44]: 'p'

In [45]: mylist2[5][3]

Out[45]: 4

Python uses the negative index to return elements from the back of the list. In this sense, -1 is
for the last element, -2 for the element before the last and so on.

In [46]: mylist2[-1]

Out[46]: [1, 2, 3, 4, 5, 6, 7]

In [47]: mylist2[-2]

Out[47]: 'learn python'

1.5.1 List Slicing


Apart from accessing individual elements, it is also possible to access a group of elements in a
sequence using a sequence. This is called slicing. The slicing operator in python is the colon (:).
The basic usage of the colon operator is to define the sequence start:stop. This returns the element
from index start to stop excluding stop or [start, stop). Note however that this is a discrete set
not a continuum. The operator has an extended sequence that allows for the use of step sizes
other than one using the syntax start:stop:step. Like the range function, this gives the indices
start, start + step, start + 2step, . . . , start + nstep < stop. When the start value is omitted, it
defaults to 0. When the stop value is omitted, it defaults to the end of the list and the step value
defaults to 1. Examples

In [48]: seq

Out[48]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [49]: seq[0:4]

Out[49]: [0, 1, 2, 3]

In [50]: seq[0:4:2]

6
Out[50]: [0, 2]

In [51]: seq[:4:2]

Out[51]: [0, 2]

In [52]: seq[4::2]

Out[52]: [4, 6, 8, 10]

In [53]: seq[4:len(seq):2]

Out[53]: [4, 6, 8, 10]

In [54]: seq[::2]

Out[54]: [0, 2, 4, 6, 8, 10]

In [55]: seq[len(seq):4:-1]

Out[55]: [10, 9, 8, 7, 6, 5]

In [56]: seq[-1:4:-1]

Out[56]: [10, 9, 8, 7, 6, 5]

In [57]: seq[::]

Out[57]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [58]: seq[::-1]

Out[58]: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

1.5.2 List Methods


Lists have some methods that can be accessed using the dot notation. They help make working
with lists relatively easy. For instance, to add a single element to the end of the list, the append
method is called on the list. The input to the list is the element to be added. Whatever the type of
element, it is added to the list as a single element.

In [59]: seq.append(11)

In [60]: seq

Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

To add multiple elements to the end of the list, the extend method is used. Extend allows to
add multiple elements to the list.

In [61]: seq.extend([12,13,14])

7
In [62]: seq
Out[62]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
In [63]: seq.append([15,16,17])
In [64]: seq
Out[64]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, [15, 16, 17]]
The pop method returns the last element from the list and removes it from the list.
In [65]: seq.pop()
Out[65]: [15, 16, 17]
In [66]: seq
Out[66]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
The index method of lists allows to query the location of an element. If the element exists, the
method returns its position, else an error is returned
In [67]: seq.index(6)
Out[67]: 6
In [68]: seq.index(15)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-68-416a6e38e5a1> in <module>()
----> 1 seq.index(15)

ValueError: 15 is not in list

The remove method removes an element from the list of it exists.


In [69]: seq.remove(6)
In [70]: seq
Out[70]: [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14]
We can also remove an element from the list using its index. To remove the element at index 5
from the list, the statement to use is
In [71]: del seq[5]
In [72]: seq
Out[72]: [0, 1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14]

8
1.6 Arithmetic Operators
The basic arithmetic operators are available in python. These are
1. Addition - (+)
2. Subtraction - (-)
3. Multiplication - (*)
4. Division - (/)
5. Remainder - (%)
6. Exponentiation (power) - (**)
We also have the abs function for absolute value and the unary negation (-). Here are some
examples of their use
In [73]: x = 2; y = 3; z = 2+5j; t = 4.5
In [74]: x + y
Out[74]: 5
In [75]: y + t
Out[75]: 7.5
In [76]: y + z
Out[76]: (5+5j)
In [77]: t + z
Out[77]: (6.5+5j)
In [78]: t*1j+z
Out[78]: (2+9.5j)
In [79]: y/x
Out[79]: 1
In [80]: y/t
Out[80]: 0.6666666666666666
In [81]: t**y
Out[81]: 91.125
In [82]: y%x
Out[82]: 1
In [83]: cmp_num
Out[83]: (5+25j)
In [84]: cmp_num*cmp_num.conjugate()
Out[84]: (650+0j)
In [85]: (cmp_num*cmp_num.conjugate()).real
Out[85]: 650.0

9
1.6.1 Operator Precedence
Operator precedence refers to the order in which python performs operations. For the arithmetic
operators, the precedence is as follow (from high to low precedence) 1. () 2. ** 3. Unary -,+ 4. *, /,
//, % 5. +, -
The operator // is also a division operator. In the strictest sense, it is the integer division
operator. It does floor division. This means it returns the quotient only, no matter what the data
type. Its opposite is the remainder operator (%). Performing the calculation below

In [86]: 2 + 3 * 24 / 6. - 4 % 2**2

Out[86]: 14.0

will first calculate 2 ** 2 which is 4. Then 3*24/6 which is 12. Then 4%4 which is 0. So we get
2+12-0=14. If we want to alter the natural order, then brackets must be introduced. An example is

In [87]: ((2 + 3) * 24 / (6. - 4 % 2))**2

Out[87]: 400.0

The arithmetic sequence + and * also apply to sequences. In such use, the addition opera-
tor is a joining/concatenation operator while the multiplication operator is a repetition operator.
Example, given the lists

In [88]: a = [1,2,['a','b']]; b = [2+3j,4.0]

In [89]: a + b

Out[89]: [1, 2, ['a', 'b'], (2+3j), 4.0]

In [90]: a * 2

Out[90]: [1, 2, ['a', 'b'], 1, 2, ['a', 'b']]

The same works for strings. Given the strings

In [91]: a = 'Welcome'; b = 'Python'

In [92]: a + b

Out[92]: 'WelcomePython'

In [93]: a + ' to ' + b

Out[93]: 'Welcome to Python'

10
1.7 Python blocks
In python, a block refers to a group of statements that collectively perform a given task. Different
programming languages define statement blocks in different ways. Some use the curly braces {},
others have explicit end statements for them. Python recognizes blocks by levels of indentation.
This is a very important feature in python. One out of place indent can result in python spitting out
an error, or worse still give you wrong values. All statements under a block must have the same
indentation level. An ideal tab press in python is 4 spaces. ipython automatically sets indentations
when a block is detected. Indentations can be set using the tab key on the keyboard. Tab space
length can be configured in different text editors. A block definition in python is a follows:
block header: block statment 1 block statment 2 block statment 3
... block statment n other statements
In this definition, block header can be a function header, while header, for header, etc. ‘other
statements’ is not part of the block. It will be run after the block execution completes. ‘block
statement 1’ to ‘block statement n’ are all part of the block. If we had mistakenly typed
block header: block statment 1 block statment 2 block statment 3
... block statment n other statements
the result will be an error because the indentation of ‘block statement 2’ does not fit the general
indentation of the block. The general definition of a block therefore include the header, followed
by a colon and the statements within the body all indented. For a one line block, the block state-
ment can immediately follow the colon as in
block header: block statement

1.8 Iterating through sequences


1.8.1 for loop
The for loop is used to iterate through the elements of a sequence. The for loop goes through the
sequence, picking each element. For each element that is picked from the array, python applies
the block statements in sequence to the element. When all the elements are exhausted, the for loop
ends. A for loop in defined in python as
for each_element in list: statement 1 statement 2 statement 3 ...
statement n
each_element in this case is a variable name that lives only throughout the life of the for loop.
Once the loop end, that variable is destroyed.

In [94]: mylist

Out[94]: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]

In [95]: list_res = []
for i in mylist:
list_res.append(i**2)
print list_res

[0.0, 0.010000000000000002, 0.04000000000000001, 0.09, 0.16000000000000003, 0.25, 0

11
1.9 Boolean values and comparison operators
Booloean values evaluate to True or False. Python provides these two values for evaluating condi-
tional statements. Note that case sensitivity rules apply here. The first letter of the boolean values
are capitalized.

In [96]: a = True

In [97]: b = False

In [98]: c = true

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-98-1f00ce3411f8> in <module>()
----> 1 c = true

NameError: name 'true' is not defined

When considering boolean values, python considers the following as false; False, an empty
string ”, an empty list [], zeros 0, 0.0, 0 + 0j. Nonzero values and nonempty sequences are consid-
ered true. Python uses comparison operators to evaluate truth values. The comparison operators
are: 1. Equal - == 2. Not Equal - != 3. Greater then - > 4. Greater than or equal to - >= 5. Less than
- < 6. Less than or equal to - <= 7. is
In addition, sequences also have the ‘in’ statement to query whether an object is an element
of the sequence. Python also provides the operators not, and, or for working with comparison
operators. Not negates the truth value of a statement. and and or combine two truth statements.
And evaluates to true only if both statements are true else it evaluates to false. Or evaluates to
false only if both statements are false, else it evaluates to true.

In [99]: a = 5; b = 3; c = 5; d = [1,2,'xyz','AIMS','python']; e = d; f = [1,2,'xyz

In [ ]:

In [100]: a == b

Out[100]: False

In [101]: a > b

Out[101]: True

In [102]: a >= b

Out[102]: True

12
In [103]: a <= b

Out[103]: False

In [104]: b <= c

Out[104]: True

In [105]: 'AIMS' in d

Out[105]: True

In [106]: e == d

Out[106]: True

In [107]: e is d

Out[107]: True

In [108]: a == c

Out[108]: True

In [109]: d == f

Out[109]: True

In [110]: d is f

Out[110]: False

In [111]: a == b or a > b

Out[111]: True

In [112]: d is not f

Out[112]: True

In [113]: d == f and d is f

Out[113]: False

13
1.10 The while loop
The while statement is another block statement. Unlike for loops however, while is not limited to
working with sequences. The while statment will continue running while a certain condition is
true. The format of the while stament is

while condition:
statement 1
statement 2
...
statement m

At any point when the condition changes, the loop terminates. If there is no provision for
condition to become false, we have what we call an infinite loop where the loop runs until it is
forced to terminate using either a keyboard interrupt or by the operating system. This is not an
acceptable use of the while loop. In coding the while loop therefore, it is important to provide a
means for the loop to terminate within your code. The following is and example:

In [114]: seq

Out[114]: [0, 1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14]

In [115]: seq_sq = [0]*len(seq)


seq_sq

Out[115]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [116]: i = 0
while i < len(seq):
seq_sq[i] = seq[i]**2
i = i + 1
seq_sq

Out[116]: [0, 1, 4, 9, 16, 49, 64, 81, 100, 121, 144, 169, 196]

In the code above, i is initialized to zero. Then the while loop checks if i is less than the length
of seq. If it is, we enter the loop body where the value of seq at index i is squared and assigned to
seq_sq at index i. To ensure, i becomes greater than or equal to len(seq) we increment i. The same
thing can be done the folowing way

In [117]: seq_sq2 = [0]*len(seq)


i = len(seq) - 1
while i >= 0:
seq_sq2[i] = seq[i]**2
i = i - 1
seq_sq2

Out[117]: [0, 1, 4, 9, 16, 49, 64, 81, 100, 121, 144, 169, 196]

In the codes i = i + 1 just says to increment the value of i by 1 and assign the value back to the
variable i. In the same way i = i - 1 says to decrease the value of i by 1 and assign the value back
to i. There are shorter ways of writing these statements:

14
1. i += 1 (i = i + 1) or i += n (for i = i + n)
2. i -= n (i = i - n)
3. i = n (i = i n)
4. i /= n (i = i / n)
5. i = n (i = i n)
6. i %= n (i = i % n)

1.11 if statements
The if statement is another block with the format:

if condition:
if_statement 1
if_statement 2
...
if_statement n

The whole block is executed only if condition evaluates to true. If not, the block is skipped.
The statement can be extended to perform another set of actions if condition is false, using the
format:

if condition:
if_statement 1
if_statement 2
...
if_statement n
else:
else_statement 1
else_statement 2
...
else_statement m

In this case, the else block is executed if condition is false. If can be used to evalaute multiple
conditions using elif which is the short form of else if. In that case, we have the format:

if condition_1:
if_statement 1
if_statement 2
...
if_statement n
elif condition_2:
elif1_statement 1
elif1_statement 2
...
elif1_statement p
elif condition_3:
elif2_statement 1
elif2_statement 2

15
...
elif2_statement s
...
elif condition_x:
elifx_statement 1
elifx_statement 2
...
elifx_statement t
else:
else_statement 1
else_statement 2
...
else_statement m

Note that in evaluating multiple conditions, the first condition that evaluates to true is used,
python does not go further to check if there is a better matching condition. It is up to the prgram-
mer to ensure that the best condition is caught first. In the first example, we output a pass or fail
based on a student mark being greater than or equal to 60.

In [118]: a = 75
if a >= 60:
print 'pass'
else:
print 'fail'

pass

We now make it a bit more challenging. We introduce the following grading scheme

1. 80 - 100 for Distinction


2. 70 - 84 for Good pass
3. 60 - 69 for pass
4. <60 for fail

The following codes all work

In [119]: if a >= 85:


print 'Distinction'
elif a >= 70:
print 'Good Pass'
elif a >= 60:
print 'Pass'
else:
print 'Fail'

Good Pass

16
In [120]: if a < 60:
print 'Fail'
elif a < 70:
print 'Pass'
elif a < 85:
print 'Good Pass'
else:
print 'Distinction'

Good Pass

In the first example, if the value provided is greater than 85, the grade is distinction, if not we
check to see if it is greater than 70. Since the first condition takes care of values greater than 80,
anything that passes to the second condition is naturally not greater than or equal to 85. The third
condition is evaluated only if the first two evaluate to false. This means there is no way a number
will fall to the third condition it is 70 or higher. If the conditions are ordered wrongly the results
can be anything. Take the following example:

In [121]: if a >= 60:


print 'Pass'
elif a >= 70:
print 'Good Pass'
elif a >= 85:
print 'Distinction'
else:
print 'Fail'

Pass

Although a = 75, the student is graded wrongly, because the value of a satisfies both the first
and the second condition but python pick the first true evaluation it finds and that gives a pass. If
we rewrite this example using the and operator with the beginning and end range, we get:

In [122]: if a >= 60 and a < 70:


print 'Pass'
elif a >= 70 and a < 85:
print 'Good Pass'
elif a >= 85 and a <= 100:
print 'Distinction'
else:
print 'Fail'

Good Pass

This works since the range of values are well defined and and the operation succeeds no matter
the order. Python can also do complex range comparison using the comparison operators. The
above code in this case can be written as:

17
In [123]: if 60<=a<70:
print 'Pass'
elif 70<=a<85:
print 'Good Pass'
elif 85<=a<=100:
print 'Distinction'
else:
print 'Fail'

Good Pass

1.12 A few more python iterables


1.12.1 Tuples
A Python tuple is another iterable. A tuple is created using the bracket but accessed in the same
manner as lists are.

In [124]: a = (1,2,3,4,5,6,7)

In [125]: a[2]

Out[125]: 3

1.12.2 Dictionaries
A dictionary is a python structure that holds a key-value pair. A dictionary is created using the
curly braces {}. Each key-value pair is separated by a comma and a key is separated from its value
by a colon. An example of a dictionary is

In [126]: reg_cap = {'Ashanti': 'Kumasi','Greater Accra':'Accra','Volta':'Ho','Cent


'Western':'Takoradi','Eastern':'Koforidua','Brong Ahafo':'Suny
'Northern':'Tamale','Upper East':'Bolgatanga','Upper West':'Wa

In [127]: reg_cap

Out[127]: {'Ashanti': 'Kumasi',


'Brong Ahafo': 'Sunyani',
'Central': 'Cape Coast',
'Eastern': 'Koforidua',
'Greater Accra': 'Accra',
'Northern': 'Tamale',
'Upper East': 'Bolgatanga',
'Upper West': 'Wa',
'Volta': 'Ho',
'Western': 'Takoradi'}

In [128]: reg_pop = {'Ashanti': 4780380,'Greater Accra':4010054,'Volta':2118252,'Ce


'Western':2376021,'Eastern':2633154,'Brong Ahafo':2310983,\
'Northern':2479461,'Upper East':1046545,'Upper West':702110}

18
In [129]: reg_pop

Out[129]: {'Ashanti': 4780380,


'Brong Ahafo': 2310983,
'Central': 2201863,
'Eastern': 2633154,
'Greater Accra': 4010054,
'Northern': 2479461,
'Upper East': 1046545,
'Upper West': 702110,
'Volta': 2118252,
'Western': 2376021}

In [130]: reg_size = {'Ashanti': 24389,'Greater Accra':3245,'Volta':20570,'Central'


'Western':23941,'Eastern':19323,'Brong Ahafo':39557,\
'Northern':70384,'Upper East':8842,'Upper West':18476}

In [131]: reg_size

Out[131]: {'Ashanti': 24389,


'Brong Ahafo': 39557,
'Central': 9826,
'Eastern': 19323,
'Greater Accra': 3245,
'Northern': 70384,
'Upper East': 8842,
'Upper West': 18476,
'Volta': 20570,
'Western': 23941}

Dictionaries elements are accessed by key. This means, when using dictionaries, instead of
entering an index in the square brackets, you enter the key and python returns its value. If the key
does not exist, python returns a KeyError.

In [132]: reg_cap['Greater Accra']

Out[132]: 'Accra'

In [133]: reg_size['Ashanti']

Out[133]: 24389

In [134]: reg_size['Komenda']

---------------------------------------------------------------------------

KeyError Traceback (most recent call last)

<ipython-input-134-c92c314160be> in <module>()

19
----> 1 reg_size['Komenda']

KeyError: 'Komenda'

Assuming the capital of Greater Accra is changed to Adenta, we can effect the change by typing

In [135]: reg_cap['Greater Accra'] = 'Adenta'

In [136]: reg_cap

Out[136]: {'Ashanti': 'Kumasi',


'Brong Ahafo': 'Sunyani',
'Central': 'Cape Coast',
'Eastern': 'Koforidua',
'Greater Accra': 'Adenta',
'Northern': 'Tamale',
'Upper East': 'Bolgatanga',
'Upper West': 'Wa',
'Volta': 'Ho',
'Western': 'Takoradi'}

If Greater Accra is not a key in the dictionary, this would add a new key-value pair to the dic-
tionary. So dictionary in this case is just a lookup table that looks up the value for a corresponding
key, such as a list of constants. We can iterate through the key-value pair using a for loop:

In [137]: for key in reg_cap:


print key + ', ' + reg_cap[key]

Upper East, Bolgatanga


Greater Accra, Adenta
Central, Cape Coast
Western, Takoradi
Northern, Tamale
Eastern, Koforidua
Brong Ahafo, Sunyani
Volta, Ho
Upper West, Wa
Ashanti, Kumasi

The keys and values of a dictionary can be extracted using the appropriately named methods
keys() and values() respectively. The results are returned as lists.

In [138]: print reg_cap.keys()

['Upper East', 'Greater Accra', 'Central', 'Western', 'Northern', 'Eastern', 'Brong

20
In [139]: print reg_cap.values()

['Bolgatanga', 'Adenta', 'Cape Coast', 'Takoradi', 'Tamale', 'Koforidua', 'Sunyani'

The existence of a key can be checked using the has_key() method of the dictionary object.

In [140]: reg_cap.has_key('Accra')

Out[140]: False

In [141]: reg_cap.has_key('Greater Accra')

Out[141]: True

1.12.3 Mutability and immutability


The difference between tuples and lists is that the values of list elements can be changed but that
cannot be done for tuples. Example:

In [142]: seq

Out[142]: [0, 1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14]

In [143]: seq[0] = 1000

In [144]: seq

Out[144]: [1000, 1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14]

In this example, we changed the value of the first element of the list seq to 1000. If we try the
same for the tuple a, we have

In [145]: a

Out[145]: (1, 2, 3, 4, 5, 6, 7)

In [146]: a[0] = 1000

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-146-080be1ca53ac> in <module>()
----> 1 a[0] = 1000

TypeError: 'tuple' object does not support item assignment

21
As shown above, we get an error stating tuple do not support assignment. In python, objects
whose values cannot be changed once they are created, are said to be immutable. Examples are
tuples and strings. For strings, we also have

In [147]: mystr

Out[147]: 'some string'

In [148]: mystr[0] = 'x'

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-148-02bb48027c20> in <module>()
----> 1 mystr[0] = 'x'

TypeError: 'str' object does not support item assignment

Objects whose values can be changed after creation are said to be mutable.

1.13 List Comprehension


List comprehension provides an elegant and compact way of creating lists. Many authors say it is
used to create lists like mathematicians do in the area of sets. For example to create the set {x3 | x
is in the set {0 .. 9}} we can use the list comprehension

In [149]: a = [x**3 for x in range(10)]


a

Out[149]: [0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

For the set {2x | x ∈ {0 .. 15}}, we use

In [150]: a = [2**x for x in range(16)]


a

Out[150]: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32

For the set {(x,y)| x ∈ {0 .. 10} and y ∈ {10 .. 0} }, we have

In [151]: a = [(x,y) for x in range(11) for y in range(10,-1,-1)]


print a

[(0, 10), (0, 9), (0, 8), (0, 7), (0, 6), (0, 5), (0, 4), (0, 3), (0, 2), (0, 1), (

22
List comprehensions can include conditional statements. Let us modify a above so the list
comprehension only has the elements with x and y even

In [152]: a = [(x,y) for x in range(11) for y in range(10,-1,-1) if x%2==0 and y%2=


print a

[(0, 10), (0, 8), (0, 6), (0, 4), (0, 2), (0, 0), (2, 10), (2, 8), (2, 6), (2, 4),

1.14 Defining Functions


1.14.1 lambda functions
Python provides the keyword lambda which is used to create anonymous functions. These are
functions assigned to variables and they are generally short functions. In the case of lambda,
usually, one liners. The syntax of the lambda statement is:

varname = lambda variables: function_statement

Examples are:

In [153]: f = lambda x: x**2 + 2*x + 1


g = lambda x,y : x**y
sign = lambda x: 1 if x > 0 else -1 if x < 0 else 0

The expression used inside the sign function is an example or a conditional expression or a
ternary operator. This evaluates x and returns 1 if x > 0 if not it goes into the else expression where
we have another ternary operator that test whether X < 0. More complex functions are defined
using python’s def statement. Def has the following syntax:

def funcname(var_list):
statement_1
statement_2
...
statement_n
return statement

def is a block statement. This means the rules of working with python blocks apply. The def
statement is part of the block heading which ends in a colon. The body of the block is indented.
The return statement transfers control back to the calling function. The return statement can be just
the return statement or return followed by an object that is returned as the result of the function
evaluation. Example:

In [154]: def f(x):


y = x**2 + 2*x +1
return y

def g(x):
return x**2 + 2*x +1

23
def h(x,y):
return x**y

def sign(x):
if x > 0:
return 1
elif x < 0:
return -1
else:
return 0

def greet(name):
print 'hello ' + name
return

The function greet above just ends with a return with nothing following. In python, we say
the greet function returns None. This means the result of greet can still be assigned to a variable
just that the variable will contain None. The following illustrates this concept:

In [155]: c = greet('John')
print 'c = ',c
print 'type of c is = ', type(c)

hello John
c = None
type of c is = <type 'NoneType'>

1.15 Module imports and namespaces


In python a name is assigned to a variable, a function, etc. When a function is created, its name
is assigned to pythons default name pool. When a variable is created, the same thing happens.
Python also allows for the creation of aliases which also gets assigned to the default name pool.
Example

In [156]: s = sign
type(s)

Out[156]: function

In [157]: s(4)

Out[157]: 1

In the definition above, s = sign points the name s to the name sign which in this case is
our function sign. Anytime s is called, it just runs the code within the sign function. A module
in python, is a file that contains python code. Usually, it contains a set of functions and vari-
ables. Modules can be imported into the current working enviromnent to make its functions and
variables available to the user. Currently, we are only able to perform elementary mathematics
operations. If we want to find the sine of the 3, we will get an error:

24
In [158]: sin(3)

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-158-eb391534dd1e> in <module>()
----> 1 sin(3)

NameError: name 'sin' is not defined

The sin function is available in the math and cmath modules. Individual functions can be
imported into the default namespace using the syntax:

from module import function_list

In [159]: from math import sin, cos, sqrt


sin(3) + cos(3)

Out[159]: -0.8488724885405782

In [160]: sqrt(-1)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-160-e94865f03ce3> in <module>()
----> 1 sqrt(-1)

ValueError: math domain error

In [161]: from cmath import sin, cos, sqrt


sin(3) + cos(3)

Out[161]: (-0.8488724885405782-0j)

In [162]: sqrt(-1)

Out[162]: 1j

As can be seen above, the functions in the math module, work with floating point numbers
and return floating point numbers while the functions in cmath work with complex numbers. To
list the names available in the default namespace, the function to use is

25
In [163]: print dir()
['In', 'Out', '_', '_100', '_101', '_102', '_103', '_104', '_105', '_106', '_107',

In [164]: print dir(__builtin__)


['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BufferErr

As can be seen above, the name __builtin__ in the default in the default namespace above
is also a space with multiple names. That is another namespace. A module can be imported into
its own namespace using the statement
import modulename
import modulename as alias
The first statement import the module into the namespace modulename while the second state-
ment imports the module into the namespace alias.
In [165]: import math
import cmath as cm
In [166]: print dir(math)
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh'

In [167]: print dir(cm)


['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh'

It is possible to import all names from a mudule into the default namespace using the statement
from modulename import *
Apart from certain special exceptions (aka pylab import), this method is not considered good
programming style as new imports with the same name will over write existing names thereby
polluting the namespace.

2 The numpy package


Numpy is the backbone of python scientific computing. It provides two basic data types; array
and matrix for working with large data(functions, intervals, just data) and a set of functions for
working on these data types.
The matrix type is defined in numpy as a specialized 2D array that retains its 2D nature
through operations and has the default operations overloaded to act as matrix operations, for
example multiplication works as matrix multiplication and exponentiation works as matrixpow-
ers.
Numpy’s main object is the array object which defines a multidimensional collection of the
same type. Numpy arrays can be seen as homogeneous python lists with mathematical operations
defined to work on the individual elements within the lists. The following statement shows the
python style for imporing the numpy module

26
In [168]: import numpy as np

2.0.1 Type casting


In python, the following are the type casting functions; int, float, complex, str, tuple, list, dict.
They allow to change one variabletype to another. Below are some examples of their use

In [169]: int(3.5)

Out[169]: 3

In [170]: float(3)

Out[170]: 3.0

In [171]: complex(3)

Out[171]: (3+0j)

In [172]: str(3+4j)

Out[172]: '(3+4j)'

In [173]: list('I love python')

Out[173]: ['I', ' ', 'l', 'o', 'v', 'e', ' ', 'p', 'y', 't', 'h', 'o', 'n']

In [174]: tuple([1,2,3,4,5])

Out[174]: (1, 2, 3, 4, 5)

One can also create an array by casting a sequence type usually a string. Example

In [175]: np.array((1,2,3,4,5))

Out[175]: array([1, 2, 3, 4, 5])

In [176]: np.array([1,2,3,4,5])

Out[176]: array([1, 2, 3, 4, 5])

Numpy also has special array creating functions. These include arange, linspace, zeros, ones,
empty.
The arange function is similar to python’s range fuunction. The difference is that arange ac-
cepts floating point arguments. The format however is still the same arange(start,stop,step) and
the result is still the same as in it does not get to the endpoint.
linspace is similar to arange but instead of the distance between successive numbers, linspace
takes how many numbers to calculate and it determines the step size from that. The syn-
tax is linspace(start,stop,num). num defaults to 50 elements. In the case of linspace, the end-
point is included in the values returned. This behaviour can be changed using the syntax
linspace(start,stop,num,endpoint=False) instead.
zeros(n) creates an arrays of all zeros with length n while ones(n) creates an array of all ones
with length n. empty(n) creates an array of length n. The catch here is that the elements of the
array are uninitialized and so can be anything. arrays created using empty must be manually
populated by the user before values are read.

27
In [177]: np.arange(5.0)

Out[177]: array([ 0., 1., 2., 3., 4.])

In [178]: np.arange(0,4,.25)

Out[178]: array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ,


2.25, 2.5 , 2.75, 3. , 3.25, 3.5 , 3.75])

In [179]: np.linspace(0,4,9)

Out[179]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

In [180]: np.zeros(5)

Out[180]: array([ 0., 0., 0., 0., 0.])

In [181]: np.ones(5)

Out[181]: array([ 1., 1., 1., 1., 1.])

In [182]: np.empty(5)

Out[182]: array([ 0., 0., 0., 0., 0.])

Numpy array can be multidimensional and their dimensions can be changed after creation.
The shape property of the array returns the number of elements in each dimension of the array.
The total number of element in the array is the product of the elements in the shape tuple.

In [183]: c = np.linspace(0,1,21)
c.shape

Out[183]: (21,)

This is a vector of length 21. No row or column information.

In [184]: c.shape = 3,7

In [185]: c

Out[185]: array([[ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 ],


[ 0.35, 0.4 , 0.45, 0.5 , 0.55, 0.6 , 0.65],
[ 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ]])

Now c is a 2D array with 3 rows and 7 columns.

In [186]: c = c.reshape((7,3))
c

28
Out[186]: array([[ 0. , 0.05, 0.1 ],
[ 0.15, 0.2 , 0.25],
[ 0.3 , 0.35, 0.4 ],
[ 0.45, 0.5 , 0.55],
[ 0.6 , 0.65, 0.7 ],
[ 0.75, 0.8 , 0.85],
[ 0.9 , 0.95, 1. ]])

c now has 7 rows and 3 columns. It is important that when the shape of an array is changed,
the elements must all be accounted for. If the change of shape will result in a change to the number
of elements in the array, python raises an exception (error).

In [187]: c.shape = 3,3

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-187-4424930b2dbb> in <module>()
----> 1 c.shape = 3,3

ValueError: total size of new array must be unchanged

Python is however smart to fill in in certain information if required. If the shape in any one
direction is made -1, this is understood by python to mean calculate that one value. Example

In [188]: c.shape = 1, -1
c.shape

Out[188]: (1, 21)

The data type of the array can equally be changed after creation. The dtype property returns
the data type of the array

In [189]: c.dtype

Out[189]: dtype('float64')

In [190]: c

Out[190]: array([[ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])

In [191]: c.astype('complex')

29
Out[191]: array([[ 0.00+0.j, 0.05+0.j, 0.10+0.j, 0.15+0.j, 0.20+0.j, 0.25+0.j,
0.30+0.j, 0.35+0.j, 0.40+0.j, 0.45+0.j, 0.50+0.j, 0.55+0.j,
0.60+0.j, 0.65+0.j, 0.70+0.j, 0.75+0.j, 0.80+0.j, 0.85+0.j,
0.90+0.j, 0.95+0.j, 1.00+0.j]])

Note that in the code c.astype(‘complex’), the value of c itself is not unchanged unless the
result of the statement is reassigned back to c. The zeros, ones and empty functions can also be
used to create multidimensional arrays. To do that, the argument must be a tuple of integers. Each
number in the tuple is the number of elements in a particular dimension. Example

In [192]: np.zeros((2,4))

Out[192]: array([[ 0., 0., 0., 0.],


[ 0., 0., 0., 0.]])

Assigning one array value to another does not create a new array in memory as one would
think. Giving our existing array c, if we do

In [193]: b = c

One would expect that if we alter the first element in b, c will remain unchanged so we can
revert to it if we so wish but what happens can be seen below

In [194]: b[0,0] = 2
b

Out[194]: array([[ 2. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])

In [195]: c

Out[195]: array([[ 2. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])

Altering b also altered c because, for such data types that can potentially use a large amount
of memory, python maps the two variables to the same memory location on assignment. One can
however force python to make a copy in memory using the copy method of the array instance.

In [196]: d = b.copy()

In [197]: d[0,0] = 0
d

Out[197]: array([[ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])

In [198]: b

Out[198]: array([[ 2. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])

30
2.1 array indexing and slicing
Indexing and slicing as learnt in lists still work for arrays in one dimension. For multidimensional
arrays for instance

In [199]: c = np.arange(0,20).reshape((5,4))
c

Out[199]: array([[ 0, 1, 2, 3],


[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])

which is a 2D array, we can index in all dimensions. The indexing format is array-
name[dim0,dim1,. . . ,dimn]. So to access the element in row zero (vertical) and column 1 (hori-
zontal), we type

In [200]: c[0,1]

Out[200]: 1

We can equally use the slice notation for each dimension. To extract the subarray of c from row
3 downwards and from column 2 to the right, we can type

In [201]: c[3:,2:]

Out[201]: array([[14, 15],


[18, 19]])

Once numpy is imported, the user has access to a large number of functions for array opera-
tions. The following is an example

In [202]: x = np.linspace(-5,5,1001)
fx = x**3+3*x**2+3*x+1
fx2 = np.exp(-x**2)*np.sin(x)+np.cos(x)
fx[0]

Out[202]: -64.0

As can be seen, there is no need to iterate through each element to perform array operations.
This is called vectorization. numpy is smart enough to perform the operation on each element of
the array. So the first line creates the discrete interval x ∈ [−5, 5] with 1001 discrete points. The
second line applies the equation
x3 + 3x2 + 3x + 1
to each point in x and assigns the result to the variable fx. The third line applies the equation
2
e−x sin(x) + cos(x)

to x and assigns the result to fx2 and the last line returns the value of the first element of fx
It is also possible to create standard 2D arrays using some matrix type operations. Examples
follow:

31
In [203]: I = np.eye(4) #This creates the standard nxn identity matrix
I

Out[203]: array([[ 1., 0., 0., 0.],


[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])

In [204]: zero = np.zeros((4,3)) #To use zeros and ones for high dimensional matric
# the size must be a tuple
one = np.ones((5,5))
zero

Out[204]: array([[ 0., 0., 0.],


[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])

In numpy, arrays can be taken not just as the traditional matrix but as functions or intervals
and therefore can be used in operations as such. For example in the code

x = np.linspace(-5,5,1001)
fx = x**3+3*x**2+3*x+1

each element in x is a point in the discrete interval [-5,5] and each index in fx is the same index
in x with the equation applied. So x in this case is the interval or the independent variable and fx
is the numerical result f(x).
Working with large arrays is usually easier when it is possible to visualize results. The mat-
plotlib moodule/library makes that possible. It is however also possible to use python in ‘scientific
mode’ in ipython or jupyter. The module pylab contains the numpy and matplotlib libraries as a
full scientific package. Jupyter or in recent versions ipython also has the %pylab magic command
which essntially prepares the ipython frontend for plotting and imports the numpy and matplotlib
libraries into the interactive namespace. This is usually done ones and the syntax is

%pylab backend

where backend can be one of qt, gtk, wx, inline which determines the graphics API used by
matplotlib for plotting. inline is a special backend that shows all your plot within the notebook
or console and only works in the notebook or the qtconsole. Once pylab is started with a plotting
backend switching backends usually mean restarting the kernel.

In [205]: %pylab inline

Populating the interactive namespace from numpy and matplotlib

/opt/anaconda2/lib/python2.7/site-packages/IPython/core/magics/pylab.py:161: UserWa
`%matplotlib` prevents importing * from pylab and numpy
"\n`%matplotlib` prevents importing * from pylab and numpy"

32
The matplotlib module has an associated magic command

%matplotlib backend

which prepares the notebook to do plotting. Note that this does not import anything and
numpy and matplotlib must be imported separately before they can be used. Using this method
also, it must be notted that the magic command must be issued before matplotlib is imported.
Once the pylab magic command is issued the contents of the numpy and matplotlib modules
are loaded into the interactive namespace which means there is now no need to qualify function
from the numpy module with np to load from the numpy namespace. Everything is inside the
interactive namespace. We can now use functions like

In [206]: c = array([1,2,3,4])
c

Out[206]: array([1, 2, 3, 4])

Once pylab is loaded, our function fx can be plotted with x using the command

In [207]: plot(x,fx)

Out[207]: [<matplotlib.lines.Line2D at 0x7fd3cb4f6fd0>]

The plot function is useful for the visualization of 1D functions. The syntax is

plot(x,y)

33
where x is the x-axis variable and y is the y-axis variable. The number of elements in x must
be the same as that in y. A point in x and a corresponding point in y make up the (x,y) pair for a
point. In case the x variable is not available or is just the sequence 0,1,2,3,4,etc, the command

plot(y)

can be used. In this case, python automatically generates the x variables. Plots in matplotlib
can be decorated in many ways. These include: 1. Axis labels 2. Plot titles 3. Text within plots
4. Markers, colours and line styles, etc. In the examples that follow, we will applyy some of
matplotlib’s plot enhancement methods.
The color of lines can be controlled using the color keyword, an example is

In [208]: plot(x,fx,color='red')

Out[208]: [<matplotlib.lines.Line2D at 0x7fd3cb1ed910>]

The color value can take on string values like red, green, blue, black, white, yellow, cyan,
magenta, pink, brown, purple, etc. The color can equally be represented as short strings; examples
are 1. r - red 2. g - green 3. b - blue 4. k - black 5. w - white 6. c - cyan etc. So we can plot using a
black line by the command

In [209]: plot(x,fx,color='k')

Out[209]: [<matplotlib.lines.Line2D at 0x7fd3cb12e9d0>]

34
It is also possible to define color as an (r,g,b) where r, g, b must be in the floating point interval
[0,1] or as a hexadecimal set #RRGGBB where RR, GG, BB are hexadecimal numbers. Examples
are

In [210]: plot(x,fx,color=(.8,1.,.2))

Out[210]: [<matplotlib.lines.Line2D at 0x7fd3cb06d110>]

35
In [211]: plot(x,fx,color='#FF3EAA')

Out[211]: [<matplotlib.lines.Line2D at 0x7fd3caf9d690>]

Marker types can also be controlled within a plot. This is done using the marker keyword.
Marker values can include * - for a star, . - for a dot, o - for an o, p - for the pentagon, ˆ - triangle
pointing up, v - triangle pointing down, h,H - hexagon, d,D - diamond, < - triangle pointing left,
> - triangle pointing right, 1, 2, 3, 4, _ - inderscore, | - bar

In [212]: plot(x[::4],fx[::4], marker = 'o')

Out[212]: [<matplotlib.lines.Line2D at 0x7fd3caf50e10>]

36
The color of the marker can also be controlled using the markerfacecolor and the markeredge-
color keyword with the color values already discussed for color. Example

In [213]: plot(x[::4],fx[::4],marker = 'o', markerfacecolor = 'r', markeredgecolor

Out[213]: [<matplotlib.lines.Line2D at 0x7fd3cae8c290>]

37
The special value none can be used to specify no colour for that particular keyword. Example

In [214]: plot(x[::4],fx[::4],marker = 'o', markerfacecolor = 'none', markeredgecol

Out[214]: [<matplotlib.lines.Line2D at 0x7fd3cadc0fd0>]

The markersize keyword also makes it possible to change the size of the marker. The marker-
size keyword takes on numeric values.

In [215]: plot(x[::4],fx[::4],marker = 'o', markerfacecolor = 'none', markeredgecol

Out[215]: [<matplotlib.lines.Line2D at 0x7fd3cad006d0>]

38
In [216]: plot(x[::4],fx[::4],marker = 'o', markerfacecolor = 'none', markeredgecol

Out[216]: [<matplotlib.lines.Line2D at 0x7fd3cac2edd0>]

39
The linestyle keyword allows for the use of different line styles within a plot. It takes on values
such as - or solid - for a solid line, – or dashed - for a dashed line, : or dotted - for a dotted line, -.
or dashdot - for a dash dot or the special value ‘none’ for no line. Examples are

In [217]: plot(x,fx,linestyle = '--')

Out[217]: [<matplotlib.lines.Line2D at 0x7fd3cab730d0>]

In [218]: plot(x,fx,linestyle = ':')

Out[218]: [<matplotlib.lines.Line2D at 0x7fd3caaa6710>]

40
In [219]: plot(x,fx,linestyle = 'dotted')

Out[219]: [<matplotlib.lines.Line2D at 0x7fd3caa55d50>]

41
Controlling the width of the line can be done using the linewidth keyword which like the
markersize keyword takes on a numeric value.

In [220]: plot(x,fx,linewidth = 5)

Out[220]: [<matplotlib.lines.Line2D at 0x7fd3ca9163d0>]

Adding descriptive text to the axes can be done using the xlabel and ylabel functions. The title
function allows to add title description to the plot. Example

In [221]: plot(x,fx)
xlabel(r'$x$-axis')
ylabel(r'$y$-axis')
title(r'Plot of $x^3+3x^2+3x+1$')

Out[221]: <matplotlib.text.Text at 0x7fd3ca89c6d0>

42
The r at the beginning of the text designates the text as a raw text so python does not attempt
to interprete the content of the string. This is especially important when we are using latex code
containing backslashes within the code since the backslash has a special meaning for python, we
use the raw text so python does not interprete the string but send it directly to the function that
uses it. In our example also, the dollar signs ($) enclose latex code.
Matplotlib allows for the placement of text within a plot. The function text allows for just that.
In its simplest form, the syntax is

text(x,y,string)

where x and y give the point to place the text and the string is the text to place at that location.
The fontsize keyword takes a numeric value and allows for the change of fontsize in points. The
color keyword also allows to change the color of the text. The fontsize and color arguments also
apply to the title, xlabel and ylabel functions.

In [222]: plot(x,fx)
xlabel(r'$x$-axis')
ylabel(r'$y$-axis')
title(r'Plot of $x^3+3x^2+3x+1$')
text(-1,0,'root (-1,0)',color = 'red', rotation = 45)
grid('on')

43
The grid function turns the grid on and off. It is possible to include legends in charts. This is
achieved by adding labels to the individual plots and invoking the legend function.

In [223]: x = linspace(-2*pi,2*pi,1001)
fx1 = cos(x)
fx2 = sin(x)
fx3 = 0.5*(exp(-x**2)*sin(x) + cos(x))

In [224]: plot(x,fx1,'r--',label=r'$\cos(x)$')
plot(x,fx2,'b-',label=r'$\sin(x)$')
plot(x,fx3,'k:',label=r'$0.5\left(e^{-x^2}\sin(x)+cos(x)\right)$')
legend(loc = 'best')

Out[224]: <matplotlib.legend.Legend at 0x7fd3ca76add0>

44
The possible options to loc are center, upper and lower for vertical location and center, left and
right for horizontal position. So we can get options like ‘upper left’, ‘center right’, ‘center’, etc. We
can use the xlim and ylim to restrict or expand the canvas. The syntax for xlim is

xlim(x1,x2)

and for ylim, we have

ylim(y1,y2)

where x1 is the left endpoint and x2 is the right. y1 is the lower endpoint and y2 is the upper.
Example:

In [225]: plot(x,fx1,'r--',label=r'$\cos(x)$')
plot(x,fx2,'b-',label=r'$\sin(x)$')
plot(x,fx3,'k:',label=r'$0.5\left(e^{-x^2}\sin(x)+cos(x)\right)$')
xlim(-8,20)
ylim(-1,1)

legend(loc = 'upper right')

Out[225]: <matplotlib.legend.Legend at 0x7fd3ca6c6d90>

45
It is possible in matplotlib to draw multiple plots on the same canvas as in the previous ex-
ample. In the example, we used plot three times to draw all the plots they all appeared on the
same canvas. Matplotlib has a function hold that determines whether or not the canvas should be
wiped clean on subsequent plot command. When this happens, any new plot wipes exiting plots
before plotting new ones. Alternatively, new plots can just be plotted over old ones. The function
that does that is the hold function. It takes the booleans True or False as arguments. hold(True)
draws new plots while keeping the old ones. hold(False) wipes the canvas clean before drawing
new plots. In the next exapmle, we set hold to False and so if we redraw our previous plots, only
the last will be visible.

In [226]: hold(False)
plot(x,fx1,'r--',label=r'$\cos(x)$')
plot(x,fx2,'b-',label=r'$\sin(x)$')
plot(x,fx3,'k:',label=r'$0.5\left(e^{-x^2}\sin(x)+cos(x)\right)$')
xlim(-8,20)
ylim(-1,1)
legend(loc = 'upper right')

Out[226]: <matplotlib.legend.Legend at 0x7fd3ca62d390>

46
As can be seen in our previous plots, we use a format string for the color and linestyle. As
example

plot(x,fx3,'k:')

means plot x, fx3 with black dotted lines. The first value is for colour while the second is for
linestyle. To plot x, fx3 with black dotted lines and dashed markers, we have

In [227]: plot(x,fx3,'k:_')

Out[227]: [<matplotlib.lines.Line2D at 0x7fd3ca58bc90>]

47
We can also plot multiple plots by putting them all in the same plot function as in

In [228]: plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:')

Out[228]: [<matplotlib.lines.Line2D at 0x7fd3ca573490>,


<matplotlib.lines.Line2D at 0x7fd3ca41bad0>,
<matplotlib.lines.Line2D at 0x7fd3ca42a1d0>]

48
As can be seen, matplotlib gives text output on what is is doing. These outputs can be sur-
pressed by assigning the plot command to a variable or ending the command with a semicolon.

In [229]: plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');

In [230]: c = plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:')

49
There is also the subplot command that allows to draw multiple plots in their own axes. Take
the example below

In [231]: subplot(2,2,1)
plot(x,fx1);
subplot(2,2,2)
plot(x,fx2);
subplot(2,2,3)
plot(x,fx3);
subplot(2,2,4)
plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');

50
In the above code, subplot(a,b,i) says to create a subplot with a rows and b columns, i is the
plot we are working on and 1 ≤ i ≤ ab. It is also possible to plot each curve in its own canvas.
That is achieved by using the figure function. The figure function creates a new figure windows
and subsequent plots are plotted within that window.

In [232]: plot(x,fx3);
figure()
plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');

51
It is possible to assign calls to figure and axes to variables and work with those variables.
Example

52
In [233]: fig = figure()
ax = fig.add_subplot(111)
plot(x,fx3)

Out[233]: [<matplotlib.lines.Line2D at 0x7fd3ca11df50>]

Alternatively, after plotting, it is possible to grab the current axes using the function gca and
use that to set some properties of the plot. Example, to move the spines to the zero position in
both axes, we can use the folloowing lines of code

In [234]: plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');
ax = gca()
ax.spines['bottom'].set_position(('data',0)) #reposition bottom spine to
ax.spines['left'].set_position(('data',0)) #reposition left spine to the
ax.spines['top'].set_color('none') #make top spine transparent
ax.spines['right'].set_color('none') #make right spine transparent
ax.xaxis.set_ticks_position('bottom') #remove the tick marks from the top
ax.yaxis.set_ticks_position('left') #remove the tick marks from the right

53
In the code above the set_position takes a tuple (position_type, amount) as argument with the
first element being the position type which can take the values outwards which moves the spine
outward by amount if it is positive or inwards otherwise. It is also possible to position the spine
in relation to the axes in which case the position_type is axes and amount will be between 0 and
1 with 0.5 being the middle of the axes. The position_type data positions the spine in relation to
the data. We also set_color to none to make the spine transparent. When the plot is all done, the
figure can be saved for later use using the savefig function with the syntax

savefig(filename)

Matplotlib is capable of many different kinds of plots, example scatter plots a scatter plot, bar
charts, histograms, pie charts, etc.

In [235]: scatter(range(100),rand(100))

Out[235]: <matplotlib.collections.PathCollection at 0x7fd3c9d43290>

54
Three dimensional plots can be done in matplotlib by importing the Axes3D module from the
mpl_toolkits as

In [236]: from mpl_toolkits.mplot3d import Axes3D

Axes3D provides access to a number of 3D plotting function as can be seen by doing

print dir(Axes3D)

Axes3D however insists functions are called not from the module but from an axes instance.
Calling a function directly from Axes3D results in the following error

In [237]: Axes3D.plot(x,fx1,fx2)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-237-7c821a39ff50> in <module>()
----> 1 Axes3D.plot(x,fx1,fx2)

TypeError: unbound method plot() must be called with Axes3D instance as fir

To use the functions, we need to define an Axes3D instance and then use it. As in

55
In [238]: ax = subplot(111, projection = '3d') #the projection = '3d' is all we nee
ax.plot(x,fx1,fx2)
ax.set_xlabel(r'$x$-axis')
ax.set_ylabel(r'$y$-axis')
ax.set_zlabel(r'$z$-axis')

Out[238]: <matplotlib.text.Text at 0x7fd3c9c73a50>

Again, it is important to set the labels using the axes instance because matplotlib as has been
imported through pylab has no zlabel function. Using that without the label will produce an error.
The Axes3D module also has functions for plotting surfaces and other 3D plots. The following
code plots the surface
z = y 2 − x2 .

In [239]: x = linspace(-2,2,101)
y = linspace(-2,2,101)
xx,yy = meshgrid(x,y)
z = yy**2 - xx**2
ax = subplot(111,projection = '3d')
ax.plot_surface(xx,yy,z)

Out[239]: <mpl_toolkits.mplot3d.art3d.Poly3DCollection at 0x7fd3c9c78790>

56
To decompose the code above, x, y are created as linspaces to define their interval of interest.
After that, the two are combined to create a meshgrid. The meshgrid function is used to create
the coordinate vectors of an n-dimensional grid. Example given that we want to create the 2
dimensional grid x, y ∈ [1, 4] × [5, 8] we can have

In [240]: x = linspace(1,4,4)
y = linspace(5,8,4)
xx,yy = meshgrid(x,y)
print 'xx is', xx
print 'yy is', yy

xx is [[ 1. 2. 3. 4.]
[ 1. 2. 3. 4.]
[ 1. 2. 3. 4.]
[ 1. 2. 3. 4.]]
yy is [[ 5. 5. 5. 5.]
[ 6. 6. 6. 6.]
[ 7. 7. 7. 7.]
[ 8. 8. 8. 8.]]

The array xx has the values of x repeated in every row while the array yy has the values of y
repeated for every column. The (x,y) points are chosen by picking corresponding elements from
xx and yy. The array xx has the values in the interval [1,4] while the array yy has the values in the
interval [5,8]. The line

z = yy**2 - xx**2

57
then performs the calculation y 2 − x2 on each point of xx and yy to create z the same size as xx
and yy. The last line

ax.plot_surface(xx,yy,z)

then plots xx, yy and z as a surface. We can also create a contour plot of our surface using

In [241]: x = linspace(-2,2,101)
y = linspace(-2,2,101)
xx,yy = meshgrid(x,y)
z = yy**2 - xx**2
ax = subplot(111)
ax.contour(xx,yy,z)

Out[241]: <matplotlib.contour.QuadContourSet instance at 0x7fd3c9dfc680>

It is also possible to view a surface as an image using the imshow function as shown below.

In [242]: imshow(z)

Out[242]: <matplotlib.image.AxesImage at 0x7fd3c9d56310>

58
In this case matplotlib takes the surface and maps values to a color using a colormap. Mat-
plotlib’s default colormap is jet. This maps the lower values to blue and the higher ones to red
and interpolates between that. The colors in the colormap and the values they represent can be
added to the plot by using the colorbar function as follows

In [243]: imshow(z)
colorbar()

Out[243]: <matplotlib.colorbar.Colorbar instance at 0x7fd3c9efea28>

59
Pylab also has the capability to work with images. Pylab has the imread function to read an
image. Example

In [244]: c = imread('cocoa.tif')
c

Out[244]: array([[ 91, 91, 91, ..., 107, 107, 105],


[ 91, 91, 91, ..., 106, 106, 104],
[ 91, 91, 91, ..., 103, 103, 101],
...,
[ 96, 91, 91, ..., 122, 121, 121],
[ 97, 94, 91, ..., 121, 120, 119],
[ 98, 96, 91, ..., 120, 120, 117]], dtype=uint8)

Here c is an array of numbers. Each point in the array is a pixel, short for picture element.
The value at each pixel location is the colour intensity or value. The array is of type uint8 which
means the minimum value it can have is 0 and the maximum is 255. We can check the minimum
and maximum values using the min and max methods of the array object

In [245]: c.min()

Out[245]: 91

In [246]: c.max()

Out[246]: 138

60
In [247]: imshow(c)
gray()

The function gray() displays the image in grayscale where black is represented by the lowest
number and white the highest number

In [248]: imshow(c,vmin = 0, vmax = 255)

Out[248]: <matplotlib.image.AxesImage at 0x7fd3ca152bd0>

61
When we try to visualize the array, imshow finds the maximum and minimum values within
the image and scales the picture to those values. We can use the vmin and vmax keywords to force
the minimum and maaximum values that imshow should use. The two images above show the
result difference. The image can be changed to a float image using the following code

In [249]: c = c.astype(float32)
c = c/255.
c

Out[249]: array([[ 0.35686275, 0.35686275, 0.35686275, ..., 0.41960785,


0.41960785, 0.41176471],
[ 0.35686275, 0.35686275, 0.35686275, ..., 0.41568628,
0.41568628, 0.40784314],
[ 0.35686275, 0.35686275, 0.35686275, ..., 0.40392157,
0.40392157, 0.39607844],
...,
[ 0.3764706 , 0.35686275, 0.35686275, ..., 0.47843137,
0.47450981, 0.47450981],
[ 0.38039216, 0.36862746, 0.35686275, ..., 0.47450981,
0.47058824, 0.46666667],
[ 0.38431373, 0.3764706 , 0.35686275, ..., 0.47058824,
0.47058824, 0.45882353]], dtype=float32)

In [250]: c.max()

Out[250]: 0.5411765

62
In [251]: c.min()

Out[251]: 0.35686275

We can for instance perform intensity scaling on the image c. We know the minimum and
maximum values in c. Using simple algebra, we know that we can create a line that maps the
minimum and maximum values in c to 0 and 1 repectively using
1
d= (c − cmin ).
cmax − cmin
The new array d, has its minimum alue to be 0 and maximum to be 1 where c has its minimum
and maximum and the intermediate values are also scaled appropriately. We then compare the
results.

In [252]: d = 1/(c.max()-c.min())*(c - c.min())

In [253]: d.min()

Out[253]: 0.0

In [254]: d.max()

Out[254]: 1.0

In [255]: imshow(c, vmin = 0, vmax = 1)

Out[255]: <matplotlib.image.AxesImage at 0x7fd3ca1521d0>

63
In [256]: imshow(d, vmin = 0, vmax = 1)

Out[256]: <matplotlib.image.AxesImage at 0x7fd3ca04fe10>

Since we know that d > c at all points, we can equally see the difference between the two
images by performing to see how much c has been enhanced. The result show a non-constant
enhancement.

In [257]: imshow(d-c, vmin=0, vmax = 1)

Out[257]: <matplotlib.image.AxesImage at 0x7fd3c8e8ca50>

64
We want to load the image house.png which is the image of a building and find the edges by
taking the square root of the laplacian q
fx2 + fy2

In [258]: d = imread('house.png')
d = d[:,:,0]
imshow(d)

Out[258]: <matplotlib.image.AxesImage at 0x7fd3ca5f2c10>

65
In [ ]:

In [259]: e = sqrt((roll(d,-1,0) - d)**2 + (roll(d,-1,1) - d)**2)

In [260]: imshow(1-e)

Out[260]: <matplotlib.image.AxesImage at 0x7fd3ca06ead0>

66
2.2 IO in numpy
Python itself allows for fileIO using its open function. Python can open a file as text and binary
and it can open files in three modes ‘r’ for read, ‘w’ for write and ‘a’ for append. So for instance,
in python, one can open a file using the snippet

In [261]: f = open('test','w')
f.write('Writing from python')
f.close()

The open function opens the file test for writing. It should be noted that in opening a file for
write, if the file already exists, it will be overwritten. If the file does not exist however, it will be
created. If a file is open for reading however and it does not exist, the operation raises an error. The
variable f is a file object pointing to the open file test on disk. The write of the file object dumps
a string to a file. If there are multiple lines to write, the file object also provides the writelines
method to write a sequence to the file. For a file opened for reading, there are the read and the
readline. The former reads the entire file content as a string while the latter reads a single line in
the file on invocation. There is also the readlines method that reads the entire file into a list with
each index containing a line of the file. After working with the file, you close the file to release its
resources and flush remaining data.
For working with arrays, numpy provides its own fileIO mechanisms. To save an array for
later use, you can use the numpy save function. This saves the array as an npy extension readable
by numpy. For example, for the array c,

In [262]: c

Out[262]: array([[ 0.35686275, 0.35686275, 0.35686275, ..., 0.41960785,


0.41960785, 0.41176471],
[ 0.35686275, 0.35686275, 0.35686275, ..., 0.41568628,
0.41568628, 0.40784314],
[ 0.35686275, 0.35686275, 0.35686275, ..., 0.40392157,
0.40392157, 0.39607844],
...,
[ 0.3764706 , 0.35686275, 0.35686275, ..., 0.47843137,
0.47450981, 0.47450981],
[ 0.38039216, 0.36862746, 0.35686275, ..., 0.47450981,
0.47058824, 0.46666667],
[ 0.38431373, 0.3764706 , 0.35686275, ..., 0.47058824,
0.47058824, 0.45882353]], dtype=float32)

In [263]: save('test.npy',c)

The file test.npy is created in the location specified, in this case, the current working folder and
the content of that file will be the array. When the array is later needed, it can be loaded using the
code

In [264]: c = load('test.npy')

67
If there are multiple arrays to be saved, numpy also provides the savez function. The following
x = linspace(-2pi,2pi,1001) fx1 = cos(x) fx2 = sin(x) fx3 = 0.5*(exp(-x**2)*sin(x) + cos(x))

In [265]: savez('vars.npz',x,fx1,fx2,fx3)

In this case the file vars.npz is created by numpy and the content is the data from the arrays x,
fx1, fx2, fx3 each an npy file. If savez is used as in the example above, numpy just saves the vari-
ables as arr_0.npy, arr_1.npy, arr_2.npy, arr_3.npy. If we want to save the names of the variables
with the data , we save the variables as key=value pairs as

In [266]: savez('vars.npz',x=x,fx1=fx1,fx2=fx2,fx3=fx3)

To load the arrays back, we again use load as

In [267]: data = load('vars.npz')

In [268]: data.keys()

Out[268]: ['x', 'fx1', 'fx3', 'fx2']

In [269]: data.items()

Out[269]: [('x', array([-2. , -1.96, -1.92, -1.88, -1.84, -1.8 , -1.76, -1.72, -1.
-1.64, -1.6 , -1.56, -1.52, -1.48, -1.44, -1.4 , -1.36, -1.32,
-1.28, -1.24, -1.2 , -1.16, -1.12, -1.08, -1.04, -1. , -0.96,
-0.92, -0.88, -0.84, -0.8 , -0.76, -0.72, -0.68, -0.64, -0.6 ,
-0.56, -0.52, -0.48, -0.44, -0.4 , -0.36, -0.32, -0.28, -0.24,
-0.2 , -0.16, -0.12, -0.08, -0.04, 0. , 0.04, 0.08, 0.12,
0.16, 0.2 , 0.24, 0.28, 0.32, 0.36, 0.4 , 0.44, 0.48,
0.52, 0.56, 0.6 , 0.64, 0.68, 0.72, 0.76, 0.8 , 0.84,
0.88, 0.92, 0.96, 1. , 1.04, 1.08, 1.12, 1.16, 1.2 ,
1.24, 1.28, 1.32, 1.36, 1.4 , 1.44, 1.48, 1.52, 1.56,
1.6 , 1.64, 1.68, 1.72, 1.76, 1.8 , 1.84, 1.88, 1.92,
1.96, 2. ])),
('fx1', array([ 1. , 0.99992104, 0.99968419, ..., 0.99968419,
0.99992104, 1. ])),
('fx3', array([ 0.5 , 0.49996052, 0.49984209, ..., 0.49984209,
0.49996052, 0.5 ])),
('fx2', array([ 2.44929360e-16, 1.25660399e-02, 2.51300954e-02, ...
-2.51300954e-02, -1.25660399e-02, -2.44929360e-16]))]

In [270]: data['fx1']

Out[270]: array([ 1. , 0.99992104, 0.99968419, ..., 0.99968419,


0.99992104, 1. ])

For large/huge scientific datasets, python also has the ability to inteface with the
HD5F. HDF stands for Hierarchical Data Format and is meant to store large data.
More information on HDF5 can be found at https://www.hdfgroup.org/HDF5/ and
https://en.wikipedia.org/wiki/Hierarchical_Data_Format. Python provides the module h5py
for working with hdf5 data. The following lines of code writes our arrays x, fx1, fx2, fx3 into
an hdf5 file test.h5.

68
In [271]: import h5py

In [272]: hf = h5py.File('test.h5','w') #open the file for writing w = write

In [273]: hf.create_dataset(name = 'x', data = x)


hf.create_dataset(name = 'fx1', data = fx1)
hf.create_dataset(name = 'fx2', data = fx2)
hf.create_dataset(name = 'fx3', data = fx3)

Out[273]: <HDF5 dataset "fx3": shape (1001,), type "<f8">

In [274]: hf.close()

The following lines of code reads the data for array x back.

In [275]: hf = h5py.File('test.h5','r') #open the file test.h5 for reading

In [276]: hf.keys()

Out[276]: [u'fx1', u'fx2', u'fx3', u'x']

In [277]: x = hf['x'][:]

In [278]: hf.close()

2.3 Pandas
Pandas strives to be an easy to use data analysis module for python. It has the ability to handle
large data files and work with large data formats including excel data, data from databases by
combining the python database modules especially sqlalchemy and also csv or comma separated
files. Pandas can be imported into python using

In [279]: import pandas

In [280]: c = pandas.read_csv('iris.data')
c

Out[280]: 5.1 3.5 1.4 0.2 Iris-setosa


0 4.9 3.0 1.4 0.2 Iris-setosa
1 4.7 3.2 1.3 0.2 Iris-setosa
2 4.6 3.1 1.5 0.2 Iris-setosa
3 5.0 3.6 1.4 0.2 Iris-setosa
4 5.4 3.9 1.7 0.4 Iris-setosa
5 4.6 3.4 1.4 0.3 Iris-setosa
6 5.0 3.4 1.5 0.2 Iris-setosa
7 4.4 2.9 1.4 0.2 Iris-setosa
8 4.9 3.1 1.5 0.1 Iris-setosa
9 5.4 3.7 1.5 0.2 Iris-setosa
10 4.8 3.4 1.6 0.2 Iris-setosa
11 4.8 3.0 1.4 0.1 Iris-setosa

69
12 4.3 3.0 1.1 0.1 Iris-setosa
13 5.8 4.0 1.2 0.2 Iris-setosa
14 5.7 4.4 1.5 0.4 Iris-setosa
15 5.4 3.9 1.3 0.4 Iris-setosa
16 5.1 3.5 1.4 0.3 Iris-setosa
17 5.7 3.8 1.7 0.3 Iris-setosa
18 5.1 3.8 1.5 0.3 Iris-setosa
19 5.4 3.4 1.7 0.2 Iris-setosa
20 5.1 3.7 1.5 0.4 Iris-setosa
21 4.6 3.6 1.0 0.2 Iris-setosa
22 5.1 3.3 1.7 0.5 Iris-setosa
23 4.8 3.4 1.9 0.2 Iris-setosa
24 5.0 3.0 1.6 0.2 Iris-setosa
25 5.0 3.4 1.6 0.4 Iris-setosa
26 5.2 3.5 1.5 0.2 Iris-setosa
27 5.2 3.4 1.4 0.2 Iris-setosa
28 4.7 3.2 1.6 0.2 Iris-setosa
29 4.8 3.1 1.6 0.2 Iris-setosa
.. ... ... ... ... ...
119 6.9 3.2 5.7 2.3 Iris-virginica
120 5.6 2.8 4.9 2.0 Iris-virginica
121 7.7 2.8 6.7 2.0 Iris-virginica
122 6.3 2.7 4.9 1.8 Iris-virginica
123 6.7 3.3 5.7 2.1 Iris-virginica
124 7.2 3.2 6.0 1.8 Iris-virginica
125 6.2 2.8 4.8 1.8 Iris-virginica
126 6.1 3.0 4.9 1.8 Iris-virginica
127 6.4 2.8 5.6 2.1 Iris-virginica
128 7.2 3.0 5.8 1.6 Iris-virginica
129 7.4 2.8 6.1 1.9 Iris-virginica
130 7.9 3.8 6.4 2.0 Iris-virginica
131 6.4 2.8 5.6 2.2 Iris-virginica
132 6.3 2.8 5.1 1.5 Iris-virginica
133 6.1 2.6 5.6 1.4 Iris-virginica
134 7.7 3.0 6.1 2.3 Iris-virginica
135 6.3 3.4 5.6 2.4 Iris-virginica
136 6.4 3.1 5.5 1.8 Iris-virginica
137 6.0 3.0 4.8 1.8 Iris-virginica
138 6.9 3.1 5.4 2.1 Iris-virginica
139 6.7 3.1 5.6 2.4 Iris-virginica
140 6.9 3.1 5.1 2.3 Iris-virginica
141 5.8 2.7 5.1 1.9 Iris-virginica
142 6.8 3.2 5.9 2.3 Iris-virginica
143 6.7 3.3 5.7 2.5 Iris-virginica
144 6.7 3.0 5.2 2.3 Iris-virginica
145 6.3 2.5 5.0 1.9 Iris-virginica
146 6.5 3.0 5.2 2.0 Iris-virginica
147 6.2 3.4 5.4 2.3 Iris-virginica

70
148 5.9 3.0 5.1 1.8 Iris-virginica

[149 rows x 5 columns]

The data above, is a machine learning dataset with 150 rows and 5 columns. It can be found
at https://archive.ics.uci.edu/ml/datasets/Iris. Each row gives the properties of a class of iris
flower together with the class. There are 50 rows of data for the class Iris-setosa, 50 for the class
Iris-versicolor and 50 for Iris-virginica. The columns shows the properties of the flowers. The
columns contain the following data

column 1 - sepal length


column 2 - sepal width
column 3 - petal length
column 4 - petal width
column 5 - flower class

In the output c, it can be seen that the first row is bold. This is because pandas assumes the
first row of the data is the header row. We can alter that by telling pandas there is no header in the
data by rewriting the snippet as

In [281]: c = pandas.read_csv('iris.data',header=None)
c.head()

Out[281]: 0 1 2 3 4
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

The head method of the dataframe just lists the first fie rows in the data. The opposite is the
tail method. In setting header = None, pandas can either be provided with a list for the columns
or it an be forced to deduce set its own columns. In this case, because header names were not
provided, pandas sets the column headers as numbers. We can provide column names using the
names keyword. In that case, our read_csv code becomes

In [282]: c = pandas.read_csv('iris.data', header = None, names = ['slength','swidt


c.head()

Out[282]: slength swidth plength pwidth fclass


0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

The describe method of the dataframe gives descriptive statistics of the dataframe.

In [283]: c.describe()

71
Out[283]: slength swidth plength pwidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

The unique method returns the unique elements in a column. To return the unique flower
classes, we have

In [284]: c['fclass'].unique()

Out[284]: array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

c[‘fclass’] just returns the fclass column of the dataframe. Multiple columns can be queried
using a list of column names for example

In [285]: c[['slength','fclass']].head()

Out[285]: slength fclass


0 5.1 Iris-setosa
1 4.9 Iris-setosa
2 4.7 Iris-setosa
3 4.6 Iris-setosa
4 5.0 Iris-setosa

The correlation and covariance of the dataframe can be extracted as follows

In [286]: c.cov() #covariance

Out[286]: slength swidth plength pwidth


slength 0.685694 -0.039268 1.273682 0.516904
swidth -0.039268 0.188004 -0.321713 -0.117981
plength 1.273682 -0.321713 3.113179 1.296387
pwidth 0.516904 -0.117981 1.296387 0.582414

In [287]: c.corr() #correlation

Out[287]: slength swidth plength pwidth


slength 1.000000 -0.109369 0.871754 0.817954
swidth -0.109369 1.000000 -0.420516 -0.356544
plength 0.871754 -0.420516 1.000000 0.962757
pwidth 0.817954 -0.356544 0.962757 1.000000

The data in the dataframe can also be grouped by one or more columns.

In [288]: c_grp = c.groupby('fclass')

72
In [289]: c_grp.describe()

Out[289]: plength pwidth slength swidth


fclass
Iris-setosa count 50.000000 50.000000 50.000000 50.000000
mean 1.464000 0.244000 5.006000 3.418000
std 0.173511 0.107210 0.352490 0.381024
min 1.000000 0.100000 4.300000 2.300000
25% 1.400000 0.200000 4.800000 3.125000
50% 1.500000 0.200000 5.000000 3.400000
75% 1.575000 0.300000 5.200000 3.675000
max 1.900000 0.600000 5.800000 4.400000
Iris-versicolor count 50.000000 50.000000 50.000000 50.000000
mean 4.260000 1.326000 5.936000 2.770000
std 0.469911 0.197753 0.516171 0.313798
min 3.000000 1.000000 4.900000 2.000000
25% 4.000000 1.200000 5.600000 2.525000
50% 4.350000 1.300000 5.900000 2.800000
75% 4.600000 1.500000 6.300000 3.000000
max 5.100000 1.800000 7.000000 3.400000
Iris-virginica count 50.000000 50.000000 50.000000 50.000000
mean 5.552000 2.026000 6.588000 2.974000
std 0.551895 0.274650 0.635880 0.322497
min 4.500000 1.400000 4.900000 2.200000
25% 5.100000 1.800000 6.225000 2.800000
50% 5.550000 2.000000 6.500000 3.000000
75% 5.875000 2.300000 6.900000 3.175000
max 6.900000 2.500000 7.900000 3.800000

The column names of c can be extracted using the columns property. For example the following
code saves the columns of c as a list in the variable name c_cols. This can then be used to extract
columns

In [290]: c_cols = list(c.columns)

In [291]: c[c_cols[:-1]].head()

Out[291]: slength swidth plength pwidth


0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

It is possible to extract relevant data from dataframes where the data satisfies certain condi-
tions. For example to check if the fclass column is equal to Iris-setosa, we can use

In [292]: (c['fclass'] == 'Iris-setosa').head()

73
Out[292]: 0 True
1 True
2 True
3 True
4 True
Name: fclass, dtype: bool

In [293]: (c['fclass'] == 'Iris-setosa').tail()

Out[293]: 145 False


146 False
147 False
148 False
149 False
Name: fclass, dtype: bool

This returns True for the rows where the fclass column is equal to Iris-setosa, else it returns
False. This can be used as a mask to return columns from rows satisfying this condition.

In [294]: c[c['fclass'] == 'Iris-setosa']

Out[294]: slength swidth plength pwidth fclass


0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa
11 4.8 3.4 1.6 0.2 Iris-setosa
12 4.8 3.0 1.4 0.1 Iris-setosa
13 4.3 3.0 1.1 0.1 Iris-setosa
14 5.8 4.0 1.2 0.2 Iris-setosa
15 5.7 4.4 1.5 0.4 Iris-setosa
16 5.4 3.9 1.3 0.4 Iris-setosa
17 5.1 3.5 1.4 0.3 Iris-setosa
18 5.7 3.8 1.7 0.3 Iris-setosa
19 5.1 3.8 1.5 0.3 Iris-setosa
20 5.4 3.4 1.7 0.2 Iris-setosa
21 5.1 3.7 1.5 0.4 Iris-setosa
22 4.6 3.6 1.0 0.2 Iris-setosa
23 5.1 3.3 1.7 0.5 Iris-setosa
24 4.8 3.4 1.9 0.2 Iris-setosa
25 5.0 3.0 1.6 0.2 Iris-setosa
26 5.0 3.4 1.6 0.4 Iris-setosa

74
27 5.2 3.5 1.5 0.2 Iris-setosa
28 5.2 3.4 1.4 0.2 Iris-setosa
29 4.7 3.2 1.6 0.2 Iris-setosa
30 4.8 3.1 1.6 0.2 Iris-setosa
31 5.4 3.4 1.5 0.4 Iris-setosa
32 5.2 4.1 1.5 0.1 Iris-setosa
33 5.5 4.2 1.4 0.2 Iris-setosa
34 4.9 3.1 1.5 0.1 Iris-setosa
35 5.0 3.2 1.2 0.2 Iris-setosa
36 5.5 3.5 1.3 0.2 Iris-setosa
37 4.9 3.1 1.5 0.1 Iris-setosa
38 4.4 3.0 1.3 0.2 Iris-setosa
39 5.1 3.4 1.5 0.2 Iris-setosa
40 5.0 3.5 1.3 0.3 Iris-setosa
41 4.5 2.3 1.3 0.3 Iris-setosa
42 4.4 3.2 1.3 0.2 Iris-setosa
43 5.0 3.5 1.6 0.6 Iris-setosa
44 5.1 3.8 1.9 0.4 Iris-setosa
45 4.8 3.0 1.4 0.3 Iris-setosa
46 5.1 3.8 1.6 0.2 Iris-setosa
47 4.6 3.2 1.4 0.2 Iris-setosa
48 5.3 3.7 1.5 0.2 Iris-setosa
49 5.0 3.3 1.4 0.2 Iris-setosa

In [295]: c[c['fclass'] == 'Iris-setosa'][['slength','swidth']].head()

Out[295]: slength swidth


0 5.1 3.5
1 4.9 3.0
2 4.7 3.2
3 4.6 3.1
4 5.0 3.6

Multiple conditions can be combined using the python bitwise operator &

In [296]: c[(c['fclass'] == 'Iris-setosa') & (c['slength'] > 5) & (c['swidth'] > 4)

Out[296]: slength swidth plength pwidth fclass


15 5.7 4.4 1.5 0.4 Iris-setosa
32 5.2 4.1 1.5 0.1 Iris-setosa
33 5.5 4.2 1.4 0.2 Iris-setosa

In [297]: c[(c['fclass'].isin(['Iris-setosa','Iris-virginica'])) & (c['slength'] >

Out[297]: slength swidth plength pwidth fclass


15 5.7 4.4 1.5 0.4 Iris-setosa
32 5.2 4.1 1.5 0.1 Iris-setosa
33 5.5 4.2 1.4 0.2 Iris-setosa

In [298]: c[(c['fclass'].isin(['Iris-setosa','Iris-virginica'])) & (c['slength'] >

75
Out[298]: slength swidth plength pwidth fclass
119 6 2.2 5 1.5 Iris-virginica

It is also possible in pandas to access rows of dataframes by inde using the iloc method. For
example

In [299]: c.iloc[5:10]

Out[299]: slength swidth plength pwidth fclass


5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa

In [ ]:

76

You might also like