AIMS-Python Notes 2016
AIMS-Python Notes 2016
September 9, 2016
In [1]: students = 50
In [2]: std_in_cls = 30
In [3]: n = 10
In variable name ‘=’ is the assignment operator. It assigns the value on the right to the name
on the left. In the examples above, the value 50 is assigned to the variable students.
1
1.3 Numeric Types
1.3.1 Int and Long
Integers store numerical values without fractional parts. Integers therefore have no decimals. An
integer is defined by just assigning a number to a variable without any decimals trailing. Example:
In [4]: students = 50
In [5]: num = 0
In [6]: neg_num = -5
The precision of integers is limited to the number of bits of the operating system. If this pre-
cision is exceeded during calculations, python automatically converts integers into long integers
which are unlimited.
In [7]: num = 2
In [8]: num**32-1
Out[8]: 4294967295
In [9]: num**63
Out[9]: 9223372036854775808L
The L at the end of the last calculation indicates that the result of the calculation was more than
the integer data type could accommodate. The resulting value was converted to a long integer
type.
1.3.2 Floats
Floating point numbers are numbers used by the computer to represent real numbers. Since the
real numbers are infinite and the computer is a finite precision device, floats just represent a subset
of the real number system. Floats are defined by adding the decimal point. Examples
In [10]: num = .5
In [11]: num = 5.
2
In [14]: c = 12j
In [15]: c = 12 + 0j
Complex numbers posses certain properties. They have a real part and an imaginary part.
They also have a conjugate. Which is the number with the same real part but the sign of the
imaginary part changed. Python can calculate the conjugate of the complex number as a method
and also return the real and imaginary parts of the number. These are built into the complex
number as defined in python. These attributes can be accessed using the dot operator (.) as in
c.property or c.method(). Methods are invoked with brackets at the end while properties are not.
Essentially, the attributes are variables contained in the objects while the methods are functions
built into the object. Example usage of attributes and methods of the string class are as follows
In [16]: cmp_num.real
Out[16]: 5.0
In [17]: cmp_num.imag
Out[17]: 25.0
In [18]: cmp_num.conjugate()
Out[18]: (5-25j)
In [19]: cmp_num.real()
---------------------------------------------------------------------------
<ipython-input-19-e9872ef84718> in <module>()
----> 1 cmp_num.real()
In [20]: cmp_num.conjugate
The last problem is not so obvious because python does not spit out an error message. Instead,
it returns the function as the result. The function can be called to get the actual value.
In [22]: test()
Out[22]: (5-25j)
3
1.4 Strings
Stings can be made up of alphanumeric characters. In python, strings are defined by placing them
between matching quotes. Example
Strings have numerous methods attached to them. Object methods can be called using the dot
operator. Example
In [27]: mystr.capitalize()
In [28]: mystr.upper()
In [29]: mystr.count('s')
Out[29]: 2
In [30]: mystr.replace('string','text')
1.5 Lists
Lists store a group of variables so they can be accessed individually or collectively. The variables
do not need to be of the same type. List can be created empty or be populated during creation.
Example
Python also makes it possible to create a numeric sequence as a list using the range function. In
this case, we can create the sequence a + di, i = 0, ..., n. This is equivalent to a, a + d, a + 2d, . . . , a +
nd. This can be done using the range function as range(a,b,d) where b = a + nd + 1.
In [34]: seq
4
Out[34]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
In [36]: seq2
In [38]: seq3
Note that the range function expects all arguments (inputs to the funtion) to be integers. You
get an error when that restriction is not respected. Example
---------------------------------------------------------------------------
<ipython-input-39-98221f92311b> in <module>()
----> 1 seq4 = range(0.0,-24,-3)
The elements of lists and other iterables (sequences) can be accessed using their indexes. Se-
quence indexing in python begins from 0. So the first element is index 0, the second is 1, the third
is 2 and so on. To access the first element in seq3, we type
In [40]: seq3[0]
Out[40]: 0
Note that the square brackets used to create the sequence is the same one used to index it.
Python generally, uses square brackets for indexing. This is understandable since the bracket () is
reserved for function calls. The len function is used to return the length of a sequence. This means,
it is possible to query even a programatically created sequence for the number of elements it has.
In [41]: len(seq3)
Out[41]: 8
In [42]: mylist2
5
Out[42]: [0.0, 'a', (2+4j), 2000, 'learn python', [1, 2, 3, 4, 5, 6, 7]]
mylist2 has different kinds of elements including two strings and a list. To access the elements
sequences contained in other sequences like in this case, we use two indices; the first one points to
the location of the inner list and the second the location of the element in the inner list. Example
In [43]: mylist2[1][0]
Out[43]: 'a'
In [44]: mylist2[4][6]
Out[44]: 'p'
In [45]: mylist2[5][3]
Out[45]: 4
Python uses the negative index to return elements from the back of the list. In this sense, -1 is
for the last element, -2 for the element before the last and so on.
In [46]: mylist2[-1]
Out[46]: [1, 2, 3, 4, 5, 6, 7]
In [47]: mylist2[-2]
In [48]: seq
In [49]: seq[0:4]
Out[49]: [0, 1, 2, 3]
In [50]: seq[0:4:2]
6
Out[50]: [0, 2]
In [51]: seq[:4:2]
Out[51]: [0, 2]
In [52]: seq[4::2]
In [53]: seq[4:len(seq):2]
In [54]: seq[::2]
In [55]: seq[len(seq):4:-1]
Out[55]: [10, 9, 8, 7, 6, 5]
In [56]: seq[-1:4:-1]
Out[56]: [10, 9, 8, 7, 6, 5]
In [57]: seq[::]
In [58]: seq[::-1]
Out[58]: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [59]: seq.append(11)
In [60]: seq
To add multiple elements to the end of the list, the extend method is used. Extend allows to
add multiple elements to the list.
In [61]: seq.extend([12,13,14])
7
In [62]: seq
Out[62]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
In [63]: seq.append([15,16,17])
In [64]: seq
Out[64]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, [15, 16, 17]]
The pop method returns the last element from the list and removes it from the list.
In [65]: seq.pop()
Out[65]: [15, 16, 17]
In [66]: seq
Out[66]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
The index method of lists allows to query the location of an element. If the element exists, the
method returns its position, else an error is returned
In [67]: seq.index(6)
Out[67]: 6
In [68]: seq.index(15)
---------------------------------------------------------------------------
<ipython-input-68-416a6e38e5a1> in <module>()
----> 1 seq.index(15)
8
1.6 Arithmetic Operators
The basic arithmetic operators are available in python. These are
1. Addition - (+)
2. Subtraction - (-)
3. Multiplication - (*)
4. Division - (/)
5. Remainder - (%)
6. Exponentiation (power) - (**)
We also have the abs function for absolute value and the unary negation (-). Here are some
examples of their use
In [73]: x = 2; y = 3; z = 2+5j; t = 4.5
In [74]: x + y
Out[74]: 5
In [75]: y + t
Out[75]: 7.5
In [76]: y + z
Out[76]: (5+5j)
In [77]: t + z
Out[77]: (6.5+5j)
In [78]: t*1j+z
Out[78]: (2+9.5j)
In [79]: y/x
Out[79]: 1
In [80]: y/t
Out[80]: 0.6666666666666666
In [81]: t**y
Out[81]: 91.125
In [82]: y%x
Out[82]: 1
In [83]: cmp_num
Out[83]: (5+25j)
In [84]: cmp_num*cmp_num.conjugate()
Out[84]: (650+0j)
In [85]: (cmp_num*cmp_num.conjugate()).real
Out[85]: 650.0
9
1.6.1 Operator Precedence
Operator precedence refers to the order in which python performs operations. For the arithmetic
operators, the precedence is as follow (from high to low precedence) 1. () 2. ** 3. Unary -,+ 4. *, /,
//, % 5. +, -
The operator // is also a division operator. In the strictest sense, it is the integer division
operator. It does floor division. This means it returns the quotient only, no matter what the data
type. Its opposite is the remainder operator (%). Performing the calculation below
In [86]: 2 + 3 * 24 / 6. - 4 % 2**2
Out[86]: 14.0
will first calculate 2 ** 2 which is 4. Then 3*24/6 which is 12. Then 4%4 which is 0. So we get
2+12-0=14. If we want to alter the natural order, then brackets must be introduced. An example is
Out[87]: 400.0
The arithmetic sequence + and * also apply to sequences. In such use, the addition opera-
tor is a joining/concatenation operator while the multiplication operator is a repetition operator.
Example, given the lists
In [89]: a + b
In [90]: a * 2
In [92]: a + b
Out[92]: 'WelcomePython'
10
1.7 Python blocks
In python, a block refers to a group of statements that collectively perform a given task. Different
programming languages define statement blocks in different ways. Some use the curly braces {},
others have explicit end statements for them. Python recognizes blocks by levels of indentation.
This is a very important feature in python. One out of place indent can result in python spitting out
an error, or worse still give you wrong values. All statements under a block must have the same
indentation level. An ideal tab press in python is 4 spaces. ipython automatically sets indentations
when a block is detected. Indentations can be set using the tab key on the keyboard. Tab space
length can be configured in different text editors. A block definition in python is a follows:
block header: block statment 1 block statment 2 block statment 3
... block statment n other statements
In this definition, block header can be a function header, while header, for header, etc. ‘other
statements’ is not part of the block. It will be run after the block execution completes. ‘block
statement 1’ to ‘block statement n’ are all part of the block. If we had mistakenly typed
block header: block statment 1 block statment 2 block statment 3
... block statment n other statements
the result will be an error because the indentation of ‘block statement 2’ does not fit the general
indentation of the block. The general definition of a block therefore include the header, followed
by a colon and the statements within the body all indented. For a one line block, the block state-
ment can immediately follow the colon as in
block header: block statement
In [94]: mylist
In [95]: list_res = []
for i in mylist:
list_res.append(i**2)
print list_res
11
1.9 Boolean values and comparison operators
Booloean values evaluate to True or False. Python provides these two values for evaluating condi-
tional statements. Note that case sensitivity rules apply here. The first letter of the boolean values
are capitalized.
In [96]: a = True
In [97]: b = False
In [98]: c = true
---------------------------------------------------------------------------
<ipython-input-98-1f00ce3411f8> in <module>()
----> 1 c = true
When considering boolean values, python considers the following as false; False, an empty
string ”, an empty list [], zeros 0, 0.0, 0 + 0j. Nonzero values and nonempty sequences are consid-
ered true. Python uses comparison operators to evaluate truth values. The comparison operators
are: 1. Equal - == 2. Not Equal - != 3. Greater then - > 4. Greater than or equal to - >= 5. Less than
- < 6. Less than or equal to - <= 7. is
In addition, sequences also have the ‘in’ statement to query whether an object is an element
of the sequence. Python also provides the operators not, and, or for working with comparison
operators. Not negates the truth value of a statement. and and or combine two truth statements.
And evaluates to true only if both statements are true else it evaluates to false. Or evaluates to
false only if both statements are false, else it evaluates to true.
In [ ]:
In [100]: a == b
Out[100]: False
In [101]: a > b
Out[101]: True
In [102]: a >= b
Out[102]: True
12
In [103]: a <= b
Out[103]: False
In [104]: b <= c
Out[104]: True
In [105]: 'AIMS' in d
Out[105]: True
In [106]: e == d
Out[106]: True
In [107]: e is d
Out[107]: True
In [108]: a == c
Out[108]: True
In [109]: d == f
Out[109]: True
In [110]: d is f
Out[110]: False
In [111]: a == b or a > b
Out[111]: True
In [112]: d is not f
Out[112]: True
In [113]: d == f and d is f
Out[113]: False
13
1.10 The while loop
The while statement is another block statement. Unlike for loops however, while is not limited to
working with sequences. The while statment will continue running while a certain condition is
true. The format of the while stament is
while condition:
statement 1
statement 2
...
statement m
At any point when the condition changes, the loop terminates. If there is no provision for
condition to become false, we have what we call an infinite loop where the loop runs until it is
forced to terminate using either a keyboard interrupt or by the operating system. This is not an
acceptable use of the while loop. In coding the while loop therefore, it is important to provide a
means for the loop to terminate within your code. The following is and example:
In [114]: seq
Out[115]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
In [116]: i = 0
while i < len(seq):
seq_sq[i] = seq[i]**2
i = i + 1
seq_sq
Out[116]: [0, 1, 4, 9, 16, 49, 64, 81, 100, 121, 144, 169, 196]
In the code above, i is initialized to zero. Then the while loop checks if i is less than the length
of seq. If it is, we enter the loop body where the value of seq at index i is squared and assigned to
seq_sq at index i. To ensure, i becomes greater than or equal to len(seq) we increment i. The same
thing can be done the folowing way
Out[117]: [0, 1, 4, 9, 16, 49, 64, 81, 100, 121, 144, 169, 196]
In the codes i = i + 1 just says to increment the value of i by 1 and assign the value back to the
variable i. In the same way i = i - 1 says to decrease the value of i by 1 and assign the value back
to i. There are shorter ways of writing these statements:
14
1. i += 1 (i = i + 1) or i += n (for i = i + n)
2. i -= n (i = i - n)
3. i = n (i = i n)
4. i /= n (i = i / n)
5. i = n (i = i n)
6. i %= n (i = i % n)
1.11 if statements
The if statement is another block with the format:
if condition:
if_statement 1
if_statement 2
...
if_statement n
The whole block is executed only if condition evaluates to true. If not, the block is skipped.
The statement can be extended to perform another set of actions if condition is false, using the
format:
if condition:
if_statement 1
if_statement 2
...
if_statement n
else:
else_statement 1
else_statement 2
...
else_statement m
In this case, the else block is executed if condition is false. If can be used to evalaute multiple
conditions using elif which is the short form of else if. In that case, we have the format:
if condition_1:
if_statement 1
if_statement 2
...
if_statement n
elif condition_2:
elif1_statement 1
elif1_statement 2
...
elif1_statement p
elif condition_3:
elif2_statement 1
elif2_statement 2
15
...
elif2_statement s
...
elif condition_x:
elifx_statement 1
elifx_statement 2
...
elifx_statement t
else:
else_statement 1
else_statement 2
...
else_statement m
Note that in evaluating multiple conditions, the first condition that evaluates to true is used,
python does not go further to check if there is a better matching condition. It is up to the prgram-
mer to ensure that the best condition is caught first. In the first example, we output a pass or fail
based on a student mark being greater than or equal to 60.
In [118]: a = 75
if a >= 60:
print 'pass'
else:
print 'fail'
pass
We now make it a bit more challenging. We introduce the following grading scheme
Good Pass
16
In [120]: if a < 60:
print 'Fail'
elif a < 70:
print 'Pass'
elif a < 85:
print 'Good Pass'
else:
print 'Distinction'
Good Pass
In the first example, if the value provided is greater than 85, the grade is distinction, if not we
check to see if it is greater than 70. Since the first condition takes care of values greater than 80,
anything that passes to the second condition is naturally not greater than or equal to 85. The third
condition is evaluated only if the first two evaluate to false. This means there is no way a number
will fall to the third condition it is 70 or higher. If the conditions are ordered wrongly the results
can be anything. Take the following example:
Pass
Although a = 75, the student is graded wrongly, because the value of a satisfies both the first
and the second condition but python pick the first true evaluation it finds and that gives a pass. If
we rewrite this example using the and operator with the beginning and end range, we get:
Good Pass
This works since the range of values are well defined and and the operation succeeds no matter
the order. Python can also do complex range comparison using the comparison operators. The
above code in this case can be written as:
17
In [123]: if 60<=a<70:
print 'Pass'
elif 70<=a<85:
print 'Good Pass'
elif 85<=a<=100:
print 'Distinction'
else:
print 'Fail'
Good Pass
In [124]: a = (1,2,3,4,5,6,7)
In [125]: a[2]
Out[125]: 3
1.12.2 Dictionaries
A dictionary is a python structure that holds a key-value pair. A dictionary is created using the
curly braces {}. Each key-value pair is separated by a comma and a key is separated from its value
by a colon. An example of a dictionary is
In [127]: reg_cap
18
In [129]: reg_pop
In [131]: reg_size
Dictionaries elements are accessed by key. This means, when using dictionaries, instead of
entering an index in the square brackets, you enter the key and python returns its value. If the key
does not exist, python returns a KeyError.
Out[132]: 'Accra'
In [133]: reg_size['Ashanti']
Out[133]: 24389
In [134]: reg_size['Komenda']
---------------------------------------------------------------------------
<ipython-input-134-c92c314160be> in <module>()
19
----> 1 reg_size['Komenda']
KeyError: 'Komenda'
Assuming the capital of Greater Accra is changed to Adenta, we can effect the change by typing
In [136]: reg_cap
If Greater Accra is not a key in the dictionary, this would add a new key-value pair to the dic-
tionary. So dictionary in this case is just a lookup table that looks up the value for a corresponding
key, such as a list of constants. We can iterate through the key-value pair using a for loop:
The keys and values of a dictionary can be extracted using the appropriately named methods
keys() and values() respectively. The results are returned as lists.
20
In [139]: print reg_cap.values()
The existence of a key can be checked using the has_key() method of the dictionary object.
In [140]: reg_cap.has_key('Accra')
Out[140]: False
Out[141]: True
In [142]: seq
In [144]: seq
In this example, we changed the value of the first element of the list seq to 1000. If we try the
same for the tuple a, we have
In [145]: a
Out[145]: (1, 2, 3, 4, 5, 6, 7)
---------------------------------------------------------------------------
<ipython-input-146-080be1ca53ac> in <module>()
----> 1 a[0] = 1000
21
As shown above, we get an error stating tuple do not support assignment. In python, objects
whose values cannot be changed once they are created, are said to be immutable. Examples are
tuples and strings. For strings, we also have
In [147]: mystr
---------------------------------------------------------------------------
<ipython-input-148-02bb48027c20> in <module>()
----> 1 mystr[0] = 'x'
Objects whose values can be changed after creation are said to be mutable.
Out[150]: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32
[(0, 10), (0, 9), (0, 8), (0, 7), (0, 6), (0, 5), (0, 4), (0, 3), (0, 2), (0, 1), (
22
List comprehensions can include conditional statements. Let us modify a above so the list
comprehension only has the elements with x and y even
[(0, 10), (0, 8), (0, 6), (0, 4), (0, 2), (0, 0), (2, 10), (2, 8), (2, 6), (2, 4),
Examples are:
The expression used inside the sign function is an example or a conditional expression or a
ternary operator. This evaluates x and returns 1 if x > 0 if not it goes into the else expression where
we have another ternary operator that test whether X < 0. More complex functions are defined
using python’s def statement. Def has the following syntax:
def funcname(var_list):
statement_1
statement_2
...
statement_n
return statement
def is a block statement. This means the rules of working with python blocks apply. The def
statement is part of the block heading which ends in a colon. The body of the block is indented.
The return statement transfers control back to the calling function. The return statement can be just
the return statement or return followed by an object that is returned as the result of the function
evaluation. Example:
def g(x):
return x**2 + 2*x +1
23
def h(x,y):
return x**y
def sign(x):
if x > 0:
return 1
elif x < 0:
return -1
else:
return 0
def greet(name):
print 'hello ' + name
return
The function greet above just ends with a return with nothing following. In python, we say
the greet function returns None. This means the result of greet can still be assigned to a variable
just that the variable will contain None. The following illustrates this concept:
In [155]: c = greet('John')
print 'c = ',c
print 'type of c is = ', type(c)
hello John
c = None
type of c is = <type 'NoneType'>
In [156]: s = sign
type(s)
Out[156]: function
In [157]: s(4)
Out[157]: 1
In the definition above, s = sign points the name s to the name sign which in this case is
our function sign. Anytime s is called, it just runs the code within the sign function. A module
in python, is a file that contains python code. Usually, it contains a set of functions and vari-
ables. Modules can be imported into the current working enviromnent to make its functions and
variables available to the user. Currently, we are only able to perform elementary mathematics
operations. If we want to find the sine of the 3, we will get an error:
24
In [158]: sin(3)
---------------------------------------------------------------------------
<ipython-input-158-eb391534dd1e> in <module>()
----> 1 sin(3)
The sin function is available in the math and cmath modules. Individual functions can be
imported into the default namespace using the syntax:
Out[159]: -0.8488724885405782
In [160]: sqrt(-1)
---------------------------------------------------------------------------
<ipython-input-160-e94865f03ce3> in <module>()
----> 1 sqrt(-1)
Out[161]: (-0.8488724885405782-0j)
In [162]: sqrt(-1)
Out[162]: 1j
As can be seen above, the functions in the math module, work with floating point numbers
and return floating point numbers while the functions in cmath work with complex numbers. To
list the names available in the default namespace, the function to use is
25
In [163]: print dir()
['In', 'Out', '_', '_100', '_101', '_102', '_103', '_104', '_105', '_106', '_107',
As can be seen above, the name __builtin__ in the default in the default namespace above
is also a space with multiple names. That is another namespace. A module can be imported into
its own namespace using the statement
import modulename
import modulename as alias
The first statement import the module into the namespace modulename while the second state-
ment imports the module into the namespace alias.
In [165]: import math
import cmath as cm
In [166]: print dir(math)
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh'
It is possible to import all names from a mudule into the default namespace using the statement
from modulename import *
Apart from certain special exceptions (aka pylab import), this method is not considered good
programming style as new imports with the same name will over write existing names thereby
polluting the namespace.
26
In [168]: import numpy as np
In [169]: int(3.5)
Out[169]: 3
In [170]: float(3)
Out[170]: 3.0
In [171]: complex(3)
Out[171]: (3+0j)
In [172]: str(3+4j)
Out[172]: '(3+4j)'
Out[173]: ['I', ' ', 'l', 'o', 'v', 'e', ' ', 'p', 'y', 't', 'h', 'o', 'n']
In [174]: tuple([1,2,3,4,5])
Out[174]: (1, 2, 3, 4, 5)
One can also create an array by casting a sequence type usually a string. Example
In [175]: np.array((1,2,3,4,5))
In [176]: np.array([1,2,3,4,5])
Numpy also has special array creating functions. These include arange, linspace, zeros, ones,
empty.
The arange function is similar to python’s range fuunction. The difference is that arange ac-
cepts floating point arguments. The format however is still the same arange(start,stop,step) and
the result is still the same as in it does not get to the endpoint.
linspace is similar to arange but instead of the distance between successive numbers, linspace
takes how many numbers to calculate and it determines the step size from that. The syn-
tax is linspace(start,stop,num). num defaults to 50 elements. In the case of linspace, the end-
point is included in the values returned. This behaviour can be changed using the syntax
linspace(start,stop,num,endpoint=False) instead.
zeros(n) creates an arrays of all zeros with length n while ones(n) creates an array of all ones
with length n. empty(n) creates an array of length n. The catch here is that the elements of the
array are uninitialized and so can be anything. arrays created using empty must be manually
populated by the user before values are read.
27
In [177]: np.arange(5.0)
In [178]: np.arange(0,4,.25)
In [179]: np.linspace(0,4,9)
In [180]: np.zeros(5)
In [181]: np.ones(5)
In [182]: np.empty(5)
Numpy array can be multidimensional and their dimensions can be changed after creation.
The shape property of the array returns the number of elements in each dimension of the array.
The total number of element in the array is the product of the elements in the shape tuple.
In [183]: c = np.linspace(0,1,21)
c.shape
Out[183]: (21,)
In [185]: c
In [186]: c = c.reshape((7,3))
c
28
Out[186]: array([[ 0. , 0.05, 0.1 ],
[ 0.15, 0.2 , 0.25],
[ 0.3 , 0.35, 0.4 ],
[ 0.45, 0.5 , 0.55],
[ 0.6 , 0.65, 0.7 ],
[ 0.75, 0.8 , 0.85],
[ 0.9 , 0.95, 1. ]])
c now has 7 rows and 3 columns. It is important that when the shape of an array is changed,
the elements must all be accounted for. If the change of shape will result in a change to the number
of elements in the array, python raises an exception (error).
---------------------------------------------------------------------------
<ipython-input-187-4424930b2dbb> in <module>()
----> 1 c.shape = 3,3
Python is however smart to fill in in certain information if required. If the shape in any one
direction is made -1, this is understood by python to mean calculate that one value. Example
In [188]: c.shape = 1, -1
c.shape
The data type of the array can equally be changed after creation. The dtype property returns
the data type of the array
In [189]: c.dtype
Out[189]: dtype('float64')
In [190]: c
Out[190]: array([[ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])
In [191]: c.astype('complex')
29
Out[191]: array([[ 0.00+0.j, 0.05+0.j, 0.10+0.j, 0.15+0.j, 0.20+0.j, 0.25+0.j,
0.30+0.j, 0.35+0.j, 0.40+0.j, 0.45+0.j, 0.50+0.j, 0.55+0.j,
0.60+0.j, 0.65+0.j, 0.70+0.j, 0.75+0.j, 0.80+0.j, 0.85+0.j,
0.90+0.j, 0.95+0.j, 1.00+0.j]])
Note that in the code c.astype(‘complex’), the value of c itself is not unchanged unless the
result of the statement is reassigned back to c. The zeros, ones and empty functions can also be
used to create multidimensional arrays. To do that, the argument must be a tuple of integers. Each
number in the tuple is the number of elements in a particular dimension. Example
In [192]: np.zeros((2,4))
Assigning one array value to another does not create a new array in memory as one would
think. Giving our existing array c, if we do
In [193]: b = c
One would expect that if we alter the first element in b, c will remain unchanged so we can
revert to it if we so wish but what happens can be seen below
In [194]: b[0,0] = 2
b
Out[194]: array([[ 2. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])
In [195]: c
Out[195]: array([[ 2. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])
Altering b also altered c because, for such data types that can potentially use a large amount
of memory, python maps the two variables to the same memory location on assignment. One can
however force python to make a copy in memory using the copy method of the array instance.
In [196]: d = b.copy()
In [197]: d[0,0] = 0
d
Out[197]: array([[ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])
In [198]: b
Out[198]: array([[ 2. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 ,
0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85,
0.9 , 0.95, 1. ]])
30
2.1 array indexing and slicing
Indexing and slicing as learnt in lists still work for arrays in one dimension. For multidimensional
arrays for instance
In [199]: c = np.arange(0,20).reshape((5,4))
c
which is a 2D array, we can index in all dimensions. The indexing format is array-
name[dim0,dim1,. . . ,dimn]. So to access the element in row zero (vertical) and column 1 (hori-
zontal), we type
In [200]: c[0,1]
Out[200]: 1
We can equally use the slice notation for each dimension. To extract the subarray of c from row
3 downwards and from column 2 to the right, we can type
In [201]: c[3:,2:]
Once numpy is imported, the user has access to a large number of functions for array opera-
tions. The following is an example
In [202]: x = np.linspace(-5,5,1001)
fx = x**3+3*x**2+3*x+1
fx2 = np.exp(-x**2)*np.sin(x)+np.cos(x)
fx[0]
Out[202]: -64.0
As can be seen, there is no need to iterate through each element to perform array operations.
This is called vectorization. numpy is smart enough to perform the operation on each element of
the array. So the first line creates the discrete interval x ∈ [−5, 5] with 1001 discrete points. The
second line applies the equation
x3 + 3x2 + 3x + 1
to each point in x and assigns the result to the variable fx. The third line applies the equation
2
e−x sin(x) + cos(x)
to x and assigns the result to fx2 and the last line returns the value of the first element of fx
It is also possible to create standard 2D arrays using some matrix type operations. Examples
follow:
31
In [203]: I = np.eye(4) #This creates the standard nxn identity matrix
I
In [204]: zero = np.zeros((4,3)) #To use zeros and ones for high dimensional matric
# the size must be a tuple
one = np.ones((5,5))
zero
In numpy, arrays can be taken not just as the traditional matrix but as functions or intervals
and therefore can be used in operations as such. For example in the code
x = np.linspace(-5,5,1001)
fx = x**3+3*x**2+3*x+1
each element in x is a point in the discrete interval [-5,5] and each index in fx is the same index
in x with the equation applied. So x in this case is the interval or the independent variable and fx
is the numerical result f(x).
Working with large arrays is usually easier when it is possible to visualize results. The mat-
plotlib moodule/library makes that possible. It is however also possible to use python in ‘scientific
mode’ in ipython or jupyter. The module pylab contains the numpy and matplotlib libraries as a
full scientific package. Jupyter or in recent versions ipython also has the %pylab magic command
which essntially prepares the ipython frontend for plotting and imports the numpy and matplotlib
libraries into the interactive namespace. This is usually done ones and the syntax is
%pylab backend
where backend can be one of qt, gtk, wx, inline which determines the graphics API used by
matplotlib for plotting. inline is a special backend that shows all your plot within the notebook
or console and only works in the notebook or the qtconsole. Once pylab is started with a plotting
backend switching backends usually mean restarting the kernel.
/opt/anaconda2/lib/python2.7/site-packages/IPython/core/magics/pylab.py:161: UserWa
`%matplotlib` prevents importing * from pylab and numpy
"\n`%matplotlib` prevents importing * from pylab and numpy"
32
The matplotlib module has an associated magic command
%matplotlib backend
which prepares the notebook to do plotting. Note that this does not import anything and
numpy and matplotlib must be imported separately before they can be used. Using this method
also, it must be notted that the magic command must be issued before matplotlib is imported.
Once the pylab magic command is issued the contents of the numpy and matplotlib modules
are loaded into the interactive namespace which means there is now no need to qualify function
from the numpy module with np to load from the numpy namespace. Everything is inside the
interactive namespace. We can now use functions like
In [206]: c = array([1,2,3,4])
c
Once pylab is loaded, our function fx can be plotted with x using the command
In [207]: plot(x,fx)
The plot function is useful for the visualization of 1D functions. The syntax is
plot(x,y)
33
where x is the x-axis variable and y is the y-axis variable. The number of elements in x must
be the same as that in y. A point in x and a corresponding point in y make up the (x,y) pair for a
point. In case the x variable is not available or is just the sequence 0,1,2,3,4,etc, the command
plot(y)
can be used. In this case, python automatically generates the x variables. Plots in matplotlib
can be decorated in many ways. These include: 1. Axis labels 2. Plot titles 3. Text within plots
4. Markers, colours and line styles, etc. In the examples that follow, we will applyy some of
matplotlib’s plot enhancement methods.
The color of lines can be controlled using the color keyword, an example is
In [208]: plot(x,fx,color='red')
The color value can take on string values like red, green, blue, black, white, yellow, cyan,
magenta, pink, brown, purple, etc. The color can equally be represented as short strings; examples
are 1. r - red 2. g - green 3. b - blue 4. k - black 5. w - white 6. c - cyan etc. So we can plot using a
black line by the command
In [209]: plot(x,fx,color='k')
34
It is also possible to define color as an (r,g,b) where r, g, b must be in the floating point interval
[0,1] or as a hexadecimal set #RRGGBB where RR, GG, BB are hexadecimal numbers. Examples
are
In [210]: plot(x,fx,color=(.8,1.,.2))
35
In [211]: plot(x,fx,color='#FF3EAA')
Marker types can also be controlled within a plot. This is done using the marker keyword.
Marker values can include * - for a star, . - for a dot, o - for an o, p - for the pentagon, ˆ - triangle
pointing up, v - triangle pointing down, h,H - hexagon, d,D - diamond, < - triangle pointing left,
> - triangle pointing right, 1, 2, 3, 4, _ - inderscore, | - bar
36
The color of the marker can also be controlled using the markerfacecolor and the markeredge-
color keyword with the color values already discussed for color. Example
37
The special value none can be used to specify no colour for that particular keyword. Example
The markersize keyword also makes it possible to change the size of the marker. The marker-
size keyword takes on numeric values.
38
In [216]: plot(x[::4],fx[::4],marker = 'o', markerfacecolor = 'none', markeredgecol
39
The linestyle keyword allows for the use of different line styles within a plot. It takes on values
such as - or solid - for a solid line, – or dashed - for a dashed line, : or dotted - for a dotted line, -.
or dashdot - for a dash dot or the special value ‘none’ for no line. Examples are
40
In [219]: plot(x,fx,linestyle = 'dotted')
41
Controlling the width of the line can be done using the linewidth keyword which like the
markersize keyword takes on a numeric value.
In [220]: plot(x,fx,linewidth = 5)
Adding descriptive text to the axes can be done using the xlabel and ylabel functions. The title
function allows to add title description to the plot. Example
In [221]: plot(x,fx)
xlabel(r'$x$-axis')
ylabel(r'$y$-axis')
title(r'Plot of $x^3+3x^2+3x+1$')
42
The r at the beginning of the text designates the text as a raw text so python does not attempt
to interprete the content of the string. This is especially important when we are using latex code
containing backslashes within the code since the backslash has a special meaning for python, we
use the raw text so python does not interprete the string but send it directly to the function that
uses it. In our example also, the dollar signs ($) enclose latex code.
Matplotlib allows for the placement of text within a plot. The function text allows for just that.
In its simplest form, the syntax is
text(x,y,string)
where x and y give the point to place the text and the string is the text to place at that location.
The fontsize keyword takes a numeric value and allows for the change of fontsize in points. The
color keyword also allows to change the color of the text. The fontsize and color arguments also
apply to the title, xlabel and ylabel functions.
In [222]: plot(x,fx)
xlabel(r'$x$-axis')
ylabel(r'$y$-axis')
title(r'Plot of $x^3+3x^2+3x+1$')
text(-1,0,'root (-1,0)',color = 'red', rotation = 45)
grid('on')
43
The grid function turns the grid on and off. It is possible to include legends in charts. This is
achieved by adding labels to the individual plots and invoking the legend function.
In [223]: x = linspace(-2*pi,2*pi,1001)
fx1 = cos(x)
fx2 = sin(x)
fx3 = 0.5*(exp(-x**2)*sin(x) + cos(x))
In [224]: plot(x,fx1,'r--',label=r'$\cos(x)$')
plot(x,fx2,'b-',label=r'$\sin(x)$')
plot(x,fx3,'k:',label=r'$0.5\left(e^{-x^2}\sin(x)+cos(x)\right)$')
legend(loc = 'best')
44
The possible options to loc are center, upper and lower for vertical location and center, left and
right for horizontal position. So we can get options like ‘upper left’, ‘center right’, ‘center’, etc. We
can use the xlim and ylim to restrict or expand the canvas. The syntax for xlim is
xlim(x1,x2)
ylim(y1,y2)
where x1 is the left endpoint and x2 is the right. y1 is the lower endpoint and y2 is the upper.
Example:
In [225]: plot(x,fx1,'r--',label=r'$\cos(x)$')
plot(x,fx2,'b-',label=r'$\sin(x)$')
plot(x,fx3,'k:',label=r'$0.5\left(e^{-x^2}\sin(x)+cos(x)\right)$')
xlim(-8,20)
ylim(-1,1)
45
It is possible in matplotlib to draw multiple plots on the same canvas as in the previous ex-
ample. In the example, we used plot three times to draw all the plots they all appeared on the
same canvas. Matplotlib has a function hold that determines whether or not the canvas should be
wiped clean on subsequent plot command. When this happens, any new plot wipes exiting plots
before plotting new ones. Alternatively, new plots can just be plotted over old ones. The function
that does that is the hold function. It takes the booleans True or False as arguments. hold(True)
draws new plots while keeping the old ones. hold(False) wipes the canvas clean before drawing
new plots. In the next exapmle, we set hold to False and so if we redraw our previous plots, only
the last will be visible.
In [226]: hold(False)
plot(x,fx1,'r--',label=r'$\cos(x)$')
plot(x,fx2,'b-',label=r'$\sin(x)$')
plot(x,fx3,'k:',label=r'$0.5\left(e^{-x^2}\sin(x)+cos(x)\right)$')
xlim(-8,20)
ylim(-1,1)
legend(loc = 'upper right')
46
As can be seen in our previous plots, we use a format string for the color and linestyle. As
example
plot(x,fx3,'k:')
means plot x, fx3 with black dotted lines. The first value is for colour while the second is for
linestyle. To plot x, fx3 with black dotted lines and dashed markers, we have
In [227]: plot(x,fx3,'k:_')
47
We can also plot multiple plots by putting them all in the same plot function as in
In [228]: plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:')
48
As can be seen, matplotlib gives text output on what is is doing. These outputs can be sur-
pressed by assigning the plot command to a variable or ending the command with a semicolon.
In [229]: plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');
In [230]: c = plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:')
49
There is also the subplot command that allows to draw multiple plots in their own axes. Take
the example below
In [231]: subplot(2,2,1)
plot(x,fx1);
subplot(2,2,2)
plot(x,fx2);
subplot(2,2,3)
plot(x,fx3);
subplot(2,2,4)
plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');
50
In the above code, subplot(a,b,i) says to create a subplot with a rows and b columns, i is the
plot we are working on and 1 ≤ i ≤ ab. It is also possible to plot each curve in its own canvas.
That is achieved by using the figure function. The figure function creates a new figure windows
and subsequent plots are plotted within that window.
In [232]: plot(x,fx3);
figure()
plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');
51
It is possible to assign calls to figure and axes to variables and work with those variables.
Example
52
In [233]: fig = figure()
ax = fig.add_subplot(111)
plot(x,fx3)
Alternatively, after plotting, it is possible to grab the current axes using the function gca and
use that to set some properties of the plot. Example, to move the spines to the zero position in
both axes, we can use the folloowing lines of code
In [234]: plot(x,fx1,'r-',x,fx2,'b--',x,fx3,'k:');
ax = gca()
ax.spines['bottom'].set_position(('data',0)) #reposition bottom spine to
ax.spines['left'].set_position(('data',0)) #reposition left spine to the
ax.spines['top'].set_color('none') #make top spine transparent
ax.spines['right'].set_color('none') #make right spine transparent
ax.xaxis.set_ticks_position('bottom') #remove the tick marks from the top
ax.yaxis.set_ticks_position('left') #remove the tick marks from the right
53
In the code above the set_position takes a tuple (position_type, amount) as argument with the
first element being the position type which can take the values outwards which moves the spine
outward by amount if it is positive or inwards otherwise. It is also possible to position the spine
in relation to the axes in which case the position_type is axes and amount will be between 0 and
1 with 0.5 being the middle of the axes. The position_type data positions the spine in relation to
the data. We also set_color to none to make the spine transparent. When the plot is all done, the
figure can be saved for later use using the savefig function with the syntax
savefig(filename)
Matplotlib is capable of many different kinds of plots, example scatter plots a scatter plot, bar
charts, histograms, pie charts, etc.
In [235]: scatter(range(100),rand(100))
54
Three dimensional plots can be done in matplotlib by importing the Axes3D module from the
mpl_toolkits as
print dir(Axes3D)
Axes3D however insists functions are called not from the module but from an axes instance.
Calling a function directly from Axes3D results in the following error
In [237]: Axes3D.plot(x,fx1,fx2)
---------------------------------------------------------------------------
<ipython-input-237-7c821a39ff50> in <module>()
----> 1 Axes3D.plot(x,fx1,fx2)
TypeError: unbound method plot() must be called with Axes3D instance as fir
To use the functions, we need to define an Axes3D instance and then use it. As in
55
In [238]: ax = subplot(111, projection = '3d') #the projection = '3d' is all we nee
ax.plot(x,fx1,fx2)
ax.set_xlabel(r'$x$-axis')
ax.set_ylabel(r'$y$-axis')
ax.set_zlabel(r'$z$-axis')
Again, it is important to set the labels using the axes instance because matplotlib as has been
imported through pylab has no zlabel function. Using that without the label will produce an error.
The Axes3D module also has functions for plotting surfaces and other 3D plots. The following
code plots the surface
z = y 2 − x2 .
In [239]: x = linspace(-2,2,101)
y = linspace(-2,2,101)
xx,yy = meshgrid(x,y)
z = yy**2 - xx**2
ax = subplot(111,projection = '3d')
ax.plot_surface(xx,yy,z)
56
To decompose the code above, x, y are created as linspaces to define their interval of interest.
After that, the two are combined to create a meshgrid. The meshgrid function is used to create
the coordinate vectors of an n-dimensional grid. Example given that we want to create the 2
dimensional grid x, y ∈ [1, 4] × [5, 8] we can have
In [240]: x = linspace(1,4,4)
y = linspace(5,8,4)
xx,yy = meshgrid(x,y)
print 'xx is', xx
print 'yy is', yy
xx is [[ 1. 2. 3. 4.]
[ 1. 2. 3. 4.]
[ 1. 2. 3. 4.]
[ 1. 2. 3. 4.]]
yy is [[ 5. 5. 5. 5.]
[ 6. 6. 6. 6.]
[ 7. 7. 7. 7.]
[ 8. 8. 8. 8.]]
The array xx has the values of x repeated in every row while the array yy has the values of y
repeated for every column. The (x,y) points are chosen by picking corresponding elements from
xx and yy. The array xx has the values in the interval [1,4] while the array yy has the values in the
interval [5,8]. The line
z = yy**2 - xx**2
57
then performs the calculation y 2 − x2 on each point of xx and yy to create z the same size as xx
and yy. The last line
ax.plot_surface(xx,yy,z)
then plots xx, yy and z as a surface. We can also create a contour plot of our surface using
In [241]: x = linspace(-2,2,101)
y = linspace(-2,2,101)
xx,yy = meshgrid(x,y)
z = yy**2 - xx**2
ax = subplot(111)
ax.contour(xx,yy,z)
It is also possible to view a surface as an image using the imshow function as shown below.
In [242]: imshow(z)
58
In this case matplotlib takes the surface and maps values to a color using a colormap. Mat-
plotlib’s default colormap is jet. This maps the lower values to blue and the higher ones to red
and interpolates between that. The colors in the colormap and the values they represent can be
added to the plot by using the colorbar function as follows
In [243]: imshow(z)
colorbar()
59
Pylab also has the capability to work with images. Pylab has the imread function to read an
image. Example
In [244]: c = imread('cocoa.tif')
c
Here c is an array of numbers. Each point in the array is a pixel, short for picture element.
The value at each pixel location is the colour intensity or value. The array is of type uint8 which
means the minimum value it can have is 0 and the maximum is 255. We can check the minimum
and maximum values using the min and max methods of the array object
In [245]: c.min()
Out[245]: 91
In [246]: c.max()
Out[246]: 138
60
In [247]: imshow(c)
gray()
The function gray() displays the image in grayscale where black is represented by the lowest
number and white the highest number
61
When we try to visualize the array, imshow finds the maximum and minimum values within
the image and scales the picture to those values. We can use the vmin and vmax keywords to force
the minimum and maaximum values that imshow should use. The two images above show the
result difference. The image can be changed to a float image using the following code
In [249]: c = c.astype(float32)
c = c/255.
c
In [250]: c.max()
Out[250]: 0.5411765
62
In [251]: c.min()
Out[251]: 0.35686275
We can for instance perform intensity scaling on the image c. We know the minimum and
maximum values in c. Using simple algebra, we know that we can create a line that maps the
minimum and maximum values in c to 0 and 1 repectively using
1
d= (c − cmin ).
cmax − cmin
The new array d, has its minimum alue to be 0 and maximum to be 1 where c has its minimum
and maximum and the intermediate values are also scaled appropriately. We then compare the
results.
In [253]: d.min()
Out[253]: 0.0
In [254]: d.max()
Out[254]: 1.0
63
In [256]: imshow(d, vmin = 0, vmax = 1)
Since we know that d > c at all points, we can equally see the difference between the two
images by performing to see how much c has been enhanced. The result show a non-constant
enhancement.
64
We want to load the image house.png which is the image of a building and find the edges by
taking the square root of the laplacian q
fx2 + fy2
In [258]: d = imread('house.png')
d = d[:,:,0]
imshow(d)
65
In [ ]:
In [260]: imshow(1-e)
66
2.2 IO in numpy
Python itself allows for fileIO using its open function. Python can open a file as text and binary
and it can open files in three modes ‘r’ for read, ‘w’ for write and ‘a’ for append. So for instance,
in python, one can open a file using the snippet
In [261]: f = open('test','w')
f.write('Writing from python')
f.close()
The open function opens the file test for writing. It should be noted that in opening a file for
write, if the file already exists, it will be overwritten. If the file does not exist however, it will be
created. If a file is open for reading however and it does not exist, the operation raises an error. The
variable f is a file object pointing to the open file test on disk. The write of the file object dumps
a string to a file. If there are multiple lines to write, the file object also provides the writelines
method to write a sequence to the file. For a file opened for reading, there are the read and the
readline. The former reads the entire file content as a string while the latter reads a single line in
the file on invocation. There is also the readlines method that reads the entire file into a list with
each index containing a line of the file. After working with the file, you close the file to release its
resources and flush remaining data.
For working with arrays, numpy provides its own fileIO mechanisms. To save an array for
later use, you can use the numpy save function. This saves the array as an npy extension readable
by numpy. For example, for the array c,
In [262]: c
In [263]: save('test.npy',c)
The file test.npy is created in the location specified, in this case, the current working folder and
the content of that file will be the array. When the array is later needed, it can be loaded using the
code
In [264]: c = load('test.npy')
67
If there are multiple arrays to be saved, numpy also provides the savez function. The following
x = linspace(-2pi,2pi,1001) fx1 = cos(x) fx2 = sin(x) fx3 = 0.5*(exp(-x**2)*sin(x) + cos(x))
In [265]: savez('vars.npz',x,fx1,fx2,fx3)
In this case the file vars.npz is created by numpy and the content is the data from the arrays x,
fx1, fx2, fx3 each an npy file. If savez is used as in the example above, numpy just saves the vari-
ables as arr_0.npy, arr_1.npy, arr_2.npy, arr_3.npy. If we want to save the names of the variables
with the data , we save the variables as key=value pairs as
In [266]: savez('vars.npz',x=x,fx1=fx1,fx2=fx2,fx3=fx3)
In [268]: data.keys()
In [269]: data.items()
Out[269]: [('x', array([-2. , -1.96, -1.92, -1.88, -1.84, -1.8 , -1.76, -1.72, -1.
-1.64, -1.6 , -1.56, -1.52, -1.48, -1.44, -1.4 , -1.36, -1.32,
-1.28, -1.24, -1.2 , -1.16, -1.12, -1.08, -1.04, -1. , -0.96,
-0.92, -0.88, -0.84, -0.8 , -0.76, -0.72, -0.68, -0.64, -0.6 ,
-0.56, -0.52, -0.48, -0.44, -0.4 , -0.36, -0.32, -0.28, -0.24,
-0.2 , -0.16, -0.12, -0.08, -0.04, 0. , 0.04, 0.08, 0.12,
0.16, 0.2 , 0.24, 0.28, 0.32, 0.36, 0.4 , 0.44, 0.48,
0.52, 0.56, 0.6 , 0.64, 0.68, 0.72, 0.76, 0.8 , 0.84,
0.88, 0.92, 0.96, 1. , 1.04, 1.08, 1.12, 1.16, 1.2 ,
1.24, 1.28, 1.32, 1.36, 1.4 , 1.44, 1.48, 1.52, 1.56,
1.6 , 1.64, 1.68, 1.72, 1.76, 1.8 , 1.84, 1.88, 1.92,
1.96, 2. ])),
('fx1', array([ 1. , 0.99992104, 0.99968419, ..., 0.99968419,
0.99992104, 1. ])),
('fx3', array([ 0.5 , 0.49996052, 0.49984209, ..., 0.49984209,
0.49996052, 0.5 ])),
('fx2', array([ 2.44929360e-16, 1.25660399e-02, 2.51300954e-02, ...
-2.51300954e-02, -1.25660399e-02, -2.44929360e-16]))]
In [270]: data['fx1']
For large/huge scientific datasets, python also has the ability to inteface with the
HD5F. HDF stands for Hierarchical Data Format and is meant to store large data.
More information on HDF5 can be found at https://www.hdfgroup.org/HDF5/ and
https://en.wikipedia.org/wiki/Hierarchical_Data_Format. Python provides the module h5py
for working with hdf5 data. The following lines of code writes our arrays x, fx1, fx2, fx3 into
an hdf5 file test.h5.
68
In [271]: import h5py
In [274]: hf.close()
The following lines of code reads the data for array x back.
In [276]: hf.keys()
In [277]: x = hf['x'][:]
In [278]: hf.close()
2.3 Pandas
Pandas strives to be an easy to use data analysis module for python. It has the ability to handle
large data files and work with large data formats including excel data, data from databases by
combining the python database modules especially sqlalchemy and also csv or comma separated
files. Pandas can be imported into python using
In [280]: c = pandas.read_csv('iris.data')
c
69
12 4.3 3.0 1.1 0.1 Iris-setosa
13 5.8 4.0 1.2 0.2 Iris-setosa
14 5.7 4.4 1.5 0.4 Iris-setosa
15 5.4 3.9 1.3 0.4 Iris-setosa
16 5.1 3.5 1.4 0.3 Iris-setosa
17 5.7 3.8 1.7 0.3 Iris-setosa
18 5.1 3.8 1.5 0.3 Iris-setosa
19 5.4 3.4 1.7 0.2 Iris-setosa
20 5.1 3.7 1.5 0.4 Iris-setosa
21 4.6 3.6 1.0 0.2 Iris-setosa
22 5.1 3.3 1.7 0.5 Iris-setosa
23 4.8 3.4 1.9 0.2 Iris-setosa
24 5.0 3.0 1.6 0.2 Iris-setosa
25 5.0 3.4 1.6 0.4 Iris-setosa
26 5.2 3.5 1.5 0.2 Iris-setosa
27 5.2 3.4 1.4 0.2 Iris-setosa
28 4.7 3.2 1.6 0.2 Iris-setosa
29 4.8 3.1 1.6 0.2 Iris-setosa
.. ... ... ... ... ...
119 6.9 3.2 5.7 2.3 Iris-virginica
120 5.6 2.8 4.9 2.0 Iris-virginica
121 7.7 2.8 6.7 2.0 Iris-virginica
122 6.3 2.7 4.9 1.8 Iris-virginica
123 6.7 3.3 5.7 2.1 Iris-virginica
124 7.2 3.2 6.0 1.8 Iris-virginica
125 6.2 2.8 4.8 1.8 Iris-virginica
126 6.1 3.0 4.9 1.8 Iris-virginica
127 6.4 2.8 5.6 2.1 Iris-virginica
128 7.2 3.0 5.8 1.6 Iris-virginica
129 7.4 2.8 6.1 1.9 Iris-virginica
130 7.9 3.8 6.4 2.0 Iris-virginica
131 6.4 2.8 5.6 2.2 Iris-virginica
132 6.3 2.8 5.1 1.5 Iris-virginica
133 6.1 2.6 5.6 1.4 Iris-virginica
134 7.7 3.0 6.1 2.3 Iris-virginica
135 6.3 3.4 5.6 2.4 Iris-virginica
136 6.4 3.1 5.5 1.8 Iris-virginica
137 6.0 3.0 4.8 1.8 Iris-virginica
138 6.9 3.1 5.4 2.1 Iris-virginica
139 6.7 3.1 5.6 2.4 Iris-virginica
140 6.9 3.1 5.1 2.3 Iris-virginica
141 5.8 2.7 5.1 1.9 Iris-virginica
142 6.8 3.2 5.9 2.3 Iris-virginica
143 6.7 3.3 5.7 2.5 Iris-virginica
144 6.7 3.0 5.2 2.3 Iris-virginica
145 6.3 2.5 5.0 1.9 Iris-virginica
146 6.5 3.0 5.2 2.0 Iris-virginica
147 6.2 3.4 5.4 2.3 Iris-virginica
70
148 5.9 3.0 5.1 1.8 Iris-virginica
The data above, is a machine learning dataset with 150 rows and 5 columns. It can be found
at https://archive.ics.uci.edu/ml/datasets/Iris. Each row gives the properties of a class of iris
flower together with the class. There are 50 rows of data for the class Iris-setosa, 50 for the class
Iris-versicolor and 50 for Iris-virginica. The columns shows the properties of the flowers. The
columns contain the following data
In the output c, it can be seen that the first row is bold. This is because pandas assumes the
first row of the data is the header row. We can alter that by telling pandas there is no header in the
data by rewriting the snippet as
In [281]: c = pandas.read_csv('iris.data',header=None)
c.head()
Out[281]: 0 1 2 3 4
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
The head method of the dataframe just lists the first fie rows in the data. The opposite is the
tail method. In setting header = None, pandas can either be provided with a list for the columns
or it an be forced to deduce set its own columns. In this case, because header names were not
provided, pandas sets the column headers as numbers. We can provide column names using the
names keyword. In that case, our read_csv code becomes
The describe method of the dataframe gives descriptive statistics of the dataframe.
In [283]: c.describe()
71
Out[283]: slength swidth plength pwidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
The unique method returns the unique elements in a column. To return the unique flower
classes, we have
In [284]: c['fclass'].unique()
c[‘fclass’] just returns the fclass column of the dataframe. Multiple columns can be queried
using a list of column names for example
In [285]: c[['slength','fclass']].head()
The data in the dataframe can also be grouped by one or more columns.
72
In [289]: c_grp.describe()
The column names of c can be extracted using the columns property. For example the following
code saves the columns of c as a list in the variable name c_cols. This can then be used to extract
columns
In [291]: c[c_cols[:-1]].head()
It is possible to extract relevant data from dataframes where the data satisfies certain condi-
tions. For example to check if the fclass column is equal to Iris-setosa, we can use
73
Out[292]: 0 True
1 True
2 True
3 True
4 True
Name: fclass, dtype: bool
This returns True for the rows where the fclass column is equal to Iris-setosa, else it returns
False. This can be used as a mask to return columns from rows satisfying this condition.
74
27 5.2 3.5 1.5 0.2 Iris-setosa
28 5.2 3.4 1.4 0.2 Iris-setosa
29 4.7 3.2 1.6 0.2 Iris-setosa
30 4.8 3.1 1.6 0.2 Iris-setosa
31 5.4 3.4 1.5 0.4 Iris-setosa
32 5.2 4.1 1.5 0.1 Iris-setosa
33 5.5 4.2 1.4 0.2 Iris-setosa
34 4.9 3.1 1.5 0.1 Iris-setosa
35 5.0 3.2 1.2 0.2 Iris-setosa
36 5.5 3.5 1.3 0.2 Iris-setosa
37 4.9 3.1 1.5 0.1 Iris-setosa
38 4.4 3.0 1.3 0.2 Iris-setosa
39 5.1 3.4 1.5 0.2 Iris-setosa
40 5.0 3.5 1.3 0.3 Iris-setosa
41 4.5 2.3 1.3 0.3 Iris-setosa
42 4.4 3.2 1.3 0.2 Iris-setosa
43 5.0 3.5 1.6 0.6 Iris-setosa
44 5.1 3.8 1.9 0.4 Iris-setosa
45 4.8 3.0 1.4 0.3 Iris-setosa
46 5.1 3.8 1.6 0.2 Iris-setosa
47 4.6 3.2 1.4 0.2 Iris-setosa
48 5.3 3.7 1.5 0.2 Iris-setosa
49 5.0 3.3 1.4 0.2 Iris-setosa
Multiple conditions can be combined using the python bitwise operator &
75
Out[298]: slength swidth plength pwidth fclass
119 6 2.2 5 1.5 Iris-virginica
It is also possible in pandas to access rows of dataframes by inde using the iloc method. For
example
In [299]: c.iloc[5:10]
In [ ]:
76