Chapter 2 - Loops & Functions - Python Programming For Data Science
Chapter 2 - Loops & Functions - Python Programming For Data Science
1. for Loops
For loops allow us to execute code a specific number of times.
Keyword for begins the loop. Colon : ends the first line of the loop.
Block of code indented is executed for each value in the list (hence the name “for” loops)
The loop ends after the variable n has taken all the values in the list
We can iterate over any kind of “iterable”: list, tuple, range, set, string.
An iterable is really just any object with a sequence of values that can be looped over. In this case, we are
iterating over the values in a list.
word = "Python"
for letter in word:
print("Gimme a " + letter + "!")
Gimme a P!
Gimme a y!
Gimme a t!
Gimme a h!
Gimme a o!
Gimme a n!
What's that spell?!! Python!
A very common pattern is to use for with the range(). range() gives you a sequence of integers up to some value
(non-inclusive of the end-value) and is typically used for looping.
range(10)
range(0, 10)
list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for i in range(10):
print(i)
0
1
2
3
4
5
6
7
8
9
We can also specify a start value and a skip-by value with range:
1
11
21
31
41
51
61
71
81
91
We can write a loop inside another loop to iterate over multiple dimensions of data:
list_1 = [0, 1, 2]
list_2 = ["a", "b", "c"]
for i in range(3):
print(list_1[i], list_2[i])
0 a
1 b
2 c
There are many clever ways of doing these kinds of things in Python. When looping over objects, I tend to use zip()
and enumerate() quite a lot in my work. zip() returns a zip object which is an iterable of tuples.
(0, 'a')
(1, 'b')
(2, 'c')
0 a
1 b
2 c
enumerate() adds a counter to an iterable which we can use within the loop.
for i in enumerate(list_2):
print(i)
(0, 'a')
(1, 'b')
(2, 'c')
for n, i in enumerate(list_2):
print(f"index {n}, value {i}")
index 0, value a
index 1, value b
index 2, value c
We can loop through key-value pairs of a dictionary using .items(). The general syntax is for key, value in
dictionary.items().
2. while loops
We can also use a while loop to excute a block of code several times. But beware! If the conditional expression is
always True, then you’ve got an infintite loop!
n = 10
while n > 0:
print(n)
n -= 1
print("Blast off!")
10
9
8
7
6
5
4
3
2
1
Blast off!
Let’s read the while statement above as if it were in English. It means, “While n is greater than 0, display the value of
n and then decrement n by 1. When you get to 0, display the word Blast off!”
For some loops, it’s hard to tell when, or if, they will stop! Take a look at the Collatz conjecture. The conjecture
states that no matter what positive integer n we start with, the sequence will always eventually reach 1 - we just
don’t know how many iterations it will take.
n = 11
while n != 1:
print(int(n))
if n % 2 == 0: # n is even
n = n / 2
else: # n is odd
n = n * 3 + 1
print(int(n))
11
34
17
52
26
13
40
20
10
5
16
8
4
2
1
Hence, in some cases, you may want to force a while loop to stop based on some criteria, using the break keyword.
n = 123
i = 0
while n != 1:
print(int(n))
if n % 2 == 0: # n is even
n = n / 2
else: # n is odd
n = n * 3 + 1
i += 1
if i == 10:
print(f"Ugh, too many iterations!")
break
123
370
185
556
278
139
418
209
628
314
Ugh, too many iterations!
The continue keyword is similar to break but won’t stop the loop. Instead, it just restarts the loop from the top.
n = 10
while n > 0:
if n % 2 != 0: # n is odd
n = n - 1
continue
break # this line is never executed because continue restarts the loop from the
top
print(n)
n = n - 1
print("Blast off!")
Print to PDF
10
8
6
4
2
Blast off!
3. Comprehensions
Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code. I use these
quite a bit! Below is a standard for loop you might use to iterate over an iterable and create a list:
subliminal = ['Tom', 'ingests', 'many', 'eggs', 'to', 'outrun', 'large', 'eagles', 'after',
'running', 'near', '!']
first_letters = []
for word in subliminal:
first_letters.append(word[0])
print(first_letters)
['T', 'i', 'm', 'e', 't', 'o', 'l', 'e', 'a', 'r', 'n', '!']
['T', 'i', 'm', 'e', 't', 'o', 'l', 'e', 'a', 'r', 'n', '!']
We can make things more complicated by doing multiple iteration or conditional iteration:
[0, 2, 4, 6, 8, 10]
[-i if i % 2 else i for i in range(11)] # condition the value, -ve odd and +ve even
numbers
Dictionary comprehension:
Tuple comprehension doesn’t work as you might expect… We get a “generator” instead (more on that later).
y = (word[-1] for word in words) # this is NOT a tuple comprehension - more on generators
later
print(y)
4. try / except
Above: the Blue Screen of Death at a Nine Inch Nails concert! Source: cnet.com.
If something goes wrong, we don’t want our code to crash - we want it to fail gracefully. In Python, this can be
accomplished using try/except. Here is a basic example:
this_variable_does_not_exist
print("Another line") # code fails before getting to this line
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-27-dd878f68d557> in <module>
----> 1 this_variable_does_not_exist
2 print("Another line") # code fails before getting to this line
try:
this_variable_does_not_exist
except:
pass # do nothing
print("You did something bad! But I won't raise an error.") # print something
print("Another line")
Python tries to execute the code in the try block. If an error is encountered, we “catch” this in the except block (also
called try/catch in other languages). There are many different error types, or exceptions - we saw NameError
above.
5/0 # ZeroDivisionError
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-29-9866726f0353> in <module>
----> 1 5/0 # ZeroDivisionError
my_list = [1,2,3]
my_list[5] # IndexError
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-30-8f0c4b3b2ce1> in <module>
1 my_list = [1,2,3]
----> 2 my_list[5] # IndexError
my_tuple = (1,2,3)
my_tuple[0] = 0 # TypeError
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-31-90cd0bd9ddec> in <module>
1 my_tuple = (1,2,3)
----> 2 my_tuple[0] = 0 # TypeError
Ok, so there are apparently a bunch of different errors one could run into. With try/except you can also catch the
exception itself:
try:
this_variable_does_not_exist
except Exception as ex:
print("You did something bad!")
print(ex)
print(type(ex))
You did something bad!
name 'this_variable_does_not_exist' is not defined
<class 'NameError'>
In the above, we caught the exception and assigned it to the variable ex so that we could print it out. This is useful
because you can see what the error message would have been, without crashing your program. You can also catch
specific exceptions types. This is typically the recommended way to catch errors, you want to be specific in catching
your error so you know exactly where and why your code failed.
try:
this_variable_does_not_exist # name error
# (1, 2, 3)[0] = 1 # type error
# 5/0 # ZeroDivisionError
except TypeError:
print("You made a type error!")
except NameError:
print("You made a name error!")
except:
print("You made some other sort of error")
The final except would trigger if the error is none of the above types, so this sort of has an if/elif/else feel to it.
There is also an optional else and finally keyword (which I almost never used), but you can read more about here.
try:
this_variable_does_not_exist
except:
print("The variable does not exist!")
finally:
print("I'm printing anyway!")
We can also write code that raises an exception on purpose, using raise:
add_one("blah")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-96e0142692a3> in <module>
----> 1 add_one("blah")
<ipython-input-35-eabf290fc405> in add_one(x)
1 def add_one(x): # we'll get to functions in the next section
----> 2 return x + 1
def add_one(x):
if not isinstance(x, float) and not isinstance(x, int):
raise TypeError(f"Sorry, x must be numeric, you entered a {type(x)}.")
return x + 1
add_one("blah")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-96e0142692a3> in <module>
----> 1 add_one("blah")
<ipython-input-37-3a3a8b564774> in add_one(x)
1 def add_one(x):
2 if not isinstance(x, float) and not isinstance(x, int):
----> 3 raise TypeError(f"Sorry, x must be numeric, you entered a {type(x)}.")
4
5 return x + 1
This is useful when your function is complicated and would fail in a complicated way, with a weird error message.
You can make the cause of the error much clearer to the user of the function. If you do this, you should ideally
describe these exceptions in the function documentation, so a user knows what to expect if they call your function.
Finally, we can even define our own exception types. We do this by inheriting from the Exception class - we’ll
explore classes and inheritance more in the next chapter!
class CustomAdditionError(Exception):
pass
def add_one(x):
if not isinstance(x, float) and not isinstance(x, int):
raise CustomAdditionError("Sorry, x must be numeric")
return x + 1
add_one("blah")
---------------------------------------------------------------------------
CustomAdditionError Traceback (most recent call last)
<ipython-input-41-96e0142692a3> in <module>
----> 1 add_one("blah")
<ipython-input-40-25db54189b4f> in add_one(x)
1 def add_one(x):
2 if not isinstance(x, float) and not isinstance(x, int):
----> 3 raise CustomAdditionError("Sorry, x must be numeric")
4
5 return x + 1
5. Functions
A function is a reusable piece of code that can accept input parameters, also known as “arguments”. For example,
let’s define a function called square which takes one input parameter n and returns the square n**2:
def square(n):
n_squared = n**2
return n_squared
square(2)
square(100)
10000
square(12345)
152399025
Functions begin with the def keyword, then the function name, arguments in parentheses, and then a colon (:). The
code executed by the function is defined by indentation. The output or “return” value of the function is specified
using the return keyword.
string
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-48-edbf08a562d5> in <module>
----> 1 string
If a function changes the variables passed into it, then it is said to have side effects. For example:
def silly_sum(my_list):
my_list.append(0)
return sum(my_list)
l = [1, 2, 3, 4]
out = silly_sum(l)
out
10
The above looks like what we wanted? But wait… it changed our l object…
[1, 2, 3, 4, 0]
If your function has side effects like this, you must mention it in the documentation (which we’ll touch on later in this
chapter).
def f(x):
x + 1 # no return!
if x == 999:
return
print(f(0))
None
Optional & Required Arguments
Sometimes it is convenient to have default values for some arguments in a function. Because they have default
values, these arguments are optional, and are hence called “optional arguments”. For example:
repeat_string("mds", 2)
'mdsmds'
repeat_string("mds", 5)
'mdsmdsmdsmdsmds'
'mdsmds'
Ideally, the default value for optional arguments should be carefully chosen. In the function above, the idea of
“repeating” something makes me think of having 2 copies, so n=2 feels like a reasonable default.
You can have any number of required arguments and any number of optional arguments. All the optional arguments
must come after the required arguments. The required arguments are mapped by the order they appear. The
optional arguments can be specified out of order when using the function.
example(1, 2, 3, 4)
1 2 3 4
example(1, 2)
1 2 DEFAULT DEFAULT
1 2 3 4
example(1, 2, c=3)
1 2 3 DEFAULT
Specifying all the arguments as keyword arguments, even though only c and d are optional:
1 2 3 4
Specifying c by the fact that it comes 3rd (I do not recommend this because I find it is confusing):
example(1, 2, 3)
1 2 3 DEFAULT
Specifying the optional arguments by keyword, but in the wrong order (this can also be confusing, but not so terrible
- I am fine with it):
1 2 3 4
example(a=1, b=2)
1 2 DEFAULT DEFAULT
Specifying the non-optional arguments by keyword, but in the wrong order (not recommended, I find it confusing):
example(b=2, a=1)
1 2 DEFAULT DEFAULT
example(a=2, 1)
sum_and_product(5, 6)
(11, 30)
The parentheses can be omitted (and often are), and a tuple is implicitly returned as defined by the use of the
comma:
sum_and_product(5, 6)
(11, 30)
It is common to immediately unpack a returned tuple into separate variables, so it really feels like the function is
returning multiple values:
s, p = sum_and_product(5, 6)
11
p
30
s, _ = sum_and_product(5, 6)
11
11
def add(*args):
print(args)
return sum(args)
add(1, 2, 3, 4, 5, 6)
(1, 2, 3, 4, 5, 6)
21
def add(**kwargs):
print(kwargs)
return sum(kwargs.values())
12
def do_nothing(x):
return x
type(do_nothing)
function
print(do_nothing)
This means you can pass functions as arguments into other functions.
def square(y):
return y**2
evaluate_function_on_x_plus_1(square, 5)
36
7. Anonymous Functions
There are two ways to define functions in Python. The way we’ve beenusing up until now:
def add_one(x):
return x+1
add_one(7.2)
8.2
type(add_one)
function
add_one(7.2)
8.2
The two approaches above are identical. The one with lambda is called an anonymous function. Anonymous
functions can only take up one line of code, so they aren’t appropriate in most cases, but can be useful for smaller
things.
evaluate_function_on_x_plus_1(lambda x: x ** 2, 5)
36
Above:
First, lambda x: x**2 evaluates to a value of type function (otice that this function is never given a name -
hence “anonymous functions”).
Then, the function and the integer 5 are passed into evaluate_function_on_x_plus_1
At which point the anonymous function is evaluated on 5+1, and we get 36.
As an example, consider the task of turning each element of a list into a palindrome.
names = ["milad", "tom", "tiffany"]
name = "tom"
name[::-1] # creates a slice that starts at the end and moves backwards, syntax is
[begin:end:step]
'mot'
names_backwards = list()
names_backwards.append(names[0] + names[0][::-1])
names_backwards.append(names[1] + names[1][::-1])
names_backwards.append(names[2] + names[2][::-1])
names_backwards
The code above is gross, terrible, yucky code for several reasons:
names_backwards = list()
names_backwards
The above is slightly better and we have solved problems (1) and (3). But let’s create a function to make our life
easier:
def make_palindromes(names):
names_backwards = list()
return names_backwards
make_palindromes(names)
Okay, this is even better. We have now also solved problem (2), because you can call the function with any list, not
just names. For example, what if we had multiple lists:
make_palindromes(names1)
make_palindromes(names2)
In my personal opinion, make_palindromes() does a bit too much to be understandable. I prefer this:
def make_palindrome(name):
return name + name[::-1]
make_palindrome("milad")
'miladdalim'
From here, if we want to “apply make_palindrome to every element of a list” we could use list comprehension:
There is also the in-built map() function which does exactly this, applies a function to every element of a sequence:
list(map(make_palindrome, names))
9. Generators
Recall list comprehension from earlier in the chapter:
[n for n in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Comprehensions evaluate the entire expression at once, and then returns the full data product. Sometimes, we
want to work with just one part of our data at a time, for example, when we can’t fit all of our data in memory. For
this, we can use generators.
(n for n in range(10))
Notice that we just created a generator object. Generator objects are like a “recipe” for generating values. They
don’t actually do any computation until they are asked to. We can get values from a generator in three main ways:
Using next()
Using list()
Looping
next(gen)
next(gen)
1
Once the generator is exhausted, it will no longer return values:
0
1
2
3
4
5
6
7
8
9
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-108-14d35f56c593> in <module>
1 gen = (n for n in range(10))
2 for i in range(11):
----> 3 print(next(gen))
StopIteration:
We can see all the values of a generator using list() but this defeats the purpose of using a generator in the first
place:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0
1
2
3
4
5
6
7
8
9
Above, we saw how to create a generator object using comprehension syntax but with parentheses. We can also
create a generator using functions and the yield keyword (instead of the return keyword):
def gen():
for n in range(10):
yield (n, n ** 2)
g = gen()
print(next(g))
print(next(g))
print(next(g))
(0, 0)
(1, 1)
(2, 4)
Below is some real-world motivation of a case where a generator might be useful. Say we want to create a list of
dictionaries containing information about houses in Canada.
import random # we'll learn about imports in a later chapter
import time
import memory_profiler
city = ['Vancouver', 'Toronto', 'Ottawa', 'Montreal']
def house_list(n):
houses = []
for i in range(n):
house = {
'id': i,
'city': random.choice(city),
'bedrooms': random.randint(1, 5),
'bathrooms': random.randint(1, 3),
'price ($1000s)': random.randint(300, 1000)
}
houses.append(house)
return houses
house_list(2)
[{'id': 0,
'city': 'Toronto',
'bedrooms': 3,
'bathrooms': 2,
'price ($1000s)': 567},
{'id': 1,
'city': 'Toronto',
'bedrooms': 3,
'bathrooms': 1,
'price ($1000s)': 989}]
What happens if we want to create a list of 1,000,000 houses? How much time/memory will it take?
start = time.time()
mem = memory_profiler.memory_usage()
print(f"Memory usage before: {mem[0]:.0f} mb")
people = house_list(500000)
print(f"Memory usage after: {memory_profiler.memory_usage()[0]:.0f} mb")
print(f"Time taken: {time.time() - start:.2f}s")
def house_generator(n):
for i in range(n):
house = {
'id': i,
'city': random.choice(city),
'bedrooms': random.randint(1, 5),
'bathrooms': random.randint(1, 3),
'price ($1000s)': random.randint(300, 1000)
}
yield house
start = time.time()
print(f"Memory usage before: {mem[0]:.0f} mb")
people = house_generator(500000)
print(f"Memory usage after: {memory_profiler.memory_usage()[0]:.0f} mb")
print(f"Time taken: {time.time() - start:.2f}s")
Although, if we used list() to extract all of the genertator values, we’d lose our memory savings:
10. Docstrings
One problem we never really solved when talking about writing good functions was: “4. It is hard to understand
what it does just by looking at it”. This brings up the idea of function documentation, called “docstrings”. The
docstring goes right after the def line and is wrapped in triple quotes """.
def make_palindrome(string):
"""Turns the string into a palindrome by concatenating itself with a reversed version
of itself."""
In Python we can use the help() function to view another function’s documentation. In IPython/Jupyter, we can use
? to view the documentation string of any function in our environment.
make_palindrome?
But, even easier than that, if your cursor is in the function parentheses, you can use the shortcut shift + tab to
open the docstring at will.
Docstring Structure
General docstring convention in Python is described in PEP 257 - Docstring Conventions. There are many different
docstring style conventions used in Python. The exact style you use can be important for helping you to render your
documentation, or for helping your IDE parse your documentation. Common styles include:
1. Single-line: If it’s short, then just a single line describing the function will do (as above).
2. reST style: see here.
3. NumPy style: see here. (RECOMMENDED! and MDS-preferred)
4. Google style: see here.
Parameters
----------
param1 : datatype
A description of param1.
param2 : datatype
A description of param2.
param3 : datatype
A longer description because maybe this requires
more explanation and we can use several lines.
Returns
-------
datatype
A description of the output, datatypes and behaviours.
Describe special cases and anything the user needs to
know to use the function.
Examples
--------
>>> function_name(3,8,-5)
2.0
"""
def make_palindrome(string):
"""Turns the string into a palindrome by concatenating
itself with a reversed version of itself.
Parameters
----------
string : str
The string to turn into a palindrome.
Returns
-------
str
string concatenated with a reversed version of string
Examples
--------
>>> make_palindrome('tom')
'tommot'
"""
return string + string[::-1]
make_palindrome?
# scipy style
def repeat_string(s, n=2):
"""
Repeat the string s, n times.
Parameters
----------
s : str
the string
n : int, optional
the number of times, by default = 2
Returns
-------
str
the repeated string
Examples
--------
>>> repeat_string("Blah", 3)
"BlahBlahBlah"
"""
return s * n
Type Hints
Type hinting is exactly what it sounds like, it hints at the data type of function arguments. You can indicate the type
of an argument in a function using the syntax argument : dtype, and the type of the return value using def
func() -> dtype. Let’s see an example:
# NumPy style
def repeat_string(s: str, n: int = 2) -> str: # <---- note the type hinting here
"""
Repeat the string s, n times.
Parameters
----------
s : str
the string
n : int, optional (default = 2)
the number of times
Returns
-------
str
the repeated string
Examples
--------
>>> repeat_string("Blah", 3)
"BlahBlahBlah"
"""
return s * n
repeat_string?
Type hinting just helps your users and IDE identify dtypes and identify bugs. It’s just another level of
documentation. They do not force users to use that date type, for example, I can still pass an dict to repeat_string
if I want to:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-129-48f8613bcebb> in <module>
----> 1 repeat_string({'key_1': 1, 'key_2': 2})
<ipython-input-127-964bad9c977b> in repeat_string(s, n)
21 "BlahBlahBlah"
22 """
---> 23 return s * n
Most IDE’s are clever enough to even read your type hinting and warn you if you’re using a different dtype in the
function, e.g., this VScode screenshot:
By Tomas Beuzen
© Copyright 2021.