Introducing Python
Introducing Python
Bernie Hogan
July 13, 2022
v1.0
Introducing Python
Intro ucing Python
ntro ucing Python
ntro ucing Pyt on
ntro ucing Pyt o
ntro ucing yt o
ntro uc ng yt o
tro uc ng yt o
tro c ng yt o
tro c n yt o
tr c n yt o
tr c yt o
tr c y o
t c y o
t y o
t o
o
Contents
3 Collections 19
3.1 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Ordered by position: The list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 List indexing and slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Adding data to a list (and adding two lists together) . . . . . . . . . . . . . . . . . . . 21
3.2.3 A list versus a tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3
3.3 Ordered by inclusion: the set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Set inclusion and efficiency in code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Set logic: Union and Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Ordered by key: the dictionary or dict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Checking in on syntax with indexers and sets . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.2 Accessing a dictionary’s components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.3 Dictionary gotchas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7 Where to next? 69
7.1 Continuning your Python learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.1 Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.2 Communities and forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.3 Online courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.2 Ideas for directions next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A Short Questions 73
A.1 Short questions for practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.2 Chapter 1. Introducing Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.3 Chapter 2. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.3.1 Practicing making strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.3.2 Making a greeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.4 Chapter 3. Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.4.1 Building an algorithm to reproduce concrete poetry . . . . . . . . . . . . . . . . . . . 74
A.4.2 A Table of Muppets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.5 Flow control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.5.1 Fozzie Bear! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.5.2 List (and dictionary) comprehension practice . . . . . . . . . . . . . . . . . . . . . . . 77
A.5.3 Code refactoring I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.6 Chapter 5. Functions and classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A.6.1 Who said programming was better than flipping burgers? . . . . . . . . . . . . . . . . 79
0.1 Welcome!
Thanks for checking out this book.
The origins of this book started many, many years ago as a series of Python scripts for teaching social
scientists how to code in Python. Over the years these grew into ever more extensive lecture notes. During
the process of compiling lecture notes for my larger book “From Social Science to Data Science”, I thought
it might be a good idea to also take my introductory notes and compile them as a book.
It turned out to be a little more work than I expected! Part of the issue is that in reading my notes in
a book form I became aware of certain assumptions I made or glossed over because I thought I could talk
through such issues during a lecture. Yet, students appreciate resources. I adore using this book as a set of
“living notebooks” in Jupyter. You can edit this book, run it, change it etc. But not much beats being able
to have a printed or at least nicely formatted version of these notes to annotate, flip through and refer back
to at a glance. So I hope you appreciate these notes. I further hope they encourage you to consider learning
Python and developing your own way of working programmatically.
Language tends to have a notion of nouns and verbs. Nouns are often thought of as objects, like a pizza.
Verbs are actions like throw. You should not throw a pizza. Python has a similar notion of objects and
functions/methods. Objects contain things. Functions are the ways that we make use of things in Python.
They are like the verbs. For example:
print("Hello world!")
In this case:
Most of programming will involve moving around data between these functions in increasing layers of com-
plexity, starting first with step by step instructions, and then subsequently with increasing amounts of
abstraction. That is how the programming in this book will proceed.
i
0.3 An outline
The first thing we will need to do is get you started with an environment for programming. Chapter 1
introduces some basics of Python and of an environment called Jupyter Lab. You might already be reading
these chapters in Jupyter Lab! But if not, Chapter 1 has some tips on how to install and work with it.
In Chapter 2, we are going to start with a discussion of primitive data types. These are the basic building
blocks of objects. They correspond to letters and numbers. Then in the second half of this chapter we will
show collections. Collections include multiple data objects, either of the same type or a different type.
That takes us to the end of this chapter just covering the logic of collections.
In Chapter 3, we ask, if we have a collection how can we repeat an action for each element in that collection?
Thus, we will learn about iteration. Iterating leads quite naturally to the question: what if I want to do
something some of the time (of for only some of the elements?) This means we are doing something under
some condition. Thus, in Python we have an important notion of conditionals, the most common of these
are if and else statements. As in:
if YEAR == 2020:
buy("mask")
The means to assign a bunch of variables, have iterations, and do it under some conditions form the basis
of programming in virtually any language. But this can also get very messy. So different languages have
ways of organising code, often to minimise redundancy or maximise reusability or robustness. One essential
concept for organising code in Python is to make use of functions, both functions that are pre-built and
those that you create yourself. In 5 we look at how to build a function, what sort of inputs are possible (and
useful), and how to bundle functions together as objects.
Functions can stand alone, or they can be dedicated functions for a specific class of object. These dedicated
functions are called methods. We cover objects in more detail also in Chapter 5.
In Python, if it is a noun and it is not a primitive data type, then it is an object. So you can have a tweet
object which contains data about a tweet, such as its author, time, URL, hashtags, etc. A simpler object
is a list, which is just an ordered collection such as ["Abba", "Toto", "Gaga"]. The type of object in
Python is called its class. Classes are also covered in Chapter 5.
Chapter 6 looks at how to read and write files, both for reading and writing data, but also for running
Python programs. This does not really involve more complex programming concepts but starts you along a
path towards using Python in Jupyter and beyond.
The book concludes with some ideas and resources for further learning in Chapter 7.
This book has a series of appendcies. These are exercises that I have put together based on the material for
the different chapters. These tend to work better as Jupyter notebooks, but I also wanted to compile them.
I feel that some of the most interesting work I’ve done is not in telling people what to learn in Python, but
inviting them to explore Python themselves. I hope the exercises encourage a level of playfulness with the
code. The first, Chapter A, presents a few shorter exercises tied to each chapter of the book. The second,
Chapter B, are some example answers for the prior appendix. I’m sure you won’t just copy and paste the
answers, but this way removes some of the temptation. Then afterwards in Chapter C I propose a few longer
projects that might be a fun challenge.
0.5 Dedication
This one is dedicated to the teachers who take risks to keep their learning fresh.
In my undergraduate degree 20 years ago Prof. Ron Byrne had been teaching a class on ‘vocational languages’
for years. The year I took it, he was using Python for the first time. It was relatively new language then.
He figured it was going to replace the scripting language perl. Little did he realise that it was to become the
pre-eminent language in data science and machine learning. But it was a gift to me to be engaged with a
language so early on. Dozens of my papers have been touched by Python in some way or another.
At my current department, it’s impossible to stand still. AI and machine learning are rapidly suffusing every
domain of academic life in some way or another. I cannot keep teaching the same material or in the same
way year on year. Who knows? In ten years I might not be teaching with Python at all! Even up until 2014
I was just teaching with script files and not Jupyter. It wasn’t until around 2015 that I was using pandas!
Sometimes these decisions seem a little late and sometimes they seem a little early. But they’re always a
risk. It’s easier to teach what you know. But it’s important to teach what students need to know and that’s
always a risk. More times than I can count I’ve been asked a question that I felt like I should have known the
answer but didn’t. Taking these risks is challenging and I’ve looked to some of my best teachers in the past
to see how they navigated unknown and choppy waters. And in the end it’s all the same - treat teaching as
an opportunity to learn and not just an opportunity to teach.
I wish I could list all my great teachers here, but that list is more important to me than you. So why not take
a couple seconds and reflect yourself on those teachers in your life who took a risk to teach you something
new? I think these people are all around us if we know where to look and act like we want to learn. So this
book goes out to them.
Chapter 1
Python is a programming language. It is interpreted by the computer and transformed into low level code
that can be interpreted by a processor. To make statements in Python, the most direct way is to type
commands into a “Python console” (for example, by opening the terminal, or on windows the “anaconda
prompt”, then typing “python”). If you already have Python installed, on a Mac or Linux computer (and
maybe someday Windows PowerShell) you can open up the terminal and type python. Then you will see a
welcome message and a series of three chevrons, like so: (with what is likely to be some slight difference in
the welcome message depending on your setup.)
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
And in here you can enter commands. For more complex work, you will want to turn to Jupyter notebooks
and Python script files (*.py files).
To exit the Python console, type exit() or press control-D. Then you will see the standard prompt, which
will likely be either a single chevron (>) or a \$.
In this book, I am writing as if we are working in a Jupyter lab environment. You might be reading this on
the screen or in a book form. But we will act as if the grey shaded areas are blocks of code that you can
run, and the unshaded areas, including code, are just text to read.
Because this book began its life as a Jupyter notebook, it leverages certain features of Jupyter, such as
syntax highlighting. So you will see different text in a different font or colour, such as:
print("Hello world")
Instead of typing into a console, we will be typing Python into cells in a Jupyter notebook and running
these. This makes it like halfway between running piecemeal in a console or running all-at-once in a single
script.
1
1.2 Working with Anaconda and Jupyter
One of the best things about Python is that it now has an entire ecosystem for scientific computing. In this
book we will be using the incredible Anaconda package for Python. This package includes a recent version
of the base Python language and a bunch of useful libraries for scientific computing. This book will not
make use of most of these libraries, but they do feature in data science work generally.
In addition to installing Python, Anaconda comes with a few ways to code and develop Python programs.
For standalone programs there’s Spyder. It is a very involved development environment with features to help
in developing code that spans multiple script files. Similar to Spyder is PyCharm, which is not presently
included in Anaconda. But one program is especially useful, and that’s Jupyter. Jupyter is a browser-based
tool for viewing Python code alongside text, figures and results. It’s like a Microsoft Word document where
paragraphs can be run and results can be inserted directly into the document. Jupyter Lab is like a workspace
where you can run multiple files in different tabs, like browser windows. It also has a nice and growing set
of extensions, such as my personal favorite, the Table of Contents extension.
These documents are called notebooks. Their default extension is .ipynb, which stands for “ipython note-
book”. Before Jupyter there was a souped up Python console called ipython. You can still run it from the
terminal by typing ipython. But Jupyter is like a souped up ipython.
Jupyter notebooks can be run on their own in a variety of ways.
1. The Jupyter notebook app, which you can run that from the “Anaconda Navigator” application. You
can also run it from the terminal by typing jupyter notebook (that’s the regular terminal \$, not the
Python console >>>).
2. If you have a Google account, you can go to https://colab.google.com and upload your notebook or
start a new one. If you store your notebooks on your Google drive you can open them in Google
Colab by right clicking on them and selecting open with → Google Colab. There are similar open
source and university run services like Binder. They tend not to be as polished as Google but might
be right for you and do not require you to have a Google account.
3. You can read a Jupyter notebook if it has been uploaded to GitHub repository. Often these
notebooks also have little badges in them you can click, like “Open in Google Colab”. This book does
not, yet. But it will render the file even though you cannot run the code there.
4. Last but definitely not least, Jupyter Lab is the most fully fledged way to run Jupyter notebooks. It is
my current Python coding environment of choice. You can run it from the Terminal by typing jupyter
lab.
1∑
n
x̄ = xi
n i=1
This formula was not written with Markdown but with a special typesetting language called LATEX. Technical
papers are often drafted in LATEX as are many books in STEM fields. It is less common in social sciences,
but it is really handy. I wrote my dissertation in a combination of LATEX and Markdown.
The formula was given its own line because it was enclosed with $$ characters. Here is what the code looks
like: \ bar{x} = \frac{1}{n}
∑n \sum^{n}_{i=1}x_{i}. If we enclose it with single $ it will be a formula
inline, like so: x̄ = n1 i=1 xi . I use inline formulae for most numbers.
If you click on this cell, you can see the formatting underneath. Here we are just using MathJax, which is
a subset of LaTeX used for formulae. StackExchange have a nice brisk tutorial of the syntax of MathJax.
Basically, there is syntax for:
• Superscripts , x^i ei , subscripts, x_i ei ,
x3
• Fractions, with \frac{NUMERATOR}{DENOMINATOR}, as in yi (√ )
(x̄−xi )2
• Parentheses (using \left( and \right) to scale properly) as in n
∑ ∏
• Summation, product, and related symbols. \sum for , and \prod for .
• Greek symbols. Use their name for the symbol such as \alpha for α or \omega for ω.
• A host of diacritics, maths symbols, and fonts. Check the tutorial above for clear examples.
There are two basic types of data in Python, primitive data types and object data types. Primitives are
the basic building blocks of more complex data structures, much like how letters are the building blocks of
words and digits the building blocks of numbers. For example, each letter is a primitive data type in Python
called a character. An ordered list of characters is called a string object.
There are (as I understand it) five primitive types in Python:
• int for integer or whole numbers.
• float for floating point numbers. These are numbers with decimals in them.
• char for characters.
• byte for a kind of character interpreted by the computer, for example for storing image data.
• bool for Boolean, namely True or False
Usually you do not want to type out some primitive data every time. Instead you would use a label to
represent them. This label is called a variable. You assign a value to a variable and then you can use that
variable to ‘represent’ the value. See how this works below:
Hello world!
Variable names in Python start with an alphabetical character (a-Z) but can include numbers and under-
scores. There are some reserved words in Python. If you type them in Jupyter they usually show up in
green. If you try to make a variable one of these (like saying print = 4), you might have some unexpected
consequences.
Don’t worry, you won’t ‘break Python’, but you might mess up that specific instance of Python. In which
case, you can always start again by restarting the kernel from up in the menu. So I encourage you to toy
around with the code, get a feel for some errors or ways to tinker. Then if you feel it’s messed up, you can
always restart the kernel. Of course, as you get further into production or academic level code you will not
7
so easily want to restart the kernel, but by then you will likely have developed other approaches to tinkering
with your code.
2.2 Characters
The first primitive data type is the character. We don’t really interact with characters directly but instead
through their collection as a ‘string’ or str object. You saw above the string Hello world!.
Characters become a string when encased in quotes. There are three types of quotes: the single tick, the
double quote, and the triple tick.
Quote 1. The single tick.
print('One small tick for strings')
Quote 2. The double quote. This is not two ticks, but the ‘double-quote’ character, “. It looks like two
vertical ticks, but closer together than two ticks (e.g.,” vs ’ ’). Be careful with this character. Some programs
like Microsoft Word like to replace the generic ” with stylised quote characters that are different at the
beginning and end of a quote such as “these”. Python doesn’t like stylised quotes and prefers the generic ”.
print("Python doesn't mind the tick here")
The advantage of using double-quotes is that you can write phrases like “don’t bring me down” without the
program being confused when the string stops. If you then wrote "society", man, inside of a string, as in:
print("We all live in a "society", man")
Then it will get assume the string ends when the next double quote appears, which is not what was the
intention.
Quote 3. The triple tick. This is indeed three ticks in a row. Python will evaluate everything inside of the
three ticks literally. So if you want to have a string break across a line you can just type that in between
three ticks and it will not throw an error.
print('''I have no problems breaking
across the lines!''')
print()
Input In [3]
print("We all live in a "society", man")
^
SyntaxError: invalid syntax
Decoding errors is a bit of an art that will develop over time. Here, we can see indeed, the syntax is invalid.
But Python is not great at explaining why. This is where experience comes in. Over time you will get better
at deducing errors and cleaning them up.
Some pointers to help with errors:
• The bottom part is closest to your code. The error might have been triggered at many different layers
of abstraction and so the output looks long and intimidating. But it really refers to the line number
of a line that was in the process of running the code at that point and an indication of where the code
was when the error was raised;
• Try to print() out at many points to get feedback (then remove these print statements from working
code);
• Break the problem down: make the smallest possible changes and see if it affects the code;
• Examine online forums that received a similar error. But be mindful of what you throw into a search
engine. Your variable names might be noise or might be research data. Be cautious of online sources,
know that the most popular is not always the best (often it is biased by age of comment). Often Stack
Overflow comments are popular because of how they explain the content. Students sometimes rush to
paste a code snippet without understanding it, but the snippet does not work because it is an example
with slightly different details than in the students’ code. Be patient and read the explanation rather
than immediately copy-paste-hope.
In the case of the error above, the program used a caret character (^) to indicate that there should not be
an s directly after a closing quote. But that’s not actually a closing quote is it? It’s those darned inverted
commas used by sceptical academics everywhere. So in order to preserve those inverted commas we need to
“escape” them.
We use the backslash character to escape, so we should see a string that looks more like:
print("We all live in a \"society\", man")
Notice that the text colour is also a hint of when things are amiss. Observe the correctly formatted print
statements below:
print("We all live in a \"society\", man, but at least we can escape the quotes.")
We will continue to explore features of characters when we return to collections, since a string is a collection
of characters.
# An integer
x = 7
# A floating point number. Still a whole number, but the .0 makes it a float rather␣
,→than an integer.
y = 4.0
print ( type(x) )
print ( type(y) )
z = x + y
# See how z inherits the floating point number even though the value could be an␣
,→integer?
print (type(z), z)
var_float = float(var_int)
print(var_float, type(var_float))
var_str = str(var_float)
print(var_str, type(var_str))
So far, so good. We went from an int 7→float 7.0→str "7.0". But now we have a problem if we want to
go back to int. It will throw a ValueError. If you run it, then it will state ValueError: invalid literal
for int() with base 10: '7.0'. The problem here is that you might think that .0 means nothing. But
it’s actually something out of nothing. Python is not fussy that the value after the decimal point is zero or
some long string of digits. The matter is that it is something after the decimal and that particular conversion
does not like it.
var_int = int(var_str)
print(var_int, type(var_int))
It is often said that a language which requires you to specify the class of a variable ahead of time is a
“strongly cast” language. For example, in Java you cannot write x=5 unless you have previously defined x
as an integer like int x = 5.
Python is a “weakly cast” language meaning that it does not check the data type before assigning an object
to that variable. So in one line you could code x=0, making x an integer number. On the next line, you can
code x="fabulous" and Python does not have a problem with that.
x = 9
y = 4
print("x = ",x)
print("y = ", y)
print("x + y = ", x + y)
print("x - y = ", x - y)
print("x * y = ", x * y)
print("x ** y = ", x ** y)
print("x / y = ", x / y)
print("x // y = ", x // y)
print("x % y = ", x % y)
# Example 1
x1 = 1/3
y1 = 1/3
z1 = 1/3
x1 is 0.3333333333333333
y1 is 0.3333333333333333
z1 is 0.3333333333333333
# Example 2
x2 = 0.333333333333333 # <- Notice one digit short
x2a = 0.3333333333333333
y2 = 0.6666666666666666
If x2 + y2 is 0.9999999999999996:
0.9999999999999996
Will x2a + y2 be 0.9999999999999999:
1.0
# Example 3
x3 = 16/9
y3 = 7/9
print("Won't 16/9 minus 7/9 equal 9/9, which is the same as 1?\n",x-y)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [7], in <cell line: 4>()
2 x3 = 16/9
3 y3 = 7/9
----> 4 print("Won't 16/9 minus 7/9 equal 9/9, which is the same as 1?\n",x-y)
This is why we say floating point numbers are approximations of real numbers. π is a real number, but it
is also infinitely non-repeating. The computer then cannot load the full number of π (as it does not have
infinite memory), but it loads in an approximation.
When we calculate things in Python, we are accepting a certain loss of precision. It does not do fractional
maths. In Example 3, the variable x3 was first calculated as 1.7777777777777777 and stored as such.
We rarely encounter that level of precision, but I think its nice to get a sense of our limits.
• <var>.strip(): This is a useful command to remove whitespace from the beginning and the end of a
string. To remove only from the beginning use <var>.lstrip(). To remove only from the right, use
<var>.rstrip().
If you want to find out details about a method, you can check the help. There are several ways to do that
in Jupyter. The first is to run help(<var>.<method>). But don’t include the () at the end of the method
or it will first run the method and then query what was returned for help.
help("str".find)
Return -1 on failure.
The second way in Jupyter is to create a new tab (via the Launcher) and instead of selecting a notebook or
Terminal, notice that in the lower right corner is a type of file called “Show contextual help”. This will give
realtime help on what command you are using. It’s a second tab so drag it around Jupyter lab until you
find a spot where it is visible but not obtrusive. Finally if you are in a code cell and you place your cursor
inside a method and hit shif t→tab it should bring up the help for that variable as a tooltip.
example_string = "The quick brown fox jumps over the lazy dog"
To upper case: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
To lower case: the quick brown fox jumps over the lazy dog
Replacing 'o' with 'you': The quick bryouwn fyoux jumps youver the lazy dyoug
print("blue" + "berry")
print(7 + 9)
print("7" + "9")
blueberry
16
79
So if you have a variable, name and you want to insert it into a greeting, you can concatenate with
print("Hello " + name).
print("Your score was " + score + "out of 30, or " + score/30 + "percent")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [16], in <cell line: 3>()
1 score = 17
----> 3 print("Your score was " + score + "out of 30, or " + score/30 + "percent")
So we can format the number as a string ahead of time, but that would be unnecessary. Instead we can
format the string itself. There’s the classic way of doing this using the format() method. Then there’s the
new way (which I adore) using f-insertions. Let’s see them both since you will encounter both in the wild.
In both cases they use {} inside of string quotes to create a marker for where the variable should go in the
string. So for the example above it would be like: "Your score was {} out of 30, or {}.". See below
(you will see why I left out “percent” in a bit):
score = 17
2.6 Conclusion
In this chapter we went from the most primitive data types (int, float, char) towards more meaningful data
types. I showed how to convert data types, how to consider errors, and how to do some string manipulation.
These get more interesting when we have many strings, numbers, or calculations. Then we can put these in
a collection and start to ask questions of the collection itself. That is where we are headed next.
Chapter 3
Collections
3.1 Collections
Python has many ways in which to organise a collection of data. Computer scientists often search for
interesting ways to collect and structure data. Some reasons are to make its storage and retrieval more
robust and efficient. Later on you will find some structures work better than others for a task at hand.
Here we are going to focus on three main kinds of collections which are extremely common and impressively
versatile: list, set, and dictionary (or dict). Getting used to the logic of these three will enable you to
manage a huge amount situations in programming. Other, fancier, and more abstract packages will come
along for specific tasks, but these three are the most important.
The thing about collections that matters most is how they associate data. Each type of collection may
associate data in a different way. You might have guessed from the discussion so far that lists place data in
some ordered sequence. We would say that a list orders by position. A set, by contrast, does not really
have a notion of position. There’s not element guaranteed to be first or last when you print a set. Instead, a
set is ordered by inclusion. This means that some object or data is either in a set or not in a set. Finally
a dictionary is ordered by key. Keys are like an index for more data. We will show what keys mean after
covering lists and sets more fully first.
Virtually all collections in Python are iterable. This means that you can ask for one item from the set and
then keep asking until you get all the items. Sometimes the order might be different depending on the type
of collection, but you will definitely get through all of them eventually. We will iterate through collections
in the next chapter.
19
list_example = ["apples","bananas","cucumbers","durians"]
print(list_example)
print(list_example[1])
bananas
We can also count backwards through the list using negative numbers, like so:
print(list_example[-1])
durians
Lists have a range that goes from 0 to n − 1 where n is the number of elements. If you try to index an
element that’s out of the range, python will throw an error. To get the range, we can use a function called
len which is short for length.
Notice what we do below. We get the length, then use that as a variable. It will give us an error, but length
minus one won’t.
list_length = len(list_example)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [5], in <cell line: 2>()
1 # This should give an InderError
----> 2 print(list_example[list_length])
durians
The first number is the list index for the element we start. The second number is the index that we get up
to,but not including. This way, list_example[0:2] and list_example[2:4] will not return overlapping
lists.
Yet, we can also slice lists from the end rather than the front of the list by using negative numbers. -1 is
the final element, and this happy one :-1 will give us everything up to the final element.
list_example = ["apples","bananas","cucumbers","durians"]
print( list_example[:-1] )
print( list_example[1:-1] )
print( list_example[:])
• Add things:
– One at a time: Append an item to the end of the list with list.append(item)
– Adding any collection: extend a list with the items from any iterable collection with
list1.extend(list2).
• Remove things:
– Remove everything with clear: Want to keep the name but empty the list? list1.clear()
– Removing one item by index: pop a value from the list. By default it is the final element, but
you can pass an index to pop an element anywhere in the list. list.pop() or list.pop(4). This
is the same as del before the list, like __del list1[-1] or del list[4] except that deleting
doesn’t return the things you deleted whereas pop does.
– Remove an item by its value: remove the first instance of a value in the list with
list.remove("item").
• Sort things:
list1 = list1.extend(list2)
Because while extend() changes list1 it does not return list1. You can test this below by printing what
is returned from the list.
# Attempt 1. Extending a list - it returns none.
list1 = [1,4,9]
list2 = [4,5,6]
print(list1.extend(list2))
list1.extend(list2)
print(list1)
None
[1, 4, 9, 4, 5, 6]
Here are the other list methods in action. Each time I start with a fresh list1 = [1,2,3].
# Append
list1 = [1,4,9]
new_val = 70
print("Original:\t",list1)
list1.append(new_val)
print("Appended:\t",list1)
Original: [1, 4, 9]
Appended: [1, 4, 9, 70]
# Extend
list1 = [1,4,9]
list2 = [4,5,6]
print("Original:\t",list1)
list1.extend(list2)
print("Extended:\t",list1)
Original: [1, 4, 9]
Extended: [1, 4, 9, 4, 5, 6]
# Clear
list1 = [1,4,9]
print("Original:\t",list1)
list1.clear()
print("Cleared:\t",list1)
Original: [1, 4, 9]
Cleared: []
# Del
list1 = [0,54,31,5,77,-3]
print("Original:\t\t",list1)
del list1[-2:]
print("Deleted last two:\t",list1)
# Remove
list1 = [10,20,30]
print("Original:\t",list1)
list1.remove(20)
print("Removed '20':\t",list1)
# Sort
list3 = [7,3,1,2,3,4]
print("Original:\t",list3)
list3.sort()
print("Sorted:\t\t",list3)
Original: [7, 3, 1, 2, 3, 4]
Sorted: [1, 2, 3, 3, 4, 7]
One neat thing about a tuple is that you can split it up when you are working with it. So watch below how
I create a tuple with two elements, and then I assign them each to a variable.
xy_tup = (-1,3)
x,y = xy_tup
print(xy_tup)
print(x)
print(y)
(-1, 3)
-1
3
This will be handy later when we get sent tuples and we really want one part of the tuple or another.
3.3 Ordered by inclusion: the set
A set is a data structure that contains only unique values. So if you have a list like so: ex1 =
["Spain","France","Spain","Italy","Italy"] and you convert it to a set set(ex1), it will only be
the following: {"Spain","France","Italy"}. If you start with an empty set you can add more elements to
it. But if you add an element that was already in the set nothing changes.
s1 = set()
s1.add("Cherry")
s1.add("Lemon")
print(s1)
s1.add("Cherry")
print(s1)
{'Lemon', 'Cherry'}
{'Lemon', 'Cherry'}
s2 = [1,2,2,3,4,5,5,5,5,5,6]
print("As a list:\t",s2)
print("As a set:\t",set(s2))
As a list: [1, 2, 2, 3, 4, 5, 5, 5, 5, 5, 6]
As a set: {1, 2, 3, 4, 5, 6}
print("Lemon" in s1)
print("Pineapple" in s1)
True
False
Granted you can often ask if <element> in <collection> for lots of collections. But it turns out that there
are a multitude of ways of organising data in the computer. A set is faster than a list because of the way it
maps data to memory. A frozenset is faster still. It is like a set but cannot be altered (i.e. it’s frozen).
speed_list = [1,2,3,4,5,6,7,8,9]
speed_set = set(speed_list)
75.9 ns ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
241 ns ± 21.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
74.7 ns ± 2.46 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
50.2 ns ± 13.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
We used a “magic command” here for the first time. This is %timeit. It’s called “magic” because it
does something in Jupyter but not necessarily in Python generally. You will see a few of these peppered
throughout code. In this case we said “do what’s on this line 10000 times and report the average time”. Why
10,000? Because I used the -n flag for number of times and set it to 10000. You can remove that but the
program will probably do it a few million times and be a bit slower.
The first two are using the list. It took ~100 nanoseconds on my machine to look for the first element. Then
to get the last item in the list it took almost double the time (and that’s with only nine elements!). That’s
because a list looks one by one through the data structure. With a set, it has a way of checking if there
is a value at the memory address for 1 and it is either there or it is not. So that’s why it took about 50
nanosecond for the first element or the last element (or any element regardless of set size).
This huge difference makes a difference to our programming. Sometimes we want to search through a list to
find the element, but most of the time we just want the element in the fastest way possible.
print(f"setCount:\t{setCount}")
print(f"setOdd: \t{setOdd}")
print()
setCount: {1, 2, 3, 4, 5}
setOdd: {1, 3, 5, 7, 9}
set3 = {"Cherry","Lemon","Orange","Grape"}
new_flavor_list = ["Orange","Pineapple","Sarsaparilla"]
set3.update(new_flavor_list)
print(set3)
food_dict = {"breakfast":"porridge",
"lunch":"pizza",
"dinner":"stir fry"}
print(food_dict["lunch"])
pizza
print(error_dict)
print(error_dict["breakfast"])
{'breakfast': 'fruit'}
fruit
As a small programming aside, you will notice by now that not all Python commands need to be on the same
line. There are a few places where you can ‘naturally’ break a line and not confuse the Python interpreter.
One is after a comma. I tend to want my code to by shorter than 80 characters wide, so I use the comma
to create breaks and keep the code readable.
ex1_dict = {"fish":"salmon",
"mushroom":"enoki"}
ex1_dict["fruit"] = "apple"
print(ex1_dict)
keys = ex1_dict.keys()
vals = ex1_dict.values()
items = ex1_dict.items()
items[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [28], in <cell line: 1>()
----> 1 items[0]
list(items)[0]
('fish', 'salmon')
Also notice that in the items collection, the keys and values are represented like ('fish','salmon'). This
is the infamous tuple I mentioned above. You can’t change this value, but you can iterate through these.
Iterating will be featured in the next chapter.
food_dict = {"fish":"salmon"}
food_dict["fish"] = "cod"
food_dict
{'fish': 'cod'}
However, if you want to add cod, then clear that strategy would not work. You also cannot do
{"fish":"salmon", "fish":"cod"} as we established above with our breakfast dictionary. So what can
you do? You can create a new list and make that the value. Here is one approach”:
food_dict = {"fish":["salmon"]}
food_dict["fish"].append("cod")
food_dict
But in case you did not have "salmon" in a list in the first place, you can always just insert it into a list:
food_dict = {"fish":"salmon"}
food_dict["fish"] = [food_dict["fish"],"cod"]
food_dict
This last one might have tripped you up. But remember that food_dict["fish"] at the beginning of line
2 was "salmon", so that way we inserted it as the first entry in a list, with "cod" as the second entry. Then
we assigned this list (["salmon","cod"]) to the value of "fish" in the dictionary.
3.5 Conclusion
Herein we have seen a number of collections. We focused on three main ones, list, set, and dict. Yet, in
the course of working through these, we also encountered tuples, dict_items, dict_values, and dict_keys.
Yet, these other collections tend to work in pretty similar ways. Ultimately, if we have a collection we need a
way to access the data from that collection. That’s why I chose list, set, and dict. They exemplify three
key ways of ordering (and thus querying) for data: by position, by inclusion, and by key.
In the next chapter we will make use of collections through iteration. We will be able to do something for
each element in a collection, or only under certain conditions.
The exercises in Appendix A enable some practice with dictionaries, and particularly some objects that are
a messy of nested lists and dictionaries. This will start to resemble some of the messy data structures we
see in the real world, like JSON (which really resembles combinations of lists and dictionaries).
Chapter 4
One way to manage the flow of data through an algorithm is to use conditionals. This means that we will
evaluate some data. If the evaluation is true then we can do something. If the evaluation is false we can
do something else. Now in life there are many sides to every store. But in Python we tend to work with
strict conditions that are either fulfilled or not. For example we would have a set of things, characters =
{"Kermit","Piggy","Fozzie"} and if we ask "Kermit" in characters" the program will return True.
characters = {"Kermit","Piggy","Fozzie"}
"Kermit" in characters
You might have noticed that Kermit, Piggy, and Fozzie are all names of characters from The Muppet Show.
So in a sense you might say muppet in characters is true, but Python doesn’t know that. Python is not
giving a meaning to the text, it is simply evaluating the string of characters against another string.
The most common conditional in Python is probably the == comparator. This is two = characters in a row.
One equals character is what we use for variable assignment. Two characters is for comparisons. See below
for an example of comparisons using ==.
a = 42
print(a == 42)
b = 55
print(a == b)
print(a = 42)
31
4.1.1 Boolean operators
The primitive data type we spent virtually no time on in the prior chapter is the Boolean variable. This is
a variable that is either True or False. To get Python to return a Boolean, all you need to do is make a
comparison using a Boolean operator.
• == is used for comparison. Does X equal Y? x == y
• and is used to ask if two things are both true. x and y
• or is used to ask if either thing is true. x or y
• not as well as ! are used for not. not x
• > is used for left side greater than right side. x > y
• < is used for left side less than right side. x < y
• in is used for membership of element in set. x in y
• is is used to check if two labels point to the same variable x is y
x = "yes"
print("x:",x)
x_list = ["yes","no","maybe"]
print("x_list:",x_list)
y_list = x_list
print("y_list is x_list:", y_list is x_list)
x = 5
if x == 5:
print("Yep, x did equal 5, thank golly.")
With an if statement also comes an else statement. Below I am going to simulate a coin toss using the
random module. That module has a method called choice. You can probably follow all the rest by reading
the code and running it a few times.
import random
coin_sides = ["heads","tails"]
if random.choice(coin_sides) == "heads":
print("You win!")
else:
print("Sorry, better luck next time.")
As it happens, there might be more than two conditions you want to cover. One way to include multiple
conditions is to use an elif statement, which is combination of else and if.
coin_sides = ["heads","tails","rim"]
side = random.choice(coin_sides)
if side == "heads":
print("You win!")
elif side == "tails":
print("Sorry, better luck next time.")
elif side == "rim":
print("It landed on its side, what are the odds?")
Notice I had to put the walrus in parenthesis. That’s because I wanted to assign
random.choice(coin_sides) to side and not the full random.choice(coin_sides) == "heads" to
side. In the former case side would be one of our three choices. In the latter case side would end up
being True or False.
# String comparisons
print("Is a > b?")
print('a' > 'b')
print("\nIs A > b?")
print('A' > 'b')
Comparing numbers: zero is false, one is true, the rest are trouble
When comparing numbers, you can of course assess whether numbers are greater or less than each other
with ease. But what if you want to assess whether a number is true or false? In Python, the number 0
equals False, the number 1 equals True, and the rest are neither True nor False. This is quite strange to
say they are neither true nor false. But follow the code below to see this in action:
print("-1\t", -1 == False)
print("0\t", 0 == False)
print("1\t", 1 == False)
print("2\t", 2 == False)
print(np.nan == True)
print(np.nan == False)
Ok so nan like the numbers 2 and -1 is neither equal to True or False. Yet observe below:
if np.nan:
print("Nan evaluates as True")
else:
print("Nan evaluates as False")
Curiouser and curiouser. Below are some other empty strings. Again neither of them are equal to the
Boolean True or False values but yet they still somehow can be used in if statements.
print(None == True)
print(None == False)
if None:
print("None evaluates as True")
else:
print("None evaluates as False")
print("" == True)
print("" == False)
if "":
print("Empty quotes evaluate as True")
else:
print("Empty quotes evaluate as False")
records = ["John_Smith_131313.pdf",
"Bernie_Hogan_113113.pdf",
"Richard_D_James_112358.pdf"]
So we didn’t get all the way to changing the names here, but that is unnecessary to make the main point.
Notice that we did the same thing to each of the records iteratively. We took each record one after the
other, split the record at the underscores and printed the last element. We used the Python syntax of for
<element> in <collection>: <do something>. This syntax is called a for loop.
As a verb we would loop over a collection, but also as a noun, would refer to ‘the loop’ as the set of iterations.
So we could loop over a collection of items but also if we stop halfway through because of an error we have
just broke the loop.
print(record)
Sometimes the collection is ordered, such as a list, and sometimes the collection is unordered, such as a
set or a dictionary. Regardless, you are able to perform some action with each element of the collection
and know that once you’ve finished you’ve iterated through every element of the collection. The process of
going through all the items is called iterating.
The way to iterate through a collection in Python is to use a for loop. It is called a for loop because we
use it to do something for each element in a collection. In fact, the way it is written is meant to be similar
to written English.
Since we iterate through the collection, we need a temporary variable to represent the next item each time.
That was the record variable above. We could have named it anything, but record was useful to help us
understand what kind of item was in the collection. Often times people will use i for the iterator, like with
the following:
for i in range(5):
print(i)
Sometimes you will encounter _ instead of i. That’s meant to signify that while you are iterating through
a collection you are not interested in the variable. For example, imagine you wanted to print a counter for
each element in a list but only print the counter and not the value.
counter = 0
for _ in ["apple","banana","cherry"]:
print(counter)
counter += 1
Now to be clear, this is not if and then else. This is for and then else. It’s a bit of a strange name.
Perhaps it might be better if it was called after or once_finished, but they called it else.
4.2.3 Enumerate
Sometimes when you are iterating through a for loop it is useful to have a counter. Python does not provide
one by default. So you might end up doing the following:
list_of_fruit = ["apple","banana","cherry","date"]
counter = 0
for fruit in list_of_fruit:
print(str(counter),fruit,sep="\t")
counter += 1
A more pythonic way of doing this would be to use a special function called enumerate(). This function
takes in your collection and then during the for loop returns both a counter and one element at a time from
your collection. See enumerate in action below:
list_of_fruit = ["apple","banana","cherry","date"]
The enumerate function has an additional parameter start. The default value is 0 which is why
above we printed 0. apple on the first line. If we say start=1 inside the enumerate (or just
enumerate(<collection>, 1) then we will get a list starting from 1 instead.
list_of_fruit = ["apple","banana","cherry","date"]
One very common use of enumerate is to report something every nth iteration. The trick here is to use ‘the
remainder’ from integer division. Since we have a counter c, when we divide that counter by n using integer
division we will get a remainder. That remainder will be 0 every n times. See below where we will print
every fifth number:
fruits = ["apple","banana","cherry","durian","eggplant"]
So notice that it skipped over both cherry and durian because they had an r in there? That’s how continue
works. On the other hand, if we wanted to stop when we come across a word with r in it, we would have
used break like so:
fruits = ["apple","banana","cherry","durian","eggplant"]
In this case, the program stopped once it got to cherry, since that triggered the break statement.
4.2.5 Double loops
You can loop while inside of a loop. This is useful for thinking about data in a table form. Imagine you
have a table of records such as person and their test results:
Now imagine you want to calculate an average. You could either calculate the average per-test, which would
move down the columns or you could calculate the average per-student, which would move across the rows.
Both of these can be done with a double for loop. But the way the loop works will depend on the data
structure. It would be different if we have a list of test results (e.g., test1 = [9,7,7,8,5]) and we knew
that Alan was always the first result versus if we have a list of students and new that Alan was the first
student (e.g., Alan = [9,6,8,9]).
Below I will create a list-of-lists. In this example, the inner lists will be the scores per person. That means
we received a list of scores for each person one at a time. Then if we combine them we will have a table like
above.
results = [[9,6,8,9],
[7,7,7,6],
[7,6,4,9],
[8,9,9,8],
[5,7,8,7]]
We can print this table using a double loop. Let’s do this first before embarking on some more complicated
algorithms. First let’s just print it using a single loop to see what each object in results looks like:
So this printed each row as a list, complete with the [] on either side. But we want to print each element
in these lists.
I don’t like having that trailing comma in there, so I’m going to complicate this just a little bit. Watch
how the inner loop is for every element in the row except the last one with row[:-1]. Then I do something
different for the last one row[-1].
Just to reassure you, however, this is not where we stop with Python if we want to get an average. Later
we will find ways that are less cumbersome and look more like a one-stop <collection>.average(). But
it helps to understand how to build your own algorithms like this so that when we get to these other ones
you will know what to look for and how to circumvent it in case it does not do precisely what you expected.
Also, there will still be lots of times where you will have a collection, and another collection inside there.
student_averages = []
print(student_averages)
From here we can see that Alan’s score was 8.0. Carl had the lowest score with 6.5 and Dana the highest
with 8.5.
So that double loop was relatively straightforward. You had an object (the results list) and inside was
another collection (the test scores) and we did something for that. What if we wanted to calculate the
average per test with the results data? This might be a little more complicated. Later, we might find a more
amenable structure like a pandas table or numpy array. But that’s for another day. Today, let’s see how one
might go about such a challenge with the tools on hand.
num_students = len(results)
num_tests = len(results[0])
test_totals = [0]*num_tests
test_averages = []
print(test_averages)
That one is probably a bit tricky just starting out. But if you do not understand what I did in that algorithm,
try to break it down. One thing I like to do when learning what’s going on in a loop is to print some parts
of it and then insert a break statement so that something is only done once. For example if I forget what
exactly are student_scores I would insert a print(student_scores) inside the outer for loop and then
insert a break.
Notice that it just printed the keys. This makes sense in a way for a dictionary; it’s why you can ask 'salt'
in ingredients and it will return True, but 'parmesean' in ingredients will return False even though
we can see above that it is a value. The dictionary is focusing on the keys.
print('salt' in ingredients)
print('parmesean' in ingredients)
To get the dictionary to print the values you would want to use ingredients.values() which will return
the values as a list (or specifically a dict_list which is close enough:
for i in ingredients.values():
print(i)
We can ask for both the key and the value at the same time with <dict>.items(). So in a for loop we can
do the following:
my_list = ["allspice","basil","cumin"]
new_list = []
for i in my_list:
i = i.upper()
new_list.append(i)
print(new_list)
The list comprehension was able to reduce the for loop down considerably. List comps can be pretty complex
as well. I will leave out much of that complexity for now, but one thing worth introducing is how you can
include a conditional in the comprehension. Imagine you only want the new list to keep some of the items.
In such cases, just place a conditional after the for statement, like so:
new_list = [i.upper() for i in my_list if len(i) == 5]
print(new_list)
x = True
while x:
random_number = random.randint(0,10)
print(random_number)
If you leave out the x = False above, you will get an infinite loop since there is no stopping condition. You
can try this for fun but you will eventually run out of memory or your processor will overheat. Either one
is not good. You can stop an infinite loop by selecting Kernel -> Restart Kernel from the menu above.
One common code pattern for a while loop is when asking for user input. Imagine you have a series of
choices and a text prompt. You can loop through this prompt until you get a valid choice. First let’s observe
a simple user input. Then let’s put that in a while loop.
input_continue = True
while input_continue:
value = input("Type Y to leave the loop. Type something else to stay:")
if value.lower() == 'y':
input_continue = False
else:
print("I'm sorry, that was not a valid choice")
else:
print("You have now exited the loop")
1/0
The error is a zero division error. As we know from maths, we cannot divide by zero for zero means nothing
(sparing us all the complex proofs of this, which are really interesting, but not for here). The important
part is that we can catch the error and move on if we anticipate it and think it’s not going to affect future
code. We do this using try and except statements. See the example below:
import numpy as np
for i in range(-2,2):
try:
print(f"1/{i} is {1/i}")
except ZeroDivisionError:
print(f"Caught an error! 1/{i} is {np.nan}")
If you wanted to create your own exception, we would say you ‘raise’ an error. So for example, if your code
should throw an error when you receive a variable of the wrong type, you can raise a TypeError. This is
much less common in data analysis than simply catching errors.
x = 8
if int(x):
raise TypeError("The program did not expect an integer.")
Then by combining raising and catching errors, you can find ways to ensure that your code is more robust,
particularly when importing and processing raw data.
When we raise an error, that means we send an error object to the program. This is a special kind of object
that will usually end the program unless we except it.
In the 1/0 case above we just said except and left out the ZeroDivisionError it would catch all possible
errors. This is not necessarily what we want, since some of those errors might be legitimately concerning
while others are things we anticipate as a matter of course. But sometimes you might want to catch all
errors and then find out which one it was. Since the error itself is an object, it has properties we can work
with. See below:
try:
1/0
except Exception as err:
print(err)
print(type(err).__name__)
Normally properties of an object do not start with __ or “the dunder” for double underscore. When a
variable or method starts with __ it normally means it is only meant to be used by the system and not in
code. But Python still exposes these methods in case you want to do any advanced tinkering, like printing
the name of an error.
4.5 Conclusion
There are many ways to direct the flow of a program. Here we first saw the use of Boolean operators to
evaluate something as True or False. We then saw how we could use this in if statements in order to do
something under some conditions. Then we looked at loops, first the for loop, followed by list comprehensions
and while loops. In each case, we were able to direct the flow of a program by doing something for each
element of a collection. Finally, we explored errors, which are ways in which we can halt a program entirely
(as well as how to catch these errors so our program does not halt if we can anticipate that error).
These provide a very sound basis for a program. Yet they are not the last topic on flow within a program.
In fact they are not even the last word on loops. But now it would be worth thinking about how we can
collect many of these operations in a single place if we want to use them together repeatedly. This single
place with be a function. Then with a function we can really start to create programs with a variety of
structures depending on our needs. You can see this in the next chapter.
Chapter 5
Functions allow you to group together related operations in such a way that you can abstract away details in
your program. Two main use cases of functions come to mind: 1. Avoiding repetition and the bugs that can
come from inconsistent code; 2. Grouping together operations used elsewhere (like in list comprehensions
and equality comparisons).
x = 5
if x %2 == 1: list_result.append(x * 2)
else: list_result.append(x)
x = 7
if x %2 == 1: list_result.append(x * 2)
else: list_result.append(x)
45
x = 12
if x %2 == 1: list_result.append(x * 2)
else: list_result.append(x)
print(list_result)
def doubleIfOdd(num):
if num % 2 == 1:
return num * 2
else:
return num
numbers = [1,4,6,7,9,14,17]
print(new_numbers)
def multiplyTheValue(input_number):
x = input_number * 2
print("Value of x inside the function",x)
return x
x = 4
output_number = multiplyTheValue(x)
print("Result from the function:",output_number)
print("Value of x after the function:",x)
But x wasn’t the argument, input_number was. So what if we change input_number inside the function?
# Local / Global scope example 2: Argument sent to function doesn't escape the␣
,→function.
def multiplyTheValue(input_number):
print("Inside the function",input_number)
return input_number
x = 4
output_number = multiplyTheValue(x)
print("After the function",input_number)
print("Value of X after the function:",x)
We sent x to the function, at which point it became the value for the input_number parameter. So we could
use input_number inside the function, but then when we try to call it outside the function it throws an
error. To make it available outside the function is not an advised code pattern, but it is possible by using
the global flag.
def multiplyTheValue(input_number):
global x
x = input_number * 2
print("Value of x inside the function",x,id(x))
return x
x = 4
print("Value of x before the function",x,id(x))
output_number = multiplyTheValue(x)
print("Value of x after the function",x,id(x))
print("After the function",output_number)
In this third example, we can see that when we declare x is a global variable inside the function, that value
then becomes the value outside of the function. We double x inside the function and then later when we
print x it is no longer 4, it retains the value it had inside the function.
def tinyexample(word):
print("Tiny examples!", word)
tinyexample("Big ideas!")
word is the parameter, "Big Ideas" is the argument. That said, most people use these terms interchangably.
There are a number of different kinds of parameters. Some of these allow a function to take in a flexible
number of arguments, others define the type of argument that the parameter will permit. Parameters can
take default values. If the parameter has a default value, then one does not need to send an argument when
running the function.
Note that since a function can have a combination of different parameter types, the ones without defaults
come first. Let’s see how some different functions take multiple arguments below:
example1("example 1 argument")
example4("example",
var1="some data from v1",
var3="Maybe it's v3?",
var2="v2's valuedata")
# Example 5. Showing the possibilities (and dangers) of fragile code and weakly cast␣
,→variables.
def MakeDouble(value):
try:
output = value*2
except TypeError:
output = None
return output
print( MakeDouble(2) )
print( MakeDouble("Double") )
print( MakeDouble(["2"]))
print( MakeDouble({1:4}))
def noReturn():
pass
print(noReturn())
if noReturn():
print("Did it work?")
else:
print("Oh right, None evaluates to false.")
class Pizza:
def __init__ (self):
self.toppings = []
self.base = 'classic'
self.sauce = 'tomato'
p = Pizza()
z = Pizza()
Now we can consider the pizza object as a combination of multiple other objects that all work together. A
shopping cart, for example, might be a class that includes a list of items, a discount code, and an identifier
for the customer that owns the shopping cart. Admittedly, for something like pizza or a shopping cart we
can also get away with just using a dictionary. That is, we could have simply written:
pizza = {toppings:[], base:"classic",sauce:"tomato"}
pizza[toppings].append("red peppers")
pizza["base"] = "thin and crispy"
So what is the advantage of using a class rather than this structure? It depends on the purpose. For simple
data transfer, actually it is nice to just keep it as dictionaries and lists. Later when we look at JSON
files from the web we will see how they are essentially just collections of lists and dictionaries. But when
programming, it is useful to be able to have a structure to the various objects that are related to each other.
This structure can give some sense to the objects as well as ensure that they all work in sync. For example,
what if we want to manage two pizza orders? Will we create another variable called pizza2?
Below I will show two approaches to printing off a receipt. Compare how I would do it for a dictionary like
above, and then for a class:
cart["items"] = ["Turntable","Microphone","Keyboard"]
cart["code"] = "HAPPY2020"
cart["customer"] = "Tom"
class Cart:
def __init__(self):
self.items = []
self.code = None
self.customer = None
def receipt(self):
message = f"Welcome {self.customer}\n\nYour items:\n"
message += "\n".join(self.items) + "\n"
if self.code:
message += f"Discount code:{self.code} applied"
return message
So above is just the class file. From here we can see a couple differences. The first is that the receipt
function is inside the Cart class. The second is that when we are referring to objects that belong to the Cart
class inside of the class definition we refer to them as self.<object>. So __init__ is never really called
directly, you never say x = Cart.__init__(), instead you initialise by saying x = Cart(), which then will
automatically run the __init__ method. In this case, it will create three internal variables, self.items,
self.code, and self.customer, and give them some values. Although this seems a little overkill compared
to the nested dictionary, it creates more of a structure to work with. Then we can create multiple cart
instances, as can be seen below.
x = Cart()
x.items = ["Turntable","Microphone","Mixer"]
x.code = "HAPPYSPINNING"
x.customer = "Chuck"
print(x.receipt())
Compare how the receipt was printed this time with the code above. We abstracted away the details of
printing to the receipt() method of the Cart class, which we defined elsewhere. We were still able to access
the objects in the Cart class, but instead of self.items, we first instantiated an object called x, and then
used x.items. Some classes can be fussy and expect you to use a dedicated method to get these objects,
like x.get_items(). Other times classes allow you to access the objects directly. It’s a bit of trial and error
as well as checking in on the docs for a particular package.
Below I will create a second object just to demonstrate how we can have separate Cart objects and use them
together in a print() statement.
y = Cart()
y.items = ["808 Drum Machine", "Keyboard", "Laptop"]
y.customer = "Caterina"
print(y.receipt(),x.receipt(),sep="\n###########\n")
class Trolley(Cart):
def __init__(self):
Cart.__init__(self) # observe what happens if you remove this!
self.post_code = "OX1 3JS"
def delivery(self):
message = "Your basket currently includes:\n"
message += "\n".join(self.items) + "\n"
message += "It will be delivered to " + self.post_code
return message
z = Trolley()
z.items = ["Cables","Cassette Player"]
z.placename = "OII"
print(z.receipt())
print(z.delivery())
df = pd.DataFrame(columns=["name","age"])
You can already notice that pandas is a library. In this library, which we have imported under the name
pd for short, is a class called a DataFrame. By calling df = pd.DataFrame() we are creating an instance
of the DataFrame class called df. By using cols=["name","age"] we are sending these two values to the
DataFrame.__init__ method. Thus, when it initialises the DataFrame object df, we will have a table with
two columns, name and age. See below (notice that it will possibly run slow the first time you import
pandas).
import pandas as pd
The DataFrame in this case is now an empty table. To create a table with data or to manipuate data is
outside the scope of this book. Rather it is where we start off in the book “From Social Science to Data
Science”.
5.3 Conclusion
Now we can see how programming can become pretty complicated, with objects referring to other objects
and other functions or methods all over the place. Often times, when I’m trying something new with
programming I often have check the documentation or print a lot to get a sense of what methods an object
has available or simply to determine what type of object was returned from some method or function. Being
able to understand how to query an object or manipulate it will be an important skill moving forward in
Python and in giving your scripts some structure. This structure is not merely for its own sake. It helps to
create code that is more reusable and robust. By structuring our code we are structuring our ideas about
data. That is good if we want to do something repeatedly or consistently across many cases.
The last chapter does not expand our basic programming knowledge much. Instead, the next chapter will
focus on how to get out of Jupyter by writing Python scripts as well as by learning the basics of how to read
and write files.
Chapter 6
Up until now we have only used Jupyter notebooks and stayed pretty closely in this environment. But we
will need to branch out eventually. This involves learning a number of features about the operating system
and how to interface with it. We will also learn how to create a file, write data to it and read the data from
it.
Windows, MacOS, Linux, Chrome or iOS, all computers have an operating system. The operating system
runs programs, manages memory, and allocates computer tasks so that multiple programs can run at the
same time. An operating system is what tends to differentiate modern computing from mechanical tasks.
The OS is meant to be a general purpose means for accomplishing tasks.
Virtually all operating systems operate on Von Neumann architecture. This means they will have an
input device (typically a keyboard but could be any sensor), a processor that does calculations, memory
(typically fast memory such as RAM and slow memory such as a Hard Drive for more long term storage)
and an output device (often, but not necessarily a screen). Other architectures based on state machines and
quantum computing exist but are often experimental and not quite relevant at the moment.
Most operating systems these days are either Windows-based or *nix-based (i.e., Unix or Linux). Macintosh
used to have its own operating system kernel, but has switched roughly 20 years ago to Unix for Mac OS X.
There are many differences under the hood between Mac and Linux but herein, we will focus on the many
similarities related to file storage. These operating systems store files on a hard drive which can be accessed
using the file path.
6.1.1 Navigating the file path with Python and in the terminal
Files, or more properly speaking file indices, are stored in hierarchical structures. The topmost structure
would be considered the root directory. Under the root directory would be child directories. Even though
it can be characterised as a ‘tree’, we tend to use the terms children and parent. Perhaps think of it more
like a family tree, a very peculiar family tree.
55
Note: File systems are different on *Nix and Windows.
Windows was a small operating system back in the day. It wanted to preserve backwards compatibility with
earlier DOS systems and IBM systems, but the / was already used in systems as a ‘switch’ meaning it would
not be easy (so it seemed) to use it to denote directory structures. Thus, \ was used instead since it looks
similar. It didn’t seem like a big deal at a time, but it has certainly become a nuisance for programmers
ever since. Along with some other quirks, such as the drives as having letter names, means that there are a
number of little differences between Windows and *Nix users.
In the code herein, I want to have notes that are as generic as possible across systems as well as notes that
are ‘robust’. So I will be using some Python libraries to ensure the code stays consistent. In particular we
will be using the os module, as you will likely see a lot of code snippets that use this module. I will also be
using the pathlib library, which has been available since Python 3.4. I think it is more tidy and coherent
than the earlier os module, but it is less commonly used still.
In the os library is a variable called os.sep. If you print it on Windows it will print a \ and if you print on
a *Nix system it will print /. You can use this to build a string representing path that are operating system
independent. However, a more robust way is to use pathlib to create a Path object, for example:
from pathlib import Path
p = Path(".")
open(p / "file.txt")
Notice that if we initialise an object called p with a path of "." that means we are initialising it with a
location of the current working directory. Then we can write in Python strings and / for folder dividers, yet
the / here works regardless of windows or mac because it is not literally a "/" character. It is outside the
string and represents folder divider more generally.
import os
print( os.getcwd() )
print("The separator on this computer is: ", os.sep)
This means that if we were to do something like create a file in Jupyter then the file will, by default, be
written to this specific directory. Watch that happen below:
You should now see a new file named ‘example_file.txt’ in the same directory as these notes. To write it to
a different directory, you can specify the absolute path to that directory. So in my case, since I know that I
have an account called work, then I can create a file under that directory using the following:
That is a pretty bad form, for two reasons. 1. You probably do not have a /Users/work/ path on your
computer. So when you run the above you would get a FileNotFoundError. 2. Writing example files all
over your computer will create a mess.
To solve the second problem, I will first clean up my file with the help of os.remove, which will delete a file
given the path name, or throw an error if the file is not found.
try:
os.remove("/Users/work/example_file2.txt")
except FileNotFoundError:
pass
To solve the first problem is a little tricker, since I want a solution that will work for both my computer
and yours. The first thing we can consider, instead of absolute paths are relative paths. The simplest
relative path is ., which means “here”. So if you see ./file.txt, that’s the same as the file underneath
this directory. If you see .. that means the parent directory. In my case, since this is in a folder called
Python, under 2021MT, if I wanted to create a new file in the parent directory, /Users/work/OneDrive -
Nexus365/Teaching/2021MT/, I could write:
Where this comes in handy is in having a folder for data separate from your notes. For my data courses, I
recommend having a folder structure like the following:
<course_name>
|- notebooks
|- data
|- output
|- other
It’s always a challenging moment when I have to help a student who has their entire course (and most others)
running as notebooks from their downloads folder. It might seem like a good idea for the first file, but after a
dozen or so files things will get messy and lost. But importantly, then even if you are working from multiple
different files, you always know where to put your data and your output (like charts and tables).
This StackOverflow conversation discusses many of the tricky aspects of getting the path of a specific Python
file.
6.1.2 How to navigate the file system through a terminal. (*Nix edition)
This section will be done in a terminal window, so you’ll have to switch back and forth. If I write a command
it will be preceded by $ and if I write the expected output, it will be preceded by >. So we will want to open
a terminal and type ls. As in don’t type the $, just:
$ ls
That should list all the files in the current working directory for the shell. Notice that it is probably not the
same directory as the one you saw above. But let’s navigate to our current Python working directory. To
do this we use cd <desired directory>. In my case (since I have learned the location above) it is:
$ cd /Users/work/OneDrive - Nexus365/Teaching/2021MT/Python
Removing a file.
You can delete a file with rm command.
$ rm temp_file.txt
You can use * to pattern match in the shell. Thus, you can delete multiple files that match a pattern with
$ rm *.txt
Removing a directory
Directories have files in them. This means that on Windows they cannot be removed in the shell without
also removing the files.
$ rm -r <directory_name>
will only work on an empty directory. If it has files you will need to add the -R (or recursive) argument after
the file name.
$ rmdir -r <directory_name>
6.1.3 How to navigate the file system through PowerShell (Windows edition)
This is a rewritten section to reflect the fact that the Windows PowerShell should now include Python with
the Anaconda install and even launch Jupyter Lab. Where possible, I would stringly encourage you to use
the PowerShell over the Anaconda Prompt or the standard cmd command line. You can even run Jupyter
directly in the PowerShell by typing jupyter lab directly in the Anaconda Power Shell.
If I write a command it will be preceded by > and if I write the expected output, it will be preceded by |.
So we will want to open a console or ‘command line window’ and type dir. As in don’t type the >:
~~~ > dir ~~~
That should list all the files in the current working directory for the shell. Notice that it is probably not the
same directory as the one you saw above. But let’s navigate to our current Python working directory. To
do this we use cd <desired directory>.
In this case, $null is the empty character (meaning send ‘nothing’). Normally it would send it to standad
out (i.e. to the terminal screen). But by using > in the terminal we are saying send it to a file.
Removing a file.
You can delete a file with Remove-Item or rm command. ~~~ > Remove-Item temp_file.txt ~~~
You can use * to pattern match in the shell. Thus, you can delete multiple files that match a pattern with
~~~ > Remove-Item *.txt ~~~
This Microsoft help page goes into greater detail on how to remove items under a variate of conditions.
rmdir <directory_name>
will only work on an empty directory. If it has files you will need to add the /s argument after the file name.
However on PowerShell if you type the first character it will then automatically encase it with quotes for
you.
Pay attention to the argument in the open() function. For writing it is 'w', for reading it is 'r' and for
appending it is 'a'. There are others as well, but we won’t be using them today. They are primarily for
bytestrings which is relevant if you are writing image data or other streaming data rather than characters.
You can review those in the doc strings for the open command
# Here is the first line from the Tao Te Ching (trans. Stephen Mitchell)
# It reminds us that in life we can only give guidance but not specific instructions.
# Writing:
fileout = open("example_tao.txt",'w')
fileout.write(str_to_be_written)
fileout.close()
str_to_be_appended = '''
The unnamable is the eternally real.
Naming is the origin
of all particular things.'''
# Appending:
fileout = open("example_tao.txt",'a')
fileout.write(str_to_be_appended)
fileout.flush()
fileout.close()
filein = open("example_tao.txt",'r')
for i in filein:
print (i)
That seemed to print every line and then a space, unlike what happened above. Why is that? It’s because it
prints the entire line including the new line character at the end. Remember from day 1 that we can remove
characters from a string using [:-1]. We can use this to remove the last character. However, sometimes
that doesn’t work as intended (if there’s a \r\n for example, which is often the case with excel documents).
Luckily, there’s a string method called strip for removing whitespace characters from the ends of a string.
As with most methods (outside of those pesky lists), it returns the cleaned string rather than altering the
variable in place.
To remove whitespace from both sides:
newvar = strvar.strip()
To remove it only from the left:
newvar = strvar.lstrip()
But what we really want is to remove the new lines on the right:
newvar = strvar.rstrip()
filein = open("example_tao.txt",'r')
for i in filein:
print (i.rstrip())
print(filein.closed)
Now while this works, it is not necessarily the most robust or Pythonic way to open a file. For example,
we have created a file opener, but we haven’t closed it when we finished. Generally, you’ll want to close the
file when you’re done with it using <filein>.close(). However, if you are doing something where you are
reading the file line by line, you can condense this by using a with statement, such as the following. The
with statement will automatically close the file when you exit that block of code.
print(filein.closed)
In [1]:
Fun fact: ipython is the ancestor of JupyterLab.
import os
fileout = open("test.py",'w')
fileout.write(filetext)
fileout.close()
print("The file can be found by navigating with the following command:\n\ncd \"{}\" ".
,→format(os.getcwd()))
fileout = open("example_argv.py",'w')
fileout.write(filetext)
fileout.close()
print("The file can be found by navigating with the following command:\n\ncd \"{}\"".
,→format(os.getcwd()))
Now go to the console and navigate to that folder. Then you can run the command below (omitting the
‘>’). It will work if there’s a example_argv.py file in that folder.
> python example_argv.py arg1 arg34 YetAnotherArgument "is this also an argument?"
You can see that if you encase words in quotes it is counted as one argument. Then the program will be
able to access these as a list of arguments in the sys.argv object. The first one (sys.argv[0]) will be the
file name. The next few will be the argument strings written in the commend.
print(Path('/Applications').exists())
print(Path('dfasdfjlsfgg/fsadgag').exists())
# This may or may not work on windows. If it does not, please continue.
import os
print(os.popen('pwd').read())
Notice that we did not use os.system to run the command in the terminal. If you run a temrinal command
in os.system it will run the command but it will only return a 0 for successful or -1 for unsuccessful. Since
we wanted the result from the terminal we can use os.popen which opens a pipe from the terminal to here,
so the result of the terminal gets piped in to Python. Then we read that result. It’s a bit overcomplicated,
which makes sense that Python would find ever simpler ways of doing it.
The first is to use the os module directly. Notice that it is now cwd (for current working directory) and
not pwd (for print working directory). Because this is an old, old command dating back to Python 1.0, it
doesn’t use underscores like more recent commands.
print(os.getcwd())
This works well, but there is one problem with this approach. What gets returned is not a path, per se, as
in a thing that you can navigate, but a string that represents the address to that path. Just watch:
result = os.getcwd()
print(type(result))
print(result.split(os.sep))
We used os.sep since that will be the correct separator, whether it is Windows or Mac, but it split the
string.
Using pathlib
You see you can transform this path just like a string. What might help us is to have a path as an object
where we do things like navigate the folder structure. Then you can ask that object for the stem (meaning
the part of the path with the filename) or add to it using the directory separator \. You can check what is
the directory ‘above’ this one or navigate to a directory below. Notice that this directory separator works
the same on Windows and Linux. Let’s see that below:
from pathlib import Path
print(Path.cwd())
Path.cwd()[:5]
curdir = Path.cwd()
print([mthd for mthd in dir(curdir) if mthd.startswith("_") == False])
if curdir.is_dir():
for c,pp in enumerate(curdir.iterdir()):
if c > 5: break
print(pp.name)
If you want to check for a specific kind of file, you can use the glob command, which refers to global.
There is also the full glob module, but nowadays I actually recommend switching to <pathobject>.glob()
instead. So I strongly suspect that unless you changed your working directory, this notebook is in there and
will end in .ipynb. Do you have other notebooks in the same folder? Let’s inspect with a wildcard search:
for i in curdir.glob("Ch.0*.ipynb"):
print(i.name)
Parts of a path
What do you call the specific parts of a path? You can query for the parent, stem, suffix, and the name
(unshown, but it’s stem + suffix).
for i in curdir.glob("Ch.0*.ipynb"):
print(i.stem, i.suffix, sep=" >>> ")
else:
print(f"These are in:{i.parent}")
The parent then is both the directory above and a way to get the part of the path address other than the
file name. Thus, you can navigate to the folder above this one and discover what files are in there, too.
for i in curdir.parent.iterdir():
print(i.name);
else:
print(f"These are in: {i.parent}")
for i in curdir.glob("**/*"):
if i.is_dir():
print(i)
else:
print("FILE_REDACTED", i.suffix,sep=None)
The way that operation worked was to use ‘recursion’. This meant that it would do the same operation
within itself. So for each pathobject, if it was a file it would just return it, but if it was a directory it would
then start again, list a file if it was a file, but if it was a directory it would start again, and so forth. One it
ran out of directories to list it finishes.
(curdir / "temp").exists()
if (curdir / "temp").exists():
(curdir / "temp").rmdir()
print("I removed a directory called temp")
else:
(curdir / "temp").mkdir()
print("I made a directory called temp")
6.5 Conclusion
So you might be a little terrified at just how much you can get away with in Python. You can read and
write files all over your computer. Your operating system will sandbox some of this but you really are out
in the wild here with the ability to read and alter files on your computer. Hopefully, this skill will help you
think of small Python projects you might do on your own.
This is pretty much it for the start of the journey. With these skills you can set out to learn in a variety
of directions. The next chapter signposts some places to go. And to remind as usual, in the appendices are
exercises. There are no specific exercises for this chapter, but this chapter will really benefit you for the
longer exercises which do ask you to think about writing files as well as running scripts.
Chapter 7
Where to next?
69
into different categories. And this is a very partial list.
7.1.1 Websites
Some websites are especially thorough when teaching Python, with pages dedicated to single topics. If you
find that the coverage of a topic here (which is often just a paragraph and a single example) is not enough,
it’s worth having a look at the following sites:
• w3schools (https://www.w3schools.com): They have very extensive examples for many specific
Python concepts, but many other web technologies as well. The examples are very sparse though.
With Python you are often getting a clear view on a single concept rather than a sense of how this
concept fits into a larger whole.
• realpython (https://realpython.com/): This one is similar to w3schools. It locks away some advanced
content for paid subscibers, which is a shame. This tends to be the case for a lot of sites.
• Towards Data Science (https://towardsdatascience.com/): This is presently a blog on Medium, so
it suffers from the same paid content issues. But I think that the blog posts on this site tend to be the
right size for me. Digestable and often well-contextualised. It’s not painfully slow and repetitive, but
is still slow enough to be clear (at least for topics that are just at the cusp of where my own learning
is).
• Data Carpentry (https://datacarpentry.org/): This non-profit community-driven site has some ex-
cellent tutorials on all skills levels in Python. It think it’s both a great resource for it’s content but also
of how it’s produced. That said, it is not as thorough as w3schools or realpython. It’s more practice
based and for specific concepts. That being said it is a great place to learn about some things just
beyond this book.
• Jake Van Der Plas’ Python Repositories. Jake was early out of the gate with a book that doubles
as a Jupyter notebook. He’s the author of a Whirlwind Tour of Python, which is very close in spirit
and style as this book, and the Python Data Science Handbook, both from O’Reilly. I really want to
cheer on these two books especially. Similarly written in Jupyter Notebooks and available on GitHub,
they are a great complement, though perhaps with a little less social science flair.
• Twitter: Twitter can often surface some real gems that might be hidden otherwise. Follow those
practicing the sort of skills you want to learn. If there’s an author of a package out there you find
useful, check out their feed. Academics are particularly keen to share code and notebooks on Twitter.
Beyond this, tons of social media sites will have resources, news, and discussion about programming.
Short Questions
Below are some short exercises to check your knowledge of the topics introduced in each of the chapters.
Each section corresponds to one of the prior chapters in the book. After these are two more appendices.
Appendix 2 is just this appendix but with some example code for answers for the questions below. Finally,
in Appendix 3 is is a series of longer creative exercises that you might want to attempt with skills from this
book. They are marked by which skills you would reasonably need to try your hand at the exercise.
In many cases I have provided some starter code and you should finish that code. You’ll see where you
should finish with an ... or a similar sort of marker.
Debug one
Debug two
73
Debug three
greeting = ''
name = ''
origin = ''
destination = ''
st2 = ...
# Third using f insertions, as in print(f"{var}{var}")
st3 = ...
a a a a a
c c c c
r r r r r
o o o o
b b b b b
a a a a
t t t t t
s s s s
Using only a variable word = "acrobats", string insertions, spaces, and lists, try to print a reproduction of
the poem.
In this version, the answer below should be done without for loops or if statements, which means it will
likely have some repetition. Your goal is to minimise that repetition even if you can’t eliminate it.
word = "acrobats"
print((word[0] + " ")*5)
...
...;
muppet_text = '''
name gender species first_appearance
Fozzie Male Bear 1976
Kermit Male Frog 1955
Piggy Female Pig 1974
Gonzo Male Unknown 1970
Rowlf Male Dog 1962
Beaker Male Muppet 1977
Janice Female Muppet 1975
Hilda Female Muppet 1976
'''
muppet_list[0] = ...
muppet_list[1] = ...
print(muppet_list)
# Answer
muppet_header = ...
muppet_data = '...'
Transform each line into a dictionary entry so that the whole dictionary will look something like this:
muppet_dict = {"Fozzie":["Male","Bear",1976],
"Kermit": ...,
...}'''
To create this dictionary you might need to repeat lines of code while only changing the indices.
This will be the last repetitive code example to complete. I will give fewer instructions here. If you know
loops, you can try them here, but I am assuming you have not skipped to chapter 3. In case you have, know
that I provide two answers to this in the next appendix. One with and one without loops.
# Answer
m_dict = {}
m_dict[muppet_data[0][0]] = ...
m_dict[muppet_data[1][0]] = ...
...
print(m_dict)
print(f"It is {len(m_dict.keys())==8} that the muppet_dict has 8 keys.")
Then take the data from user_input and print a profile of the muppet in the following form: Character:
Fozzie Profile: Gender: Male Species: Bear First Appearance: 1976 Consider printing a list of all muppets
(i.e. all keys from the dictionary) before asking for user input so the user can get the correct spelling. You
might find other ways to make this robust, especially after reading in the later chapters.
user_input = input("Which muppet do you want to profile:")
print(...)
ex_list = []
for i in range(1,10):
ex_list.append(i)
# List comprehension
lc_ex_list = ...
every_second_list = []
for i in range(1,10):
if i%2 == 0:
every_second_list.append(i)
# List comprehension
lc_every_second_list = ...
powers_of_two_list = []
for i in range(10):
powers_of_two_list.append(i**2)
# List comprehension
lc_powers_of_two_list = ...
# Dictionary comprehension
dc_new_dict = ...
word = "acrobats"
...
muppet_text = '''
name gender species first_appearance
Fozzie Male Bear 1976
Kermit Male Frog 1955
Piggy Female Pig 1974
Gonzo Male Unknown 1970
Rowlf Male Dog 1962
Beaker Male Muppet 1977
Janice Female Muppet 1975
Hilda Female Muppet 1976'''
while ...:
user_input = input("Which muppet do you want to profile:(x to quit)")
...
break
def burger_order():
...
return
# Testing code. Check the output of this code with the strings provided.
default_burger = burger_order()
print(default_burger)
# output should be:
'''
***Burger Order***
Bun: white
Patty: beef
'''
Bun: white
Patty: beef
Extras:
- cheese
'''
print(super_burger)
# output should be:
'''
***Burger Order***
Bun: white
Patty: chicken
Extras:
- lettuce
- tomato
'''
health_burger = burger_order("gluten-free","veggie",["lettuce","tomato","pickle"])
print(health_burger)
# output should be:
'''
***Burger Order***
Bun: gluten-free
Patty: veggie
extras:
- lettuce
- tomato
- pickle
''';
Some extensions to this include:
• Give each order a number. Try to remember the previous order number.
• What about using wildcard kwargs arguments in order to allow for any topping?
• What about set burger types? How might these be best expressed?
Appendix B
These are in a different sheet so you can avoid them until you need them.
Debug one
Debug two
Debug three
83
B.3.2 Making a greeting
With this exercise, you should learn about string insertions. We will do them three ways: 1. Using a + to
concatenate the strings 2. Using "<str>{<VAR>}".format() 3. Using f"{<var}"
All three different approaches should print:
<greeting>! My name is <name> and I'm from <origin>. Someday I hope to get to
<destination>, got any suggestions?
Remember you can check on https://pyformat.info/
# Answer
st1 = greeting + "! My name is "+ name + " and I'm from " + origin + ". Someday I hope␣
,→to get to " + destination + ", got any suggestions?"
st2 = "{}! My name is {} and I'm from {}. Someday I hope to get to {}, got any␣
,→suggestions?".format(greeting, name, origin, destination)
st3 = f"{greeting}! My name is {name} and I'm from {origin}. Someday I hope to get to␣
,→{destination}, got any suggestions?"
a a a a a
c c c c
r r r r r
o o o o
b b b b b
a a a a
t t t t t
s s s s
word = "acrobats"
print((word[0] + " ")*5)
print(" " + (word[1] + " ")*4)
print((word[2] + " ")*5)
print(" " + (word[3] + " ")*4)
print((word[4] + " ")*5)
print(" " + (word[5] + " ")*4)
print((word[6] + " ")*5)
print(" " + (word[7] + " ")*4)
muppet_list[0] = muppet_list[0].split("\t")
muppet_list[1] = muppet_list[1].split("\t")
muppet_list[2] = muppet_list[2].split("\t")
muppet_list[3] = muppet_list[3].split("\t")
muppet_list[4] = muppet_list[4].split("\t")
muppet_list[5] = muppet_list[5].split("\t")
muppet_list[6] = muppet_list[6].split("\t")
muppet_list[7] = muppet_list[7].split("\t")
muppet_list[8] = muppet_list[8].split("\t")
print(muppet_list)
# Answer
muppet_header = muppet_list[0]
muppet_data = muppet_list[1:]
# Answer
m_dict = {}
m_dict[muppet_data[0][0]] = muppet_data[0][1:]
m_dict[muppet_data[1][0]] = muppet_data[1][1:]
m_dict[muppet_data[2][0]] = muppet_data[2][1:]
m_dict[muppet_data[3][0]] = muppet_data[3][1:]
m_dict[muppet_data[4][0]] = muppet_data[4][1:]
m_dict[muppet_data[5][0]] = muppet_data[5][1:]
m_dict[muppet_data[6][0]] = muppet_data[6][1:]
m_dict[muppet_data[7][0]] = muppet_data[7][1:]
# Example answer
gen = m_dict[user_input][0]
sp = m_dict[user_input][1]
fa = m_dict[user_input][2]
print(f"Character:\n\t{user_input}\nProfle:\n\tGender: {gen}\n\tSpecies: {sp}\n\tFirst␣
,→Appearance: {fa}")
ex_list = []
for i in range(1,10):
ex_list.append(i)
# List comprehension
every_second_list = []
for i in range(1,10):
if i%2 == 0:
every_second_list.append(i)
# List comprehension
lc_every_second_list = [i for i in range(1,10) if i%2 == 0]
powers_of_two_list = []
for i in range(10):
powers_of_two_list.append(i**2)
# List comprehension
# Dictionary comprehension
In addition to these, just a reminder that all of the exercises in the previous section (for the chapter on
collections) that have repetitive code can benefit from loops. Have a look at the answers for these and see
if you can refactor them to use loops. The answers with loops are provided below here
Concrete poetry with a for loop
muppet_text = '''
name gender species first_appearance
Fozzie Male Bear 1976
Kermit Male Frog 1955
Piggy Female Pig 1974
Gonzo Male Unknown 1970
Rowlf Male Dog 1962
Beaker Male Muppet 1977
Janice Female Muppet 1975
Hilda Female Muppet 1976'''
m_dict = {}
header_row = True
m_dict
m_dict = {i.split("\t")[0]:i.split("\t")[1:]
for i in muppet_text.strip().split("\n")[1:]}
print(m_dict)
Making the profile display more robust
Try then to do the profiling code with a while statement for user input. Here you can now use elif statements
to do somethings in different cases, such as check for valid input and keep going until the user types quit
or x, etc.
while True:
user_input = input("Which muppet do you want to profile:(x to quit)")
if user_input.lower() == "l":
print ("\n".join(m_dict.keys()))
elif user_input.lower() == "x":
break
elif user_input in m_dict.keys():
gen = m_dict[user_input][0]
sp = m_dict[user_input][1]
fa = m_dict[user_input][2]
print(f"Character:\n\t{user_input}\nProfle:\n\tGender: {gen}\n\tSpecies␣
,→{sp}\n\tFirst Appearance: {fa}")
else:
print("That was not a valid name. Type L to see names of muppets.")
return receipt
# Testing code. Check the output of this code with the strings provided.
default_burger = burger_order()
print(default_burger)
# output should be:
'''
***Burger Order***
Bun: white
Patty: beef
'''
Bun: white
Patty: beef
Extras:
- cheese
'''
print(super_burger)
# output should be:
'''
***Burger Order***
Bun: white
Patty: chicken
Extras:
- lettuce
- tomato
'''
health_burger = burger_order("gluten-free","veggie",["lettuce","tomato","pickle"])
print(health_burger)
# output should be:
'''
***Burger Order***
Bun: gluten-free
Patty: veggie
extras:
- lettuce
- tomato
- pickle
''';
Appendix C
These are some of the longer creative questions that I have used in classes in the past. These normally involve
combining multiple ideas from the book in creative ways as well as perhaps drawing upon some knowledge
beyond the book. I don’t have specific answers to these questions available, they are for you to explore on
your own.
93
First get the keys from the dictionary,
Shuffle the keys, and then iterate through those keys.
On each iteration, assign a value to the dictionary by asking for user input.
The user input should use the key in the prompt like
"Please suggest a noun for the story"
# Example answer
import random
libs = list(answer_dict.keys())
random.shuffle(libs)
for i in libs:
answer_dict[i] = input(f"Please suggest a {i[:-1]} for the story:")
ad = answer_dict
story = f'{ad["proper noun1"]} was so surprised when they saw the {ad["noun1"]}␣
,→{ad["verb continuous1"]} in {ad["place1"]}, they immediately packed their bags full␣
print(story)
Below I have written two different ways to create a word waterfall algorithmically.
Read them both, edit and play with them as you like.
Then below write the following: 1. A pseudocode explanation of algorithm 1. 2. A pseudocode explanation
of algorothm 2. 3. An evaluation of algorithms 1 and 2: What is common about them and what is different?
import random
WORD = "waterfall"
word = list(WORD)
wordmap = list(range(len(word)))
random.shuffle(wordmap)
for i in wordmap:
print("".join(word))
word[i] = " "
waterfall
aterfall
aterfal
a erfal
a er al
a er l
er l
er
r
word = list(WORD)
print()
waterfall
waterfal
wat rfal
wat rf l
wat rf
at rf
t rf
t r
r
3. Have the user select an order for delivery from the items on the menu.
• There should be at least five mains and three sides. The mains should be made of multiple ingredients.
• when the program starts, randomly select one ingredient to be ‘sold out due to those panic buyers’
(or a similarly plausible reason for an ingredient to be sold out). If the user selects an item with that
ingredient mention that it is sold out and offer an alternative.
4. When finished, print the order, an order number, the total price, and the delivery time.
5. Ask them to confirm by giving their mobile phone number.
Note:
• Consider what would be the maximum stock available. We will test by making a ridiculously large
order.
• Items contain multiple (often overlapping) ingredients. So if cheese is sold out, then cheeseburgers and
lasagna would not be available, but meatballs would still be available).
• Ensure that you can recommend at least one meal to the customer.
• You can make a larger menu but remember to test it for legibility and usability.
Challenge Store the order in a file before the program exits and give the user an ‘order number’. Then if
they run the program again and present the correct order number (and the correct telephone number) it will
print a duplicate receipt.
Some things you’ll want to consider about your data structure:
• How are you storing the order? As a list? As an object of the Order class?
• What will you do with bad input and how will you make the text input as easy as possible?
• How will you manage the inventory so that you will know which goods include which ingredients?
• What if you had to expand the list of menu items. How hard would that be with your program?
• Does the program end gracefully?
• Will your printing of output be attractive and easy to read?