(Treading On Python 1) Matt Harrison-Treading On Python Volume 1 - Foundations of Python. 1 (2011)
(Treading On Python 1) Matt Harrison-Treading On Python Volume 1 - Foundations of Python. 1 (2011)
Foundations
Treading on Python: Volume 1
Foundations
Matt Harrison
hairysun.com
COPYRIGHT © 2013
Contents
1 Why Python? 3
3 The Interpreter 7
3.1 Interactive interpreter . . . . . . . . . . . . . . . . . . 8
3.2 A REPL example . . . . . . . . . . . . . . . . . . . . . 8
4 Running Programs 11
4.1 Unixy embellishments . . . . . . . . . . . . . . . . . 12
6 Variables 17
6.1 Mutation and state . . . . . . . . . . . . . . . . . . . 17
6.2 Python variables are like tags . . . . . . . . . . . . . 17
6.3 Cattle tags . . . . . . . . . . . . . . . . . . . . . . . . 18
7 Basic Types 21
7.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2 Integers and floats . . . . . . . . . . . . . . . . . . . . 22
7.3 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.4 Rebinding variables . . . . . . . . . . . . . . . . . . . 22
7.5 Naming variables . . . . . . . . . . . . . . . . . . . . 23
7.6 Additional naming considerations . . . . . . . . . . . 23
v
Contents
9 Numbers 31
9.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . 33
9.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . 33
9.4 Division . . . . . . . . . . . . . . . . . . . . . . . . . 34
9.5 Modulo . . . . . . . . . . . . . . . . . . . . . . . . . . 35
9.6 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.7 Order of operations . . . . . . . . . . . . . . . . . . . 38
10 Strings 39
11 Formatting Strings 43
11.1 Format string syntax . . . . . . . . . . . . . . . . . . 43
11.2 format examples . . . . . . . . . . . . . . . . . . . . . 45
vi
Contents
17 Iteration 79
17.1 Looping with an index . . . . . . . . . . . . . . . . . 79
17.2 Breaking out of a loop . . . . . . . . . . . . . . . . . . 80
17.3 Skipping over items in a loop . . . . . . . . . . . . . 80
17.4 Removing items from lists during iteration . . . . . . 81
17.5 else clauses . . . . . . . . . . . . . . . . . . . . . . . 82
18 Dictionaries 83
18.1 Dictionary assignment . . . . . . . . . . . . . . . . . 83
18.2 Retrieving values from a dictionary . . . . . . . . . . 83
18.3 The in operator . . . . . . . . . . . . . . . . . . . . . 84
18.4 Dictionary shortcuts . . . . . . . . . . . . . . . . . . . 84
18.5 setdefault . . . . . . . . . . . . . . . . . . . . . . . . 85
18.6 Deleting keys . . . . . . . . . . . . . . . . . . . . . . . 87
18.7 Dictionary iteration . . . . . . . . . . . . . . . . . . . 87
19 Functions 89
19.1 Invoking functions . . . . . . . . . . . . . . . . . . . 91
19.2 Multiple parameters . . . . . . . . . . . . . . . . . . . 91
19.3 Default parameters . . . . . . . . . . . . . . . . . . . 92
19.4 Naming conventions for functions . . . . . . . . . . . 94
22 Classes 105
22.1 Defining a class . . . . . . . . . . . . . . . . . . . . . 106
22.2 Creating an instance of a class . . . . . . . . . . . . . 107
22.3 Calling a method on a class . . . . . . . . . . . . . . . 108
22.4 Examining an instance . . . . . . . . . . . . . . . . . 108
vii
Contents
24 Exceptions 113
24.1 Look before you leap . . . . . . . . . . . . . . . . . . 113
24.2 Easier to ask for forgiveness . . . . . . . . . . . . . . 114
24.3 Multiple exceptional cases . . . . . . . . . . . . . . . 115
24.4 finally clause . . . . . . . . . . . . . . . . . . . . . . 115
24.5 else clause . . . . . . . . . . . . . . . . . . . . . . . . 116
24.6 Raising exceptions . . . . . . . . . . . . . . . . . . . . 116
24.7 Defining your own exceptions . . . . . . . . . . . . . 116
viii
Contents
Index 157
ix
Introduction
1
Chapter 1
Why Python?
3
Chapter 2
This book will focus on Python 2. Python 3, has been out for a bit
now and is somewhat backwards incompatible with the 2 series.
Why use Python 2 then? Frankly for beginning Python, there are
few differences. The current advantage of Python 2 is that third
party library support is better. Every major library for Python is
satisfied with version 2 yet some still do not feel much pressure to
migrate to version 3.
For the most part, the examples in this book will run on both
Python 2 and 3, but it was written and tested with Python 2 in mind.
Which version of Python 2 does the book focus on? The examples
were tested in 2.6 and 2.7 but should work in 2.4 and above.
5
2. Which Version of Python?
time to learn to use their tool appropriately and it will pay dividends.
Learning to use the features of an editor can make churning out
code easier. Many modern editors today have some semblance of
support for Python. Note that Notepad and word processors are not
really text editors, though they might act as such on Halloween. For
Windows users, Notepad++ and the latest version of Visual Studio
have Python support. For Mac people, Sublime Text appears to be a
popular choice (note that this is also cross-platform). Kate and gedit
are sensible Linux choices.
If you are just beginning with Python and have not had much
experience with real text editors, most Python installations include
IDLE. IDLE has decent Python editing features. The IDLE develop-
ment environment also runs on Windows, Mac and Linux.
Many programmers favor Vim or Emacs. For Java people, both
Eclipse (via PyDev) and JetBrains (via PyCharm) provide Python
support. Wing is another cross platform Python specific editor
that many favor. As mentioned previously, Sublime Text is another
cross-platform editor that is gaining momentum. There are many
other editors available, though one advantage of those mentioned
in this paragraph is that they are cross platform if you find yourself
working in different environments.
As you can tell there are many editors, and each has their advan-
tages and disadvantages. If you do not have a favorite editing tool,
it is probably best to use a simple one such as Notepad++, Sublime
Text, or gedit. Editors such as Vim and Emacs have a slightly steeper
learning curve, though their adherents would argue that it pays off
to learn them well.
6
Chapter 3
The Interpreter
†
http://www.pypy.org
7
3. The Interpreter
$ python
>>> 2 + 2
4
>>>
In the above example, python was typed which opened the inter-
preter. The first >>> could be thought of as the read portion. Python
is waiting for input. 2 + 2 is typed in, read, and evaluated. The result
of that expression—4—is printed. The second >>> illustrates the loop,
because the interpreter is waiting for more input.
The REPL, by default, prints any non-None result of an expression
to standard out. This behaviour is inconsistent with normal Python
programs, where the print statement must be explicitly invoked.
But it saves a few keystrokes when in the REPL.
Note
The >>> prompt is only used on the first line of each input.
If the statement typed into the REPL takes more than one line,
the ... prompt follows:
The REPL ends up being quite handy. You can use the interactive
interpreter to write small functions, to test out code samples, or
8
3.2. A REPL example
**********************************
Personal firewall software may
warn about the connection IDLE
makes to its subprocess using this
computer ’s internal loopback
interface . This connection is not
visible on any external interface
and no data is sent to or received
from the Internet .
**********************************
IDLE 2.6.6
>>>
The >>> is a prompt. That is where you type your program. Type
print "hello world" after the >>> and hit the enter key. Make sure
there are not any spaces or tabs before the word print. You should
see this:
>>> print " hello world "
hello world
Note
Programming requires precision. If you were not careful
in typing exactly print "hello world" you might have seen
something like this:
9
3. The Interpreter
Note (cont.)
>>> print " hello world
SyntaxError : EOL while scanning
string literal
10
Chapter 4
Running Programs
$ python hello . py
Note
When running a command from the command line, this
book will precede the command with a $. This will distinguish
it from interpreter contents (>>> or ...) and file contents
(nothing preceding the content).
Note
The previous command, python hello.py, will probably
fail unless you have a file named hello.py.
In the previous chapter you just used the REPL to run “hello
world”, how does one run the same program standalone? Create a
file named hello.py using your favorite text editor.
In your hello.py file type:
Save the file, go to its directory, and execute the file (here execute
and run have the same meaning, ie type python before the file name,
and let the Python interpreter evaluate the code for you.)
11
4. Running Programs
Note
Typing python standalone launches the interpreter. Typing
python some_file.py executes that file.
Note
It is not uncommon to hear about shell scripts, Perl scripts,
Python scripts, etc. What is the difference between a Python
script and a Python program? Nothing, really it is only seman-
tics. A Python script usually refers to a Python program run
from the command line, whereas a Python program is any pro-
gram written in Python (which run the gamut of small 1-liners
to fancy GUI applications, to “enterprise” class services).
Note
This new first line tells the shell that executes the file to run
the rest of the file with the #!/usr/bin/env python executable.
(Shell scripts usually start with #!/bin/bash or #!/bin/sh.)
Save hello.py with the new initial line.
12
4.1. Unixy embellishments
Tip
#!/usr/bin/env is a handy way to indicate that the first
python executable found on your PATH environment variable
should be used. Because the python executable is located in
different places on different platforms, this solution turns out
to be cross platform. Note that Windows ignores this line. Un-
less you are absolutely certain that you want to run a specific
Python version, you should probably use #!/usr/bin/env.
Using hardcoded hashbangs such as:
• #!/bin/python
• #!/usr/bin/python2.4
$ chmod +x hello . py
This sets the executable bit on the file. The Unix environment
has different permissions (set by flipping a corresponding bit) for
reading, writing, and executing a file. If the executable bit is set, the
Unix environment will look at the first line and execute it accordingly,
when the file is run.
Tip
If you are interested in knowing what the chmod command
does, use the man (manual) command to find out by typing:
$ man chmod
Now you can execute the file by typing its name in the terminal
and hitting enter. Type:
$ ./ hello . py
And your program (or script) should run. Note the ./ included
before the name of the program. Normally when you type a com-
mand into the terminal, the environment looks for an executable
13
4. Running Programs
$ hello . py
bash : hello . py command not found
Yes, all that work just to avoid typing python hello.py. Why?
The main reason is that perhaps you want your program to be named
hello (without the trailing .py). And perhaps you want the pro-
gram on your PATH so you can run it at anytime. By making a file
executable, and adding a hashbang, you can create a file that looks
like an ordinary executable. The file will not require a .py extension,
nor will it need to be explicitly executed with the python command.
14
Chapter 5
Programs usually have some notion of input and output. For simple
programs, printing values to the screen and allowing the end user
to type in a value is usually sufficient. In Python both of these are
really straightforward.
Note
The above example illustrates typing print into the Python
interpreter shell. In that case the interpreter will immediately
execute the request, in this case printing Hello there.
Printing from the interpreter is common. Seasoned pro-
grammers will keep an interpreter open during coding and
use it as a scratch pad. If you follow along with the exam-
ples in this book, you will likely do many of them from the
interpreter too.
Note
Part of the cleanup effort of Python 3 was to make the
language more consistent. As a result print is a function in
Python 3. Functions require parentheses to be invoked:
15
5. Writing and Reading Data
Note (cont.)
>>> print (’ Hello there ’)
Hello there
If you typed in the above into the interpreter, it might look like
your computer is frozen. In reality, it is waiting for you to type
in some input. After you type something in and press enter, the
variable name will hold the value you typed. Type the name Matt
and press the enter key. If you print name it will print the value you
just typed:
Note
The value entered for raw_input is always a string. If you
want to convert it to another type like an integer or float, you
will need to use the int and float functions respectively:
Note
In Python 3 raw_input is removed and input replaces it:
16
Chapter 6
Variables
Now that you know about running programs via the interpreter (or
the REPL) and the command line, it is time to start learning about
programming. Variables are a basic building blocks of computer
programs.
17
6. Variables
and keep track of its state you need to have a variable to tag that data.
Here the state of the bulb is stored in a variable named status:
This tells Python to create a string with the contents of off. Create
a variable named status, and attach it to that string. Later on when
18
6.3. Cattle tags
you need to know what the status is, you can ask your program to
print it out like so:
Later on in your program you can access wattage, you can print it
out, and you can even assign another variable to it, or assign wattage
to another new value (say if your incandescent bulb broke and you
replaced it with an LED bulb):
19
Chapter 7
Basic Types
The last chapter discussed variables for storing string objects. Strings
in and of themselves are quite useful, but often it makes sense to
represent other types of state. There are many types of objects built
into Python that come in handy. If these built-in types are not
sufficient to model what you need, you can even define your own
objects—classes.
7.1 Strings
A string holds character information that can have any number of
characters (including 0):
21
7. Basic Types
>>> a = 4 # integer
>>> b = 5.6 # float
Tip
If you are interested in understanding more about how
computers represent floats, wikipedia has probably more in-
formation than you would want on the subject. Just search for
the term “Floating point”.
7.3 Booleans
Booleans are a built-in type to represent a binary toggle between true
and false. Booleans are frequently used to keep track of the status of
a task, ie whether it is done or not. You could use a boolean to keep
track of whether you have cleaned your room. Here are the variable
c is set to the boolean that represents true, which the d variable is
set to the boolean that represents false:
>>> c = True
>>> d = False
22
7.5. Naming variables
>>> a = 4
>>> a = ’4’ # now a is a string
Note
Keywords are reserved for use in Python language con-
structs, so it confuses Python if you try to make them vari-
ables.
The module keyword has a kwlist attribute, that is a list
containing all the current keywords for Python:
23
7. Basic Types
• be lowercase
• use an underscore to separate words
• not start with numbers
• not override a built-in function
Here are examples of variable names, both good and bad:
>>> good = 4
>>> bAd = 5 # bad - capital letters
>>> a_longer_variable = 6
Tip
Rules and conventions for naming in Python come from
a document named “PEP 8 – Style Guide for Python Code”.
PEP stands for Python Enhancement Proposal, which is a
community process for documenting a feature, enhancement
or best practice for Python. PEP documents are found on the
Python website.
Note
Although Python will not allow keywords as variable
names, it will allow you to use a built-in name as a variable.
Built-ins are functions, classes or variables that Python au-
tomatically preloads for you, so you get easy access to them.
Unlike keywords, Python will let you use a built-in as a vari-
able name without so much as a peep. However, you should
refrain from doing this, it is a bad practice.
24
7.6. Additional naming considerations
Note (cont.)
Using a built-in name as a variable name shadows the built-
in. The new variable name prevents you from getting access to
the original built-in. Essentially you took the built-in variable
and co-opted it for your use. To get access to the original built-
in you will need to access it through the __builtin__ module.
But it is much better not to shadow it in the first place.
Here is a list of Python’s built-ins that you should avoid
using as variables:
25
7. Basic Types
Note (cont.)
’ execfile ’, ’exit ’, ’ file ’ ,
’ filter ’, ’ float ’, ’ format ’ ,
’ frozenset ’, ’ getattr ’ , ’ globals ’ ,
’ hasattr ’, ’hash ’, ’ help ’ , ’hex ’ ,
’id ’, ’ input ’, ’int ’ , ’ intern ’ ,
’ isinstance ’, ’ issubclass ’ , ’ iter ’ ,
’len ’, ’ license ’, ’ list ’ , ’ locals ’ ,
’ long ’, ’map ’, ’max ’ , ’min ’ , ’ next ’ ,
’ object ’, ’oct ’, ’open ’ , ’ord ’ ,
’pow ’, ’ print ’, ’ property ’ , ’ quit ’ ,
’ range ’, ’ raw_input ’ , ’ reduce ’ ,
’ reload ’, ’repr ’, ’ reversed ’ ,
’ round ’, ’set ’, ’ setattr ’ , ’ slice ’ ,
’ sorted ’, ’ staticmethod ’ , ’str ’ ,
’sum ’, ’ super ’, ’ tuple ’ , ’ type ’ ,
’ unichr ’, ’ unicode ’ , ’ vars ’ ,
’ xrange ’, ’zip ’]
Note
In Python3 the __builtin__ module is renamed to
builtins.
Tip
Here are built-ins that would be good variable names oth-
erwise:
• dict
• file
• id
• list
• open
• str
• sum
• type
26
Chapter 8
This chapter will dive into objects a little bit more. You will cover
three important properties of objects:
• identity
• type
• value
8.1 Identity
Identity at its lowest level refers to the location in the computer’s
memory of an object. Python has a built-in function—id that tells
you the identity of an object:
When you type this, the identity of the string "Matt" will appear
as 140310794682416 (which refers to a location in the RAM of your
computer). This will generally vary for each computer and for each
time you start the shell, but the id of an object is consistent across
the lifetime of a program.
Do note that just as it is possible for a single cow to have two tags
on its ears, it is also possible for two variables to refer to the same
object. Hence, running id on either of the two variables will return
the same id. If you want another variable—first—to also refer to
the same object referred to by name, you could do the following:
27
8. More about objects
This tells Python to give the first variable the same id of name.
Later you could use the is operator to validate that they are actually
the same:
>>> first is name
True
>>> id ( first )
140310794682416
If you print either first or name at the REPL, it will print the
same value because they pointing to the exact same value:
8.2 Type
Another property of an object is its type. Common types are strings,
integers, floats, and booleans. There are many other kinds of types,
and you can create your own as well. The type of an object refers
to the class of an object. A class defines the state of data an object
holds, and the methods or actions that it can perform. Python allows
you to easily view the type of an object with the build-in function
type:
The type function tells you that the variable name holds a string
(str).
The table below shows the types for various obejcts in Python.
Object Type
String str
Integer int
Floating point float
List list
Dictionary dict
Tuple tuple
function function
User defined class‡ classobj
Instance of User defined class§ instance
User defined class (subclass object) type
Instance of class (subclass of class) class
built-in function builtin_function _or_method
type type
28
8.3. Mutability
8.3 Mutability
A final interesting property of an object is its mutability. Many objects
are mutable while others are immutable. Mutable objects can change
their value in place, in other words you can alter their state, but
their identity stays the same. Objects that are immutable do not allow
you to change their value. Instead you can change their variable
reference to a new object, but this will change the identity variable
to the new object as well.
In Python, dictionaries and lists are mutable types. Strings, tuples,
integers, and floats are immutable types. Here is an example demon-
strating that the identity of a variable holding an integer will change
if you change the value:
>>> age = 10
>>> id ( age )
140310794682416
>>> age = age + 1
>>> id ( age )
140310793921824 # DIFFERENT !
>>> names = []
>>> id ( name )
140310794682432
>>> names . append (" Fred ")
>>> id ( name )
140310794682432 # SAME !
29
Chapter 9
Numbers
9.1 Addition
The Python REPL can also be used as a simple calculator. If you
want to add two integers it is easily done:
>>> 2 + 6
8
31
9. Numbers
>>> 6 + .2
6.2000000000000002
Note
If you have an operation involving two numerics, coercion
generally does the right thing. For operations involving an
integer and a float, the integer is coerced to a float. If both
numerics are floats or integers, no coercion takes place. The
function coerce is a built-in that illustrates numeric coercion.
It takes two arguments and returns a tuple with numeric co-
ercion applied:
Note
Coercion between strings and numerics does not occur
with most mathematical operations. The exception being the
string formatting operator, %, if the left operand is a string:
>>> coerce ( ’2 ’ , 2)
Traceback ( most recent call last ):
File "< stdin >" , line 1 , in < module >
Type Error : number coercion failed
>>> print ’ num : %s ’ % 2
num : 2
32
9.2. Subtraction
Note
Explicit conversion can be done with the int and float
built-in classes. (Note that they look like functions but are
really classes):
9.2 Subtraction
Subtraction is similar to addition. Subtraction of two integers or two
floats returns an integer or a float respectively. For mixed numeric
types, the operands are coerced before performing subtraction:
>>> 2 - 6
-4
>>> .25 - 0.2
0.049999999999999989
>>> 6 - .2
5.7999999999999998
9.3 Multiplication
In many programming languages the * (asterisk) is used for multi-
plication. You can probably guess what is going to happen when
you multiply two integers:
>>> 6 * 2
12
If you have been following carefully, you will also know what
happens when you multiply two floats:
And if you mix the types of the product you end up with a float
as a result:
>>> 4 * .3
1.2
33
9. Numbers
9.4 Division
In Python (like many languages) the / symbol is used for division.
What happens when the operation is applied to the different types?
Start with integers:
>>> 12 / 4
3
>>> 3 / 4
0
Note
This is considered such a heinous wart that in Python 3,
the / operator automatically coerces the operands to floats
before performing the operation. Hence, the answer will be a
float. If you really want integer division in Python 3, you need
to use the // (double slash) operator.
>>> numerator / 4
0
>>> float ( numerator ) / 4
0.75
34
9.5. Modulo
If your string does not really look like a number though, Python
will complain:
Note
The built-in class, int, will coerce variables into integers if
it can:
>>> int ( ’2 ’)
2
>>> int (2.2)
2
9.5 Modulo
The modulo operator (%) calculates the remainder in a division oper-
ation. This is useful for determining whether a number is odd or
even (or whether you have just iterated over 1000 items):
# remainder of 4 divided by 3
>>> 4 % 3
1
Tip
Be careful with the modulo operator and negative numbers.
Modulo can behave differently, depending on which operand
is negative. It makes sense that if you are counting down, the
modulo should cycle at some interval:
>>> 3 % 3
35
9. Numbers
Tip (cont.)
0
>>> 2 % 3
2
>>> 1 % 3
1
>>> 0 % 3
0
>>> -1 % 3
2
But when you switch the sign of the denominator, the be-
havior becomes weird:
>>> -1 % -3
-1
>>> 1 % -3
-2
9.6 Power
Python also gives you the power operator by using the ** (double
asterisks). If you wanted to square 4, the following will do it:
>>> 4 ** 2
16
>>> 10 ** 100
10000000000000000000000000000000000000
00000000000000000000000000000000000000
0000000000000000000000000 L
If you look carefully at the result, after the 100 zero’s, there is an
L. The L stands for Long (or long integer). Programs need to use a
certain amount of memory to store integers. Because integers are
36
9.6. Power
Note
Python will tell you at what point it considers a number an
integer or a long integer. In the sys module there is a variable,
maxint that defines this. On my computer (a 64-bit computer)
this number is:
Tip
Why not always work with longs if they can represent more
numbers? In practice, you do not need them because Python
37
9. Numbers
Tip (cont.)
will do the right thing. (Unless you are counting on integer
overflow). The main reason is performance, longs are slower
to work with (bigger bags/more memory). Unless you need
longs, stay away so as to not incur a performance penalty. If
you feel an urge to use longs, the class long will coerce to a
long, much like int or float.
In Python 3, there is no long type because the details of
how the number is represented by the computer is completelly
handled behind the scenes.
Note
Python includes the operator module that has functions
for the common mathematical operations. When using more
advanced features of Python such as lambda functions or list
comprehensions, these come in handy:
>>> 4 + 2 * 3
10
>>> (4 + 2) * 3
18
38
Chapter 10
Strings
Strings are objects that hold character data. A string could hold
a single character, a word, a line of words, a paragraph, multiple
paragraphs or even zero characters.
Python denotes strings by wrapping them with ’ (single quotes),
" (double quotes), """ (triple doubles) or ’’’ (triple singles). Here
are some examples:
>>> character = ’a ’
>>> word = " Hello "
Notice that the strings always start and end with the same style of
quote. As illustrated in the line example you can put double quotes
inside of a single quoted string—and vice versa. Furthermore, if
you need to include the same type of quote within your string, you
can escape the quote by preceding it with a \ (backslash). When you
print out an escaped character the backslash is ignored.
Note
Attentive readers may wonder how to include a backslash
in a string. To include a backslash in a normal string, you
must escape the backslash with ... you guessed it, another
backslash:
>>> backslash = ’\\ ’
>>> print backslash
\
39
10. Strings
Note
Here are the common escape sequences in Python:
Tip
If you do not want to use an escape sequence, you can make
a raw string, by preceding the string with an r. Raw strings
are normally used in regular expressions, where the backslash
can be common.
Raw strings interpret the character content literally (ie.
there is no escaping). The following illustrates the difference
between raw and normal strings:
40
... aute irure dolor in reprehenderit in
... voluptate velit esse cillum dolore eu
... fugiat nulla pariatur . Excepteur sint
... occaecat cupidatat non proident , sunt
... in culpa qui officia deserunt mollit
... anim id est laborum ."""
41
Chapter 11
Formatting Strings
If you are paying careful attention, you will note that the num-
bers in the curly braces are incrementing. In reality they tell the
format operation which object to insert and where. Many computer
languages start counting from 0, so {0} would correspond with the
integer 1, and the {1} would be 2.5, while the {2} is the string ’foo’.
43
11. Formatting Strings
Field Meaning
fill Fills in space with align
align <-left align, >-right align, ^-center align, =-put padding after
sign
sign +-for all number, --only negative, space-leading space for
positive, sign on negative
# Prefix integers. Ob-binary, 0o-octal, 0x-hex
0 Enable zero padding
width Minimum field width
, Use comma for thousands separator
.preci- Digits after period (floats). Max string length (non-numerics)
sion
type s-string format (default) see Integer and Float charts
Float Meaning
Types
e/E Exponent. Lower/upper-case e
f Fixed point
g/G General. Fixed with exponent for large, and small numbers
(g default)
n g with locale specific separators
% Percentage (multiplies by 100)
44
11.2. format examples
Note
The format method on a string replaces the % operator
which was similar to C’s printf. This operator is still available
and some users prefer it because it requires less typing for
simple statements and because it is similar to C. %s, %d, and %x
are replaced by their string, integer, and hex value respectively.
Here are some examples:
45
Chapter 12
You have only just touched the surface of strings, but you need to
take a break to discuss two important functions and one library that
are built-in to Python. The first function is dir, which illustrates
wonderfully how powerful and useful the REPL is. The dir function
indicates the attributes of an object. If you had a Python interpreter
open and wanted to know what the attributes of a string are, you
can do the following:
47
12. dir, help, and pdb
dir lists all the attributes of the object passed into it. Since
you passed in the string ’Matt’ to dir, the function displays the
attributes of the string Matt. This handy feature of Python illus-
trates its “batteries included” philosophy. Python gives you an easy
mechanism to discover the attributes of any object. Other languages
might require special websites, documentation or IDE’s to access
similar functionality. But in Python, because you have the REPL,
you can get at this information quickly and easily.
The attribute list is in alphabetical order, and you can normally
ignore the first couple of attributes starting with __. Later on you will
see attributes such as capitalize (which is a method that capitalizes
a string), format (which as you illustrated, allows for formatting
of strings), or lower (which is a method used to ensure the string is
lowercase). These attributes happen to be methods, which are easy
to invoke on a string:
12.2 help
help is another built-in function that is useful in combination with
the REPL. This function provides documentation for methods, mod-
ules, classes, and functions (if the documentation exists). For exam-
ple, if you are curious what the attribute upper on a string does, the
following gives you the documentation:
48
12.3. pdb
upper (...)
S. upper () -> string
12.3 pdb
Python includes a debugger to step through code named pdb. This
library is modeled somewhat after the gdb library for C. To drop
into the debugger at any point a Python program, insert the code
import pdb; pdb.set_trace(). When this line is executed it will
present a (pdb) prompt, which is similar to the REPL. Code can
be evaluated and inspected live. Also breakpoints can be set and
further inspection can take place.
Below is a table listing useful pdb commands:
Command Purpose
h, help List the commands available
n, next Execute the next line
c, cont, continue Continue execution until a breakpoint is hit
w, where, bt Print a stack track showing where execution is
u, up Pop up a level in the stack
d, down Push down a level in the stack
l, list List source code around current line
Note
Many Python developers use print debugging. They insert
print statements to provide clarity as to what is going on.
This is often sufficient. Just make sure to remove the debug
statements or change them to logging statements before re-
leasing the code. When more exploration is required, the pdb
module can be useful.
49
Chapter 13
In the previous chapter you learned about the built-in dir function
and saw some methods you can call on string objects. Strings allow
you to capitalize them, format them, make them lowercase (lower),
as well as many other actions. These attributes of strings are methods.
Methods are functions that are called on an instance of a type. Try
to parse out that last sentence a little. The string type allows you to
call a method (another term for call is invoke) by placing a . (period)
and the method name directly after the variable name holding the
data (or the data itself), followed by parentheses with arguments
inside of them.
Here is an example of calling the capitalize method on a string:
# invoked on variable
>>> correct = name . capitalize ()
>>> print correct
Matt
# invoked on data
>>> print ’fred ’. capitalize ()
Fred
51
13. Strings and methods
Note
Do integers and floats have methods? Yes, all types in
Python are classes, and classes have methods. This is easy to
verify by invoking dir on an integer (or a variable holding an
integer):
>>> 5. conjugate ()
File "< stdin >" , line 1
5. conjugate ()
^
SyntaxError : invalid syntax
>>> five = 5
>>> five . conjugate ()
5
52
13.2. endswith
13.2 endswith
If you have a variable holding a filename, you might want to check
the extension. This is easy with endswith:
Note
Notice that you had to pass in a parameter, ’xls’, into the
method. Methods have a signature, which is a funky way of
saying that they need to be called with the correct number
(and type) of parameters. For endswith it makes sense that
if you want to know if a string ends with another string you
have tell Python which ending you want to check for. This is
done by passing the end string to the method.
Tip
Again, it is usually easy to find out this sort of information
via help. The documentation should tell you what parameters
are required as well as any optional parameters. Here is the
help for endswith:
endswith (...)
S. endswith ( suffix [ , start [ , end ]]) -> bool
>>> xl . endswith ()
53
13. Strings and methods
Tip (cont.)
Traceback ( most recent call last ):
File "< stdin >" , line 1 , in
< module >
TypeError : endswith () takes at
least 1 argument (0 given )
13.3 find
The find method allows you to find substrings inside other strings.
It returns the index (offset starting at 0) of the matched substring. If
no substring is found it returns -1:
# 0 is g , 1 is r , 2 is a
>>> word . find (’ ate ’)
2
>>> word . find (’ great ’)
-1
13.4 format
format allows for easy creation of new strings by combining existing
variables. The variables replace {X} (where X is an integer):
Note
In the above example, the print statement is spread across
two lines. By placing a \ following a . you indicate to Python
that you want to continue on the next line. If you have opened
54
13.5. join
Note (cont.)
a left parenthesis, (, you can also place the arguments on
multiple lines without a \:
13.5 join
join creates a new string from a sequence by inserting a string
between every member of the list:
55
13. Strings and methods
Tip
For most Python interpreters, using join is faster than
repeated concatenation using the + operator. The above idiom
is common.
13.6 startswith
startswith is analogous to endswith except that it checks that a
string starts with another string:
13.7 strip
strip removes preceding and trailing whitespace (spaces, tabs, new-
lines) from a string. This may come in handy if you have to normalize
data or parse input from a user (or the web):
Note that three spaces at the front of the string were removed
as were the two at the end. But the two spaces between the words
were left intact. If you are interested in removing only the lead-
ing whitespace or rightmost whitespace, the methods lstrip and
rstrip respectively will perform those duties.
There are other string methods, but they are used less often. Feel
free to explore them by reading the documentation and trying them
out.
56
Chapter 14
14.1 Comments
Comments are not a type per se, because they are ignored by Python.
Comments serve as reminders to the programmer. There are various
takes on comments, their purpose, and their utility. Opinions vary
about comments. There is a continuum of those who are against any
and all comments, those who comment almost every line of code,
and those who are in between. If you are contributing to a project,
try to be consistent with their commenting scheme. A basic rule of
thumb is that a comment should explain the why rather than the
how (code alone should be sufficient for the how).
To create a comment in Python simply start a line with a #:
Tip
A rogue use of comments is to temporarily disable code
during editing. If your editor supports this, it is sometimes
easier to comment out code rather than remove it completely.
But the common practice is to remove commented-out code
before sharing the code with others.
57
14. Comments, Booleans, and None
Tip
You may be tempted to comment out multiple lines of code
by making those lines a triple quoted string. This is ugly and
confusing. Try not to do this.
14.2 Booleans
Booleans represent the true and false values. You have already seen
them in previous code examples, such as the result of .startswith:
>>> a = True
>>> b = False
>>> ’bar ’. startswith (’b ’)
True
Note
The actual name of the boolean class in Python is bool.
Note
For the built-in types, int, float, str, and bool, even
though they are capitalized as if they were functions, they
are classes. Invoking help(str) will confirm this:
58
14.2. Booleans
Note (cont.)
| the object . If the argument is a string ,
...
Tip
Be careful when parsing content that you want to turn into
booleans. Strings that are non-empty evaluate to True. One
example of a string that might bite you is the string ’False’
which evaluates to True:
59
14. Comments, Booleans, and None
Truthy Falsey
True False
Most objects None
1 0
3.2 0.0
[1, 2] [] (empty list)
{’a’: 1, ’b’: 2} {} (empty dict)
’string’ "" (empty string)
’False’
’0’
Tip
Do not test boolean values to see if they are equal to True.
If you have a variable, done, containing a boolean, this is suffi-
cient:
>>> if done :
... # do something
>>> members = []
>>> if members :
... # do something if members
... # have values
... else :
... # member is empty
Note
If you wish to define the implicit truthiness for self defined
objects, the __nonzero__ method specifies this behavior. It can
return True, or False. If this magic method is not defined, the
60
14.3. None
Note (cont.)
__len__ method is checked for a non-zero value. If neither
method is defined, an object defaults to True.
14.3 None
None is a special type in Python—NoneType. Other languages have
similar constructs such as NULL or undefined. Variables can be
assigned to None to indicate that they are waiting to hold a real
value. None coerces to False in a boolean context:
Note
A Python function defaults to returning None if no return
statement is specified.
Note
None is a singleton (Python only has one copy of None in the
interpreter):
>>> a = None
>>> id (a)
140575303591440
>>> b = None
>>> id (b)
140575303591440
>>> a is b
True
>>> a is not b
False
61
Chapter 15
In addition to the boolean type in Python, you can also use expres-
sions to get boolean values. For comparing numbers, it is common
to check if they are greater than or less than other numbers. > and <
do this respectively:
>>> 5 > 9
False
Check Meaning
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
== Equal to
!= Not equal to
is Identical object
is not Not identical object
63
15. Conditionals and whitespace
Note
The “rich comparison” magic methods, __gt__, __lt__,
__ge__, __le__, __eq__, and __ne__ correspond to >, <, >=, <=,
==, and != respectively. For classes where these comparisons
are commonly used, the functools.total_ordering class dec-
orator allows for only defining __eq__ and __le__. The decora-
tor will automatically derive the remainder of the comparison
methods. Otherwise all six methods should be implemented:
Tip
is and is not are for comparing identity. When testing
for identity—if two objects are the same actual object (not
just have the same value)—use is or is not. Since None is a
singleton and only has one identity, is and is not are used
with None:
64
15.1. Combining conditionals
Boolean Meaning
Operator
x and y Both x and y must evaluate to True for true result
x or y If x or y is True, result is true
not x Negate the value of x (True becomes False and vice
versa)
>>> score = 91
>>> if score > 90 and score <= 100:
... grade = ’A ’
Note
In the above example the \ following ’George’ or indicates
that the statement will be continued on the next line.
Like most programming languages, Python allows you to
wrap conditional statements in parentheses. Because they are
not required in Python, most developers leave them out unless
they are needed for operator precedence. But another subtlety
of using parentheses is that they serve as a hint to the inter-
preter when a statement is still open and will be continued on
the next line, hence the \ is not needed in that case:
65
15. Conditionals and whitespace
15.2 if statements
Booleans (True and False) are often used in conditional statements.
Conditional statements are instructions that say “if this statement is
true, do that, otherwise do something else.” This is a useful construct
and is used frequently in Python. Sometimes the “if statement”
will check values that contain booleans, other times it will check
expressions that evaluate to booleans. Another common check is for
implicit coercion to “truthy” or “falsey” values:
>>> score = 87
>>> if score >= 90:
... grade = ’A ’
... else :
... grade = ’B ’
>>> score = 87
>>> if score >= 90:
... grade = ’A ’
... elif score >= 80:
... grade = ’B ’
66
15.5. Whitespace
Note
The if statement can have zero or more elif statements,
but they can only have up to one else statements.
Note
Note that after the if and elif statements come their
blocks. The block following a conditional statement is only
executed when its conditional statement evaluates to True, or
when the else statement is encountered.
15.5 Whitespace
Another peculiarity you may have noticed is the colon (:) following
the boolean expression in the if statement. The lines immediately
after were indented by four spaces. The indented lines are the block
of code that is executed when the if expression evaluates to True.
In many other languages an if statement looks like this:
• a colon (:)
• indentation
67
15. Conditionals and whitespace
Tip
What is consistent indentation? Normally either tabs or
spaces are used to indent code. In Python four spaces is the
preferred way to indent code. This is described in PEP 8 . If
you mix tabs and spaces you will eventually run into problems.
Although spaces are the preferred mechanism, if you are
working on code that already uses tabs, it is better to be con-
sistent. In that case continue using tabs with the code.
The python executable also has a -tt command line option
that will cause any inconsistent usage of spaces and tabs to
throw errors.
68
Chapter 16
Many of the types discussed so far have been scalars, which hold a
single value. Integers, floats, and booleans are all scalar values.
Sequences hold collections of objects (scalar types or even other
sequences). This chapter will discuss collections—lists, tuples and
sets.
16.1 Lists
Lists, as the name implies, are used to hold a list of objects. They are a
mutable type, meaning you can add, remove, and alter the contents of
them. There are two ways to create empty lists, one is with the list
function, and the other is using the square bracket literal syntax—[
and ]:
>>> names = list ()
>>> other_names = []
If you want to have prepopulated lists, you can provide the values
in between the square brackets, using the literal syntax:
Note
The list function can also create prepopulated lists, but it
is somewhat redundant because you have to pass a list into it:
69
16. Sequences: Lists, tuples, and sets
Lists, like other types, have methods that you can call on them
(use dir([]) to see a complete list of them). To add items to the end
of a list use the append method:
>>> names . append (’ Matt ’)
>>> names . append (’ Fred ’)
>>> print names
[ ’ Matt ’, ’Fred ’]
70
16.4. List deletion
Note
CPython’s underlying implementation of a list is actually
an array of pointers. This provides quick random access to
indices. Also appending/removing at the end of a list is quick
(O(1)), while inserting/removing from the middle of a list is
slower (O(n)). If you find yourself inserting and popping from
the front of a list, a collections.deque might be a better data
structure.
As with many of the other operations on lists, you can also delete
by index using the bracket notation:
If the previous order of the list was important, you can make
a copy of it before sorting. Another option is to use the sorted
function. sorted creates a new list that is reordered:
>>> old = [5 , 3, -2 , 1]
>>> nums_sorted = sorted ( old )
>>> print nums_sorted
[ -2 , 1, 3, 5]
>>> print old
[5 , 3, -2, 1]
71
16. Sequences: Lists, tuples, and sets
Both the sort method and sorted function allow arbitrary control
of sorting by passing in a function for the key parameter. In this
example, by passing in str as the key parameter, every item in the
list is sorted as if it were a string:
Note
In Python 2, sort and sorted also accept a cmp parameter.
This is removed from Python 3. key can provide the same
functionality and is slightly faster.
Tip
The Python built-in function range constructs integer lists.
If you needed the numbers zero through four, you can easily
get it with range:
# numbers < 5
>>> nums = range (5)
>>> print nums
[0 , 1, 2, 3, 4]
Notice that range does not include 5 in its list. Many Python
functions dealing with final indices mean “up to but not in-
cluding”. (Slices are another example of this you will see
later).
72
16.7. Tuples
Tip (cont.)
If you need to start at a non-zero number, range can take
two parameters as well. When there are two parameters, the
first is the starting number (including it), and the second is
the “up to but not including” number:
# numbers from 2 to 5
>>> nums2 = range (2 , 6)
>>> print nums2
[2 , 3, 4, 5]
Note
The “up to but not including” construct is also more for-
mally known as the half-open interval convention. It is com-
monly used when defining sequences of natural numbers.
This has a few nice properties:
16.7 Tuples
Tuples (commonly pronounced as either “two”-ples or “tuh”-ples)
are immutable sequences. Once you create them, you cannot change
them. Similar to the list square bracket literal sytax, there is a paren-
theses literal syntax for tuples. There is also a tuple function that
you can use to construct a new tuple from an existing list or tuple:
73
16. Sequences: Lists, tuples, and sets
>>> b = (2 ,3)
>>> print b
(2 , 3)
There are two ways to create an empty tuple, using the tuple
function or parentheses:
>>> empty = ()
>>> print empty
()
Here are three ways to create a tuple with one item in it:
>>> one = (1 ,)
>>> print one
(1 ,)
>>> one = 1,
>>> print one
(1 ,)
Note
Because parentheses are used for both denoting the calling
of functions or methods in Python as well as tuple creation,
this can lead to confusion. Here’s the simple rule, if there is one
item in the parentheses, then Python treats the parentheses as
normal parentheses (for operator precedence), such as those
that you might use when writing (2 + 3) * 8. If there is more
than one item in the parentheses, then Python treats it as a
tuple:
>>> d = (3)
>>> type (d)
< type ’int ’>
74
16.7. Tuples
Note (cont.)
ing the item—or use the tuple function with a single item
list:
>>> e = (3 ,)
>>> type (e)
< type ’ tuple ’>
Here are three ways to create a tuple with more than one item:
>>> many = 1 ,2 ,3
>>> print many
(1 , 2, 3)
Note
Why the distinction between tuples and lists? Why not use
lists since they appear to be a super-set of tuples?
The main difference is mutability. Because tuples are im-
mutable they are able to serve as keys in dictionaries. Tuples
are often used to represent a record of data such as the results
of a database query, which may contain heterogeneous types
of objects. Perhaps a tuple would contain a name, address,
and age:
75
16. Sequences: Lists, tuples, and sets
16.8 Sets
Another sequence type found in Python is a set. A set is an un-
ordered sequence that cannot contain duplicates. Like a tuple it can
be instantiated with a list. There are a few differences. First, unlike
lists and tuples, a set does not care about order. Also, unlike a tuple
or list, there is no special sytax to create sets, you have to call the
set class (another coercion class that appears as a function). Pass
the set class a list, and it will create a sequence with any duplicates
removed:
>>> digits = [0 , 1, 1, 2 , 3 , 4 , 5 , 6 ,
... 7 , 8, 9]
# remove extra 1
>>> digit_set = set ( digits )
>>> digit_set
set ([0 , 1, 2, 3, 4, 5, 6 , 7 , 8 , 9])
Sets are useful because they allow for set operations, such as union
(|), intersection (&), difference (-), and xor (^) among two sets.
Difference (-) allows you to remove items in one set from another:
# difference
>>> even = digit_set - odd
>>> print even
set ([0 , 8, 2, 4, 6])
# those in both
>>> prime_even = prime & even
>>> print prime_even
set ([2])
The union (|) operation returns a set composed of both sets, with
duplicates removed:
Xor (^) is an operation that returns a set of items that only are
found in one set or the other, but not both:
76
16.8. Sets
Tip
Why use a set instead of a list? Sets are optimized for
set operations. If you find yourself performing unions or
differences among lists, look into using a set instead.
Sets are also quicker for testing membership. The in oper-
ator runs faster for sets than lists. However, this speed comes
at a cost. Sets do not keep the elements in any particular order,
whereas lists do.
77
Chapter 17
Iteration
Note
Notice that a for loop construct contains a colon (:) fol-
lowed by indented code. (The indented code is the block of
the for loop).
79
17. Iteration
>>> animals = [" cat " , " dog " , " bird "]
>>> for index , value in \
... enumerate ( animals ):
... print index , value
0 cat
1 dog
2 bird
>>> numbers = [3 , 5 , 9 , -1 , 3 , 1]
>>> result = 0
>>> for item in numbers :
... if item < 0:
... break
... result = result + item
>>> print result
17
Note
Note that the if block inside the for block is indented
eight spaces. Blocks can be nested, and each level needs to be
indented consistently.
80
17.4. Removing items from lists during iteration
>>> numbers = [3 , 5 , 9 , -1 , 3 , 1]
>>> result = 0
>>> for item in numbers :
... if item < 0:
... continue
... result = result + item
>>> print result
21
81
17. Iteration
82
Chapter 18
Dictionaries
In the above example the keys are the names. For example
’George’ is the key that maps to the integer 10, the value.
The above example illustrates the literal syntax for creating an
initially populated dictionary. It also shows how the square brackets
are used to insert items into a dictionary. They associate a key with
a value, when used in combination with the assignment operator (=).
83
18. Dictionaries
Be careful though, if you try to access a key that does not exist in
the dictionary, Python will throw an exception:
Tip
Dictionaries also have a has_key method, that is similar to
the in operator. Idiomatic Python favors in to has_key:
Tip
The get method of dictionaries is one way to get around
the KeyError thrown when trying to use the bracket notation
to pull out a key not found in the dictionary.
84
18.5. setdefault
18.5 setdefault
A useful, but somewhat confusingly named, method of dictionaries
is the setdefault method. The method has the same signature as
get and initially behaves like it, returning a default value if the key
does not exist. In addition to that, it also sets the value of the key to
the default value if the key is not found. Because setdefault returns
a value, if you initialize it to a mutable type, such as a dict or list,
you can mutate the result in place.
setdefault can be used to provide an accumulator or counter
for a key. For example if you wanted to count the number of people
with same name, you could do the following:
Tip
The collections.Counter class found in Python 2.7 and
Python 3 can perform the above operations much more suc-
cinctly:
85
18. Dictionaries
Tip
The collections module from the Python standard library
includes a handy class—defaultdict. This class behaves just
like a dictionary but it also allows for setting the default value
of a key to an arbitrary factory. If the default factory is not
None, it is initialized and inserted as a value any time a key is
missing.
The previous example re-written with defaultdict is the
following:
86
18.6. Deleting keys
Tip (cont.)
... names_to_bands [ name ].\
... append ( ’ Wings ’)
>>> print names_to_bands [ ’ Paul ’]
[’ Beatles ’, ’ Wings ’]
Tip
Like deletion from a list while iterating over it, be careful
about removing keys from a dictionary while iterating over
the same dictionary.
Note
The dictionary has a method—keys—that will also list out
the keys of a dictionary.
87
18. Dictionaries
To retrieve both key and value during iteration, use the items
method:
Tip
If the order of iteration is important, either sort the se-
quence of iteration or use a data structure that stores order
information.
The built-in function sorted will return a new sorted list,
given a sequence:
88
Chapter 19
Functions
89
19. Functions
Tip
The help function has been emphasized through this book.
It is important to note that the help function actually gets its
content from the docstring of the object passed into it. If you
call help on add_2, you should see the following (provided
you actually typed out the add_2 code above):
add_2 ()
return 2 more than num
( END )
90
19.1. Invoking functions
• indentation
• docstring
• logic
• return statement
>>> add_two_nums (4 , 6)
10
>>> add_two_nums ( ’4 ’ , ’6 ’)
’46 ’
91
19. Functions
>>> add_two_nums ( ’4 ’ , 6)
Traceback ( most recent call last ):
File "< stdin >" , line 1 , in < module >
File "< stdin >" , line 2 , in
add_two_nums
TypeError : cannot concatenate ’str ’ and
’int ’ objects
Tip
Default parameters must be declared after non-default pa-
rameters. Otherwise Python will give you a SyntaxError:
92
19.3. Default parameters
Tip
Do not use mutable types (lists, dictionaries) for default
parameters unless you know what you are doing. Because
of the way Python works, the default parameters are created
only once—when a function is defined. If you use a mutable
default value, you will end up re-using the same instance of
the default parameter during each function invocation:
93
19. Functions
• be lowercase
• have_an_underscore_between_words
• not start with numbers
• not override built-ins
• not be a keyword
94
Chapter 20
Two nice constructs that Python provides to pull data out of sequence-
like types (lists, tuples, and even strings) are indexing and slicing.
Indexing allows you to access single items out of a sequence, while
slicing allows you to pull out a sub-sequence from a sequence.
20.1 Indexing
For example, if you have a list containing pets, you can pull out
animals by index:
>>> my_pets = [" dog " , " cat " , " bird "]
>>> print my_pets [0]
dog
>>> print my_pets [ -1]
bird
Tip
Indices start at 0. If you want to pull out the first item you
reference it by 0, not 1. This is zero-based indexing.
Tip
You can also reference items using negative indices. -1
references the last item, -2 the second to last item, etc. This is
especially useful for pulling off the last item.
95
20. Indexing and Slicing
>>> my_pets = [" dog " , " cat " , " bird "] # a list
>>> print my_pets [0:2]
[ ’ dog ’, ’cat ’]
You can also negative indices when slicing. It works for either
the first or second index. An index of -1 would be the last item. If
you slice up to the last item, you will get everything but that item
(remember Python usually goes up to but not including the end
range):
If you include the colon (:), the final index is optional. If the final
index is missing, the slice defaults to the end of the list:
If you include the colon (:), the first and second indices are
optional. If both indices are missing, the slice returned will contain
a copy of the list. This is actually a construct you see to quickly copy
lists in Python.
96
20.3. Striding slices
>>> my_pets = [" dog " , " cat " , " bird "]
>>> dog_and_bird = my_pets [0:3:2]
>>> print dog_and_bird
[ ’ dog ’, ’bird ’]
Note
Again, the range function has a similar third parameter
that specifies stride.
97
Chapter 21
99
21. File Input and Output
Be careful, if you try to read a file that does not exist, Python will
throw an error:
Tip
The open function returns a file object instance. This object
has methods to read and write data. You might be tempted
to name a variable file. Try not to since file is a built-in
function.
Common variable names for file objects that do not override
built-ins are fin (file input), fout (file output), fp (file pointer,
used for either input or output) or names such as passwd_file.
Names like fin and fout are useful because they indicate
whether the file is used for reading or writing respectively.
100
21.4. Writing files
Tip
In addition to typing a few less characters for line iteration,
the __iter__ method has an additional benefit. Rather than
loading all the lines into memory at once, it reads the lines
one at a time. If you happen to be examining a large log file,
readlines will read the whole file into memory. If the file is
large enough, this might fail. However, looping over lines in
the file with __iter__ will not fail even for large text files since
in only reads one line from a file at a time.
Note
If you want to include newlines in your file you need to
explicitly pass them to the file methods. On unix platforms,
strings passed into write should end with \n. Likewise, each
of the strings in the sequence that is passed into to writelines
should end in \n. On Windows, the newline string is \r\n.
To program in a cross platform manner, the linesep string
found in the os module defines the correct newline string for
the platform.
101
21. File Input and Output
Tip
If you are trying this out on the interpreter right now, you
may notice that the /tmp/names.txt file is empty even though
you told Python to write George in it. What is going on?
File output is buffered by Python. In order to optimize writes
to the storage media, Python will only write data after a certain
threshold has been passed. On Linux systems this is normally
4K bytes.
To force writing the data, you can call the flush method,
which flushes the pending data to the storage media.
A more heavy-handed mechanism, to ensure that data is
written, is to call the close method. This informs Python that
you are done writing to the file:
Notice that the with line ends with a colon. Indented content
following a colon is a block. In the above example, the block consisted
of writing Ringo to a file. Then the block finished. At this point the
102
21.6. Designing around files
context manager kicks in. The file context manager tells Python to
automatically close the file for you when the block is finished.
Tip
Use the with construct for reading and writing files. It is
a good practice to close files, and if you use with you do not
have to worry about it, since it automatically closes the files
for you.
This code will probably work okay. But what will happen when
the requirement comes to insert line numbers in front of lines not
coming from files? Or if you want to test the code, now you need to
have access to the file system. One way around this is to write
add_numbers similar to above, but have it call another function,
add_nums_to_seq that actually adds the line numbers to a sequence:
103
21. File Input and Output
Hint
Actually there are other types that implement the file-like
interface (read and write). Anytime you find yourself coding
with a filename, ask yourself if you may want to apply the
logic to other sequence like things. If so, use the previous
example of nesting the functions to obtain code that is much
easier to reuse.
104
Chapter 22
Classes
You have read about objects, such as strings, files, and integers. In
Python almost everything is an object (keywords such as in are not
objects). This chapter will delve deeper into what an object really is.
Object is a somewhat ambiguous term. When you hear about
“Object Oriented Programming”, it means grouping together data
(state) and methods (functions to alter state). Many object oriented
languages such as C++, Java, and Python use classes to define state
and methods. Whereas classes are the definition of the state and
methods, instances are occurrences of said classes.
For example, in Python, str is the name of the class used to store
strings. The str class defines the methods of strings.
You can create an instance of the str class by using Python’s
literal string syntax:
# an instance of a String
>>> "I ’m a string "
I ’m a string
Note
The str class can also be used to create strings, but is nor-
mally used for casting. It is a bit of overkill to pass in a string
literal into the str class.
105
22. Classes
There are a few things to notice here. First, the keyword class
indicates the definition of a class. This is followed by the name of the
class—Animal. In Python 2.x, the class name is normally followed
by a superclass, (object) in this case.
The last character on the line is the colon (:). Remember that a
colon indicates that you are going to define a block. In this case the
block will be the code describing the state and methods on the class.
Note that everything below that first colon is indented.
Note
Class names are normally camel cased. Unlike functions
where words are joined together with underscores, in camel
casing, you capitalize the first letter of each word and then
shove them together. Normally class names are nouns. In
Python they cannot start with numbers. The following are
examples of class names both good and bad:
• Kitten # good
• jaguar # bad - starts with lowercase
• SnowLeopard # good - camel case
• White_Tiger # bad - has underscores
• 9Lives # bad - starts with a number
Inside of the body of a class are methods. You have already seen
many methods, such as format on a string. Methods are simply
106
22.2. Creating an instance of a class
Note
Classes and their methods can have docstrings. These are
useful while browsing code, and are accessible from the REPL
by invoking help on the class name:
107
22. Classes
Note that an instance of animal has the attributes for both the
data bound to the instance (in the __init__ method), as well as any
methods defined. You see the name attribute for the data and the
talk attribute for the method.
108
Chapter 23
Subclassing a Class
When Animal was defined, the first line defining the class was
class Animal(object). This tells Python to create a class Animal
that is more specialized than object, the base class. Here the line
defining the class, class Cat(Animal), indicates the definition of a
new class, Cat that is more specific than the base class of Animal.
Because Cat defines talk, that method is overridden. The construc-
tor, __init__, was not overriden, so Cat uses the same constructor
as its superclass.
By overriding methods, you can further specialize a class. If you
do not need to override, you can encourage code reuse, which is
nice because it eliminates typing, but also can eliminate bugs.
Here is an example of instantiating a cat and calling a method
on it:
>>> cat = Cat (" Groucho ")
>>> cat . talk () # invoke method
Groucho says , " Meow !"
109
23. Subclassing a Class
23.1 Superclasses
The class definitions of Animal and Cat indicated that their super-
classes are object and Cat respectively. If you are coming from a
language like Java, this might be somewhat confusing. Why should
Animal have a superclass, when it should be the baseclass?
In reality, using object as the superclass of Animal is not required
in Python 2.x, but Python 2.2 introduced a change to allow subclass-
ing of lists and dicts, which changed the underlying implementation
for classes. This change introduced new-style classes which should
derive from object. The original classic class is still available in
Python 2.x, when a class is defined without a baseclass:
But classic classes cannot derive from dict or list types. Nor can
they have properties defined on them. Also the method resolution order,
which parent classes to call for methods, is different for both types
of classes.
Tip
Superclasses can be confusing because they have changed
in Python 2 and again in Python 3.
In Python 3.x the (object) is not required. If you do not
use it in Python 2.x you will create what is known as a “classic”
class. In this case the old school style is bad, so use (object)
in 2.x. But in 3.x you do not need to. Yes, Python 3.x cleaned
up that wart, but it is now somewhat confusing.
110
23.2. Calling parent class methods
Note
The semantics of super are interesting. You pass in the
name of the class (TomCat) and the class instance (self), and
super will return the superclass on which you can call the su-
perclass’ method. Also super only works on new-style classes.
In Python 3 the semantics of super have been simplified
and super().talk() would work on the example above.
There are two cases where super really comes in handy.
One is for resolving method resolution order (MRO) in classes
that have multiple parents. super will guarantee that this
order is consistent. The other is when you change the base
class, super is intelligent about determining who the new base
is. This aids in code maintainability.
The old school way (before Python 2.2, when super was
introduced) was just to invoke the method on the parent. It
still works in a pinch on all Python versions:
111
Chapter 24
Exceptions
>>> 3/0
Traceback ( most recent call last ):
File "< stdin >" , line 1 , in < module >
ZeroDivisionError : integer division
or modulo by zero
The above states that in line 1 there was a divide by zero error.
When you execute a program with an exception, the stack trace
will indicate the file name and line number of where the problem
occurred.
113
24. Exceptions
>>> numerator = 10
>>> divisor = 0
>>> if divisor != 0:
... result = numerator / divisor
... else :
... result = None
Note
Note that None is used to represent the undefined state.
This is a common idiom throughout Pythondom. Be careful
though, not to try and invoke methods on a variable that
contains None.
>>> numerator = 10
>>> divisor = 0
>>> try :
... result = numerator / divisor
... except ZeroDivisionError as e :
... result = None
Notice that the try construct creates a block following the try
statement (because there is colon and indentation). Inside of the
try block are the statements that might throw an exception. If the
statements actually throw an exception Python looks for an except
block that catches that exception. Here the except block states that
it will catch any exception that is an instance (or sublass) of the
ZeroDivisionError class. If an error in thrown in the try block, the
except block is executed and result is set to None.
Tip
Try to limit the scope of the try block. Instead of including
all of the code in a function inside a try block, put only the
line that will possibly throw the error.
114
24.3. Multiple exceptional cases
Tip
Which method is better? It depends. If you find yourself
running into exceptions often it is possible that look before
you leap might be favorable. Raising exceptions and catching
them is a relatively expensive operation.
>>> try :
... some_function ()
... except ZeroDivisionError , e :
... # handle specific
... except Exception , e :
... # handle others
115
24. Exceptions
>>> try :
... some_function ()
... except Exception , e :
... # handle errors
... finally :
... # cleanup
>>> try :
... print ’hi ’
... except Exception , e :
... print ’ Error ’
... else :
... print ’ Success ’
... finally :
... print ’at last ’
hi
Success
at last
Normally you will not raise the generic BaseException class, but
will raise subclasses that are predefined, or define your own.
116
24.7. Defining your own exceptions
117
24. Exceptions
118
Chapter 25
Importing libraries
The previous chapters have covered the basic constructs for Python.
In this chapter you’ll learn about importing code. Many languages
have the concept of libraries or reusable chunks of code. Python
comes with “batteries included”, which really means that the stan-
dard libraries that come included with Python should allow you to
do a lot without having to look elsewhere.
To use libraries you have to load the code into your namespace. The
namespace holds the functions, classes, and variables (ie “names”)
your module has access to. For example the built-in math library
has a sin function that calculates the sine of an angle expressed in
radians:
>>> from math import sin
>>> sin (0)
0.0
The above code loads the sin function from the math module into
your namespace. If you do this from the REPL as illustrated above,
you have access to the sin function from the REPL. If you include
that code in a file, code in that file should now have access to the
sin function.
119
25. Importing libraries
In the above we imported the math library and invoked its tan
function.
Tip
When would you import a function using from or import a
library using import? If you are using a couple of attributes of
a library perhaps you might want to use a from style import.
It is possible to specify multiple comma-delimted attributes
in the from construct:
Note
Prior to Python 2.4 an import line could not span multiple
lines unless backslashes were used to escape line endings:
120
25.3. Star imports
Tip
The as keyword can also be used to eliminate typing. If
your favorite library has overly long and verbose names you
can easily shorten them in your code. Users of the Numpy¶
library have adopted the standard of reducing keystrokes by
using a two letter acronym:
¶
numpy.scipy.org
k
pandas.pydata.org
121
25. Importing libraries
Notice that the above calls the arc sine, which has not yet been
defined. The line where asin is invoked is the first reference to asin
in the code. What happened? When you say from library import
*, it tells Python to throw everything from the library (class defini-
tions, functions, and variables) into the local namespace. While this
might appear handy at first glance, it is quite dangerous.
Star imports make debugging harder, because it is not explicit
where code comes from. Even worse are star imports from multiple
libraries. Subsequent library imports might override something
defined in an earlier library. As such star imports are discouraged
and frowned upon by many Python programmers.
Tip
Do not use star imports!
Note
The possible exceptions to this rule are when you are writ-
ing your own testing code, or messing around in the REPL.
Library authors do this as a shortcut to importing everything
from the library that they want to test. But just because you see
it in testing code, do not be tempted to use it in other places.
Notice that the from construct allows importing only the func-
tions and classes needed. Using the import construct would require
more typing (but also allow access to everything from the package):
122
25.5. Import organization
Tip
It is useful to organize the grouped imports alphabetically.
Tip
It can be useful to postpone some imports to:
123
Chapter 26
26.1 Modules
Modules are just Python files that end in .py, and have a name that
is importable. PEP 8 states that module filenames should be short
and in lowercase. Underscores may be used for readability.
26.2 Packages
A package in Python is a directory that contains a file named __init__.py.
The file named __init__.py can have any implementation it pleases
or it can be empty. In addition the directory may contain an arbitrary
number of modules and subpackages.
Here is an example from a portion of the directory layout of
the popular sqlalchemy project (an Object Relational Mapper for
databases):
sqlalchemy /
__init__ . py
engine /
__init__ . py
base . py
schema . py
125
26. Libraries: Packages and Modules
or:
or:
26.4 PYTHONPATH
PYTHONPATH is simply an environment variable listing non-standard
directories that Python looks for modules or packages in. It is usu-
ally empty by default. It is not necessary to change this unless you
are developing code and want to use libraries that have not been
installed.
If you had some code in /home/test/a/plot.py, but were work-
ing out of /home/test/b/, using PYTHONPATH allows access to that
code. Otherwise, if plot.py was not installed using system or Python
tools, trying to import it would raise an ImportError:
126
26.5. sys.path
Tip
Python packages can be installed via package managers,
Windows executables or Python specific tools such as Pip or
easy_install.
26.5 sys.path
sys.path is accessible after importing the sys module that comes
with Python. It lists all the directories that are scanned for Python
modules and packages. If you inspect this variable you will see all
the locations that are scanned. It might look something like this:
Tip
If you find yourself encountering errors like this:
127
26. Libraries: Packages and Modules
Tip (cont.)
Use sys.path to see if it has the directory holding foo.py
(if it is a module) or the parent of the foo/ directory (in the
case of a package):
128
Chapter 27
A complete example
27.1 cat.py
Here is the contents of the Python implementation of cat. It only
includes an option for adding line numbers (--number), but none of
the other cat options:
129
27. A complete example
import argparse
import logging
import sys
__version__ = ’0.0.1 ’
logging . basicConfig (
level = logging . DEBUG )
130
27.2. Common layout
if args . run_tests :
import doctest
doctest . testmod ()
else :
cat = Catter ( args . files , args . number )
cat . run ( sys . stdout )
logging . debug ( ’ done catting ’)
if __name__ == ’ __main__ ’:
main ( sys . argv [1:])
Note
The above list is a recommendation. Most of those items
can be in an arbitrary order. And not every file will have all
these items. For instance not every file needs to be runnable
as a shell script.
131
27. A complete example
Note (cont.)
You are free to organize files how you please, but you do so
at your own peril. Users of your code will likely complain (or
submit patches). You will also appreciate code that follows
the recommendation, since it will be quickly discoverable.
27.3 Shebang
The first line on a file that also used as a script is usually the shebang
line (#!/usr/bin/env python). On Unix operating systems, this line
is parsed to determine how to execute the script. Thus, this line is
only included in files that are meant to be runnable as scripts.
Note
The Windows platform ignores the shebang line.
Note
Rather than hardcoding a specific path to Python,
/usr/bin/env selects the first python executable found on the
user’s PATH. Tools such as virtualenv will modify your PATH
to use a custom python executable.
Tip
If the directory containing the file is present in the user’s
PATH environment variable, and the file is executable, then the
file name alone is sufficient for execution from a shell.
27.4 Docstring
A module may have a docstring as the first piece of code. Since a
docstring serves as an overview of the module, it should contain a
basic summary of the code. Also it may contain examples of using
the module.
132
27.5. Imports
Tip
Python contains a library, doctest that can verify examples
from an interactive interpreter. Using docstrings that contain
REPL code snippets can serve both as documentation and
simple sanity tests for your library.
cat.py includes doctest code at the end of its docstring.
When cat.py runs with --run-tests, the doctest library will
check any docstrings and validate the code found in them.
Normally a non-developer end user would not see options
for running tests in a script. In this case it is included as an
example of using doctest.
27.5 Imports
Imports are usually included at the top of Python modules. The
import lines are normally grouped by location of library. First come
any libraries found in the Python standard library. Next come third
party libraries. Finally listed are libraries that are local to the current
code. Such organization allows end users of your code to quickly
see imports, requirements, and where code is coming from.
133
27. A complete example
Note
Though the Python language does not have support for
constant data types, globals are often used to indicate that a
variable should be constant.
Note
By defining constants as globals, and using well thought-
out variable names, you can avoid a problem found in pro-
gramming—“magic numbers”. A magic number is a numbers
sitting in code or formulas that is not stored in a variable. That
in itself it bad enough, especially when someone else starts
reading your code.
Another problem with magic numbers is that the same
value tends to propagate through the code over time. The
solution to both these problems (context and repetition) is to
put the value in a named variable. Having them in a variable
gives context and naming around the number. It also allows
you to easily change the value in one place.
Note
It is a good idea to define a version for your library if you
intend on releasing it to the wild. PEP 386 suggests best prac-
tices for how to declare version strings.
134
27.7. Logging
27.7 Logging
One more variable that is commonly declared at the global level is
the logger for a module. The Python standard library includes the
logging library that allows you to report different levels of informa-
tion in well defined formats.
Multiple classes or functions in the same module will likely need
to log information. It is common to just do initialization once at
the global level and then reuse the logger handle that you get back
throughout the module.
27.9 Implementation
Following any global and logging setup comes the actual meat of
the code—the implementation. This is accomplished by defining
functions and classes. The Catter class would be considered the
core logic of the module.
27.10 Testing
Normally bonafide test code is separated from the implementation
code. Python allows a small exception to this. Python docstrings
can be defined at module, function, class, and method levels. Within
docstrings, you can place Python REPL snippets illustrating how
to use the function, class or module. These snippets, if well crafted
and thought-out, can be effective in documenting common usage
of the module. In addition, Python includes a library, doctest, that
allows testing and validation of Python REPL snippets.
Another nice feature of doctest is validation of documentation.
If your snippets once worked, but now they fail, either your code
has changed or your snippets are wrong. You can easily find this
out before end users start complaining to you.
135
27. A complete example
Tip
doctest code can be in a stand-alone text file. To execute
arbitrary files using doctest, use the testfile function:
Note
In addition to doctest, the Python standard library in-
cludes the unittest module that implements the common
xUnit style methodology—setup, assert, and teardown. There
are pro’s and con’s to both doctest and unittest styles of
testing. doctest tends to be more difficult to debug, while
unittest contains boilerplate code that is regarded as too
Java-esque. It is possible to combine both to achieve well
documented and well tested code.
if __name__ == ’ __main__ ’:
sys . exit ( main ( sys . argv [1:]) or 0)
27.12 __name__
Python defines the module level variable __name__ for any module
you import, or any file you execute. Normally __name__’s value is
the name of the module:
136
27.12. __name__
Note
It is easy to illustrate __name__. Create a file,
some_module.py, with the following contents:
$ python some_module . py
The __name__ is : __main__
if __name__ == ’ __main__ ’:
sys . exit ( main ( sys . argv [1:]) or 0)
This simple statement will run the main function when the file is
executed. Conversely, if the file is used a module, main will not be
run automatically. It calls sys.exit with the return value of main (or
0 if main does not return an exit code) to behave as a good citizen in
the Unix world.
Tip
Some people place the execution logic inside of main func-
tion directly under the if __name__ == ’__main__’: test.
Reasons to actually put the logic in a function include:
137
27. A complete example
138
Chapter 28
139
Appendix A: Testing in Python
This section will discuss testing strategies using unittest and doctest
in Python. It will assume you want to test latest internet sensa-
tion–an integer generation site called Integr! Imagine a web-based
service that could convert a string like “2,5,8” to an actual list of the
numbers 2, 5 and 8. Integers are quite useful for activities such as
counting and if you hurry you can beat the market. Heavy thick
clients have used Integr technology in the “Select Pages” box of
printer dialogs for ages. Yet no one has had the foresight to bring
this functionality to the internet. The critical feature you need to
start out is the ability to demarcate individual integers by commas!
This chapter will consider implementing the basic logic of the
integer web-service using tests.
141
Appendix A: Testing in Python
Detour to unittest
Out of the box Python provides a library, unittest, which is a library
that implements the x-unit pattern. If you are familiar with Java,
junit is an implementation of the this same paradigm. The basic
idea is to start from a well known state, call the code you want to test,
and assert that something has happened. If the assertion fails, the
code is broken and needs to fixed. Otherwise the assertion passes
and all is well. Groups of tests can be collected into “suites” and the
execution of these suites can be easily automated.
Here is some code for testing the initial spec:
import unittest
import integr
if __name__ == ’ __main__ ’:
unittest . main ()
unittest details
Unit tests are meant to test a single unit of code. A unit test tests a
single method or function (if your code is broken down into func-
tions that perform a single concrete action). The goal of unit testing
is to insure that the individual chunks of code are acting as they
should. There are other types of testing as well, such as performance
and integration testing, whose goals are different that those of unit
tests. Rather than verifying logic of small blocks, these tests would
measure performance and ensure that disparate systems work in
harmony.
Creating a unit test in Python is easy. At the most basic level you
subclass the unittest.TestCase class and implement methods that
start with test.
142
unittest details
Note
Any method that starts with test, will be treated as a unit
test. Even though PEP 8 encourages one to use underscores
(_) between words in function and method names, unittest
supports camel casing as well. In fact the implementation of
unittest itself disregards the PEP 8 naming recommendations
in favor of a following a more Java-like style.
143
Appendix A: Testing in Python
Assertion Methods
Method signature Explanation
assert_( expression, Complains if expression is False
[message])
assertEqual( this, Complains if this != that
that, [message])
assertNotEqual( this, Complains if this == that
that, [message])
assertRaises( Complains if callable(*args, **kw) does
exception, callable, not raise exception (Made context manager
*args, **kw) in 2.7)
fail( [message]) Complains immediately
There are other methods, but these are what you will be using 99%
of the time for assertion. Python 2.7 added many new methods for
assertions, cleanup during exceptional cases in setUp and tearDown,
decorators for skipping tests on certain platforms and test discovery.
$ python testintegr . py
Traceback ( most recent call last ):
File " testintegr . py " , line 1 , in < module >
import integr
ImportError : No module named integr
After writing your code, re-run your test. Does it work? If so,
you are off on the right track to creating the next internet sensation.
144
Being controlled by testing
Handling Whitespace
To implement whitespace handling utilizing TDD, first add another
test method to testinteger.py:
Now when you run your test suite, the original test_basic method
should pass and test_spaces should fail. Having a failing test case
gives you a clear direction of what you need to develop. This is a
compelling feature of TDD. Not only do you have tests, but if you
develop the tests of the required functionality first, you will spend
your time working on the desired functionality. Writing the test first
also allows you to think about the API of your code.
This is not to say that the TDD methodology is a panacea. It
certainly requires a small amount of determination (or gentle per-
suasion by a manager) to implement, but this upfront effort has both
short and longterm benefits.
Exception Handling
Suppose input is unparseable and the spec says that integr should
raise a BadInput exception. unittest provides a few ways for that:
145
Appendix A: Testing in Python
Note
The new unittest features found in Python 2.7+ are avail-
able as a 3rd party package for Python 2.4+ users. This package
is called unittest2 and available at pypi.
146
Test my docstrings
This sounds like someone had some gripes with the lack of both
documentation and testing in software projects. The above points
were actually taken from a snippet of a post that Tim Peters wrote
to the comp.lang.python newsgroup in 1999 to enumerate the reasons
for creating a new module called doctest.
Mr. Peters makes some good points. It is true, examples are
priceless. How many people code simply by copying and pasting
examples? How much software has good documentation? Or any
documentation? Most developers are not concerned with documen-
tation, so documentation is commonly non-existant. And many
developers do not like testing. This oftentimes puts one in a pickle
when picking up a new library.
Since the time of Mr. Peters’ post, newer programming method-
ologies with more emphasis on testing have come out, so the situa-
tion might be better. Nonetheless, the ideas are compelling—what
if you could somehow provide examples and documentation that
also could serve as testcases?
Test my docstrings
Docstrings can be located on a module, class, method or function.
While they can serve as examples of the API, they can also serve as
tests. Creating a doctest is quite simple. Any docstring that has >>>
followed by Python code can serve as a doctest. If you are developing
with a REPL alongside, you can simply copy and paste the code out
of it. Here is a trivial example of creating docstrings and executing
them with the doctest library:
147
Appendix A: Testing in Python
def add_10 (x ):
"""
adds 10 to the input value
>>> add_10 (5)
15
>>> add_10 ( -2)
8
"""
return x + 10
If you run this module from the command line apparently noth-
ing happens. A tweak to the code will illustrate what is going on
here. Change the 8 to a 6:
def add_10 (x ):
"""
adds 10 to the input value
>>> add_10 (5)
15
>>> add_10 ( -2)
6
"""
return x + 10
*******************************************
File " add10 . py ", line 7 , in __main__ . add_10
Failed example :
add_10 ( -2)
Expected :
6
Got :
8
*******************************************
1 items had failures :
1 of 2 in __main__ . add_10
*** Test Failed *** 1 failures .
148
integr doctests
100% line code coverage for their products with it. (So does that
mean their documentation was complete since it effectively tests
every line of code?)
integr doctests
The same testing done by unittest could be done with the following
python file, integrdoctest.py:
"""
This is an explanation of integr
>>> import integr
>>> integr . parse ( ’1 ,3 ,4 ’)
[1 , 3, 4]
Gracefully fails !
>>> integr . parse ( ’ abcd ’)
Traceback ( most recent call last ):
...
BadInput : ’abcd ’ not valid
"""
if __name__ == " __main__ ":
import doctest
doctest . testmod ()
unittest or doctest?
At this point you may be asking “which style of testing should I use?”
While unittest follows a well known paradigm, doctest appears to
have many benefits. Who wouldn’t want documentation, examples
and tests?
First of all, using one library does not preclude using the other.
However, doctest rubs some people the wrong way. It is certainly
nice to be able to copy a chunk of interactive code, paste it into a
docstring and turn it into a test. But when those chunks are more
that a screenful, or those tests end up being longer than the imple-
mentation, maybe they are overkill.
As doctest tries to combine documentation, examples and test-
ing, it comes off as a jack-of-all-trades and master-of-none. For
example doctest does not include built-in support for setup and
teardown, which can make some doctests unnecessarily verbose.
Also trying to do extensive testing of corner cases can be distracting
when using the docstrings as examples only.
149
Appendix A: Testing in Python
import doctest
doctest . testfile ( ’/ path / to / file ’)
Note
The author of this book uses doctest to test the examples
in the source of the book. The third-party docutil library
tooling allows easy creation of documents, slides, webpages
and handouts that have tested Python code.
Indentation
Which of the following is correct?
The former is the correct one. The latter might look right but
doctest will complain with something like this:
150
Blanklines
Failed example :
print " foo "
Expected :
foo
Got :
foo
Tip
If you forget how to line up your output, just try running
the same code from the interpreter. Since doctest is meant to
be copied and pasted from interactive sessions, usually what
the interpreter says is correct.
One more thing that might be confusing is that the starting col-
umn of a doctest block does not matter (in fact subsequent chunks
might be indented at a different level). It is assumed that the start
of the >>> is the first column for that block. All code for that block
should be aligned to that column.
Blanklines
If the following code were in a doctest, the first test would fail. This
is because doctest has not yet learned how to read your mind. Is
the blank line inserted for clarity or is it actually significant?
Spacing
As you have seen, doctest can be really picky about spacing. One
confusing aspect might be when something like the following fails:
151
Appendix A: Testing in Python
[0 ,1 ,2]
When executing the code you would see an error like this:
*******************************************
File "/ tmp / foo . py ", line 4 , in __main__ . foo
Failed example :
print range (3)
Expected :
[0 ,1 ,2]
Got :
[0 , 1, 2]
*******************************************
Tip
The doctest module has various directives to tell
it to ignore some spacing issues. The #doctest:
+NORMALIZE_WHITESPACE directive allows the previous
example to work:
Here is one that can leave a developer scratching their head for
a while. Say you get the following error in your doctest. What is
wrong? (Hint doctest compares characters and this is the “spacing”
section)
*******************************************
File "/ tmp / foo . py ", line 6 , in __main__ . foo
Failed example :
print range (3)
Expected :
[0 , 1, 2]
Got :
[0 , 1, 2]
*******************************************
If you left extra spaces at the end of the expected result, [0, 1,
2]__, (note that there are two spaces at the end of the line, here they
are replaced by underscores) instead of [0, 1, 2], you could see
this issue.
152
Start Testing
Start Testing
Testing is part of programming. Whether you do it in a formal
manner or an ad hoc style, Python has tools suited to aid those
efforts.
153
About the author
Matt Harrison has over 11 years Python experience across the do-
mains of search, build management and testing, business intelli-
gence and storage.
He has presented and taught tutorials at conferences such as
SCALE, PyCON and OSCON as well as local user conferences. The
structure and content of this book is based off of first hand experience
teaching Python to many individuals.
He blogs at hairysun.com and occasionally tweets useful Python
related information at @__mharrison__.
155
About the author
156
Index
Index
157
Index
input, 16 sort, 71
int, 31 sorted, 88
integer, 22, 31 stride, 97
interpreter, 7 string, 21
invocation, 91 strings, 39
is, 63 super, 110
is not, 63 sys.path, 127
package, 125
PATH, 132
pdb, 49
PYTHONPATH, 126
raise, 116
range, 72
raw_input, 16
readline, 99
readlines, 100
REPL, 8
return, 90
self, 106
set, 76
shebang, 12
shebang, 132
slices, :, 96
158